qa_rapi: Test inter-cluster instance move script
This test moves an instance on the same cluster and, if successful,moves it back. While not testing a real move between two clusters,this is certainly better than nothing.
Signed-off-by: Michael Hanselmann <hansmi@google.com>...
backend: Add support for import/export magic
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
import/export daemon: Add support for a magic prefix
This “magic” value will be used to ensure that we don't accidentiallyconnect to the wrong daemon (e.g. due to a bug), comparable to DRBD'sper-disk secret. Just depending on the SSL certificate isn't enough...
import/export daemon: Simplify command building
Instead of appending strings, stage parts in a list. Building the "dd" command is moved to a separate function.
import/export: Limit max length of socat options
import/export: Validate remote host/port
The hostname and port received from the remote cluster shouldbe validated, just in case.
utils: Add function to validate service name
Handle ESRCH when sending signals
Upon sending signals, ESRCH can be reported when the target nolonger exists.
Add missing directory from Makefile.am
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Add example gnt-debug submit-job json files
These files are being used to test the job queue performance withvarious changes and conditions. Adding them here for posterity.
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Cache a few bits of status in jqueue
Currently each time we submit a job we check the job queue size, and thedrained file. With this change we keep these pieces of information inmemory and don't read them from the filesystem each time.
Significant changes include:...
ListVisibleFiles: do optional sorting
Fix a TODO in _QueuedJob
Rather than raising Exception use GenericError and explain a bit betterwhat happened.
Remove unused parameter from function
This also removes the relevant pylint disable.No point in keeping unused parameters around: if/when we need them it'seasy to add it back.
Optimize _GetJobIDsUnlocked
Currently we sort the list of job queue files twice (once inutils.ListVisibleFiles with sort and then later with NiceSort). We applythe _RE_JOB_FILE regular expression twice (once in _ListJobFiles andonce in _ExtractJobID). This simplifies the code a little, and a couple...
jqueue: Rename _queue_lock to _queue_filelock
The name clarifies the difference between this and the internal lock.Also explain a bit better what it is.
jstore._ReadNumericFile: use utils.ReadFile
Improve import-export unittest a bit
- Increase timeouts from 10 to 30 seconds (this still breaks when the machine is busy, e.g. using bonnie++)- Depend on only one timeout per test instead of three- Reset variables before each test
Test client timeout for import-export daemon
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Generate import-export unittest certs in parallel
Generating certificates can be slow.
Remove locking._CountingCondition
This class is unused and untested. We must have forgot it around.
Remove the job queue drain rpc call
This call was introduced but never used. In two years.Since it's just creating/removing a file it can also be in simpler ways,without a special rpc call, if/when we need it again. In the meantime,let's give it to history....
_BaseCondition: allow saving/restoring state
SharedLock _acquire_restore and _release_save
If a shared lock is used inside a condition, we need to make sure thatit's reacquired in the same way as it was originally, after the wait.
Submit[*each*]Pending job
This is useful so we can test both SubmitJob and SubmitManyJobs.
Add unittest for ganeti-cleaner
cfgupgrade: Local variable for cluster-domain-secret filename
This is necessary to allow cfgupgrade to work on a non-standard directory.
Start to prepare documentation for 2.2 release
- Update NEWS file- Remove dependency on OpenSSL (pyOpenSSL remains)- Update manpages, fix typos and other things
gnt-job auto-completion: suggest "all" too
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
import/export: Allow script to predict size
Once we have a size for an export (in the context of theimport/export daemon), we can provide the user with apercentage and ETA.
backend: Enable export size prediction
Show formatted ETA for disk sync and import/export
import/export daemon: Record amount of data transferred
This reports the amount of data transferred and the throughput (averagedover 60 seconds) to the master daemon. While not yet fully implemented,once the export scripts report the expected data size, we can even provide...
import/export: Show progress updates to user
With this patch, we show progress updates approx. once per minute.
ensure-dirs: don't fail if no rapi log is present
Sometimes a node has never been a master. Or ran rapi. In that case weneed to create the file (because if later rapi gets started, it won't beable to create it itself).
Signed-off-by: Guido Trotter <ultrotter@google.com>...
Introduce harcdoded timeouts for each RPC call
This patch adds a table with per-opcode timeouts. They were chosen in anempiric, rather than scientific, way - see the comments in lib/rpc.py.
The patch also shows how custom timeouts can be used - call_test_delay...
http client: support per-request read timeout
Currently, the read timeout is hardcoded in theHttpClientRequestExecutor class. The patch changes the timeout so thatit's a per-request property, and makes the rpc.Client class pass oneexplicitly in. Furthermore, we modify the rpc.RpcRunner class to support...
Let daemon-utils fix the owners for ganeti-rapi
This is a workaround until we fully switched to user separation and fixes theowners of directories/log files so ganeti-rapi will start flawlessly. This isright now run for every daemon but as it operates on a relatively small subset...
Modify ganeti-masterd to set permission and owner of masterd-socket
Signed-off-by: René Nussbaumer <rn@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Let ganeti-rapi run under a different user/group
Make it possible to call utils.Daemonize with uid and gid to run as
Signed-off-by: René Nussbaumer <rn@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Adding customized user/group as configure flags
Merge branch 'devel-2.1'
_ExecuteKVMRuntime: fix hv parameter fun
When executing the kvm runtime we were currently accessing a mix of theparameters as configured currently on the instance and the ones it wasstarted with. We were doing it without a precise criteria, but quite by...
Update FinalizeMigration docstring
This is used not only for aborted migrations, so the docstring shouldreflect that.
LUGrowDisk: fix operation on down instances
Currently it's impossible to grow a disk if an instance is shutdown,because the disk could not be assembled. Now we take care of assemblingit, and shutting it down after.
Allow disk operation to act on a subset of disks
If the disks= parameter is passed, we can assemble/wait forsync/shutdown only some disks belonging to an instance, rather than all.
This is useful to only activate/sync/shutdown the affected disk whengrowing it....
NEWS: add release date for 2.1.3
utils: Add function to format seconds
Bump up version for the 2.1.3 release
import/export unittest: Test large(r) transfer
import/export unittest: Improve logging and fix one race condition
Apart from improved logging, one race condition is fixed. Ifthe destination's status file became available, the port wouldbe returned immediately, even if it was still “None”. Most ofthe time it worked, but not always. Now an additional check...
Convert ganeti-masterd's main thread to mainloop
Not much changes with this patch. The main loop for the IOServer isrepaced by mainloop.Run() and the main thread now uses asyncore tohandle connections to the master socket. Once it accepts them, though,...
daemon.AsyncAwaker
This new asyncore dispatcher can be used to force a thread running theasyncore loop to awake from the select, by signaling it on one of itsselected sockets.
daemon.AsyncStreamServer
This is a new asyncore server which handles listening stream sockets bycalling a non-implemented function for each connection it accepts. It'sthe stream-oriented cousing of the AsyncUDPSocket.
daemon.AsyncTerminatedMessageStream
This is the counterpart of the AsyncStreamServer can be used to handleconnected sockets returned from connected clients if the protocol is aterminator separated message stream. Nothing in this class is serverspecific though: it can be used as a client as well, if the client is...
Test the new streaming daemon classes
Unittests cover AsyncStreamServer and AsyncTerminatedMessageStream withboth tcp and unix sockets.
ganeti-watcher should attempt to fix ganeti-rapi
Update ganeti-watcher so that it tests the master's RAPI port with asimple test (in this case GetVersion). If it fails, make one attemptat restarting ganeti-rapi and retest.
- daemons/ganeti-watcher: Test rapi and make one attempt at restarting it....
TestAsyncUDPSocket: remove dead code and add test
- _ThreadedClient was added on the idea of making this unittest concurrent, which was actually never done (we could test everything without it, so well)- handle_write() was never called without filling the send queue, and...
TestAsyncUDPSocket: test for oversized sends
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Luca Bigliardi <shammash@google.com>
Document the check-man change
Since this affects developers' systems, document it in NEWS anddevnotes.rst
Update NEWS for Ganeti 2.1.3
Second attempt at fixing check-man
I was wrong, actually LANG-vs-LC_ALL only fixed one case, by mistake. Toget proper UTF-8 encoding, we need to enforce any UTF-8 locale. Wechoose the 'default' of en_US.UTF-8.
Signed-off-by: Iustin Pop <iustin@google.com>...
Fix check-man for newer man-db
Again, check-man :)
Commit 5fa1642226 removed LC_ALL=C, since that breaks the check.However, with no LANG/LC_* variables, man-db is still broken.
We import the new lintian behaviour, i.e. LANG=C (which seems to differfrom LC_ALL=C, even with empty environment). I'm not sure of the...
Add RemoveDir utility function
Backported from master, 72087dcd5b06c0127e2ec3bf8c80f7f54da3fb01
Signed-off-by: Balazs Lecz <leczb@google.com>Reviewed-by: Iustin Pop <iustin@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Merge remote branch 'origin/devel-2.1'
Conflicts: test/ganeti.daemon_unittest.py...
AsyncUDPSocket: fix IgnoreSignals usage and test
This bug was found in the asyncore master patch series, but actuallyapplies to 2.1 for AsyncUDPSocket as well.
Explicitely return None from IgnoreSignals
Same result, but what happens is clearer.
Add KVM chroot feature
This patch adds a new boolean hypervisor parameter to the KVM hypervisor,named 'use_chroot'.If it's turned on for an instance, than KVM is started in "chroot mode":Ganeti creates an empty directory for the instance and passes the path...
utils: Add function to check whether process handles a signal
This will be used to avoid a race condition between starting a program (ddfor import/export) and sending signals to it.
Fix and Improve TryToRoman unittest
1) Don't break when the roman module is not found2) Test that not finding the roman module doesn't make TryToRoman fail(currently that is the case)
move-instance: Use error message instead of multiple state variables
Until now, move-instance used different status variables: “success”,“abort” and “error_message”. With this patch, everything is changedto use “error_message” only. This simplifies the code a bit....
Distribute cluster domain secret
The cluster domain secret file was not distributed to other nodes.
Convert gnt-instance list and info to use roman
Finally gnt-instance has roman support as well.
gnt-cluster info --roman
Convert to roman (if so the user wishes) the following: - cluster candidate size - uid pool - any integer be or hv parameter
FormatUidPool: provide optional roman conversion
The convert= option of compat.tryToRoman is used to do optionalconversion without duplicating formatting code.
gnt-node: remove latinfriendlyfields
Rather than relying on a static list of fields, we opportunisticallyconvert all integers.
Move roman conversion to compat
The new TryToRoman function provides optional easy to use romanconversion. Nunc cum demonstrationi unitati.
ssconf: error out when writing oversized files
Since we impose a maximum limit when reading ssconf files, let's errorout when trying to write them too big, so we don't pretend everything isok, and make mistakes when we actually read partial files.
Add a new opcode timestamp field
Since the current start_timestamp opcode attribute refers to the initalstart time, before locks are acquired, it's not useful to determine theactual execution order of two opcodes/jobs competing for the same lock.
This patch adds a new field, exec_timestamp, that is updated when the...
Fix IgnoreSignals on socket.error
Some confusion arose handling EINTR on this function: in python 2.6socket.error is an IOError, and thus: - It's an EnvironmentError - It has an .errno member
In 2.4 and 2.5 it's not, and so its errno variable must be extracted...
Master core scalability design doc
This initial design still lacks information about the job queue lockcontention decrease.
design-2.2: job queue lock analysis/remediation
This builds up on the "Master core scalability design doc" detailing thecritical situations in the job queue and proposing how to fix them. Thebulleted point list at the beginning is changed to subparagraph, as the...
reraise exceptions in async tests' error handlers
This makes sure that any unforeseen error raises an exception ratherthen just increasing a counter. It makes unittest debugging a loteasier.
Move hash functions to the compat module
Since the hash functions' changed their module name between python 2.4and 2.6, and we have to do an try/import/except trick, we'll do it justonce, for both hash functions, and in compat.py. This also fixes a use...
Fix {Ignore, RetryOn}Signals on socket.error
Some confusion arose handling EINTR on those functions: in python 2.6socket.error is an IOError, and thus: - It's an EnvironmentError - It has an .errno member
RAPI client should convert urllib2.URLError to GanetiApiError
Signed-off-by: Tom Limoncelli <tlim@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
KVM: Migration bandwidth and downtime control
Introduce 2 new hypervisor options, migration_bandwidth and migration_downtimeand implement KVM migration bandwidth and downtime control.
migration_bandwidth controls KVM's maximal bandwidth during migration, in...
Make utils.EnsureDirs() ignore umask
EnsureDirs() should create directories with the exact mode requestedin the arguments, but it currently applies the umask.This patch makes it independent from the umask.
Signed-off-by: Balazs Lecz <leczb@google.com>...
import/export daemon: Move command building into separate module
The import/export daemon code is already large. Moving some codeto a separate module will make it smaller and easier to test.
import/export daemon: Move some I/O processing code to module
The code parsing the child process' output is moved to a separateclass in the impexpd module. As more programs are added, it'llbecome more complex and should be separated.
import/export daemon: Move command building into class
Instead of passing around many variables for building the executedcommand, they're now kept as instance variables.
Signed-off-by: Balazs Lecz <leczb@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Fix two race conditions in reboot instance
If the instance crashes between backend.InstanceReboot checks the listof running instances and the execution of hv_xen.RebootInstance,ini_info will be None. And if the instance doesn't reboot fast enough,new_info will be None. Both cases lead to “TypeError: unsubscriptable...
Support for latin friendly output in node list
Test for errors during inotify callback
- Create a new _MyErrorLoggingAsyncNotifier class which registers error counts, rather than logging them- Add an additional ERR notifier to test with- Check that no error was returned, for tests that weren't supposed to...