Add JobQueue.SafeLoadJobFromDisk
This will be used to read a job file without having to deal withexceptions from _LoadJobFromDisk.
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
jqueue._LoadJobFromDisk: remove safety archival
Currently _LoadJobFromDisk archives job files it finds corrupted. Sincewe want to use it to load files without holding locks, this could causea conflict: we just move the feature to _LoadJobUnlocked which is always...
Add repetition count to the TestDelay opcode
If the repetition count is not passed or is passed as 0 we sleep exactlyone time, otherwise we sleep "repeat" times and log in between.
Merge branch 'devel-2.1'
Signed-off-by: Iustin Pop <iustin@google.com>...
Add "adopt" to the allowed disk parameters
"adopt" was missing from bd061c3, thus breaking disk adoption.
Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Improve pylintrc for pylint 0.21+
While we'll need to update the source files too, at least this changemakes pylint 0.21 not fail on the current source tree.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Fix warnings with Python 2.6
'format' is a new built-in function, and 'bytes' is a new builtin type.We rename this to make pylint happy (and remove potential bugs).
Fix a small bug introduced in cf26a87a
Commit cf26a87a added a tiny typo, which would break non-FQDN argumentsto modify node storage.
Fix the type of 'valid' attribute in LUDiagnoseOS
The update of the valid status in LUDiagnoseOS says:
valid = valid and osl and osl[0][1]
However, in Python, “True and []” (which '[]' we get for an invalid OS)will result in “[]”, and thus the valid field for an OS will be either...
Merge branch 'stable-2.1'
Bump up version for the 2.1.4 release
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Update NEWS about the latest 2.1 change
Fix handling of errors from socket.gethostbyname
Socket functions can raise more than just gaierror. Most of the times,socket.gethostbyname_ex will return gaierror, but rarely it will alsoraise herror. For completeness, we catch all socket exceptions with data...
Update a comment in qa-sample.json
Fix the sentence to say what it means.
gnt-debug: remove @todo from GenericOpCodes
- the function is not broken, and we're using in nowadays- we have example json files and all, which show its usage=> the todo is incorrect
count the number of tasks done in the wp unittest
Currently there's no way to know if something actually gets done.After this check we actually test that the threads do their job.
Workerpool.AddManyTasks: check tasks type
Each task has to be a sequence, or the RunTask call will fail.
jqueue.AddManyJobs: use AddManyTasks
Rather than adding the jobs to the worker pool one at a time, we addthem all together, which is slightly faster, and ensures they don't getstarted while we loop.
RAPI client: Add support for Python 2.6
The httplib module used by urllib2 requires its sockets to have amakefile() method to provide a file-like interface (or ratherfile-in-Python-like) to the socket. PyOpenSSL doesn't implementmakefile() as the semantics require files to call dup(2) on the...
Bump RPC protocol version to 40
Many RPC calls have changed in Ganeti 2.2, hence bumping the RPC protocolversion.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Change ganeti-cleaner unittest to not use random values
Using random values in unittests isn't good. This one broke exactlywhen building the 2.2.0~beta0 release. I suspect there were duplicatejob IDs generated (due to $large being not so large).
Signed-off-by: Michael Hanselmann <hansmi@google.com>...
Update NEWS for Ganeti 2.1.4
Bump version to 2.2.0~beta0
Fix parameter names in SimpleFillBE/NIC docstrings
WorkerPool.AddManyTasks
Useful if we want to add many tasks at once, without contention with theprevious one we added starting.
AsyncAwaker: use shutdown on the socketpair
This makes sure the out_socket can only be used for writing, and thein_socket for reading.
jqueue: make replication on job update optional
Sometimes it's useful to write to the local filesystem, but immediatereplication to all master candidates is not needed.
The _WriteAndReplicateFileUnlocked function gets renamed to_UpdateJobQueueFile, as calling "write and replicate, but don't...
s/queue._GetJobInfoUnlocked/job.GetInfo/
The job queue currently has a static _GetJobInfoUnlocked method.Changing it to be a normal method of _QueuedJob, which makes more sense.
Abstract loading job file from disk
Move the work from _LoadJobUnlocked to _LoadJobFileFromDisk, which canthen be used in other contexts as well. Also, if we fail to deserializethe job, archive it as well (before we archived it only if we failed tocreate the related object, but kept it there if deserialization failed....
Makefile: Add support for local Makefile additions
With the recent addition of a check for directories listed in Makefilelocal custom directories are always reported as unlisted. This patchadds support for a “Makefile.local” file, which can adjust settings in...
jqueue: simplify removal from _nodes
Somewhere we do try/del/except and somewhere just pop. Using popeverywhere saves lines of code.
ListVisibleFiles: do not sort output
Among all users, turns out just one may need the output to be sorted.All the others can cope without.
Improve gnt-debug man page
Signed-off-by: Manuel Franceschini <livewire@google.com>Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Remove a TODO
Since OS objects are not stored in the configuration, we cannot putos_hvp there, therefore the TODO is obsolete…
Rework LUSetInstanceParams._GetUpdatedParams
Currently, this function does three things:- special handling of constants.VALUE_DEFAULT- type enforcing of the resulting dict- filling the dictionary with defaults
However, except for the first one, the second two do not belong in this...
Split the core-OS and instance-specific env
Since we'll need to be able to generate the OS-specific environmentseparately from the instance one, we move it to a separate function. Wealso add a new OS_NAME env. var which is identical to the INSTANCE_OSone (which won't exist for OS-only environments)....
Add cluster.SimpleFill*() functions
Currently, the existing cluster.Fill* functions take as argument aninstance. This means that in any case where we don't have an actualinstance object, we have to resort to calling the low-levelobjects.FillDict function....
Merge branch 'devel-2.1' into master
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Balazs Lecz <leczb@google.com>
Fix a bug in instance startup with custom hvparams
Since the introduction of OS-specific hvparams, we shouldn't ever useobjects.FillDict directly for instances, but instead go via the clusterobject. Otherwise the os_hvp will be ignored.
Fix unsafe variant initializer in _TryOSFromDisk
In case an OS has inconsistent declarations, we might get into a casewhere one node reports a valid variants list (with OS API >=15), andanother node has OS API < 15, in which case its supported_variants gets...
Makefile: Add check for DIRS consistency
It's easy to forget to add a new directory to DIRS. This check shouldreport such inconsistencies.
Disallow DES for SSL connections
Older OpenSSL versions include DES-CBC3-* ciphers when specifying theHIGH group of ciphers. Removing potentially weak ciphers from the listof allowed ciphers ensures only strong ciphers are considered for SSLconnections....
Start instance after creating snapshots for export
This restores functionality lost in commit 387794f8. Found duringtests using QA scripts. An instance should be started after ithas been temporarily shutdown for an export.
Use import/export magic for backup/import and inter-cluster moves
This should prevent bugs in our code from accidentally overwritingdisks.
Disable compression for all intra-cluster imports/exports
Tests have shown that usually we're CPU-bound for intra-clusterimports/exports. Disabling compression will help with this.
Some versions of OpenSSL, depending on the build options, alsocompress transparently. This will need further work in Ganeti....
qa_rapi: Test inter-cluster instance move script
This test moves an instance on the same cluster and, if successful,moves it back. While not testing a real move between two clusters,this is certainly better than nothing.
backend: Add support for import/export magic
import/export daemon: Add support for a magic prefix
This “magic” value will be used to ensure that we don't accidentiallyconnect to the wrong daemon (e.g. due to a bug), comparable to DRBD'sper-disk secret. Just depending on the SSL certificate isn't enough...
import/export daemon: Simplify command building
Instead of appending strings, stage parts in a list. Building the "dd" command is moved to a separate function.
import/export: Limit max length of socat options
import/export: Validate remote host/port
The hostname and port received from the remote cluster shouldbe validated, just in case.
utils: Add function to validate service name
Handle ESRCH when sending signals
Upon sending signals, ESRCH can be reported when the target nolonger exists.
Add missing directory from Makefile.am
Add example gnt-debug submit-job json files
These files are being used to test the job queue performance withvarious changes and conditions. Adding them here for posterity.
Fix RpcResult.Raise error code
A typo in the Raise() method of rpc.RpcResult means that any remoteerrors will lack an appropriate error code; this will confuse e.g. RAPIusers.
Cache a few bits of status in jqueue
Currently each time we submit a job we check the job queue size, and thedrained file. With this change we keep these pieces of information inmemory and don't read them from the filesystem each time.
Significant changes include:...
ListVisibleFiles: do optional sorting
Fix a TODO in _QueuedJob
Rather than raising Exception use GenericError and explain a bit betterwhat happened.
Remove unused parameter from function
This also removes the relevant pylint disable.No point in keeping unused parameters around: if/when we need them it'seasy to add it back.
Optimize _GetJobIDsUnlocked
Currently we sort the list of job queue files twice (once inutils.ListVisibleFiles with sort and then later with NiceSort). We applythe _RE_JOB_FILE regular expression twice (once in _ListJobFiles andonce in _ExtractJobID). This simplifies the code a little, and a couple...
jqueue: Rename _queue_lock to _queue_filelock
The name clarifies the difference between this and the internal lock.Also explain a bit better what it is.
jstore._ReadNumericFile: use utils.ReadFile
Improve import-export unittest a bit
- Increase timeouts from 10 to 30 seconds (this still breaks when the machine is busy, e.g. using bonnie++)- Depend on only one timeout per test instead of three- Reset variables before each test
Test client timeout for import-export daemon
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Generate import-export unittest certs in parallel
Generating certificates can be slow.
Enforce consistency in disks and nics input dicts
With this change unknown disk and nic parameters will be refused, ratherthan silently ignored, so that one can't pass them in by mistake and notrealize what went wrong.
Signed-off-by: Guido Trotter <ultrotter@google.com>...
TLMigrateInstance: pass lu to _Check*
The various _Check* helper functions expect an lu to be passed in, butthe TL is passed instead. This works... sometimes! :)
Remove locking._CountingCondition
This class is unused and untested. We must have forgot it around.
Remove the job queue drain rpc call
This call was introduced but never used. In two years.Since it's just creating/removing a file it can also be in simpler ways,without a special rpc call, if/when we need it again. In the meantime,let's give it to history....
Move fake hypervisor run dir under ganeti
This makes it uniform with the other hypervisors.
_BaseCondition: allow saving/restoring state
SharedLock _acquire_restore and _release_save
If a shared lock is used inside a condition, we need to make sure thatit's reacquired in the same way as it was originally, after the wait.
Submit[*each*]Pending job
This is useful so we can test both SubmitJob and SubmitManyJobs.
Add unittest for ganeti-cleaner
cfgupgrade: Local variable for cluster-domain-secret filename
This is necessary to allow cfgupgrade to work on a non-standard directory.
Start to prepare documentation for 2.2 release
- Update NEWS file- Remove dependency on OpenSSL (pyOpenSSL remains)- Update manpages, fix typos and other things
gnt-job auto-completion: suggest "all" too
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
import/export: Allow script to predict size
Once we have a size for an export (in the context of theimport/export daemon), we can provide the user with apercentage and ETA.
backend: Enable export size prediction
Show formatted ETA for disk sync and import/export
import/export daemon: Record amount of data transferred
This reports the amount of data transferred and the throughput (averagedover 60 seconds) to the master daemon. While not yet fully implemented,once the export scripts report the expected data size, we can even provide...
import/export: Show progress updates to user
With this patch, we show progress updates approx. once per minute.
ensure-dirs: don't fail if no rapi log is present
Sometimes a node has never been a master. Or ran rapi. In that case weneed to create the file (because if later rapi gets started, it won't beable to create it itself).
Introduce harcdoded timeouts for each RPC call
This patch adds a table with per-opcode timeouts. They were chosen in anempiric, rather than scientific, way - see the comments in lib/rpc.py.
The patch also shows how custom timeouts can be used - call_test_delay...
http client: support per-request read timeout
Currently, the read timeout is hardcoded in theHttpClientRequestExecutor class. The patch changes the timeout so thatit's a per-request property, and makes the rpc.Client class pass oneexplicitly in. Furthermore, we modify the rpc.RpcRunner class to support...
Let daemon-utils fix the owners for ganeti-rapi
This is a workaround until we fully switched to user separation and fixes theowners of directories/log files so ganeti-rapi will start flawlessly. This isright now run for every daemon but as it operates on a relatively small subset...
Modify ganeti-masterd to set permission and owner of masterd-socket
Signed-off-by: René Nussbaumer <rn@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Let ganeti-rapi run under a different user/group
Make it possible to call utils.Daemonize with uid and gid to run as
Signed-off-by: René Nussbaumer <rn@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Adding customized user/group as configure flags
_ExecuteKVMRuntime: fix hv parameter fun
When executing the kvm runtime we were currently accessing a mix of theparameters as configured currently on the instance and the ones it wasstarted with. We were doing it without a precise criteria, but quite by...
Update FinalizeMigration docstring
This is used not only for aborted migrations, so the docstring shouldreflect that.
LUGrowDisk: fix operation on down instances
Currently it's impossible to grow a disk if an instance is shutdown,because the disk could not be assembled. Now we take care of assemblingit, and shutting it down after.
Allow disk operation to act on a subset of disks
If the disks= parameter is passed, we can assemble/wait forsync/shutdown only some disks belonging to an instance, rather than all.
This is useful to only activate/sync/shutdown the affected disk whengrowing it....
NEWS: add release date for 2.1.3
utils: Add function to format seconds