watcher: Write per-group instance status, merge into global one
Each per-group watcher process writes its own instance status file. Oncethat's done it tries to acquire an exclusive lock on the global file andwill proceed to read all status file, merging them based on each file's...
utils.ReadFile: Add pre-read callback
This will be used by the watcher to store the file's fstat(2). It mustbe done from the filehandle.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Merge branch 'stable-2.4'
Signed-off-by: René Nussbaumer <rn@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Fixed a typo in utils/process.py
Signed-off-by: Agata Murawska <agatamurawska@google.com>Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Remove 15-second sleep from LUInstanceCreate
Remove 15 second sleep when wait_for_sync is not set. LUInstanceCreate alreadycalls _WaitForSync with oneshot=True, which already performs an internalwait-loop for disks to start syncing.
Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>...
Add a readability alias
lu.glm.list_owned becomes lu.owned_locks, which is clearer for thereader.
Also rename three variables (which were before named owned_locks) tomake clearer what they track.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: René Nussbaumer <rn@google.com>
Fix broken object references in docstrings
The module is called “objects”, not “object”.
Add “gnt-instance change-group” command
Add opcode to change instance's group
This is quite similar to evacuating a group, but the lockingis different.
Factorize checking instance's node groups
Remove WATCHER_STATEFILE constant
ganeti-watcher: Split for node groups
This patch brings a huge change to ganeti-watcher to make it aware ofnode groups. Each node group is processed in its own subprocess,reducing the impact of long-running operations.
The global watcher state file, $datadir/ganeti/watcher.data, is replaced...
Lock potential target nodes for group evacuation
All potential target nodes should be locked while calculatinga group evacuation.
Small changes in group evacuation
- Use OpPrereqError in CheckPrereq- Clarify command synopsis
cmdlib: Factorize getting iallocator
The same logic will be used for changing an instance's group.
Pause DRBD sync for OS install if not wait_for_sync
When wait_for_sync is set to False in LUInstanceCreate, Ganeti lets DRBD syncin the background while performing the rest of the installation steps,including OS installation.
However, OS installation is a very disk-intensive task that intereferes badly...
Fix documentation of gnt-instance failover
Explain that we only start the instance on the new node if it wasoriginally running.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Fix small typo in docstring
Change the backend.InstanceLogName signature
This uses now the component for the transfer (if available), otherwise(e.g. in installs/renames) nothing.
Instance transfer: export component name to backend
This modifies the RPC layer to export the component name too to thebackend, so that it can be used in log files and messages.
Instance transfer: add argument for the 'component'
Currently, transfer data is done mainly with just the instance name,but when we have instances with multiple disks this is not enough todistinguish between the different transfers being done for theinstance....
Optimise use of repeated/looping GetInstanceInfo
Similar to the previous patch, this adds a helper function toeliminate repeated calls info ConfigWriter.
Optimise use of repeated/looping GetNodeInfo
This adds a new ConfigWriter.GetMultiNodeInfo function and replacesmultiple/looping calls to GetNodeInfo with it.
Fix lint errors
It turns out that the only use of the operator module was foritemgetter, so patch eb62069e should have removed that import too.
Add two more compat functions
operator.itemgetter(0) → fstoperator.itemgetter(1) → snd
snd is not used yet, but it makes sense to add both.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Fix types passed to IAllocator
Iallocator mode reloc, parameter reloc_from takes a list; half of thecode already forced this parameter to list, we add the other two caseswhere it is needed.
jqueue: Add short delay before detecting job changes
By sleeping for 100ms after receiving a notification for a changed jobfile the job is given some additional time to change again. Thissignificantly reduces the number of LUXI calls for WaitForJobChanges...
Add primary/second nodes' group as query fields
These will be very useful for ganeti-watcher as it needs to retrieveinstances by group.
Fix doclint failures
Commit 54ca6e4b2 renamed some arguments, but didn't also renames themin the docstrings.
watcher: Separate function for writing instance status file
For now this will do another query to the master daemon, but with thesplit for node groups this issue will go away.
watcher: Make RAPI error messages less technical
watcher.state: Use strings, not objects
Until now the state class would receive instances as objects(ganeti.watcher.Instance), but this is not necessary. By using stringsthe interface is simplified.
This patch also simplifies some code accessing the internal structures,...
watcher: Raise error on unknown hook status
Also, remove punctuation from one error message.
watcher: Reformat constants
Make them match with style guide.
Add new watcher constants
WATCHER_STATEFILE will be removed at the end of thispatch series.
Fix formatting of frozensets
Signed-off-by: Stephen Shirley <diamond@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
cli: Add constant for node group option
ganeti-watcher will use this constant to pass the option to itself forprocessing all node groups.
Replace %r with '%s' in masterd/instance.py
I still don't know why Michael is a fan of %r, but in the meantimethis patch changes:
WARNING: import u'import-2011-07-29_01_39_33-y3gZKV' on node1 failed:Exited with status 1
into:
WARNING: import 'import-2011-07-29_01_39_33-y3gZKV' on node1 failed:...
Add "reboot_behavior" hypervisor flag
During instance installations, you do not want the instance to rebootand start again with the same parameters, as that will most likelyre-start the install process. Therefore, when the instance requests areboot it should instead shutdown. This flag allows this to be...
Clear the OS scripts environment
The OS scripts currently run with the whole noded environment; this isdifferent from the hooks which run with a cleared one and most likelyan oversight.
This might create problems when upgrading, so it needs to be clearly...
watcher: Split state class into separate module
Rename watcher's constant for instance status file
“upfile” is a bad name.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
watcher: Split node maintenance into separate module
The node maintenance class is standalone.
Merge branch 'devel-2.4'
Remove requirement for variants on OS API v15+
This removes:
- the check in backend that such OSes have a variants file or if it exists that is non-empty; in order for this to work, we also rework the logic in backend._TryOSFromDisk to allow for optional OS files...
Revert "cli.JobExecutor: Feedback function for info output"
This reverts commit 7421df8e5f2cf31022085b332d1300640ba5854b.
The feedback_fn argument to JobExecutor is used for PollJob, and thushas a fixed signature: a single arg, tuple of (timestamp, log type,...
Fix group verification of offline nodes
Commit aef59ae7 reworked the file verification, but forgot to takeinto account offline nodes.
The fact that this was not detected yet is due to the fact that wedon't test clusters with offline nodes in QA :(
Signed-off-by: Iustin Pop <iustin@google.com>...
Disallow variants for OSes that don't support them
Otherwise we get no variant checks at all, but the variant is stillrecorded.
Fix OS queries for API v20 w/parameters
OS parameters is a list of tuples, so we can't pass it directly toutils.NiceSort, hence we use a sort key.
This was not detected in QA since QA only tests API v10 :(
Add helper for declaring all locks shared
This patch adds a function for abstracting“dict.fromkeys(locking.LEVELS, 1)”. It also removes a duplicateassignment for the share_locks in LUInstanceQuerydata.
Additionally, it moves the _SupportsOob function to the helper...
Add ht-based result checks to opcodes
This adds the infrastructure necessary to check opcode results usinght-based functions. Checks are added for two opcodes.
Change OpClusterVerifyDisks to per-group opcodes
Until now verifying disks, which is also used by the watcher,would lock all nodes and instances. With this patch the opcodeis changed to operate on per nodegroup, requiring fewer locks.
Both “gnt-cluster” and “ganeti-watcher” are changed for the...
cmdlib: Give instance name in error message on group evacuation
cmdlib: Factorize mapping instance LVs to node/volume
cli.JobExecutor: Feedback function for info output
This will be used in the watcher where we don't want topollute stdout unless in debug mode.
Add OS search path to gnt-cluster info
Otherwise, it's pretty hard to figure it out from the command line.
Signed-off-by: Ben Lipton <benlipton@google.com>Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Fix recompilation of htools on regen-vcs-version
Currently, most htools code depends on Constants.hs which is generatedfrom constants.py and also depends on _autoconf.py. Also, _autoconf.pydepends on vcs-version, which all together means that when 'make...
Add another name for the --yes-do-it option
Most boring patch ever
s/'/"/ in (hopefully) the right places.
Signed-off-by: Michael Hanselmann <hansmi@google.com>...
Reopen daemon's stdio on SIGHUP
Before this patch daemons would continue to refer to an old logfile fortheir standard I/O if they had been asked to reopen the log (SIGHUP).
Reopen log file only once after SIGHUP
Commit b6fa9a44 added a re-openable log handler. The log file isreopened when a daemon is sent a HUP signal. Due to a bug in the code,fixed by this patch, the log file would be reopened for every single logmessage thereafter....
Don't leak file descriptors when setting up daemon output
When a daemon's output is configured using “utils.SetupDaemonFDs”, thefunction must use dup2(2). Unfortunately the code didn't close theoriginal file descriptors, leaking them in the process.
gnt-instance info: Return static info if node offline
Before this patch “gnt-instance info” would fail with the error message“Error checking node $node: Node is marked offline” if the instance'sprimary node is marked offline and the user didn't explicitely request...
Ignore offline primary when failing over
When the source node for a failover is marked offline, there's no needto require the user to specify “--ignore-consistency”.
To make it work at all, a number of bugs introduced by the merge ofmigration and failover are also fixed by this patch....
gnt-instance console: Use query instead of opcode
This means opening the console no longer requires the instance lock,allowing it to be used during long-running operations (e.g. replacing adisk).
Add opcode attribute for comments
This attribute allows programmatic submitters of jobs (e.g. iallocator)to add a comment to each opcode, describing its purpose. Example:
$ gnt-job info 123Job ID: 123 … Opcodes: OP_INSTANCE_REPLACE_DISKS …...
gnt-node volumes: Fix instance names
Commit 84d7e26b changed “objects.Instance.MapLVsByN” to not just returnthe LV name, but to include the volume group name (e.g.“xenvg/d67e8700….disk0_data”). This in turn broke the mapping of volumenames in LUNodeQueryvols, stopping instance names from displayed in...
Fix instance failover (missing argument)
More fallout from commit 323f9095b49d.
Implement instance failover via RAPI
No idea why this was missed before.
Export job dependencies through lock monitor
This makes them visible to the user. Example:
$ gnt-debug locks -o name,pendingName Pendingjob/890 job:891,892job/892 job:894
locking.GLM: Allow adding locks to monitor
This will be used for exporting job dependencies throughthe lock monitor.
Make lock monitor more versatile
With this change it'll be possible to register other lock informationproviders. One usecase for this are job dependencies, which can be shownin the output of “gnt-debug locks”, too.
The lock monitor is changed to accept more than one return value from...
Add error state to LUGroupEvacuate's exceptions
Rename *_STATUS_WAITLOCK to …_WAITING
This patch renames the {JOB,OP}_STATUS_WAITLOCK constants to {JOB,OP}_STATUS_WAITING, as per design document for chained jobs.
gnt-group: Add command to evacuate whole group
Add new opcode for evacuating group
Fix locking issue with job dependencies
When jobs waiting for a dependency are notified, they're re-added to thequeue. This would require owning the queue lock in exclusive mode, butsince the function doing so is called from within the job/opcodeprocessor, it only holds the lock in shared mode....
jqueue: Read-only jobs don't need processor lock
Add support for KVM keymaps
Signed-off-by: Sébastien Bocahu <zecrazytux@zecrazytux.net>Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
gnt-debug: Add tests for job dependencies
jqueue: Implement submitting multiple jobs with dependencies
With this change users of the “SubmitManyJobs” interface can userelative job dependencies. Relative job IDs in dependencies are resolvedbefore handing the job off to the workerpool.
Fix node evacuation
- Adjust for new iallocator result format- Split some code into helper functions
jqueue: Add “writable” flag to memory objects
Basically only one instance of the job, the one being processed,should be serialized to disk and replicated to other nodes. Withthis flag assertions can be added in various places.
Implement chained jobs
An overview is available in the design document for this change,doc/design-chained-jobs.rst.
When a job enters the job processor, the current opcode's dependenciesare evaluated. If a referenced job has not yet reached the desired...
Remove constants for iallocator multi-relocate
They're no longer necessary.
Fix assertion error on unclean master shutdown
Commit 66bd7445 added an assertion to ensure a finalized job has its“end_timestamp” attribute set. Unfortunately it didn't cover a case whenthe queue is recovering from an unclean master shutdown.
Make SharedLock._is_owned public
This will be useful for assertions. GanetiLockManager._is_owned isexported, too.
Adding a wrapper around connecting to kvm console
The wrapper will connect to the console, and check in the background ifthe instance is paused, unpausing it as necessary.
Signed-off-by: Stephen Shirley <diamond@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Adding a wrapper around "xm console"
Add opcode attribute for chained jobs
Set startup_paused to False when restarting
This fixes the lint error:
E1120:1220:InstanceReboot: No value passed for parameter'startup_paused' in function call
gnt-cluster {command|copyfile}: Support per-group operations
This patch allows commands to be run on and files to be copied to allnodes within a specific group.
cli.GetOnlineNodes: Support node group filter, use query2
This patc changes cli.GetOnlineNodes to use query2, which does thefiltering in the master daemon, and adds a new parameter to filter bynode group.
Unittests were added for the old implementation and then adopted to...
ht.WithDesc: Work around pylint warning
Explicitely defining “__call__” silences a pylint warning when wrappedtype check functions are used directly. I had no idea pylint is thisintelligent.
ht: Add new check for numbers
Places which receive floats can usually also deal with integers, e.g.OpTestDelay. Tests are added and the new check function is used for theaforementioned opcode and verifying query results.
Fix off-by-one bug in job serial generation
Commit 009e73d0 (September 2009) changed the job queue to generatemultiple job serials at once. Ever since it would return one more thanrequested.
The “serial” file in the job queue directory is defined to contain the...
Reverts the patch series about console wrappers
This reverts commits 030a9cb8022b83bf43ec14dfbafd943299bc01c4 andae082df0000a785b693b2f4aa434650a81a94bdf.
There are two problems:
- Makefile.am breakage, which is trivial to revert- unittest breakage, which honestly I'm not sure how to fix and how...