Clear the OS scripts environment
The OS scripts currently run with the whole noded environment; this isdifferent from the hooks which run with a cleared one and most likelyan oversight.
This might create problems when upgrading, so it needs to be clearly...
watcher: Split state class into separate module
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Rename watcher's constant for instance status file
“upfile” is a bad name.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Fixed a typo in the installation tutorial
Signed-off-by: Andrea Spadaccini <spadaccio@google.com>Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
watcher: Split node maintenance into separate module
The node maintenance class is standalone.
Fixed doc compilation under Sphinx 1.0.7
Sphinx 1.0.7 complains if an indented block in .warning starts with :option.This fixes it.
Merge branch 'devel-2.4'
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Remove requirement for variants on OS API v15+
This removes:
- the check in backend that such OSes have a variants file or if it exists that is non-empty; in order for this to work, we also rework the logic in backend._TryOSFromDisk to allow for optional OS files...
Add support for cluster/OS parameters in QA
Currently there is no way to QA with (for example) an initrd becausethe QA only inits the cluster with the default parameters. This makesit impossible to QA using anything but the default parameters, whichdoesn't always work....
Revert "cli.JobExecutor: Feedback function for info output"
This reverts commit 7421df8e5f2cf31022085b332d1300640ba5854b.
The feedback_fn argument to JobExecutor is used for PollJob, and thushas a fixed signature: a single arg, tuple of (timestamp, log type,...
Extend the ovf-support design with format translation
Signed-off-by: Agata Murawska <agatamurawska@google.com>Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Add a QA constant for cluster verify command
This seems to be used and reused multiple times, let's abstract it…
Fix group verification of offline nodes
Commit aef59ae7 reworked the file verification, but forgot to takeinto account offline nodes.
The fact that this was not detected yet is due to the fact that wedon't test clusters with offline nodes in QA :(
Signed-off-by: Iustin Pop <iustin@google.com>...
Disallow variants for OSes that don't support them
Otherwise we get no variant checks at all, but the variant is stillrecorded.
Fix QA OS API failure
The patch changing the OS api in QA to 20 was not complete, sorry.
QA: test using OS API v20
v20 is (mostly) a superset of the other versions, so testing with itshould be better than with V10. This detects properly the breakagefixed by the previous patch.
Fix OS queries for API v20 w/parameters
OS parameters is a list of tuples, so we can't pass it directly toutils.NiceSort, hence we use a sort key.
This was not detected in QA since QA only tests API v10 :(
Add helper for declaring all locks shared
This patch adds a function for abstracting“dict.fromkeys(locking.LEVELS, 1)”. It also removes a duplicateassignment for the share_locks in LUInstanceQuerydata.
Additionally, it moves the _SupportsOob function to the helper...
Add ht-based result checks to opcodes
This adds the infrastructure necessary to check opcode results usinght-based functions. Checks are added for two opcodes.
Change OpClusterVerifyDisks to per-group opcodes
Until now verifying disks, which is also used by the watcher,would lock all nodes and instances. With this patch the opcodeis changed to operate on per nodegroup, requiring fewer locks.
Both “gnt-cluster” and “ganeti-watcher” are changed for the...
cmdlib: Give instance name in error message on group evacuation
cmdlib: Factorize mapping instance LVs to node/volume
cli.JobExecutor: Feedback function for info output
This will be used in the watcher where we don't want topollute stdout unless in debug mode.
Add OS search path to gnt-cluster info
Otherwise, it's pretty hard to figure it out from the command line.
Signed-off-by: Ben Lipton <benlipton@google.com>Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
cluster-merge: remove a hardcoded constant
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
cluster-merge: remove option list from usage
It doesn't make sense to have to keep them up to date twice, and --helpalready lists all of them with help strings.
cluster-merge: add instance restart strategy opt
Right now we always restart all instances, which is not right if someinstances were already down for other reasons. Thus we add an option todecide how to handle this. The right default should be "up" which is:...
Fix recompilation of htools on regen-vcs-version
Currently, most htools code depends on Constants.hs which is generatedfrom constants.py and also depends on _autoconf.py. Also, _autoconf.pydepends on vcs-version, which all together means that when 'make...
Add another name for the --yes-do-it option
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Most boring patch ever
s/'/"/ in (hopefully) the right places.
Signed-off-by: Michael Hanselmann <hansmi@google.com>...
Reopen daemon's stdio on SIGHUP
Before this patch daemons would continue to refer to an old logfile fortheir standard I/O if they had been asked to reopen the log (SIGHUP).
Reopen log file only once after SIGHUP
Commit b6fa9a44 added a re-openable log handler. The log file isreopened when a daemon is sent a HUP signal. Due to a bug in the code,fixed by this patch, the log file would be reopened for every single logmessage thereafter....
Don't leak file descriptors when setting up daemon output
When a daemon's output is configured using “utils.SetupDaemonFDs”, thefunction must use dup2(2). Unfortunately the code didn't close theoriginal file descriptors, leaking them in the process.
htools: rework the algorithm for ChangeAll mode
I think I've identified the problem with the current ChangeAllmode. The current algorithm works as follows:
- identify a new primary by choosing the node which gives best score as new secondary- failover to it...
gnt-instance info: Return static info if node offline
Before this patch “gnt-instance info” would fail with the error message“Error checking node $node: Node is marked offline” if the instance'sprimary node is marked offline and the user didn't explicitely request...
Ignore offline primary when failing over
When the source node for a failover is marked offline, there's no needto require the user to specify “--ignore-consistency”.
To make it work at all, a number of bugs introduced by the merge ofmigration and failover are also fixed by this patch....
htools: replace two hardcoded uses of pri+sec nodes
These two cases use explicit uses of primary and secondary nodes withInstance.allNodes, which means the code is more flexible if theinternal layout of the instance changes.
I've verified that the output of involvedNodes is not required to be...
htools: add target_node member to migrate opcode
… and failover too. Not many changes otherwise except forserialisation and unittests.
htools: do not change node disk for non-local storage
htools: add more functions for local disk storage
These will be used in Node.hs for proper add/remove instancecode. Furthermore, we restrict the movable status to the right disktemplates only, so that we don't attempt to move the 'wrong' instancetypes....
Initial design doc for OVF support
Signed-off-by: Agata Murawska <agatamurawska@google.com>[iustin@google.com: fixed formatting issues]
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Iustin Pop <iustin@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Fix aliases in bash completion
Ever since commit 2d48a3a2 aliases were not included in the bashcompletion script. This patch also replaces one tab with two spaces.
gnt-instance console: Use query instead of opcode
This means opening the console no longer requires the instance lock,allowing it to be used during long-running operations (e.g. replacing adisk).
Add opcode attribute for comments
This attribute allows programmatic submitters of jobs (e.g. iallocator)to add a comment to each opcode, describing its purpose. Example:
$ gnt-job info 123Job ID: 123 … Opcodes: OP_INSTANCE_REPLACE_DISKS …...
gnt-node volumes: Fix instance names
Commit 84d7e26b changed “objects.Instance.MapLVsByN” to not just returnthe LV name, but to include the volume group name (e.g.“xenvg/d67e8700….disk0_data”). This in turn broke the mapping of volumenames in LUNodeQueryvols, stopping instance names from displayed in...
Fixed one option name and a typo in the docs
The -g vg-name option was deprecated in commit04367e70ad71eea3f0f19e7889dc68fb9783c98a.
Fix instance failover (missing argument)
More fallout from commit 323f9095b49d.
Implement instance failover via RAPI
No idea why this was missed before.
Export job dependencies through lock monitor
This makes them visible to the user. Example:
$ gnt-debug locks -o name,pendingName Pendingjob/890 job:891,892job/892 job:894
locking.GLM: Allow adding locks to monitor
This will be used for exporting job dependencies throughthe lock monitor.
Make lock monitor more versatile
With this change it'll be possible to register other lock informationproviders. One usecase for this are job dependencies, which can be shownin the output of “gnt-debug locks”, too.
The lock monitor is changed to accept more than one return value from...
Update documentation regarding Haskell dependencies
These were forgot when the supported library versions were changed.
htools: add two more small unittests
This adds tests for the opToResult and eitherToResult functions fromTypes.hs, and changes two other tests for the same module to test JSONserialisation (which automatically also tests the lower-level to/fromstring conversion functions)....
htools: update hail man page with the new modes
Also mark the deprecated modes we no longer support.
htools: a few more hlint fixes
Tested only on GHC 7.x, will test on 6.1x too before commit.
htools: further docstring fixes
This adds parameter documentation for Cluster.iMoveToJob (I think itwas not clear if the new or old node list is needed) and fixes otherdocstring style issues.
After this patch, all modules except for CLI.hs (which has many...
htools: add JSON instance for EvacMode
This abstracts the JSON parsing of the type EvacMode near itsdefinition, and simplifies its conversion in IAlloc.parseData.
htools: add human-readable output to hspace
Currently, hspace can only output a machine-readable format that(while detailed) is hard to parse quickly by people. This patch adds(and enables by default) a human-readable output that shows the mostimportant metrics in a simple format....
Fix job constants use in htools
Commit 56c094b4 added use of job constants, but I didn't payattention and ended up mixing things: job constants were used foropcode ones, and the job ones didn't get converted.
This patch corrects it and uses only C.* constants throughout the Jobs...
Add error state to LUGroupEvacuate's exceptions
Rename *_STATUS_WAITLOCK to …_WAITING
This patch renames the {JOB,OP}_STATUS_WAITLOCK constants to {JOB,OP}_STATUS_WAITING, as per design document for chained jobs.
gnt-group: Add command to evacuate whole group
Add new opcode for evacuating group
Fix locking issue with job dependencies
When jobs waiting for a dependency are notified, they're re-added to thequeue. This would require owning the queue lock in exclusive mode, butsince the function doing so is called from within the job/opcodeprocessor, it only holds the lock in shared mode....
jqueue: Read-only jobs don't need processor lock
Add support for KVM keymaps
Signed-off-by: Sébastien Bocahu <zecrazytux@zecrazytux.net>Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
gnt-debug: Add tests for job dependencies
jqueue: Implement submitting multiple jobs with dependencies
With this change users of the “SubmitManyJobs” interface can userelative job dependencies. Relative job IDs in dependencies are resolvedbefore handing the job off to the workerpool.
Fix node evacuation
- Adjust for new iallocator result format- Split some code into helper functions
Do proper name lookup for the -O option
hspace and hbal treat -O differently, and use aliases for short names(although hbal succeeds in that, and hspace doesn't). Uniform this witha name lookup, using the same functions we used for instanceselection/exclusion....
jqueue: Add “writable” flag to memory objects
Basically only one instance of the job, the one being processed,should be serialized to disk and replicated to other nodes. Withthis flag assertions can be added in various places.
Implement chained jobs
An overview is available in the design document for this change,doc/design-chained-jobs.rst.
When a job enters the job processor, the current opcode's dependenciesare evaluated. If a referenced job has not yet reached the desired...
Add implementation details to design for chained jobs
As requested by Iustin.
Add support for GPT by using parted for disks bigger than 2TB.
Signed-off-by: Pedro Macedo <pmacedo@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Remove constants for iallocator multi-relocate
They're no longer necessary.
htools: add a machine-readable CLI flag
This will be used in hspace to toggle between "human" readableand machine readable output formats.
htools: move the '-p' option to htools.rst
Since this is a common option and has a big description.
htools: move tiered spec map helper to Hspace.hs
This is used just in hspace, so let's help in making Cluster.hssmaller. We also split the function in two, as computing the spec mapand formatting it are two different tasks.
htools: import the program modules in QC.hs
This adds the binaries code to the coverage, and thus the coveragefinally shows the real coverage over all logic code (except for thehtools.hs code, which is not logic code related to the algorithms, soit doesn't matter — plus it's also very small)....
htools: switch hspace to the generic binary
This is the last patch of the binaries conversion.
As information, we now have a single binary that is approx. 5.4MiB insize, compared to 4 binaries that were approx. 5.1-5.2MiB in size;this will result in a smaller package and install size, and the single...
htools: switch hscan to the generic binary
htools: switch hbal to the generic binary
In addition, the patch adds a separate Makefile variable for holdingthe binary roles to make it more clear what we symlink.
htools: switch hail to the generic binary
This converts the first binary to the generic 'htools' binary.
htools: add a generic binary
This is the start of a series of patches that will unify all thebinaries currently in use in a single one, which can perform differentroles based on the name it is installed as.
htools: add a compatibility module
When compiling with the parallel-3.x library, we get a deprecationwarning, which makes understanding any other error messagesharder. This patch adds a compatibility module that will hold suchcode for transitioning libraries....
htools: remove no-longer-needed tryMG* functionality
… which was deprecated by the previous patch.
htools: remove ialloc/relocate and multi-evacuate
Since the new node-evacuate mode does both their work and alsosupports better multi-group clusters (including handling splitinstances).
htools: fix potential bug in ialloc/change-group
Currently, the ChangeAll mode of nodeEvac computes the primary groupof the instance and then uses the resulting group index for computingthe group score. However, during the change-group operation (which...
htools: run IAllocator input through checkData
As the IAllocator backend is using a different data path than theothers, it doesn't get the full functionality that loadExternalDatadoes. This results in the current situation where checkData is not runon the input cluster state, which means the node memory properties are...
htools: abstract a function for displaying warnings
This will make it possible to reuse this in IAllocator too.
htools: use maybePrintNodes in hail.hs
This eliminates duplication of codes (and was forgotten back whenmaybePrintNodes was added).
htools: add cluster state saving support to hail
This add support for saving the cluster state (both pre- andpost-iallocator run) to a text file such that it can be fed back intoany of the htools commands.
htools: return the final instance map in ialloc
Similar to the previous patch, this returns the final instance mapfrom the iallocator run, which will allow saving the cluster state forfurther examination/post-processing.
htools: implement post-alloc cluster status display
This patch changes the IAllocator result formatting workflow to returnthe final node list, which can be then used to display the final nodestatus too—currently only the initial status can be shown, which is...
Update node group iallocator design to use job dependencies
While working on a function to submit jobsets, I realized that weactually don't need them anymore. With the new job dependencies, theiallocator plugin can just generate the right dependencies and gets the...
Fix assertion error on unclean master shutdown
Commit 66bd7445 added an assertion to ensure a finalized job has its“end_timestamp” attribute set. Unfortunately it didn't cover a case whenthe queue is recovering from an unclean master shutdown.
Make SharedLock._is_owned public
This will be useful for assertions. GanetiLockManager._is_owned isexported, too.
htools: return new state from new IAllocator modes
The old modes already return the node list (as part of AllocSolution),this patch makes the new modes provide this new information.