Document OpNodeMigrate's result for RAPI
- Commit b7a1c8161 changed the LU to generate jobs- Mention documented results in NEWS
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Fail if node/group evacuation can't evacuate instances
If an instance can't be evacuated, only a message would be printed. Withthis change the operation always aborts. Newly added unittests check forthis behaviour.
Signed-off-by: Michael Hanselmann <hansmi@google.com>...
LUInstanceRename: Compare name with name
… instead of object with name.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: René Nussbaumer <rn@google.com>
LUClusterRepairDiskSizes: Acquire instance locks in exclusive mode
Instances are modified if their disk size doesn't match.
jqueue: Allow zero jobs to be submitted at once
If cmdlib.LUNodeMigrate was called for a node without primary instancesit would try to submit an empty list of jobs. This was never visible viaCLI as there we check the list of primary instances first.
Merge branch 'devel-2.4' into stable-2.5
Signed-off-by: René Nussbaumer <rn@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Fix queue archive creation with wrong permissions
On a master failover some of the archive dirs might have wrongpermissions in the non-root model. This is due to the nature of nodedstill running as root and the job queue is synced that way. This patchwill fix this behaviour by setting the permissions accordingly....
Ensure permission on the job queue version file
Signed-off-by: René Nussbaumer <rn@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
OpGroupVerifyDisks: Fix wrong result type declaration
If an instance had actually a missing disk, the type check would fail.
RAPI: Make node evacuation actually work
Commit e1f23243 changed te LU and opcode for node evacuation to receivea “mode” parameter (among other things). Commit de40437a changed theRAPI code accordingly, but did so for an earlier version of the firstpatch. Obviously this couldn't work, so here's the fix....
RAPI: Fix resource for replacing disks
Commit d1c172deb4f inadvertently changes the“/2/instances/[instance_name]/replace-disks” resource to use bodyparameters. There were no QA tests and the issue wasn't noticed.
This patch re-introduces support for query parameters and adds a QA...
rpc: Disable HTTP client pool and reduce memory consumption
We noticed that “ganeti-masterd” can use large amounts of memory,especially on large clusters. Measurements showed a single PycURL clientusing about 500 kB of heap memory (the actual usage depends on versions,...
Fix issue when verifying cluster files
If a cluster has any non-master-candidate nodes, those don't contain allfiles (e.g. config.data). With commit aef59ae764dc (March 31st, 2011)the logic was changed and subsequently verifying a cluster with non-mcnodes would complain....
Revert "utils.log: Write error messages to stderr"
This reverts commit 34aa8b7c4bb6f5e2e788108e024c9cd70bdb3431. Writingerror messages to stderr would also include backtraces, something wetried to avoid in the past.
Fix adding nodes after commit 64c7b3831dc
Commit 64c7b3831dc changed the RPC call for verifying SSH connections.Unfortunately this case in adding nodes was missed.
LUClusterVerifyGroup: Spread SSH checks over more nodes
When verifying a group the code would always check SSH to all nodes inthe same group, as well as the first node for every other group. On bigclusters this can cause issues since many nodes will try to connect to...
Optimise cli.JobExecutor with many pending jobs
In the case we submit many pending jobs (> 100) to the masterd, theJobExecutor 'spams' the master daemon with status requests for thestatus of all the jobs, even though in the end it will only choose asingle job for polling....
utils.log: Write error messages to stderr
When “gnt-cluster copyfile” failed it would only print “Copy of file …to node … failed”. A detailed message is written using logging.error.Writing error messages to stderr can be helpful in figuring out whatwent wrong (the messages also go to the log file, but not everyone might...
ssh: Quote strings in error message
Fix handling of cluster verify hooks
The change to enforce boolean results for cluster verify group opcodemissed the HooksCallBack, which uses a very ugly 1/0logic. Furthermore, the logic is wrong, since it unconditionallyresets the verify result to true....
Redistribute the RAPI certificate
This reverts to the old behaviour in Ganeti 2.4 and before.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
RAPI: Fix wrong check on instance shutdown
Commit 7fa310f6d84 (April 1st, 2011) converted the RAPI resource forshutting down an instance to FillOpCode. Unfortunately it missed thefact that the shutdown resource gets its parameters as query arguments.
baserlib: Accept empty body in FillOpcode
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: René Nussbaumer <rn@google.com>(cherry picked from commit c6e1a3eef05674d637570c39f25a799cec7ba187)
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Fix assertion error on unclean master shutdown
Commit 66bd7445 added an assertion to ensure a finalized job has its“end_timestamp” attribute set. Unfortunately it didn't cover a case whenthe queue is recovering from an unclean master shutdown.
Fixes to errors/warnings raised by pylint 0.24
Running pylint 0.24.0 revealed 2 errors and 1 warning. Here is how Ifixed them:
DeprecationWarning fixes for pylint
In version 0.21, pylint unified all the disable-* (and enable-*)directives to disable (resp. enable). This leads to a lot ofDeprecationWarning being emitted even if one uses the recommendedversion of pylint (0.21.1, as stated in devnotes.rst)....
Merge branch 'devel-2.4' into devel-2.5
Conflicts: NEWS (trivial) configure.ac (trivial) daemons/ensure-dirs.in (deleted)
utils: Fix UnescapeAndSplit parsing bug
If a value passed to UnescapeAndSplit ended with a backslash anexception would be raised:
$ gnt-instance modify -H mem=x\\ inst1.example.com[…] e2 = slist.pop(0)IndexError: pop from empty list
Two more PEP8 fixes
cmdlib: Avoid wrapping using backslash
gnt_group: Avoid * magic using keyword arguments (the “pep8” tooldoesn't like the inline comment in this case and will complain aboutspaces around the “*” operator)
PEP8 style fixes
Identified using the “pep8” utility.
Documentation fix for importing with --src-dir option
Signed-off-by: Agata Murawska <agatamurawska@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>(cherry picked from commit b7d7876bd0e9844fab8be28bfa1fd5d563ec7412)
Conflicts:
lib/cmdlib.py (easily fixed)
Fix a parsing issue with DRBD 8.3.11 in the Linux Kernel
In the Linux kernel commit 4b0715f096 introduced a display bug into/proc/drbd which broke our regex.
The bug was first introduced into Linux 2.6.39-rc1. This bug is stillunfixed as of today.
This patch adapt the regular expression to workaround this bug for the...
watcher: Wait for child processes by default
This patch retains the behaviour of ganeti-watcher in previous Ganetiversions.
sphinx_ext: workaround epydoc warning
Similar to commit c29e35f, this works around epydoc breakage byaliasing the module. Makes 'apidoc' pass again on my machine.
Unify some file headers
Remove unnecessary commas, add empty lines where necessary to make themconsistent.
I'm working on a script to check this, but it's not yet ready.
ensure-dirs: Fix epydoc error
Signed-off-by: Agata Murawska <agatamurawska@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
ensure-dirs: Check mode and owner before changing
This avoids many calls to chmod(2) and chown(2), and thereby ctimeupdates.
Since I had to update the unittests anyway I untangled the code a bit,split it into more separate functions and added some more tests....
ensure-dirs: Refine error handling on stat(2)
The “_stat_fn” function is renamed to “_lstat_fn” to reflect itsfunction. The try/except block just wraps calling lstat(2) and nothingelse.
ensure-dirs: Change wording of some messages
ensure-dirs: Implement debug logging
There was no logging at all.
ensure-dirs: Set permissions on job files in queue
This was a regression from 2.4.
ensure-dirs: Set permissions on queue lock file
ensure-dirs: Set correct permissions on ssconf files
The files should be 0444, not 0400. This was a regression from 2.4.
Handle network interfaces without IPs
If the user specified a network interface with no IPs, he would receivean unhelpful "list index out of range" error. Fixed that.
Signed-off-by: Andrea Spadaccini <spadaccio@google.com>Signed-off-by: Guido Trotter <ultrotter@google.com>...
Fixed potential unreferenced variable usage
I noticed a path in the code that would use spice_ip_version even ifit was not initialized. This patch fixes it.
Added basic support for SPICE
Implemented the following parameters:- spice_bind- spice_ip_version
Signed-off-by: Andrea Spadaccini <spadaccio@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Fix exit code of “gnt-cluster verify”
With commit fcad7225e3fc4 LU-generated jobs are used, but theexit code must still be backwards-compatible.
Small improvements for cluster verify
- Check if BGL is actually owned- Show group name as feedback
watcher: Use locks when querying for resource information
Allow locking to be used via OpQuery
The original design for query2 specifically excluded locking, but nowit's turned out that it would be a good thing to have in watcher. Thispatch adds a new parameter to OpQuery and enables its use in LUQuery. Amissing function is added to LUGroupQuery, a comment clarified in...
opcodes: Add more result checks, add some comments
Some of these will be used by the RAPI documentation.
sphinx_ext: Allow documenting opcode results
Will be used by RAPI documentation.
ht: Allow adding comment to type descriptions
This will be used to add some more details to type descriptions, e.g. onopcode parameters or result values. The implementation is very similarto “WithDesc”.
I chose to use “[…]” after finding “/*…*/” hard to read and spot. At...
Clarify job ID-related type checks, add unittests
Instead of a rather complicated expression only “JobId” is output. JobID lists (like generated by “SubmitManyJobs”) are limited to two-itemlists. Unittests are added.
Change OpClusterVerifyConfig's result, verify results
This patch removes the list of node groups (not used anymore sincecommit fcad7225e3fc) from OpClusterVerifyConfig's result and adds resultverification to all OpClusterVerify* opcodes.
Use LU-generated jobs for verifying cluster
This patch moves the logic for verifying the various node groups in acluster into the master daemon. Job dependencies are used to ensure theconfiguration, which requires the BGL, is verified first.
With this change it will be possible to expose whole-cluster...
opcodes: Use variables for verification parameters
Just some cleanup before the 2.5 release.
mcpu: Specify actual received type on opcode issue
This helped me debug an issue with opcodes.
Use resource kind as OpQuery*'s description
This gives a hint as to what's queried. “QUERY” or“QUERY” are way better than just “QUERY”.
Added helper functions in netutils and related constants
Added the following functions to netutils:- IsValidInterface- GetInterfaceIpAddresses- _GetIpAddressesFromIpOutput
Added the following static methods to netutils.IPAddress:- GetAddressFamilyFromVersion...
Fix epydoc error in rlib2.py
I blindly assumed epydoc would use normal reST, but turns out it usesits own “epytext” in our configuration. Since the latter doesn't supportblockquotes, I just make the paragraph a literal block.
Fix typo in rlib2's docstring
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Benjamin Lipton <benlipton@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Documentation fixes and clarification
- In README, refer to “install.rst”, not “install.html”- In rapi.rst, wrap line longer than 72 characters- In rlib2.py, update and clarify description of POST vs. PUT
gnt-instance: Rename SHUTDOWN* to EXPAND*
Once upon a time these constants were only used for stopping instances,but pretty soon they became more useful. Let's rename them.
rlib2: Exclude oplog/opresult from bulk job list
These fields can get rather large. Excluding them from the big bulk listreduces the amount of data. They are still available via per-jobrequests.
rlib: Expose node group tags
Commit 1ffd26739d3 added support for tagging node groups. Also add acheck for exposed fields.
rapi: Bulk support for jobs
This was requested in issue 181.
Fixed an error in the documentation of _GetKVMVersion
Fixed an epydoc compilation error that I introduced with last commit.
Removed code duplication for calls to _GetKVMVersion
Fix epydoc breakage caused by f8638e288c7a
Changed NET_PORT_CHECK to REQ_NET_PORT_CHECK, to improve consistency
I originally made this change because I needed the OPT_NET_PORT_CHECK,and I am committing it even if I don't need anymore OPT_NET_PORT_CHECKbecause IMO it improves the consistency of the name of the wrappers....
Added check for the ip command at configure time
Also, corrected a few places where the ip command was hardcoded.
Detect globbing patterns as query arguments
Short: this patch enables the use of “gnt-instance list '*.site'”.
Detailed description: This patch changes the command line interface codeto try to deduce the kind of filter from the arguments to a “list”command. If it's a list of plain names an old-style name filter is used....
Allow fixing of split instances via relocate
Currently, the IAllocator code requests strictly that the (set of) groups ofthe nodes we're relocating from is equal to the set of groups we'rerelocating to.
This, however, makes is impossible to fix split instances, since (by...
Further cleanup after multi-evacuate removal
Commit f0edfcf6 removed the parsing of multi-evacuate result, but thecode went from:
if mode in (multi-evac, relocate): … if mode relocate: …
to:
if mode relocate: … if mode == relocate...
Fix bug in IAllocator parsing of Evacuate result
Commit 342f9172 added stricter checks for the iallocator result inevacuate mode, but it does this irrespective of the resultstatus. When the result has failed and (according to the design) thelist of nodes is empty, this code will trigger the following:...
Implement globbing operator for filters
The operators “=*” and “!*” do globbing in filters, e.g.:
$ gnt-instance list --no-headers -o name 'name =* "*.site"'inst1.site.example.com
Zero DRBD metadata before creation
The docstring of the DRBD8 class says:
… The meta device is checked for valid size and is zeroed on create.
which is not done today, hence we havehttp://code.google.com/p/ganeti/issues/detail?id=182:
node1# mkreiserfs -f /dev/xenvg/t8...
Remove iallocator's “multi-evacuate” mode
It is no longer used and has been deprecated in 2.5.
confd.querylib: Remove long-deprecated query mode
This was never used by a stable version.
Add docstring to cmdlib.TLReplaceDisks._FindFaultyDisks
watcher: Fix breakage caused by 9bb69bb52fb9
The first argument to str.split is the separator, not the maximum numberof splits.
LUGroupVerifyDisks: Use _CheckInstanceNodeGroups' result
… instead of getting the list of instances once again from theconfiguration.
cmdlib: Factorize checking node groups' instances
utils.ReadFile: Add pre-read callback
This will be used by the watcher to store the file's fstat(2). It mustbe done from the filehandle.
watcher: Write per-group instance status, merge into global one
Each per-group watcher process writes its own instance status file. Oncethat's done it tries to acquire an exclusive lock on the global file andwill proceed to read all status file, merging them based on each file's...
Merge branch 'stable-2.4'
Fixed a typo in utils/process.py
Signed-off-by: Agata Murawska <agatamurawska@google.com>Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Remove 15-second sleep from LUInstanceCreate
Remove 15 second sleep when wait_for_sync is not set. LUInstanceCreate alreadycalls _WaitForSync with oneshot=True, which already performs an internalwait-loop for disks to start syncing.
Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>...
Add a readability alias
lu.glm.list_owned becomes lu.owned_locks, which is clearer for thereader.
Also rename three variables (which were before named owned_locks) tomake clearer what they track.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: René Nussbaumer <rn@google.com>
Fix broken object references in docstrings
The module is called “objects”, not “object”.
Add “gnt-instance change-group” command
Add opcode to change instance's group
This is quite similar to evacuating a group, but the lockingis different.
Factorize checking instance's node groups
Remove WATCHER_STATEFILE constant
ganeti-watcher: Split for node groups
This patch brings a huge change to ganeti-watcher to make it aware ofnode groups. Each node group is processed in its own subprocess,reducing the impact of long-running operations.
The global watcher state file, $datadir/ganeti/watcher.data, is replaced...
Lock potential target nodes for group evacuation
All potential target nodes should be locked while calculatinga group evacuation.
Small changes in group evacuation
- Use OpPrereqError in CheckPrereq- Clarify command synopsis