Fix gnt-group --help display
Copy-paste mismatch :)
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Bernardo Dal Seno <bdalseno@google.com>(cherry picked from commit 36c70d4ddc508dd1ffdcc806d617d5100f4cb265)
Signed-off-by: Iustin Pop <iustin@google.com>...
Fix hardcoded Xen kernel path
We already have a ./configure-time variable for this, but it seems tobe actually unused.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>(cherry picked from commit 3c4afa2e93d9499ce39c1aed575dacb549d35083)...
Fix grow-disk handling of invalid units
The reason why grow-disk was doing:
$ gnt-instance grow-disk instance3 0 -64Unhandled Ganeti error: Invalid format
Is because it does it's own ParseUnit call, and doesn't transform thatinto a nicer message.
Accept both PUT and POST in noded
This is a partial cherry-pick from7530364ddbe949bc34fc26f25ba3f5d921beb021 on master:
Currently, noded requires PUT, even though the semantics of the RPCcalls do not match a PUT. We change the code accept both PUT and POST,...
Preserve bridge MTU in KVM ifup script
Closes: #201 - KVM_IFUP does not set bridge-MTU on tap devicesSigned-off-by: Andrea Spadaccini <spadaccio@google.com>Reviewed-by: Iustin Pop <iustin@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>(cherry picked from commit a1ec8695a6b453acdc2fa746a27be73c614b2e87)...
Update synopsis for “gnt-cluster repair-disk-sizes”
Mention that instances can be passed on the CLI when “--help” is used.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Andrea Spadaccini <spadaccio@google.com>(cherry picked from commit eb5ac108d146644200df98b9f90dae003dcea426)...
Reconcile Makefile.am and test data files
Sorry, forgot this in previous commit.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>(cherry picked from commit 1a1e7ab3f7cfe156152fb69961115a2c85b2a82d)
Workaround changed LVM behaviour
The vgreduce command has changed behaviour from when we initiallywrote the code (2.02.02 versus 2.02.66, 4 years delta):
- if there are LVs which will be impacted, it requires --force- otherwise refuses to proceed, but it still returns exit code 0...
Enable lvmstrap to run under Linux 3.x
Extend the kernel version check to also accept Linux 3.x as valid.
Signed-off-by: Alexander Schreiber <als@google.com>Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Iustin Pop <iustin@google.com>(cherry picked from commit 1bf72492d381aacb5c488f1a87ac7665b9ddc6c7)...
Add a default PATH variable to OS scripts env
In commit 896a03f6 I cleaned up the environment for OS scripts,however I think that was a bit too extreme - it breaks our owninstance-debootstrap hooks, because for example dpkg (called from thegrub script) requires PATH to be set....
Move hooks PATH environment variable to constants
Move the contents of the PATH environment variable for hooks toconstants, and use its value in the code and in the hooks documentation.
Signed-off-by: Andrea Spadaccini <spadaccio@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>...
Add note to the install doc about bridge MAC issues
Thanks to Faidon Liambotis for explaining this on the external IRCchannel.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Faidon Liambotis <paravoid@gmail.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Fix exception re-raising in Python Luxi clients
Commit e687ec01 (present in 2.5 since the 2.5 beta 3) did consistencyfixes across the code-base. Unfortunately this was done without enoughchecks on the actual meaning of one of the fixes, which means error...
Fix LVM volume listing with newer LVM
Per commit 0304f0e, newer LVM has extended the lv_attr field. However,that commit was incomplete as we examine this attribute in anotherplace in the code.
Thanks to user alperhome, the _LVSLINE_REGEX in lib/backend.py also...
Bump version for 2.5.0 final release
Also update NEWS file.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Merge branch 'devel-2.4' into stable-2.5
configure.ac: Fix “too many arguments” error
If GHC_PKG_QUICKCHECK contains multiple values, the test would failwith “too many arguments”.
Fix extra whitespace
Sorry, didn't catch this before…
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: René Nussbaumer <rn@google.com>Reviewed-by: Iustin Pop <iustin@google.com>(cherry picked from commit 54b010cad1ea0a536ed037bf315a04dd1c079964)...
Further fixes concerning drbd port release
Commit 3b3b1bc does not entirely fix the bug introduced in commitf396ad8. It fixes consistency of config data in permanent storage, butdoes not ensure consistency in data held in runtime memory of masterd.
The bug of duplicate ports is still triggered when LUInstanceRemove()...
Fix a bug concerning TCP port release
Commit f396ad8 returns the TCP port used by DRBD disk back to theTCP/UDP port pool using AddTcpUdpPort().
However, AddTcpUdpPort() writes the config on every invocation,using _WriteConfig(). This causes two problems:...
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: René Nussbaumer <rn@google.com>
LUOobCommand: acquire BGL in shared mode
Fixed a typo so that now LUOobCommand acquires the BLG in shared mode, asintended.
Signed-off-by: Bernardo Dal Seno <bdalseno@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
LUNodeAdd: Verify version in Prereq
There are other ways to leave the cluster in a broken state than justthe version check. However they are not very trivial to fix in 2.5. Soleave it up to 2.6 for a nicer fix.
Signed-off-by: René Nussbaumer <rn@google.com>...
Fix LV status parsing to accept newer LVM
LVM version 2.02.93 (or at least, sometimes after .88) has extend thelv_attr field with two more flag; we only care about the first digit,so let's change the "!= 6" check to "< 6".
Thanks to Robin H Johnson <robbat2@gentoo.org> for finding this issue....
Bump version for 2.5.0~rc6 release
Revert "Stop acquiring BGL for LUXI queries"
This reverts commit 0fa753bad2cf5a0cf88953347e5da3aebbf21956.
Turns out there are more queries acquiring locks than we'd like. Thispatch goes to version 2.6 and a separate patch fixes the immediateissues in LUClusterVerifyConfig....
LUClusterVerifyConfig: Share BGL, acquire all locks in shared mode
Instead of acquiring the BGL in exclusive mode (which blocks all otheroperations), we acquire all locks for groups, nodes and instances inshared mode before verifying the configuration....
KVM: don't add -nographic using spice
This fixes issue 222.
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Stop acquiring BGL for LUXI queries
Short description: This fixes an issue whereby masterd would becomeunresponsive on the LUXI socket, leading to client timeouts. While madeworse in 2.5, the underlying issue was already present in 2.4.
Longer description: Until now all LUXI queries would acquire the BGL...
Fix type error in LUInstanceChangeGroup
If a specific list of groups has been requested, then the code usedthat, without transforming it to a (frozen)set first, which resultsin:
unsupported operand type(s) for &: 'list' and 'frozenset'
Trivial fix is to do that in the 'then' branch....
Fix Makefile.am compatibility with automake 1.11.2
Automake 1.11.2 made the following change:
Unfortunately, this breaks our Makefile.am (issue 216) exactly because...
Fix type check for OpQuery.filter
Just using ht.TListOf as a type check doesn't work correctly. Thefunction must be called with the expected item type. In this specificcase TListOf was always called with the filter as a value, and theresult of that call evaluated to truth. Since filters can be quite...
Fix explanation of gnt-node evacuate --primaries-only
Furthermore, correct the --help display on evacuate.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Makefile.am: fix permissions for Python scripts on install
Some Python scripts in /usr/lib/ganeti/ were getting the wrong permissions(their 'x' bit was cleared). This patch fixes that behavior.
This patch renames the variable 'dist_tools_PYTHON' to 'python_scripts'....
devel/upload: Fix permissions for installed directories
Permissions for the directories created during install depended on theumask of the user running the script. Now umask is reset inside the scriptto remove such dependency.
Signed-off-by: Bernardo Dal Seno <bdalseno@google.com>...
Fix cluster verification issues on multi-group clusters
This patch attempts to fix a number of issues with “gnt-cluster verify”in presence of multiple node groups and DRBD8 instances split over nodesin more than one group.
- Look up instances in a group only by their primary node (otherwise...
Migrate: don't check for free memory on cleanup
Cleanup just updates the config with the correct location of theinstance, or informs of its down status, but never starts it. As suchthere's no point in checking for enough free memory. Actually this check...
Bump version to 2.5.0~rc5, update NEWS
KVM: support version reported by 1.0
This of course was working for all the rcs, but broke with 1.0 itself.
In addition: - split between running kvm --version and parsing its output - unittest parsing for various known --help outputs - updated NEWS file...
jqueue: Fix epylint errors introduced in 37d76f1e4
jqueue: Fix deadlock between job queue and dependency manager
When an opcode is about to be processed its dependencies areevaluated using “_JobDependencyManager.CheckAndRegister”. Dueto its nature that function requires a lock on the manager'sinternal structures. All of this happens while the job queue...
Add UnescapeAndSplit unittest for multi-escapes
This would have caught the bug in the first place. Argh,hand-generated test cases!
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Fix a bug in command line option parsing code
Fix bug affecting command line options of "keyval" type. Althoughescaping commands with \ is supported, it is is not applied to theinput recursively.
Signed-off-by: Nikos Skalkotos <skalkoto@grnet.gr>Signed-off-by: Iustin Pop <iustin@google.com>...
ConfigWriter: Fix epydoc error
The parameter is called “mods”, not “modes”.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Andrea Spadaccini <spadaccio@google.com>(cherry picked from commit 1730d4a1ab56ef36d082b614d3d0ab13f3e14a85)
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Andrea Spadaccini <spadaccio@google.com>
LUGroupAssignNodes: Fix node membership corruption
Note: This bug only manifests itself in Ganeti 2.5, but since theproblematic code also exists in 2.4, I decided to fix it there.
If a node was assigned to a new group using “gnt-group assign-nodes” the...
Fix pylint warning on unreachable code
Commit c50452c3186 added an exception when all instances should beevacuated off a node, but did so in a way which made pylint complainabout unreachable code.
LUNodeEvacuate: Disallow migrating all instances at once
There is a design issue in the iallocator interface which prevents usfrom doing this.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: René Nussbaumer <rn@google.com>
LUNodeEvacuate: Locking fixes
When evacuating a node, only an assertion without informative text wasused to check if the necessary node locks had been acquired. This was ontop of evaluating the list of nodes without having a node group lock, sothis was changed as well....
Fix error when removing node
ConfigWriter.GetAllInstancesInfo returns a dictionary, not a list.Removing a node would fail with “too many values to unpack”.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
htools: rework message display construction
While diagnosing some (unrelated) memory usage in htools, I'vestumbled upon some very bad behaviour in checkData: mapAccum isnon-strict, and the tuple we use also, so that results in the list oflist of messages being very bad space-wise (hundreds of MB of memory...
hbal: handle empty node groups
This patch changes an internal assert (which can only be triggeredwhen a node group is empty) into properly handling this case (andreturning empty node/instance lists).
While we could handle this in the backend (Cluster.splitNodeGroup)...
Document OpNodeMigrate's result for RAPI
- Commit b7a1c8161 changed the LU to generate jobs- Mention documented results in NEWS
Ensure unused ports return to the free port pool
Ensure ports previously allocated by calling ConfigWriter's AllocatePort() arereturned to the pool of free ports when no longer needed:
Re-wrap a paragraph to eliminate a sphinx warning
This just makes sure that the paragraph doesn't contains lines thatstart with :, which make Sphinx (1.0.7) complain.
Fail if node/group evacuation can't evacuate instances
If an instance can't be evacuated, only a message would be printed. Withthis change the operation always aborts. Newly added unittests check forthis behaviour.
Signed-off-by: Michael Hanselmann <hansmi@google.com>...
LUInstanceRename: Compare name with name
… instead of object with name.
LUClusterRepairDiskSizes: Acquire instance locks in exclusive mode
Instances are modified if their disk size doesn't match.
Update NEWS for 2.5.0~rc4
I forgot this in the previous patch.
Bump version to 2.5.0~rc4
Merge branch 'stable-2.4' into stable-2.5
Conflicts: configure.ac: Trivial
jqueue: Allow zero jobs to be submitted at once
If cmdlib.LUNodeMigrate was called for a node without primary instancesit would try to submit an empty list of jobs. This was never visible viaCLI as there we check the list of primary instances first.
Update NEWS and increase to 2.4.5
Signed-off-by: René Nussbaumer <rn@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
hail: don't select the primary as new secondary
This just adds the primary node of the instance as 'non-allocable'during the choosing of the new secondary.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>(cherry picked from commit 7073b3a86856bcd8d8a62c0b72f82deaabb8d8f1)...
hail: add an extra safety check in relocate
If we select the primary as new secondary, better to fail than returnwrong data to Ganeti.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>(cherry picked from commit f25508bef4e85032f0468e5a6f0f8930ff154e66)...
Bump version to 2.5.0~rc3
Fix queue archive creation with wrong permissions
On a master failover some of the archive dirs might have wrongpermissions in the non-root model. This is due to the nature of nodedstill running as root and the job queue is synced that way. This patchwill fix this behaviour by setting the permissions accordingly....
Ensure permission on the job queue version file
Signed-off-by: René Nussbaumer <rn@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
OpGroupVerifyDisks: Fix wrong result type declaration
If an instance had actually a missing disk, the type check would fail.
RAPI: Make node evacuation actually work
Commit e1f23243 changed te LU and opcode for node evacuation to receivea “mode” parameter (among other things). Commit de40437a changed theRAPI code accordingly, but did so for an earlier version of the firstpatch. Obviously this couldn't work, so here's the fix....
Bump version to 2.5.0~rc2
Conflicts: NEWS: Trivial
Update NEWS for unreleased 2.4.5
I need this for another 2.5 release.
RAPI: Fix resource for replacing disks
Commit d1c172deb4f inadvertently changes the“/2/instances/[instance_name]/replace-disks” resource to use bodyparameters. There were no QA tests and the issue wasn't noticed.
This patch re-introduces support for query parameters and adds a QA...
rpc: Disable HTTP client pool and reduce memory consumption
We noticed that “ganeti-masterd” can use large amounts of memory,especially on large clusters. Measurements showed a single PycURL clientusing about 500 kB of heap memory (the actual usage depends on versions,...
hail: Fix result for node evacuation
According to the iallocator documentation the “node-evacuate” call needsto return a list of jobs, not a list of lists of jobs.
Bump version to 2.5.0~rc1
Fix issue when verifying cluster files
If a cluster has any non-master-candidate nodes, those don't contain allfiles (e.g. config.data). With commit aef59ae764dc (March 31st, 2011)the logic was changed and subsequently verifying a cluster with non-mcnodes would complain....
Revert "utils.log: Write error messages to stderr"
This reverts commit 34aa8b7c4bb6f5e2e788108e024c9cd70bdb3431. Writingerror messages to stderr would also include backtraces, something wetried to avoid in the past.
Fix adding nodes after commit 64c7b3831dc
Commit 64c7b3831dc changed the RPC call for verifying SSH connections.Unfortunately this case in adding nodes was missed.
LUClusterVerifyGroup: Spread SSH checks over more nodes
When verifying a group the code would always check SSH to all nodes inthe same group, as well as the first node for every other group. On bigclusters this can cause issues since many nodes will try to connect to...
Optimise cli.JobExecutor with many pending jobs
In the case we submit many pending jobs (> 100) to the masterd, theJobExecutor 'spams' the master daemon with status requests for thestatus of all the jobs, even though in the end it will only choose asingle job for polling....
listrunner: Don't pass arguments if there are none
If no arguments were specified the “exec_args” variable was “None”,leading to the command being run as “… ./… None”.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Iustin Pop <iustin@google.com>...
ssh: Quote strings in error message
utils.log: Write error messages to stderr
When “gnt-cluster copyfile” failed it would only print “Copy of file …to node … failed”. A detailed message is written using logging.error.Writing error messages to stderr can be helpful in figuring out whatwent wrong (the messages also go to the log file, but not everyone might...
Add signal handling doc to hbal man page
Also remove a bug note, since hbal can now for a long time directlyexecute jobs.
Fix handling of cluster verify hooks
The change to enforce boolean results for cluster verify group opcodemissed the HooksCallBack, which uses a very ugly 1/0logic. Furthermore, the logic is wrong, since it unconditionallyresets the verify result to true....
Redistribute the RAPI certificate
This reverts to the old behaviour in Ganeti 2.4 and before.
QA: Add tests for instance start/stop via RAPI
This would have detected the issue fixed in the previous patch.
RAPI: Fix wrong check on instance shutdown
Commit 7fa310f6d84 (April 1st, 2011) converted the RAPI resource forshutting down an instance to FillOpCode. Unfortunately it missed thefact that the shutdown resource gets its parameters as query arguments.
baserlib: Accept empty body in FillOpcode
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: René Nussbaumer <rn@google.com>(cherry picked from commit c6e1a3eef05674d637570c39f25a799cec7ba187)
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Fix assertion error on unclean master shutdown
Commit 66bd7445 added an assertion to ensure a finalized job has its“end_timestamp” attribute set. Unfortunately it didn't cover a case whenthe queue is recovering from an unclean master shutdown.
Version bump for 2.5.0~beta3
Makefile: Use $(LN_S) instead of “ln -s”
Some platforms apparently don't support “ln -s”, otherwise Autoconfwouldn't have AC_PROG_LN_S.
Fixes to errors/warnings raised by pylint 0.24
Running pylint 0.24.0 revealed 2 errors and 1 warning. Here is how Ifixed them: