Generate a shared HMAC key at cluster init time
This key is shared on all nodes (via cmdlib._RedistributeAncillaryFiles)and will be used for HMAC authentication of confd messages.
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Fix unittests broken by commit 2bb5c9115f
File "../test/ganeti.hooks_unittest.py", line 239, in setUp self.lu = FakeLU(FakeProc(), self.op, self.context, None)File "…/ganeti/cmdlib.py", line 92, in init self.LogStep = processor.LogStepAttributeError: FakeProc instance has no attribute 'LogStep'...
cmdlib: Move code doing disk replacements into separate class
This class will be used for a new opcode to evacuate nodes.
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
cmdlib: Pass config and rpc objects directly to IAllocator
Before IAllocator would access them using “self.lu.cfg” and “self.lu.rpc”.It shouldn't know about the internals of the LU.
Fix backend import errors from GetHypervisorClass
The merge of commit 360b0dc into branch-2.1 broke import of backend,since it uses hypervisor.GetHypervisor() which returns an instance ofthe hypervisor. Some of the hypervisors create directories at init time,...
Merge branch 'next' into branch-2.1
Conflicts: lib/backend.py: non-trivial conflict but easy to solve
backend: Only build once the list of upload files
The list of upload files is built currently at every UploadFile() call.This patch moves it to a separate variable which is initialized onlyonce.
This won't make much difference but I regard it as cleanup....
Merge commit 'origin/next' into branch-2.1
Conflicts: lib/cli.py: trivial extra empty line
Fix gnt-instance reinstall
Commit 55efe6dabe48e5c37dc1ff6099e0bb8afde7a468 "Convert instancereinstall to multi instance model" actually broke instance reinstall forsingle-instance cases. This one-liner fixes it.
Signed-off-by: Iustin Pop <iustin@google.com>...
Fix a couple of epydoc warnings
It seems epydoc needs fully-qualified references, and doesn't deal withrelative ones (not even in the current module) if there are anyambiguities.
There are other epydoc warnings, in the rapi docstrings, but those areleft as-is as they're removed in 2.1....
job queue: fix loss of finalized opcode result
Currently, unclean master daemon shutdown overwrites all of a job'sopcode status and result with error/None. This is incorrect, since theany already finished opcode(s) should have their status and resultpreserved, and only not-yet-processed opcodes should be marked as...
Switch gnt-debug submit-job to JobExecutor
Currently gnt-debug submits jobs individually, but in 2.1 JobExecutoruses the optimized SubmitManyJobs luxi call and as such should be usedwhenever multiple jobs need to be submitted.
This patch converts gnt-debug submit-job to use it and also removes an...
Convert instance reinstall to multi instance model
This patch converts ‘gnt-instance reinstall’ from single-instance tomulti-instance model; since this is dangerours, it's required to pass“--force --force-multiple” to skip the confirmation.
gnt-instance batch-create: use the job executor
This small patch changed the batch create functionality to use the jobexecutor instead of single-job submits.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>...
Modify cli.JobExecutor to use SubmitManyJobs
This patch changes the generic "multiple job executor" to use the manyjobs submit model, which automatically makes all its users use the newmodel.
This makes, for example, startup/shutdown of a full cluster much more...
Add a luxi call for multi-job submit
As a workaround for the job submit timeouts that we have, this patchadds a new luxi call for multi-job submit; the advantage is that all thejobs are added in the queue and only after the workers can startprocessing them....
job queue: fix interrupted job processing
If a job with more than one opcodes is being processed, and the masterdaemon crashes between two opcodes, we have the first N opcodes markedsuccessful, and the rest marked as queued. This means that the overall...
Fix an error path in job queue worker's RunTask
In case the job fails, we try to set the job's run_op_idx to -1.However, this is a wrong variable, which wasn't detected until theslots addition. The correct variable is run_op_index.
Add slots on objects in jqueue
Adding slots to _QueuedOpCode decreases memory usage (of these objects)by roughly four times. It is a lesser change for _QueuedJobs.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
ganeti.initd: Pass $*_ARGS to programs when restarting them
Optimizie OpCode loading
This patch converts the opcode loading to a pre-built map (at importtime) instead of iteration over the globals dict at each call.
Microbenchmarks show that this should be around three times faster, andburnin still passes.
Yet another fallout from the pylint fixes
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Olivier Tharan <olive@google.com>
Merge branch 'master' into next
Fix another issue with hypervisor_name change
Update NEWS and version for 2.0.2 release
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Improve the description of node flags in man page
[iustin@google.com: slightly reworded the explanation for offline andchanged the commit message]Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Add enabled hypervisors to TestConfigRunner
This parameter is now mandatory for the cluster config to work.
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Add a few more checks to verify config
- Check that the enabled hypervisors list is valid- Check that the master node is a valid node
Make sure enabled_hypervisors list is valid
Get rid of the default_hypervisor slot
Currently we have both a default_hypervisor and an enabled_hypervisorslist. The former is only settable at cluster init time, while the lattercan be changed with cluster modify.
This becomes cumbersome in a few ways: at cluster init time for example...
design-2.1: Update OS Flavours section
This reflects a discussion we had, according to which the full"parameters" implementation is too heavy weight for 2.1, and we shouldhave a partial version for now, and decide again later.
Signed-off-by: Guido Trotter <ultrotter@google.com>...
Change default stripe count to 1
In order not to change the default during a stable series, we modifyconfigure.ac to default to one stripe, in effect keeping the status quo(well, minus the LVM Attach() changes).
cmdlib: Use dict.fromkeys instead of custom loop
Simplify InitConfig and remove SimpleConfigWriter
InitConfig currently creates the cluster config_data, then puts it intoa dict, passes it to SimpleConfigWriter to load it from a dict (whichjust reuses the dict value) and then saves it. The SimpleConfigWriter is...
InitCluster, don't use SimpleConfigWriter
InitConfig returns a SimpleConfigWriter to InitCluster, which thenpasses it on to ssh.WriteKnownHostsFile, which extracts a couple ofvalues from it. One line later the full ConfigWriter is initialized.
By initializing it one line before we can pass the full writer to...
Fix python 2.4 compatibility
I got overexcited and forgot we have to remain compatible with python2.4. With this patch we move from sha256 to sha1 for hmac authenticatedserialized messages, and we handle both newer and older python, byimporting the right module for each....
Use full-stripe size in LVM growth
LVM has issues when growing stripped volumes, so it's best to specifythe growth in exact multiples of the full stripe size (as precise aspossible). For this we need to do a couple of changes: - in LVM Attach(), we query additionally the VG extent size and the LV...
Remove ConfigWriter.InitConfig
It's been replaced by a simpler bootstrap.InitConfig function, whichdoes the same job, and is currently unused.
Conflicts:
daemons/ganeti-masterd...
Remove SimpleConfigWriter.SetMasterNode
This function is not used.
_GenerateDiskTemplate: use base_index in the name
Currently if a disk is added later the base_index is not considered, andall the disks are called disk0. This patch fixes it.
ganeti-masterd: avoid SimpleConfigReader
SimpleStore is a lot less heavyweight than SimpleConfigReader, and tojust get the master name we can use that. This is the only usage ofSimpleConfigReader currently, but we're not going to delete the class,as new usages will come in for ganeti-confd (in 2.1). Using it there,...
HMAC authenticated json messages
This patch includes HMAC authenticated json messages to the serializer.The new interface works on any json-encodable data type, and can sign itwith a private key and an optional salt. The same private key must beused upon message loading to verify the message....
rapi: Implement /2/nodes/[node_name]/role resource
This resource can be used to retrieve and set the role of a node.
rapi: Add generic “force” parameter
cmdlib: Fix typo in LUQueryClusterInfo
This was broken by my pylint fixes patch.
RAPI: implement instance reinstall
This patch adds instance reinstall to RAPI, with two optional parameters: - ‘os', in order to change the OS on reinstall - ‘nostartup’, in order to leave the instance down after reinstall
The call will first shutdown the instance, the reinstall it, and unless...
Extend call_node_start_master rpc with no_voting
When the parameter is set to True and start_daemons is also True,ganeti-masterd will be started with the new --no-voting --yes-do-itoptions.
This new option is set to True only on masterfailover, when no_voting is...
Create a new --no-voting option for masterfailover
This allows failing over in certain corner cases, such as a 2 nodecluster with one node down. The man page is also updated to documentthis dangerous option and how to recover from this situation.
ganeti-masterd: allow non-interactive --no-voting
This will be used by ganeti-noded to start ganeti-masterd in a--no-voting masterfailover.
Fix pylint warnings
Add custom pylintrc
bootstrap: Don't leak file descriptor when generating SSL certificate
Fix problem with EAGAIN on socket connection in clients
If a user used ^Z to stop the program, poll() in socket.recv would returnEAGAIN due to SIGSTOP. This patch changes luxi.Transport.Recv to ignore EAGAIN.
Signed-off-by: Michael Hanselmann <hansmi@google.com>...
Fix some typos
Increase maximum accepted size for a DRBD meta dev
With the change to stripped LVs, the actual size of a meta device (whichis small) can be more than we expected (for non-stripped LVs). Thispatch increases from 160MB to 1GB the accepted size, and updates the...
Cleanup config data when draining nodes
Currently, when draining nodes we reset their master candidate flag, butwe don't instruct them to demote themselves. This leads to “ERROR: file'/var/lib/ganeti/config.data' should not exist on non master candidates...
Fix node readd issues
This patch fixes a few node readd issues.
Currently, the node readd consists of two opcodes: - OpSetNodeParms, which resets the offline/drained flags - OpAddNode (with readd=True), which reconfigures the node
The problem is that between these two, the configuration is inconsistent...
backend.DemoteFromMC: don't fail for missing files
If the config file is missing when the DemoteFromMC() function iscalled, it will raise a ProgrammerError. Instead of changing theutils.CreateBackup() file which is called from multiple places, for nowwe only change the DemoteFromMC() function to not call it if the file is...
Allow GetMasterCandidateStats to ignore some nodes
This patch modifies ConfigWriter.GetMasterCandidateStats to allow it toignore some nodes in the calculation, so that we can use it to predictcluster state without some nodes (which we know we will modify, and thus...
Fix error message for extra files on non MC nodes
Currently the message for extraneous files on non master candidates isconfusing, to say the least. This makes it hopefully more clear.
Merge branch 'master' into branch-2.1
Rename the volume_list RPC call to lv_list
There are volume-related rpc calls. This patch renames the ‘volume_list’call to ‘lv_list’ to make more clear its purpose.
GenericMain, handle ParameterError from _ParseArgs
Before this case was not covered, and printed a stack trace.
check_ident_key_val, handle no_ and - prefixes
If an ident member of an IdentKeyVal relationship starts with no_ or -,handle it the same way we do for a key. Some unittests are added tocheck that check_ident_key_val behaves as expected.
This patch also changes ForceDictType to, for now, fail on such an...
_SplitKeyVal with no data return an empty dict
If an empty string is passed to _SplitKeyVal, we should return {},rather than {'': True}. Also test for the correct behavior.
Introduce OS api version 15
Also, since Ganeti 2.1 will be compatible with both 10 and 15, changethe OS_API_VERSION constant to be an OS_API_VERSIONS set, and update theplaces in the code that used that constat to use something else.
In particular: - in the qa for now we just create a fake version 10 OS...
_OSOndiskAPIVersion: save a loop
The api_versions list is first stripped and then converted to integer.Combining the two operations.
Fix adjustement of candidates in cluster modify
The code for adjusting the candidate pool size was done after the configupdate, and this means we triggered the save of the config file withoutfixing the candidate pool, which aborts with an error.
The patch just moves it above. The old comment was valid, but we anyway...
Add a new node list field
This patch adds a ‘role’ node list field, which shows a one-characternode status. This is a simpler way to see the node status than selectingall the flags individually.
Use ReadFile.splitlines() rather than readlines
A few places in the code open a file "manually" rather than using ourwrapper function, because they need an array with the lines. Combiningthe result of utils.ReadFile with splitlines() we get rid of theexceptions....
Rename _OSOndiskVersion to _OSOndiskAPIVersion
This makes what versions we're talking about clearer.
Convert ssconf._ReadFile to utils.ReadFile
Making ssconf._ReadFile a wrapper over utils.ReadFile
Signed-off-by: Guido Trotter <ultrotter@google.com>
backend.StartMaster: fix variable name
As per comments for patch “Convert node_start_master to new styleresult”, the ‘payload’ variable is renamed to ‘err_msgs’.
Fix HTTP server library handling of credentials
Currently the http library only checks credentials when authenticationis required. This means that any credentials are accepted on the rootresource, for example, which makes problems hard to diagnose - the...
Fix a typo in backend.InstanceReboot docstring
The documentation for the reboot was wrong. This patch fixes it andupdates the docstring with more details.
Update RAPI docs for the dry-run mode
rapi: implement dry-run mode
This patch implements dry-run mode for the operations which modify thestate of the cluster. Dry-run mode is enabled by passing a 'dry-run'query argument with positive integer value.
LUCreateInstance: the node list as return value
Currently LUCreateInstance has no result; this patch changes it so thatboth the normal result and the dry-run result is the node list of theselected instance.
Implement dry-run mode at cli level (partially)
This patch adds support for the dry-run mode for all command lineoperations, and also makes use of this for commands using theSubmitOrSend function. For the ones not using it, the flag has noeffect (future patches)....
LU execution: implement dry-run framework
This patch adds a new (global) opcode flag 'dry_run' which, when True,causes early exit from the LU workflow, returning a special value fromthe LU object (initialized in the parent LogicalUnit class, and which if...
Introduce slots deriving in opcodes.py
This simple patch adds to all opcodes extension of the base opcodeslots. This way we can add slots across all opcodes, for example'dry-run'.
Fix some small epydoc warnings
gnt-instance(8) one more batch-create update
Document the new nics list, as an alternative to the one nick which youcan create with the old mac, ip, mode, link/bridge keys.
Also specify that 'bridge' is still accepted as well.
Update gnt-instance batch-create for NIC params
This is compatible with the previous version, but also allows specifyingmore than one nic, by giving a "nics" list of dicts. The two methods(individual fields for the first nic, and list of all nics) areincompatible with each other....
Fix various pylint warnings
There were multiple issues: - copy-paste resulted in wrong indentation - wrong function name - missing spaces around assignment - overriding built-in names (type, dir) or already defines ones (errors, hypervisor)
Document iallocator proposed improvements
Fix handling of 'vcpus' in instance list
Currently running “gnt-instance list -o+vcpus” fails with a cryptic message: Unhandled Ganeti error: vcpus
This is due to multiple issues: - in some corner cases cmdlib.py raises an errors.ParameterError but this is not handled by cli.py...
RAPI: move to nic parameters
In query we ask for nic.links, rather than nic.bridgesIn create we accept both "link" and "bridge" and let the opcode dealwith it. Note that we still can create only one nic per instance.
Update manpages for NIC parameters
Update a forgot docstring for nic parameters
Properly document the expected nic format.
Fix QueryInstanceData for nic parameters
This CL updates QueryInstanceData to return NICs in the new format (mac,ip, mode, link) and fixes gnt-instance info to properly display them.
Update instance query for NIC parameters
Compatibility with the old parameters is maintained, by allowing toquery for "bridge" "nic.bridges" and "nic.bridge/N", but None isreturned in that case for routed nics.
Rename _PreBuildNICHooksList to _NICListToTuple
We're going to use this helper function for more than just hooks, sowe'll give it a more generic name.
Fix checking for valid OS in instance create
The current check in LUCreateInstance.CheckPrereq() is wrong - it only checksif we got an OS, but not if we got a valid OS. This patch fixes it.