Switch gnt-debug submit-job to JobExecutor
Currently gnt-debug submits jobs individually, but in 2.1 JobExecutoruses the optimized SubmitManyJobs luxi call and as such should be usedwhenever multiple jobs need to be submitted.
This patch converts gnt-debug submit-job to use it and also removes an...
Convert instance reinstall to multi instance model
This patch converts ‘gnt-instance reinstall’ from single-instance tomulti-instance model; since this is dangerours, it's required to pass“--force --force-multiple” to skip the confirmation.
Signed-off-by: Iustin Pop <iustin@google.com>...
gnt-instance batch-create: use the job executor
This small patch changed the batch create functionality to use the jobexecutor instead of single-job submits.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>...
Modify cli.JobExecutor to use SubmitManyJobs
This patch changes the generic "multiple job executor" to use the manyjobs submit model, which automatically makes all its users use the newmodel.
This makes, for example, startup/shutdown of a full cluster much more...
Add a luxi call for multi-job submit
As a workaround for the job submit timeouts that we have, this patchadds a new luxi call for multi-job submit; the advantage is that all thejobs are added in the queue and only after the workers can startprocessing them....
job queue: fix interrupted job processing
If a job with more than one opcodes is being processed, and the masterdaemon crashes between two opcodes, we have the first N opcodes markedsuccessful, and the rest marked as queued. This means that the overall...
Fix an error path in job queue worker's RunTask
In case the job fails, we try to set the job's run_op_idx to -1.However, this is a wrong variable, which wasn't detected until theslots addition. The correct variable is run_op_index.
Add slots on objects in jqueue
Adding slots to _QueuedOpCode decreases memory usage (of these objects)by roughly four times. It is a lesser change for _QueuedJobs.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
ganeti.initd: Pass $*_ARGS to programs when restarting them
Signed-off-by: Michael Hanselmann <hansmi@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Optimizie OpCode loading
This patch converts the opcode loading to a pre-built map (at importtime) instead of iteration over the globals dict at each call.
Microbenchmarks show that this should be around three times faster, andburnin still passes.
Yet another fallout from the pylint fixes
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Olivier Tharan <olive@google.com>
Merge branch 'master' into next
Fix another issue with hypervisor_name change
Update NEWS and version for 2.0.2 release
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Improve the description of node flags in man page
[iustin@google.com: slightly reworded the explanation for offline andchanged the commit message]Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Add enabled hypervisors to TestConfigRunner
This parameter is now mandatory for the cluster config to work.
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
Add a few more checks to verify config
- Check that the enabled hypervisors list is valid- Check that the master node is a valid node
Signed-off-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Make sure enabled_hypervisors list is valid
Change default stripe count to 1
In order not to change the default during a stable series, we modifyconfigure.ac to default to one stripe, in effect keeping the status quo(well, minus the LVM Attach() changes).
Use full-stripe size in LVM growth
LVM has issues when growing stripped volumes, so it's best to specifythe growth in exact multiples of the full stripe size (as precise aspossible). For this we need to do a couple of changes: - in LVM Attach(), we query additionally the VG extent size and the LV...
Remove ConfigWriter.InitConfig
It's been replaced by a simpler bootstrap.InitConfig function, whichdoes the same job, and is currently unused.
Remove SimpleConfigWriter.SetMasterNode
This function is not used.
_GenerateDiskTemplate: use base_index in the name
Currently if a disk is added later the base_index is not considered, andall the disks are called disk0. This patch fixes it.
ganeti-masterd: avoid SimpleConfigReader
SimpleStore is a lot less heavyweight than SimpleConfigReader, and tojust get the master name we can use that. This is the only usage ofSimpleConfigReader currently, but we're not going to delete the class,as new usages will come in for ganeti-confd (in 2.1). Using it there,...
cmdlib: Fix typo in LUQueryClusterInfo
This was broken by my pylint fixes patch.
RAPI: implement instance reinstall
This patch adds instance reinstall to RAPI, with two optional parameters: - ‘os', in order to change the OS on reinstall - ‘nostartup’, in order to leave the instance down after reinstall
The call will first shutdown the instance, the reinstall it, and unless...
Create a new --no-voting option for masterfailover
This allows failing over in certain corner cases, such as a 2 nodecluster with one node down. The man page is also updated to documentthis dangerous option and how to recover from this situation.
Signed-off-by: Guido Trotter <ultrotter@google.com>...
ganeti-masterd: allow non-interactive --no-voting
This will be used by ganeti-noded to start ganeti-masterd in a--no-voting masterfailover.
Fix pylint warnings
Add custom pylintrc
bootstrap: Don't leak file descriptor when generating SSL certificate
Fix problem with EAGAIN on socket connection in clients
If a user used ^Z to stop the program, poll() in socket.recv would returnEAGAIN due to SIGSTOP. This patch changes luxi.Transport.Recv to ignore EAGAIN.
Signed-off-by: Michael Hanselmann <hansmi@google.com>...
Fix some typos
Increase maximum accepted size for a DRBD meta dev
With the change to stripped LVs, the actual size of a meta device (whichis small) can be more than we expected (for non-stripped LVs). Thispatch increases from 160MB to 1GB the accepted size, and updates the...
Cleanup config data when draining nodes
Currently, when draining nodes we reset their master candidate flag, butwe don't instruct them to demote themselves. This leads to “ERROR: file'/var/lib/ganeti/config.data' should not exist on non master candidates...
Fix node readd issues
This patch fixes a few node readd issues.
Currently, the node readd consists of two opcodes: - OpSetNodeParms, which resets the offline/drained flags - OpAddNode (with readd=True), which reconfigures the node
The problem is that between these two, the configuration is inconsistent...
backend.DemoteFromMC: don't fail for missing files
If the config file is missing when the DemoteFromMC() function iscalled, it will raise a ProgrammerError. Instead of changing theutils.CreateBackup() file which is called from multiple places, for nowwe only change the DemoteFromMC() function to not call it if the file is...
Allow GetMasterCandidateStats to ignore some nodes
This patch modifies ConfigWriter.GetMasterCandidateStats to allow it toignore some nodes in the calculation, so that we can use it to predictcluster state without some nodes (which we know we will modify, and thus...
Fix error message for extra files on non MC nodes
Currently the message for extraneous files on non master candidates isconfusing, to say the least. This makes it hopefully more clear.
Fix adjustement of candidates in cluster modify
The code for adjusting the candidate pool size was done after the configupdate, and this means we triggered the save of the config file withoutfixing the candidate pool, which aborts with an error.
The patch just moves it above. The old comment was valid, but we anyway...
Add a new node list field
This patch adds a ‘role’ node list field, which shows a one-characternode status. This is a simpler way to see the node status than selectingall the flags individually.
Fix HTTP server library handling of credentials
Currently the http library only checks credentials when authenticationis required. This means that any credentials are accepted on the rootresource, for example, which makes problems hard to diagnose - the...
Fix a typo in backend.InstanceReboot docstring
The documentation for the reboot was wrong. This patch fixes it andupdates the docstring with more details.
Fix handling of 'vcpus' in instance list
Currently running “gnt-instance list -o+vcpus” fails with a cryptic message: Unhandled Ganeti error: vcpus
This is due to multiple issues: - in some corner cases cmdlib.py raises an errors.ParameterError but this is not handled by cli.py...
Fix checking for valid OS in instance create
The current check in LUCreateInstance.CheckPrereq() is wrong - it only checksif we got an OS, but not if we got a valid OS. This patch fixes it.
Show disk size in instance info
The size of the instance's disk was not shown in “gnt-instance info”.This patch adds it and formats it nicely if possible.
gnt-cluster(8) fix --backend-parameters opt name
It was mistakenly called --backend
LUQueryInstances: fix querying for nic data
Currently we support querying for "mac" "ip" or "bridge", meaning "theone of the first nic. We are not checking that there is a first nic,though, and thus could incur in errors. This patch fixes it by returning...
Specify the object type in two docstring
Update NEWS and version for 2.0.1 release
gnt-{instance,backup}(8) --nic is actually --net
Fix a typo in the man pages that used the wrong option name.
Fix a wrong function name in backend.DrbdAttachNet
Commit cf8df3f30c2dcd0ab398d835fa9f64d61578a4f7 "bdev: forward-portReAttachNet/DisconnectNet" forward-ported 1.2's bdev.DRBD8.ReAttachNet()to 2.0 while renaming it to AttachNet(), but commit6b93ec9d798ed53089a06bc0ced58ef1d8a9e4b0 "Forward-port DrbdNetReconfig"...
GNT-CLUSTER fix search-tags example
Reported in issue 59.
Enable stripped LVs
This patch enables stripped LVs, falling back to non-stripped if thestripped creation fails. If the configure-time lvm-stripecount is 1,this patch becomes a noop (with an insignificant python-level overhead,but no extra lvm calls)....
Add a lvm stripecount configure parameter
This patch adds a configure-time customizable parameter that will beused to enable stripped LVs. The default of the parameter is 3.
Add more constants for DRBD and change sync tests
This patch adds constants for the connection status, peer roles and diskstatus, and it changes the rules for when the disk is considered as“resyncing” - previously it was only for syncsource/synctarget, but...
Wait for a while in failed resyncs
This patch is an attempt at fixing some very rare occurrences of messages like: - "There are some degraded disks for this instance", or: - "Cannot resync disks on node node3.example.com: [True, 100]"
What I believe happens is that drbd has finished syncing, but not all...
Assemble DRBD using the known size
This patch changes DRBD disk attachment to force the wanted size, as opposed toletting the device auto-discover its size.
This should make the disks more resilient with regard to small differences insize (e.g. due to LVM rounding). This still works with regard to disk...
Fix two issues with exports and snapshot errors
This patch fixes two issues related to failed snapshots during exports: - first, the error messages used disk.logical_id1, which is a node name for DRBD, and it resulted in strange error messages like...
Set the size on new DRBDs in replace secondary
Currently the code in cmdlib doesn't set the device size to new DRBDdevices in replace secondary, but we need to do it otherwise it getsinitialized to None.
Change the bdev init signatures
This patch changes all the bdev.BlockDev constructors to take anadditional ‘size’ parameter, all the backend functions that call thosefunctions to pass it and also changes backend.BlocdevCreate() to not usethe size passed via the rpc call but instead directly disk.size (this is...
Merge branch 'next'
Release 2.0.0 final
This is simply a version bump, no changes from rc5.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>Reviewed-by: Michael Hanselmann <hansmi@google.com>
watcher: automatically restart noded/rapi
This patch makes the watcher automatically restart the node and rapidaemons, if they are not running (as per the PID file).
This is not an exhaustive test; a better one would be TCP connect to theport, and an even better one a simple protocol ping (e.g. get / for rapi...
watcher: handle full and drained queue cases
Currently the watcher is broken when the queue is full, thus notfulfilling its job as a queue cleaner. It also doesn't handle nicely thequeue drained status.
This patch does a few changes: - first archive jobs, and only after submit jobs; this fixes the case...
rapi: rework error handling
Currently the rapi code doesn't have any custom error handling; anyexceptions raised are simply converted into an HTTP 500 error, withoutmuch explanation.
This patch adds a couple of generic SubmitJob/GetClient functions that...
Fix backend.OSEnvironment be/hv parameters
Commit 67fc3042c20f5893abf71a0b4c445c356f9603b9 added some morevariables to be exported to OSEnvironment, but it has two bugs: - wrong variable name (env vs. result) - in OSEnvironment we don't have the automatic converstion to strings...
rapi: make tags query not use jobs
Currently the rapi tags query implementation is similar to the commandline one: it submits OpGetTags jobs. This not good, since this being anAPI it can be used a lot and can pollute the job queue with many suchtrivial jobs....
Change failover instance when instance is stopped
Currently, if the instance is stopped, we still check for enough memoryon the target node. This is a little bit too strict, since in case toomany nodes have failed and one is out of the memory, this prevents...
Export more instance information in hooks
Currently we miss in hooks the instance's hypervisor, hypervisorparameters and backend parameters. This forces hooks to query back intoganeti, which is dangerous due to possible luxi sockets exhaustion.
This patch adds these three as INSTANCE_HYPERVISOR, INSTANCE_HV_*,...
Signed-off-by: Guido Trotter <ultrotter@google.com>
watcher: write the instance status to a file
This patch modifies the watcher to keep on-disk a file with the instancestatus; this can be used from outside of ganeti to react to instancesbeing down (when the watcher cannot restart them).
Release 2.0rc5
Fix the SafeEncoding behaviour
Currently we have bad behaviour in SafeEncode: - binary strings are actually not handled correctly (ahem) - the encoding is not stable, due to use of string_escape
For this reason, we replace the use of string_escape with part of the...
Move more hypervisor strings into constants
This patch adds constants for the mouse and boot order strings; whilethere are still some issues remaining, we're trying to cleanup hardcodedstrings from the hypervisors.
Since the formatting of frozensets is currently wrong, we also add an...
watcher: try to restart the master if down
Bugs in either our code or in associated libraries can bring the master daemondown, and this (due to the 2.0 architecture) stops all work on the cluster.
Since the watcher already does periodic checks on the cluster, we modify...
IAllocator: export total disk size for instances
This patch adds for current instance a ‘disk_space_total’ key, similarto the key for the new instance in case of new allocations.
Add -H/-B startup parameters to gnt-instance
This patch modifies the start instance script, opcode and logical unitto support temporary startup parameters.
Different from 1.2, where only the kernel arguments were supportingchanges (and thus xen-pvm specific), this version supports changing all...
call_instance_start: add optional hv/be parameters
This patch modifies the rpc.call_instance_start - the master side - totake optional hv/be parameters. The noded side is unchanged andoblivious to the change.
This will allow implementation of single-user capability and such on...
Fix gnt-job list argument handling
Currently QueryJob returns "None" when a wrong job ID is passed.Handle this in gnt-job list, by printing an error for each wrong job,and still giving output for all the jobs which actually do exist.
Instance reinstall: don't mix up errors
If the remote info rpc call fails we can't assume that the instance isup.
Don't check memory at startup if instance is up
gnt-cluster modify: fix --no-lvm-storage
Currently doing a gnt-cluster-modify --no-lvm-storage is silentlyignored, as it passes a None value in vg_name, which is the same as notmodifying that parameter. Explicitely set the passed value to '', so thenon-true not-None value can be evaluate to actually remove a volume...
LUSetClusterParams: improve volume group removal
Currently LUSetClusterParams will remove the volume group if the vg_namefield passed in is not true, but not None. Setting the target volumegroup to False or the empty string, though, is a bad idea because it's...
gnt-cluster info: show more cluster parameters
Even if we cannot modify all of them, they are useful information aboutthe current cluster.
LUQueryClusterInfo: return a few more fields
Some fields can be set at cluster init, and perhaps even modifed withSetClusterParams but there's no way to know them. With this patch weexport them in the cluster info query.
KVMHypervisor: return memory and cpus as integers
Currently the KVM hypervisor returns strings for the memory and cpuvalues, while the xen hypervisor returns integers. Making this uniformconverting the values to integers in KVM as well.
LUSetInstanceParam: don't assume memory is integer
LUSetInstanceParam currently assumes that the 'memory' value of acall_instance_info result is an integer, while the rest of the codeexplicitely converts it to int(). Converting it to int works around a...
Add the new DRBD test files to the Makefile
These were forgotten in commit 01e2ce3a6e4ca68983f50dedaddd0d0fc7b77026,and caused “make distcheck” to fail.
Fix QA and documentation about no initrd case
In Ganeti 1.2, “none” was used to signify no initrd. In 2.0 we havechanged to “no_” as a prefix (i.e. “-H no_initrd_path”) and thus wedocument in the manpage this.
The QA suite is changed accordingly.
Remove an unused function
The _TransformPath function is not used anymore in 2.0, let's remove it.
Exporting the instance network_port on the RAPI
Patch for adding network_port to the instance attributes exported by theRAPI.
[iustin@google.com: slightly changed the formatting]Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Minor patch to rapi documentation
Minor patch to clarify the URL necessary for accessing the RAPI.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Small doc change in README
The version is 2.0, and we don't build PDFs by default, only HTMLfiles.
Remove some superfluous imports
This is for Python 2.6 compatibility.
Make Python interpreter selectable for test scripts
The Python interpreter used to run the test cases is hard-coded to be/usr/bin/python. If we use the first one from $PATH instead, it ismuch easier to test ganeti with other Python versions.
Pass optional arguments to the daemons
These can be set in the defaults file, default to no arguments beingpassed, and make it easy for local installation to customize the way theganeti daemons are called.