Wait for a while in failed resyncs
This patch is an attempt at fixing some very rare occurrences of messages like: - "There are some degraded disks for this instance", or: - "Cannot resync disks on node node3.example.com: [True, 100]"
What I believe happens is that drbd has finished syncing, but not all...
Fix two issues with exports and snapshot errors
This patch fixes two issues related to failed snapshots during exports: - first, the error messages used disk.logical_id1, which is a node name for DRBD, and it resulted in strange error messages like...
rapi: rework error handling
Currently the rapi code doesn't have any custom error handling; anyexceptions raised are simply converted into an HTTP 500 error, withoutmuch explanation.
This patch adds a couple of generic SubmitJob/GetClient functions that...
Fix backend.OSEnvironment be/hv parameters
Commit 67fc3042c20f5893abf71a0b4c445c356f9603b9 added some morevariables to be exported to OSEnvironment, but it has two bugs: - wrong variable name (env vs. result) - in OSEnvironment we don't have the automatic converstion to strings...
rapi: make tags query not use jobs
Currently the rapi tags query implementation is similar to the commandline one: it submits OpGetTags jobs. This not good, since this being anAPI it can be used a lot and can pollute the job queue with many suchtrivial jobs....
Change failover instance when instance is stopped
Currently, if the instance is stopped, we still check for enough memoryon the target node. This is a little bit too strict, since in case toomany nodes have failed and one is out of the memory, this prevents...
Export more instance information in hooks
Currently we miss in hooks the instance's hypervisor, hypervisorparameters and backend parameters. This forces hooks to query back intoganeti, which is dangerous due to possible luxi sockets exhaustion.
This patch adds these three as INSTANCE_HYPERVISOR, INSTANCE_HV_*,...
Merge branch 'master' into next
Signed-off-by: Guido Trotter <ultrotter@google.com>
watcher: write the instance status to a file
This patch modifies the watcher to keep on-disk a file with the instancestatus; this can be used from outside of ganeti to react to instancesbeing down (when the watcher cannot restart them).
Signed-off-by: Iustin Pop <iustin@google.com>...
Fix the SafeEncoding behaviour
Currently we have bad behaviour in SafeEncode: - binary strings are actually not handled correctly (ahem) - the encoding is not stable, due to use of string_escape
For this reason, we replace the use of string_escape with part of the...
Move more hypervisor strings into constants
This patch adds constants for the mouse and boot order strings; whilethere are still some issues remaining, we're trying to cleanup hardcodedstrings from the hypervisors.
Since the formatting of frozensets is currently wrong, we also add an...
IAllocator: export total disk size for instances
This patch adds for current instance a ‘disk_space_total’ key, similarto the key for the new instance in case of new allocations.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Guido Trotter <ultrotter@google.com>
Add -H/-B startup parameters to gnt-instance
This patch modifies the start instance script, opcode and logical unitto support temporary startup parameters.
Different from 1.2, where only the kernel arguments were supportingchanges (and thus xen-pvm specific), this version supports changing all...
call_instance_start: add optional hv/be parameters
This patch modifies the rpc.call_instance_start - the master side - totake optional hv/be parameters. The noded side is unchanged andoblivious to the change.
This will allow implementation of single-user capability and such on...
Fix gnt-job list argument handling
Currently QueryJob returns "None" when a wrong job ID is passed.Handle this in gnt-job list, by printing an error for each wrong job,and still giving output for all the jobs which actually do exist.
Signed-off-by: Guido Trotter <ultrotter@google.com>...
Instance reinstall: don't mix up errors
If the remote info rpc call fails we can't assume that the instance isup.
Don't check memory at startup if instance is up
LUSetClusterParams: improve volume group removal
Currently LUSetClusterParams will remove the volume group if the vg_namefield passed in is not true, but not None. Setting the target volumegroup to False or the empty string, though, is a bad idea because it's...
LUQueryClusterInfo: return a few more fields
Some fields can be set at cluster init, and perhaps even modifed withSetClusterParams but there's no way to know them. With this patch weexport them in the cluster info query.
KVMHypervisor: return memory and cpus as integers
Currently the KVM hypervisor returns strings for the memory and cpuvalues, while the xen hypervisor returns integers. Making this uniformconverting the values to integers in KVM as well.
LUSetInstanceParam: don't assume memory is integer
LUSetInstanceParam currently assumes that the 'memory' value of acall_instance_info result is an integer, while the rest of the codeexplicitely converts it to int(). Converting it to int works around a...
Exporting the instance network_port on the RAPI
Patch for adding network_port to the instance attributes exported by theRAPI.
[iustin@google.com: slightly changed the formatting]Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Remove some superfluous imports
This is for Python 2.6 compatibility.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Iustin Pop <iustin@google.com>
Avoid DeprecationWarning on Python >= 2.6
Python 2.6 complains about module 'sha' being deprecated. It makesexecution of Ganeti commands a bit annoying, and when you run'ganeti-watcher' in cron jobs, you get a mail message after everyexecution.
Tests pass under under Python 2.6 and Python 2.4....
Fix compatibility with DRBD 8.3
DRBD 8.3 changes two more things compared to 8.2: - /proc/drbd format changed in multiple ways; the part we're interested is the ‘st:’ to ‘ro:‘ change (in the changelog named as “Renamed 'state' to 'role'” - “drbdsetup /dev/drbdN show” changed the ‘device’ stanza from:...
Fix compatibility with DRBD 8.2
This patch adds (and suppresses) the extra ipv4/ipv6 words before theactual address that newer DRBD versions add.
[iustin@google.com: slightly changed the patch to conform to styleguide, and changed the commit message]Signed-off-by: Iustin Pop <iustin@google.com>...
RunCmd: log command line for missing cmd case
In case of missing programs, currently utils.RunCmd doesn't show anyinformation to help debugging, only 'No such file or directory'. Thispatch adds error handling for the ENOENT case such that at least we have...
Abstract Linux node information in hv_base
Currently both hv_fake and hv_kvm implement practically identical codeto get the node information. Since future container-like hypervisorswill also need this functionality, this patch moves it into the baseclass (as a separate function) which can then be called from classes...
Fix argument checking in LUSetClusterParams
This patch fixes two issues with LUSetClusterParams and argumentchecking.
First, this LU used the wrong function name (CheckParameters instead ofCheckArguments), which means that no parameter checking was done at all;...
Small optimisation in utils.WriteFile
Currently we always try to remove the new file, even if the renamesucceeded. This patch tracks the existence of the new file and doesn'ttry to remove it if we managed to rename it.
Include node name in hypervisor validation errors
The current validation routine just says "failed", without specifyingthe node name. This is very confusing, and we should log the node nametoo.
Signed-off-by: Iustin Pop <iustin@google.com>Reviewed-by: Alexander Schreiber <als@google.com>
Fix gnt-cluster getmaster on non-master nodes
The current implementation of “gnt-cluster getmaster” doesn't work onnon-master nodes, which is a regression from 1.2. This patch implementsit (again) via ssconf.
LUDiagnoseOS: change locking and error handling
Since the “list OSes” call is exported via RAPI, this can be used prettyeasily to DOS the master daemon during long jobs.
The implementation of LUDiagnoseOS makes an RPC call to all nodes; welock nodes here in order to prevent node removal....
Fix verify-disks with broken volume groups
When a remote node returns invalid LVM data, we check it, but we don'tstop and continue with the rest of the checks (which require a validvolume group). This raises an internal error and breaks verify disks.
This seems unchanged for a long while, I don't know why it surfaced just...
Prevent errors when xenvg is broken cluster verify
When vg_name is not returned at all, we currently abort with an internalerror. This is because we don't catch KeyError.
This patch adds a custom message for this case, and also adds KeyErrorto the list of catched exceptions, just for safety....
A bunch of doc and other small fixes
This patch adds a couple of both externally and internally reportedissues: - missing SGML tags (Issue 54), report and patch by superdupont - wrong variable used in the init.d script, report and patch by Karsten Keil <karsten-keil@t-online.de>...
Fix Xen soft reboot via polling
This patch fixes the Xen soft reboot ("xm reboot") via polling for a specifictime for either changed domain ID or decreased CPU run-time.
This sould prevent the race-conditions discussed on the mailing list forreboots....
Add a new ssconf file with the cluster tags
Since the cluster tags are/should be more-or-less static, add them as anssconf key, so that querying them is possible without creating ajob/requiring the masterd to be running.
Reviewed-by: imsnah
Fix _NOQUOTE regexp
Allow expressions longer than one character to match.
Mainloop: avoid calculating timeout every time
set timeout_needs_update to False after calculating the timeout.
kvm: use the correct vnc bind address
There is a bug in kvm, when binding vnc to a specific address theconstant 'vnc_bind_address' is passed in, instead of the actualrequested address. This patch fixes it.
Reviewed-by: iustinp
Xen: Remove one hardcoded constant
s/"vnc_bind_address"/constants.HV_VNC_BIND_ADDRESS/
Handle ghost instances in temp DRBD map
Currently cluster-verify doesn't handle the (admitedly invalid) case where wehave reservation for instances that were removed in the meantime.
This patch adds a check for this and prevents code errors in cluster-verify in...
Fix error handling in replace-disks with new node
Currently the _CreateSingleBlockDev function only raises OpExecError and notBlockDeviceError. This means that we don't release the instance's temporaryminors properly, and this creates problems later if the instance is removed...
Export tags to cluster verify hooks
This patch export the cluster and node tags to the cluster verify hookscripts. The tags are exported as a space-separated list, which allowseasy parsing from the shell (e.g. “for tag in $GANETI_CLUSTER_TAGS; do...”) and therefore requires the previous “Don't allow spaces in tag...
Don't allow spaces in tag names
This patch restricts the use of spaces in tags, as this does not allownice exporting of tags to environment in hooks. One can use underscoresor dashes instead of spaces.
Reviewed-by: schreiberal
Update the iallocator documentation
This updates the iallocator documentation to 2.0, bumps up theiallocator version (and moves a constants to lib/constants.py), andfixes a style on install.rst.
Reviewed-by: ultrotter
Fix a bug in utils.EnsureDirs
This fixes a bug introduced in rev 2562 and also fixes the indentation.
Use EnsureDirs in KVM as well.
The KVM hypervisor has also code to ensure a list of directories exist.Substitute it with our new utils function.
Create runtime dir in bootstrap
Some hypervisors (KVM) need RUN_GANETI_DIR to exist even at cluster inittime. This patch creates it in InitCluster just before hv parameterchecking. Since the code to make list of directories is already repeatedtwice in the code, and this would be the third time, we abstract it into...
LUVerifyCluster: Handle the "no volume group" case
If we're only file based and out volume group is set to "None" there'sno point in asking nodes for their volume groups, logical volumes, anddrbd devices, and checking those.
Fix some epydoc style issues
99% of the epydoc return tags are "@return:", but each of the modified fileshad one "@returns:" line. We fix this for consistency.
Fix typos in utils.WriteFile's docstring
Fix mixed pvm/hvm clusters and instance listing
The current implementation of the combining of the instance lists willonly do this for instances whose all four-fields match in bothhypervisors; however, this is broken for the dynamic fields (state,times) which can change between the invocations of the two different...
Fix xen-hvm and KERNEL_ARGS
xen-hvm doesn't have KERNEL_ARGS, and I just changed blindly all oldextra_args to HV_KERNEL_ARGS. This makes xen-hvm work again.
Update some version-related constants
Since we are quite close to final RPC and hooks APIs, we update the hooks andprotocol_version constants.
Update some hooks settings
While reviewing the hooks document, I realised we are not correctlyexporting the instance properties.
This patch fixes: - export the disk and disk template in all LUs, not only (hardcoded) in the instance create - removes the instance create INSTANCE_ prefix on some non-instance...
Remove the extra_args parameter in instance start
This patch removes the extra_args parameter and instead switches theinstance to the HV_KERNEL_ARGS hypervisor option.
This is a big change, but it's a needed cleanup, this extra parameter onall RPC calls is not generic and we also need to have a persistent value...
Simplify a little the hypervisor routines
Instead of “instance.hvparams”, we use a shorter “hvp” name to make readabilitybetter.
Add definitions for the root_args hypervisor param
This patch adds a new hypervisor parameter for the hypervisors that canactually start and instance with external kernels.
Make gnt-instance info work with offline nodes
This simply makes LUQueryInstanceData return the same information as fora static query when one or both of the nodes are down.
Update version numbers to beta2
Note that the RAPI change is in a docstring (i.e. example), not in code.
Show more details for failed xen commands
This patch also logs the output of the xm commands in case of failures;some corner cases were forgotten in the last redo.
Fix some bugs in reboot
There are two issues fixed in this patch: - first, the recent RPC changes caused loss of data in hard reboot type; we weren't reporting any results from the stop/start instance calls; - second, in soft or hard reboots, we didn't initialized the disk...
Convert IOErrors for /proc/drbd into our errors
If /proc/drbd can't be opened, this raises an IOError, but all theerror-handling behaviour in backend treats only BlockDeviceErrors. Thiscreates a plain failure in cluster verify and in other RPC calls.
This patch simply converts EnvironmentErrors into BlockDeviceErrors, and...
Convert default root partition to msdos style
As discussed with 2.0 msdos partition style should be the default in theinstance OS, so we're changing the default instance params accordingly.A followup patch will update the debootstrap os.
RAPI: documentation updates
This patch fixes the version and does some update to the RAPI resourcesdocs.
RAPI: fixes related to write mode
This patch fixes many small issues related to write functions: - update documentations w.r.t. how to add users - update the instance add function for latest API - add instance delete - fix addition of tags - update some error messages...
Some small improvements to the fake hypervisor
This patch modifies the fake hypervisor to substract the memory “used”by “running” instances from the free memory, so the actual nodeinformation changes based on the running instances.
Also some style changes and fixes are added....
SetInstanceParams: export nic changes to hooks
Currently we export the old instance "as is" and any nic changes getlost, so hooks won't know of a different ip, bridge, or mac address.This patch fixes it by putting the nics in the override dict, if anychanges are done....
Remove two fixed FIXME and convert one to TODO
The cli FIXME is not something broken, but rather some better handlingfeature we'd rather have, and the two backend FIXME are done (disks havetheir read only parameter set, and the error is raised and thus reaches...
RAPI: format error messages as JSON
This patch changes the format of the HTTP error messages from text/html, whichis hard to parse from RAPI clients, to JSON which can be automatically parsed.
The error message is an object, which contains always three keys:...
Make RAPI return 502/504 errors for luxi errors
This changes the RAPI error codes for luxi errors; a timeout error isnow reported properly as 504, while any other luxi error is reported as502.
It would be good to convert even more errors into proper return codes in...
job queue: log the opcode error too
Currently we only log "Error in opcode ...", but we don't log the error itself.This is not good for debugging.
LUSetInstanceParams: Fix nic handling
CheckArguments: Use constants.VALUE_NONE rather than hardcoding the string "none" If we're adding a nic fill the nic_dict with default values Check if the mac is syntactically valid, if we have one Don't allow the mac to be 'auto' when modifying a nic...
ConfigWriter.AddInstance check instance mac
There is a race condition in CreateInstance, since the mac address isgenerated early and only added to the config (and thus really assured tobe unique) only at this point. Since it's possible that another instance...
Instance Creation: generate nics earlier
We want the real nic to be shown to the hooks and the allocators, sowe'll generate them in CheckPrereq. We also write a comment about therace condition we generate. This race condition existed even before, somoving this generation will just lenghen it a bit. A separate patch...
Handle better broken disks
While running burnin: File "/usr/lib/python2.4/site-packages/ganeti/objects.py", line 497, in str val += ", size=%dm)>" % self.sizeTypeError: int argument required
This happened while handling another error, so we lose the original...
Do not check 'None' disk IDs for duplicates
In case of 'None' logical or physical IDs, we don't need to check themfor duplicates. This case can happen for DRBD devices in case of newlyadded disks, for example.
Prevent race condition on MAC addresses
This patch adds a temporary set for MACs that have been requested butare not yet in the configuration (as part of an instance NIC). The MACsof an instance are automatically removed from this set when the instance...
Some small fixes
This patch removes the admin_ram LUQueryInstances field (is brokenanyway) and fixes the VNC address checks in the Xen Hypervisor.
Fix LUQueryInstances fields.
The query fields are now regular expressions. We need to quote the dots,otherwise invalid fields will be accepted but they will lose specialformatting in the cli scripts.
Apply the right permissions to /etc/hosts
In the current Ganeti version when modifying /etc/hosts we mistakenlygive it the permissions of the temporary file we create to define itscontent, which is by default 0600. This breaks most non-rootapplications, and thus must be corrected. This patch forces the mode to...
Fix RPC result handling in _AssembleInstanceDisks
For (status, data)-style RPC calls, the result data is in the ‘payload’attribute. This was missed in the conversion patch, with the only sideeffect that gnt-instance activate-disks didn't show a nice output...
ConfigWriter: add checks for duplicate disk IDs
This patch adds a safety check for duplicate disk logical/physical IDs,in order to prevent possible software bugs.
Switch the instance_shutdown rpc to (status, data)
This patch changes the return type from this RPC call to include statusinformation and renames the backend method to match the RPC call name.
The patch is a little bigger than the reboot one, since this call is...
Switch the instance_reboot rpc to (status, data)
This small patch changes the return type from this RPC call to includestatus information and renames the backend method to match the RPC callname.
FileStorage: abort creating over an existing file
In FileStorage there is a TODO: decide whether we should check for existing files and abort or notAfter Ganeti ate my instance data I decided. Let's abort.In general there is no reason we should overwrite existing files, and...
_GenerateDiskTemplate: correct file disk index
Currently when adding disks the base for the index is not taken intoaccount, and disk 0 is added twice.
HTS_USE_VNC, rename and remove KVM
Currently we use the HTS_USE_VNC constant only to copy the vnc passwordfile. While KVM uses vnc it currently has no password support, nor we'llbe on time making one for 2.0, so renaming the constant toHTS_COPY_VNC_PASSWORD and only putting Xen HVM in it. In the future...
Some fixes to node add and re-add
The patch changes the pre-checks in node-add and re-add: - if the node is not already in the cluster, refuse to re-add - when re-adding, reuse the secondary IP from the cluster configuration - when re-adding, reset the offline and drained flags, so that RPC...
Instance parameters: force typing
We want all the hv/be parameters to have a known type, rather than arandom mix of empty string, boolean values, and None, so we declare thetype of each variable and we enforce/convert it.
- Add some new constants for enforceable value types...
Implement modification of the drained flag
This patch adds LU and cli-level support for modification of the nodedrained flag. It is similar to the offline changes.
Prevent allocations on drained nodes
This patch adds checks for drained nodes in the logical units thatallocate or move instances around. We also update an error message (notstyle-compliant).
cluster verify: show correctly drained nodes
This patch changes slightly the output of gnt-cluster verify for drainednodes, and also adds a note with the total number of drained nodes(similar to the offline nodes note).
ConfigWriter: handle the drained node flag
This patch changes the master candidate pool computations inConfigWriter to properly handle drained nodes. They are now excludedfrom counting towards the reachable number of candidates.
The patch also adds verification of consistency for the node status....
Allow query of the drained node attribute
This patch exports the drained attribute: - LUQueryNodes accepts now the drained field - RAPI exports it for node objects - gnt-node info shows it now (along newly-added master_candidate and offline flags)...
Add a ‘drained’ attribute to node objects
This attribute will be used to prevent any allocation on the node (anyof replace-disks with new secondary this node, failover to the node,migration to the node).
The patch adds the attribute and initializes it correctly in cluster...
Some error message cleanups