4 Documents Ganeti version |version|
8 .. highlight:: shell-example
13 This document serves as a more example-oriented guide to Ganeti; while
14 the administration guide shows a conceptual approach, here you will find
15 a step-by-step example to managing instances and the cluster.
17 Our simulated, example cluster will have three machines, named
18 ``node1``, ``node2``, ``node3``. Note that in real life machines will
19 usually have FQDNs but here we use short names for brevity. We will use
20 a secondary network for replication data, ``192.0.2.0/24``, with nodes
21 having the last octet the same as their index. The cluster name will be
22 ``example-cluster``. All nodes have the same simulated hardware
23 configuration, two disks of 750GB, 32GB of memory and 4 CPUs.
25 On this cluster, we will create up to seven instances, named
26 ``instance1`` to ``instance7``.
32 Follow the :doc:`install` document and prepare the nodes. Then it's time
33 to initialise the cluster::
35 $ gnt-cluster init -s %192.0.2.1% --enabled-hypervisors=xen-pvm %example-cluster%
38 The creation was fine. Let's check that one node we have is functioning
42 Node DTotal DFree MTotal MNode MFree Pinst Sinst
43 node1 1.3T 1.3T 32.0G 1.0G 30.5G 0 0
45 Mon Oct 26 02:08:51 2009 * Verifying global settings
46 Mon Oct 26 02:08:51 2009 * Gathering data (1 nodes)
47 Mon Oct 26 02:08:52 2009 * Verifying node status
48 Mon Oct 26 02:08:52 2009 * Verifying instance status
49 Mon Oct 26 02:08:52 2009 * Verifying orphan volumes
50 Mon Oct 26 02:08:52 2009 * Verifying remaining instances
51 Mon Oct 26 02:08:52 2009 * Verifying N+1 Memory redundancy
52 Mon Oct 26 02:08:52 2009 * Other Notes
53 Mon Oct 26 02:08:52 2009 * Hooks Results
56 Since this proceeded correctly, let's add the other two nodes::
58 $ gnt-node add -s %192.0.2.2% %node2%
60 Performing this operation is going to replace the ssh daemon keypair
61 on the target machine (node2) with the ones of the current one
62 and grant full intra-cluster ssh root access to/from it
64 Unable to verify hostkey of host xen-devi-5.fra.corp.google.com:
65 f7:…. Do you want to accept it?
67 Mon Oct 26 02:11:53 2009 Authentication to node2 via public key failed, trying password
69 Mon Oct 26 02:11:54 2009 - INFO: Node will be a master candidate
70 $ gnt-node add -s %192.0.2.3% %node3%
72 Performing this operation is going to replace the ssh daemon keypair
73 on the target machine (node3) with the ones of the current one
74 and grant full intra-cluster ssh root access to/from it
77 Mon Oct 26 02:12:43 2009 - INFO: Node will be a master candidate
79 Checking the cluster status again::
82 Node DTotal DFree MTotal MNode MFree Pinst Sinst
83 node1 1.3T 1.3T 32.0G 1.0G 30.5G 0 0
84 node2 1.3T 1.3T 32.0G 1.0G 30.5G 0 0
85 node3 1.3T 1.3T 32.0G 1.0G 30.5G 0 0
87 Mon Oct 26 02:15:14 2009 * Verifying global settings
88 Mon Oct 26 02:15:14 2009 * Gathering data (3 nodes)
89 Mon Oct 26 02:15:16 2009 * Verifying node status
90 Mon Oct 26 02:15:16 2009 * Verifying instance status
91 Mon Oct 26 02:15:16 2009 * Verifying orphan volumes
92 Mon Oct 26 02:15:16 2009 * Verifying remaining instances
93 Mon Oct 26 02:15:16 2009 * Verifying N+1 Memory redundancy
94 Mon Oct 26 02:15:16 2009 * Other Notes
95 Mon Oct 26 02:15:16 2009 * Hooks Results
98 And let's check that we have a valid OS::
108 Now that the cluster is created, it is time to check that the hardware
109 works correctly, that the hypervisor can actually create instances,
110 etc. This is done via the debootstrap tool as described in the admin
111 guide. Similar output lines are replaced with ``…`` in the below log::
113 $ /usr/lib/ganeti/tools/burnin -o debootstrap -p instance{1..5}
114 - Testing global parameters
123 * Submitted job ID(s) 157, 158, 159, 160, 161
124 waiting for job 157 for instance1
126 waiting for job 161 for instance5
127 - Replacing disks on the same nodes
129 run replace_on_secondary
130 run replace_on_primary
133 run replace_on_secondary
134 run replace_on_primary
135 * Submitted job ID(s) 162, 163, 164, 165, 166
136 waiting for job 162 for instance1
138 - Changing the secondary node
140 run replace_new_secondary node3
142 run replace_new_secondary node1
145 run replace_new_secondary node1
146 * Submitted job ID(s) 167, 168, 169, 170, 171
147 waiting for job 167 for instance1
151 increase disk/0 by 128 MB
154 increase disk/0 by 128 MB
155 * Submitted job ID(s) 173, 174, 175, 176, 177
156 waiting for job 173 for instance1
158 - Failing over instances
162 * Submitted job ID(s) 179, 180, 181, 182, 183
163 waiting for job 179 for instance1
165 - Migrating instances
167 migration and migration cleanup
170 migration and migration cleanup
171 * Submitted job ID(s) 184, 185, 186, 187, 188
172 waiting for job 184 for instance1
174 - Exporting and re-importing instances
178 import from node3 to node1, node2
184 import from node1 to node2, node3
186 * Submitted job ID(s) 196, 197, 198, 199, 200
187 waiting for job 196 for instance1
189 - Reinstalling instances
191 reinstall without passing the OS
192 reinstall specifying the OS
195 reinstall without passing the OS
196 reinstall specifying the OS
197 * Submitted job ID(s) 203, 204, 205, 206, 207
198 waiting for job 203 for instance1
200 - Rebooting instances
202 reboot with type 'hard'
203 reboot with type 'soft'
204 reboot with type 'full'
207 reboot with type 'hard'
208 reboot with type 'soft'
209 reboot with type 'full'
210 * Submitted job ID(s) 208, 209, 210, 211, 212
211 waiting for job 208 for instance1
213 - Adding and removing disks
221 * Submitted job ID(s) 213, 214, 215, 216, 217
222 waiting for job 213 for instance1
224 - Adding and removing NICs
232 * Submitted job ID(s) 218, 219, 220, 221, 222
233 waiting for job 218 for instance1
235 - Activating/deactivating disks
237 activate disks when online
238 activate disks when offline
239 deactivate disks (when offline)
242 activate disks when online
243 activate disks when offline
244 deactivate disks (when offline)
245 * Submitted job ID(s) 223, 224, 225, 226, 227
246 waiting for job 223 for instance1
248 - Stopping and starting instances
252 * Submitted job ID(s) 230, 231, 232, 233, 234
253 waiting for job 230 for instance1
259 * Submitted job ID(s) 235, 236, 237, 238, 239
260 waiting for job 235 for instance1
264 You can see in the above what operations the burn-in does. Ideally, the
265 burn-in log would proceed successfully through all the steps and end
266 cleanly, without throwing errors.
274 At this point, Ganeti and the hardware seems to be functioning
275 correctly, so we'll follow up with creating the instances manually::
277 $ gnt-instance add -t drbd -o debootstrap -s %256m% %instance3%
278 Mon Oct 26 04:06:52 2009 - INFO: Selected nodes for instance instance1 via iallocator hail: node2, node3
279 Mon Oct 26 04:06:53 2009 * creating instance disks...
280 Mon Oct 26 04:06:57 2009 adding instance instance1 to cluster config
281 Mon Oct 26 04:06:57 2009 - INFO: Waiting for instance instance1 to sync disks.
282 Mon Oct 26 04:06:57 2009 - INFO: - device disk/0: 20.00\% done, 4 estimated seconds remaining
283 Mon Oct 26 04:07:01 2009 - INFO: Instance instance1's disks are in sync.
284 Mon Oct 26 04:07:01 2009 creating os for instance instance1 on node node2
285 Mon Oct 26 04:07:01 2009 * running the instance OS create scripts...
286 Mon Oct 26 04:07:14 2009 * starting instance...
287 $ gnt-instance add -t drbd -o debootstrap -s %256m% -n %node1%:%node2% %instance2%
288 Mon Oct 26 04:11:37 2009 * creating instance disks...
289 Mon Oct 26 04:11:40 2009 adding instance instance2 to cluster config
290 Mon Oct 26 04:11:41 2009 - INFO: Waiting for instance instance2 to sync disks.
291 Mon Oct 26 04:11:41 2009 - INFO: - device disk/0: 35.40\% done, 1 estimated seconds remaining
292 Mon Oct 26 04:11:42 2009 - INFO: - device disk/0: 58.50\% done, 1 estimated seconds remaining
293 Mon Oct 26 04:11:43 2009 - INFO: - device disk/0: 86.20\% done, 0 estimated seconds remaining
294 Mon Oct 26 04:11:44 2009 - INFO: - device disk/0: 92.40\% done, 0 estimated seconds remaining
295 Mon Oct 26 04:11:44 2009 - INFO: - device disk/0: 97.00\% done, 0 estimated seconds remaining
296 Mon Oct 26 04:11:44 2009 - INFO: Instance instance2's disks are in sync.
297 Mon Oct 26 04:11:44 2009 creating os for instance instance2 on node node1
298 Mon Oct 26 04:11:44 2009 * running the instance OS create scripts...
299 Mon Oct 26 04:11:57 2009 * starting instance...
302 The above shows one instance created via an iallocator script, and one
303 being created with manual node assignment. The other three instances
304 were also created and now it's time to check them::
307 Instance Hypervisor OS Primary_node Status Memory
308 instance1 xen-pvm debootstrap node2 running 128M
309 instance2 xen-pvm debootstrap node1 running 128M
310 instance3 xen-pvm debootstrap node1 running 128M
311 instance4 xen-pvm debootstrap node3 running 128M
312 instance5 xen-pvm debootstrap node2 running 128M
317 Accessing an instance's console is easy::
319 $ gnt-instance console %instance2%
320 [ 0.000000] Bootdata ok (command line is root=/dev/sda1 ro)
321 [ 0.000000] Linux version 2.6…
322 [ 0.000000] BIOS-provided physical RAM map:
323 [ 0.000000] Xen: 0000000000000000 - 0000000008800000 (usable)
324 [13138176.018071] Built 1 zonelists. Total pages: 34816
325 [13138176.018074] Kernel command line: root=/dev/sda1 ro
326 [13138176.018694] Initializing CPU#0
328 Checking file systems...fsck 1.41.3 (12-Oct-2008)
330 Setting kernel variables (/etc/sysctl.conf)...done.
331 Mounting local filesystems...done.
332 Activating swapfile swap...done.
333 Setting up networking....
334 Configuring network interfaces...done.
335 Setting console screen modes and fonts.
336 INIT: Entering runlevel: 2
337 Starting enhanced syslogd: rsyslogd.
338 Starting periodic command scheduler: crond.
340 Debian GNU/Linux 5.0 instance2 tty1
344 At this moment you can login to the instance and, after configuring the
345 network (and doing this on all instances), we can check their
348 $ fping %instance{1..5}%
359 Removing unwanted instances is also easy::
361 $ gnt-instance remove %instance5%
362 This will remove the volumes of the instance instance5 (including
363 mirrors), thus removing all the data of the instance. Continue?
368 Recovering from hardware failures
369 ---------------------------------
371 Recovering from node failure
372 ++++++++++++++++++++++++++++
374 We are now left with four instances. Assume that at this point, node3,
375 which has one primary and one secondary instance, crashes::
377 $ gnt-node info %node3%
379 primary ip: 198.51.100.1
380 secondary ip: 192.0.2.3
381 master candidate: True
384 primary for instances:
386 secondary for instances:
391 At this point, the primary instance of that node (instance4) is down,
392 but the secondary instance (instance1) is not affected except it has
393 lost disk redundancy::
395 $ fping %instance{1,4}%
397 instance4 is unreachable
400 If we try to check the status of instance4 via the instance info
401 command, it fails because it tries to contact node3 which is down::
403 $ gnt-instance info %instance4%
404 Failure: command execution error:
405 Error checking node node3: Connection failed (113: No route to host)
408 So we need to mark node3 as being *offline*, and thus Ganeti won't talk
411 $ gnt-node modify -O yes -f %node3%
412 Mon Oct 26 04:34:12 2009 - WARNING: Not enough master candidates (desired 10, new value will be 2)
413 Mon Oct 26 04:34:15 2009 - WARNING: Communication failure to node node3: Connection failed (113: No route to host)
416 - master_candidate -> auto-demotion due to offline
419 And now we can failover the instance::
421 $ gnt-instance failover %instance4%
422 Failover will happen to image instance4. This requires a shutdown of
423 the instance. Continue?
425 Mon Oct 26 04:35:34 2009 * checking disk consistency between source and target
426 Failure: command execution error:
427 Disk disk/0 is degraded on target node, aborting failover.
428 $ gnt-instance failover --ignore-consistency %instance4%
429 Failover will happen to image instance4. This requires a shutdown of
430 the instance. Continue?
432 Mon Oct 26 04:35:47 2009 * checking disk consistency between source and target
433 Mon Oct 26 04:35:47 2009 * shutting down instance on source node
434 Mon Oct 26 04:35:47 2009 - WARNING: Could not shutdown instance instance4 on node node3. Proceeding anyway. Please make sure node node3 is down. Error details: Node is marked offline
435 Mon Oct 26 04:35:47 2009 * deactivating the instance's disks on source node
436 Mon Oct 26 04:35:47 2009 - WARNING: Could not shutdown block device disk/0 on node node3: Node is marked offline
437 Mon Oct 26 04:35:47 2009 * activating the instance's disks on target node
438 Mon Oct 26 04:35:47 2009 - WARNING: Could not prepare block device disk/0 on node node3 (is_primary=False, pass=1): Node is marked offline
439 Mon Oct 26 04:35:48 2009 * starting the instance on the target node
442 Note in our first attempt, Ganeti refused to do the failover since it
443 wasn't sure what is the status of the instance's disks. We pass the
444 ``--ignore-consistency`` flag and then we can failover::
447 Instance Hypervisor OS Primary_node Status Memory
448 instance1 xen-pvm debootstrap node2 running 128M
449 instance2 xen-pvm debootstrap node1 running 128M
450 instance3 xen-pvm debootstrap node1 running 128M
451 instance4 xen-pvm debootstrap node1 running 128M
454 But at this point, both instance1 and instance4 are without disk
457 $ gnt-instance info %instance1%
458 Instance name: instance1
459 UUID: 45173e82-d1fa-417c-8758-7d582ab7eef4
461 Creation time: 2009-10-26 04:06:57
462 Modification time: 2009-10-26 04:07:14
463 State: configured to be up, actual state is up
467 Operating system: debootstrap
468 Allocated network port: None
470 - root_path: default (/dev/sda1)
471 - kernel_args: default (ro)
472 - use_bootloader: default (False)
473 - bootloader_args: default ()
474 - bootloader_path: default ()
475 - kernel_path: default (/boot/vmlinuz-2.6-xenU)
476 - initrd_path: default ()
482 - nic/0: MAC: aa:00:00:78:da:63, IP: None, mode: bridged, link: xen-br0
484 - disk/0: drbd8, size 256M
486 nodeA: node2, minor=0
487 nodeB: node3, minor=0
489 auth key: 8e950e3cec6854b0181fbc3a6058657701f2d458
490 on primary: /dev/drbd0 (147:0) in sync, status *DEGRADED*
492 - child 0: lvm, size 256M
493 logical_id: xenvg/22459cf8-117d-4bea-a1aa-791667d07800.disk0_data
494 on primary: /dev/xenvg/22459cf8-117d-4bea-a1aa-791667d07800.disk0_data (254:0)
495 - child 1: lvm, size 128M
496 logical_id: xenvg/22459cf8-117d-4bea-a1aa-791667d07800.disk0_meta
497 on primary: /dev/xenvg/22459cf8-117d-4bea-a1aa-791667d07800.disk0_meta (254:1)
499 The output is similar for instance4. In order to recover this, we need
500 to run the node evacuate command which will change from the current
501 secondary node to a new one (in this case, we only have two working
502 nodes, so all instances will be end on nodes one and two)::
504 $ gnt-node evacuate -I hail %node3%
505 Relocate instance(s) 'instance1','instance4' from node
506 node3 using iallocator hail?
508 Mon Oct 26 05:05:39 2009 - INFO: Selected new secondary for instance 'instance1': node1
509 Mon Oct 26 05:05:40 2009 - INFO: Selected new secondary for instance 'instance4': node2
510 Mon Oct 26 05:05:40 2009 Replacing disk(s) 0 for instance1
511 Mon Oct 26 05:05:40 2009 STEP 1/6 Check device existence
512 Mon Oct 26 05:05:40 2009 - INFO: Checking disk/0 on node2
513 Mon Oct 26 05:05:40 2009 - INFO: Checking volume groups
514 Mon Oct 26 05:05:40 2009 STEP 2/6 Check peer consistency
515 Mon Oct 26 05:05:40 2009 - INFO: Checking disk/0 consistency on node node2
516 Mon Oct 26 05:05:40 2009 STEP 3/6 Allocate new storage
517 Mon Oct 26 05:05:40 2009 - INFO: Adding new local storage on node1 for disk/0
518 Mon Oct 26 05:05:41 2009 STEP 4/6 Changing drbd configuration
519 Mon Oct 26 05:05:41 2009 - INFO: activating a new drbd on node1 for disk/0
520 Mon Oct 26 05:05:42 2009 - INFO: Shutting down drbd for disk/0 on old node
521 Mon Oct 26 05:05:42 2009 - WARNING: Failed to shutdown drbd for disk/0 on oldnode: Node is marked offline
522 Mon Oct 26 05:05:42 2009 Hint: Please cleanup this device manually as soon as possible
523 Mon Oct 26 05:05:42 2009 - INFO: Detaching primary drbds from the network (=> standalone)
524 Mon Oct 26 05:05:42 2009 - INFO: Updating instance configuration
525 Mon Oct 26 05:05:45 2009 - INFO: Attaching primary drbds to new secondary (standalone => connected)
526 Mon Oct 26 05:05:46 2009 STEP 5/6 Sync devices
527 Mon Oct 26 05:05:46 2009 - INFO: Waiting for instance instance1 to sync disks.
528 Mon Oct 26 05:05:46 2009 - INFO: - device disk/0: 13.90\% done, 7 estimated seconds remaining
529 Mon Oct 26 05:05:53 2009 - INFO: Instance instance1's disks are in sync.
530 Mon Oct 26 05:05:53 2009 STEP 6/6 Removing old storage
531 Mon Oct 26 05:05:53 2009 - INFO: Remove logical volumes for 0
532 Mon Oct 26 05:05:53 2009 - WARNING: Can't remove old LV: Node is marked offline
533 Mon Oct 26 05:05:53 2009 Hint: remove unused LVs manually
534 Mon Oct 26 05:05:53 2009 - WARNING: Can't remove old LV: Node is marked offline
535 Mon Oct 26 05:05:53 2009 Hint: remove unused LVs manually
536 Mon Oct 26 05:05:53 2009 Replacing disk(s) 0 for instance4
537 Mon Oct 26 05:05:53 2009 STEP 1/6 Check device existence
538 Mon Oct 26 05:05:53 2009 - INFO: Checking disk/0 on node1
539 Mon Oct 26 05:05:53 2009 - INFO: Checking volume groups
540 Mon Oct 26 05:05:53 2009 STEP 2/6 Check peer consistency
541 Mon Oct 26 05:05:53 2009 - INFO: Checking disk/0 consistency on node node1
542 Mon Oct 26 05:05:54 2009 STEP 3/6 Allocate new storage
543 Mon Oct 26 05:05:54 2009 - INFO: Adding new local storage on node2 for disk/0
544 Mon Oct 26 05:05:54 2009 STEP 4/6 Changing drbd configuration
545 Mon Oct 26 05:05:54 2009 - INFO: activating a new drbd on node2 for disk/0
546 Mon Oct 26 05:05:55 2009 - INFO: Shutting down drbd for disk/0 on old node
547 Mon Oct 26 05:05:55 2009 - WARNING: Failed to shutdown drbd for disk/0 on oldnode: Node is marked offline
548 Mon Oct 26 05:05:55 2009 Hint: Please cleanup this device manually as soon as possible
549 Mon Oct 26 05:05:55 2009 - INFO: Detaching primary drbds from the network (=> standalone)
550 Mon Oct 26 05:05:55 2009 - INFO: Updating instance configuration
551 Mon Oct 26 05:05:55 2009 - INFO: Attaching primary drbds to new secondary (standalone => connected)
552 Mon Oct 26 05:05:56 2009 STEP 5/6 Sync devices
553 Mon Oct 26 05:05:56 2009 - INFO: Waiting for instance instance4 to sync disks.
554 Mon Oct 26 05:05:56 2009 - INFO: - device disk/0: 12.40\% done, 8 estimated seconds remaining
555 Mon Oct 26 05:06:04 2009 - INFO: Instance instance4's disks are in sync.
556 Mon Oct 26 05:06:04 2009 STEP 6/6 Removing old storage
557 Mon Oct 26 05:06:04 2009 - INFO: Remove logical volumes for 0
558 Mon Oct 26 05:06:04 2009 - WARNING: Can't remove old LV: Node is marked offline
559 Mon Oct 26 05:06:04 2009 Hint: remove unused LVs manually
560 Mon Oct 26 05:06:04 2009 - WARNING: Can't remove old LV: Node is marked offline
561 Mon Oct 26 05:06:04 2009 Hint: remove unused LVs manually
564 And now node3 is completely free of instances and can be repaired::
567 Node DTotal DFree MTotal MNode MFree Pinst Sinst
568 node1 1.3T 1.3T 32.0G 1.0G 30.2G 3 1
569 node2 1.3T 1.3T 32.0G 1.0G 30.4G 1 3
572 Re-adding a node to the cluster
573 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
575 Let's say node3 has been repaired and is now ready to be
576 reused. Re-adding it is simple::
578 $ gnt-node add --readd %node3%
579 The authenticity of host 'node3 (198.51.100.1)' can't be established.
580 RSA key fingerprint is 9f:2e:5a:2e:e0:bd:00:09:e4:5c:32:f2:27:57:7a:f4.
581 Are you sure you want to continue connecting (yes/no)? yes
582 Mon Oct 26 05:27:39 2009 - INFO: Readding a node, the offline/drained flags were reset
583 Mon Oct 26 05:27:39 2009 - INFO: Node will be a master candidate
585 And it is now working again::
588 Node DTotal DFree MTotal MNode MFree Pinst Sinst
589 node1 1.3T 1.3T 32.0G 1.0G 30.2G 3 1
590 node2 1.3T 1.3T 32.0G 1.0G 30.4G 1 3
591 node3 1.3T 1.3T 32.0G 1.0G 30.4G 0 0
593 .. note:: If Ganeti has been built with the htools
594 component enabled, you can shuffle the instances around to have a
595 better use of the nodes.
600 A disk failure is simpler than a full node failure. First, a single disk
601 failure should not cause data-loss for any redundant instance; only the
602 performance of some instances might be reduced due to more network
605 Let take the cluster status in the above listing, and check what volumes
608 $ gnt-node volumes -o phys,instance %node2%
620 You can see that all instances on node2 have logical volumes on
621 ``/dev/sdb1``. Let's simulate a disk failure on that disk::
625 $ echo offline > /sys/block/sdb/device/state
627 /dev/sdb1: read failed after 0 of 4096 at 0: Input/output error
628 /dev/sdb1: read failed after 0 of 4096 at 750153695232: Input/output error
629 /dev/sdb1: read failed after 0 of 4096 at 0: Input/output error
630 Couldn't find device with uuid '954bJA-mNL0-7ydj-sdpW-nc2C-ZrCi-zFp91c'.
631 Couldn't find all physical volumes for volume group xenvg.
632 /dev/sdb1: read failed after 0 of 4096 at 0: Input/output error
633 /dev/sdb1: read failed after 0 of 4096 at 0: Input/output error
634 Couldn't find device with uuid '954bJA-mNL0-7ydj-sdpW-nc2C-ZrCi-zFp91c'.
635 Couldn't find all physical volumes for volume group xenvg.
636 Volume group xenvg not found
639 At this point, the node is broken and if we are to examine
640 instance2 we get (simplified output shown)::
642 $ gnt-instance info %instance2%
643 Instance name: instance2
644 State: configured to be up, actual state is up
649 - disk/0: drbd8, size 256M
650 on primary: /dev/drbd0 (147:0) in sync, status ok
651 on secondary: /dev/drbd1 (147:1) in sync, status *DEGRADED* *MISSING DISK*
653 This instance has a secondary only on node2. Let's verify a primary
656 $ gnt-instance info %instance1%
657 Instance name: instance1
658 State: configured to be up, actual state is up
663 - disk/0: drbd8, size 256M
664 on primary: /dev/drbd0 (147:0) in sync, status *DEGRADED* *MISSING DISK*
665 on secondary: /dev/drbd3 (147:3) in sync, status ok
666 $ gnt-instance console %instance1%
668 Debian GNU/Linux 5.0 instance1 tty1
670 instance1 login: root
671 Last login: Tue Oct 27 01:24:09 UTC 2009 on tty1
672 instance1:~# date > test
674 instance1:~# cat test
675 Tue Oct 27 01:25:20 UTC 2009
676 instance1:~# dmesg|tail
677 [5439785.235448] NET: Registered protocol family 15
678 [5439785.235489] 802.1Q VLAN Support v1.8 Ben Greear <greearb@candelatech.com>
679 [5439785.235495] All bugs added by David S. Miller <davem@redhat.com>
680 [5439785.235517] XENBUS: Device with no driver: device/console/0
681 [5439785.236576] kjournald starting. Commit interval 5 seconds
682 [5439785.236588] EXT3-fs: mounted filesystem with ordered data mode.
683 [5439785.236625] VFS: Mounted root (ext3 filesystem) readonly.
684 [5439785.236663] Freeing unused kernel memory: 172k freed
685 [5439787.533779] EXT3 FS on sda1, internal journal
686 [5440655.065431] eth0: no IPv6 routers present
689 As you can see, the instance is running fine and doesn't see any disk
690 issues. It is now time to fix node2 and re-establish redundancy for the
693 .. note:: For Ganeti 2.0 we need to fix manually the volume group on
694 node2 by running ``vgreduce --removemissing xenvg``
698 $ gnt-node repair-storage %node2% lvm-vg %xenvg%
699 Mon Oct 26 18:14:03 2009 Repairing storage unit 'xenvg' on node2 ...
701 VG #PV #LV #SN Attr VSize VFree
702 xenvg 1 8 0 wz--n- 673.84G 673.84G
705 This has removed the 'bad' disk from the volume group, which is now left
706 with only one PV. We can now replace the disks for the involved
709 $ for i in %instance{1..4}%; do gnt-instance replace-disks -a $i; done
710 Mon Oct 26 18:15:38 2009 Replacing disk(s) 0 for instance1
711 Mon Oct 26 18:15:38 2009 STEP 1/6 Check device existence
712 Mon Oct 26 18:15:38 2009 - INFO: Checking disk/0 on node1
713 Mon Oct 26 18:15:38 2009 - INFO: Checking disk/0 on node2
714 Mon Oct 26 18:15:38 2009 - INFO: Checking volume groups
715 Mon Oct 26 18:15:38 2009 STEP 2/6 Check peer consistency
716 Mon Oct 26 18:15:38 2009 - INFO: Checking disk/0 consistency on node node1
717 Mon Oct 26 18:15:39 2009 STEP 3/6 Allocate new storage
718 Mon Oct 26 18:15:39 2009 - INFO: Adding storage on node2 for disk/0
719 Mon Oct 26 18:15:39 2009 STEP 4/6 Changing drbd configuration
720 Mon Oct 26 18:15:39 2009 - INFO: Detaching disk/0 drbd from local storage
721 Mon Oct 26 18:15:40 2009 - INFO: Renaming the old LVs on the target node
722 Mon Oct 26 18:15:40 2009 - INFO: Renaming the new LVs on the target node
723 Mon Oct 26 18:15:40 2009 - INFO: Adding new mirror component on node2
724 Mon Oct 26 18:15:41 2009 STEP 5/6 Sync devices
725 Mon Oct 26 18:15:41 2009 - INFO: Waiting for instance instance1 to sync disks.
726 Mon Oct 26 18:15:41 2009 - INFO: - device disk/0: 12.40\% done, 9 estimated seconds remaining
727 Mon Oct 26 18:15:50 2009 - INFO: Instance instance1's disks are in sync.
728 Mon Oct 26 18:15:50 2009 STEP 6/6 Removing old storage
729 Mon Oct 26 18:15:50 2009 - INFO: Remove logical volumes for disk/0
730 Mon Oct 26 18:15:52 2009 Replacing disk(s) 0 for instance2
731 Mon Oct 26 18:15:52 2009 STEP 1/6 Check device existence
733 Mon Oct 26 18:16:01 2009 STEP 6/6 Removing old storage
734 Mon Oct 26 18:16:01 2009 - INFO: Remove logical volumes for disk/0
735 Mon Oct 26 18:16:02 2009 Replacing disk(s) 0 for instance3
736 Mon Oct 26 18:16:02 2009 STEP 1/6 Check device existence
738 Mon Oct 26 18:16:09 2009 STEP 6/6 Removing old storage
739 Mon Oct 26 18:16:09 2009 - INFO: Remove logical volumes for disk/0
740 Mon Oct 26 18:16:10 2009 Replacing disk(s) 0 for instance4
741 Mon Oct 26 18:16:10 2009 STEP 1/6 Check device existence
743 Mon Oct 26 18:16:18 2009 STEP 6/6 Removing old storage
744 Mon Oct 26 18:16:18 2009 - INFO: Remove logical volumes for disk/0
747 As this point, all instances should be healthy again.
749 .. note:: Ganeti 2.0 doesn't have the ``-a`` option to replace-disks, so
750 for it you have to run the loop twice, once over primary instances
751 with argument ``-p`` and once secondary instances with argument
752 ``-s``, but otherwise the operations are similar::
754 $ gnt-instance replace-disks -p instance1
756 $ for i in %instance{2..4}%; do gnt-instance replace-disks -s $i; done
758 Common cluster problems
759 -----------------------
761 There are a number of small issues that might appear on a cluster that
762 can be solved easily as long as the issue is properly identified. For
763 this exercise we will consider the case of node3, which was broken
764 previously and re-added to the cluster without reinstallation. Running
765 cluster verify on the cluster reports::
768 Mon Oct 26 18:30:08 2009 * Verifying global settings
769 Mon Oct 26 18:30:08 2009 * Gathering data (3 nodes)
770 Mon Oct 26 18:30:10 2009 * Verifying node status
771 Mon Oct 26 18:30:10 2009 - ERROR: node node3: unallocated drbd minor 0 is in use
772 Mon Oct 26 18:30:10 2009 - ERROR: node node3: unallocated drbd minor 1 is in use
773 Mon Oct 26 18:30:10 2009 * Verifying instance status
774 Mon Oct 26 18:30:10 2009 - ERROR: instance instance4: instance should not run on node node3
775 Mon Oct 26 18:30:10 2009 * Verifying orphan volumes
776 Mon Oct 26 18:30:10 2009 - ERROR: node node3: volume 22459cf8-117d-4bea-a1aa-791667d07800.disk0_data is unknown
777 Mon Oct 26 18:30:10 2009 - ERROR: node node3: volume 1aaf4716-e57f-4101-a8d6-03af5da9dc50.disk0_data is unknown
778 Mon Oct 26 18:30:10 2009 - ERROR: node node3: volume 1aaf4716-e57f-4101-a8d6-03af5da9dc50.disk0_meta is unknown
779 Mon Oct 26 18:30:10 2009 - ERROR: node node3: volume 22459cf8-117d-4bea-a1aa-791667d07800.disk0_meta is unknown
780 Mon Oct 26 18:30:10 2009 * Verifying remaining instances
781 Mon Oct 26 18:30:10 2009 * Verifying N+1 Memory redundancy
782 Mon Oct 26 18:30:10 2009 * Other Notes
783 Mon Oct 26 18:30:10 2009 * Hooks Results
789 As you can see, *instance4* has a copy running on node3, because we
790 forced the failover when node3 failed. This case is dangerous as the
791 instance will have the same IP and MAC address, wreaking havoc on the
792 network environment and anyone who tries to use it.
794 Ganeti doesn't directly handle this case. It is recommended to logon to
797 $ xm destroy %instance4%
799 Unallocated DRBD minors
800 +++++++++++++++++++++++
802 There are still unallocated DRBD minors on node3. Again, these are not
803 handled by Ganeti directly and need to be cleaned up via DRBD commands::
807 $ drbdsetup /dev/drbd%0% down
808 $ drbdsetup /dev/drbd%1% down
814 At this point, the only remaining problem should be the so-called
815 *orphan* volumes. This can happen also in the case of an aborted
816 disk-replace, or similar situation where Ganeti was not able to recover
817 automatically. Here you need to remove them manually via LVM commands::
822 Do you really want to remove active logical volume "22459cf8-117d-4bea-a1aa-791667d07800.disk0_data"? [y/n]: %y%
823 Logical volume "22459cf8-117d-4bea-a1aa-791667d07800.disk0_data" successfully removed
824 Do you really want to remove active logical volume "22459cf8-117d-4bea-a1aa-791667d07800.disk0_meta"? [y/n]: %y%
825 Logical volume "22459cf8-117d-4bea-a1aa-791667d07800.disk0_meta" successfully removed
826 Do you really want to remove active logical volume "1aaf4716-e57f-4101-a8d6-03af5da9dc50.disk0_data"? [y/n]: %y%
827 Logical volume "1aaf4716-e57f-4101-a8d6-03af5da9dc50.disk0_data" successfully removed
828 Do you really want to remove active logical volume "1aaf4716-e57f-4101-a8d6-03af5da9dc50.disk0_meta"? [y/n]: %y%
829 Logical volume "1aaf4716-e57f-4101-a8d6-03af5da9dc50.disk0_meta" successfully removed
832 At this point cluster verify shouldn't complain anymore::
835 Mon Oct 26 18:37:51 2009 * Verifying global settings
836 Mon Oct 26 18:37:51 2009 * Gathering data (3 nodes)
837 Mon Oct 26 18:37:53 2009 * Verifying node status
838 Mon Oct 26 18:37:53 2009 * Verifying instance status
839 Mon Oct 26 18:37:53 2009 * Verifying orphan volumes
840 Mon Oct 26 18:37:53 2009 * Verifying remaining instances
841 Mon Oct 26 18:37:53 2009 * Verifying N+1 Memory redundancy
842 Mon Oct 26 18:37:53 2009 * Other Notes
843 Mon Oct 26 18:37:53 2009 * Hooks Results
849 Since redundant instances in Ganeti have a primary/secondary model, it
850 is needed to leave aside on each node enough memory so that if one of
851 its peer node fails, all the secondary instances that have that node as
852 primary can be relocated. More specifically, if instance2 has node1 as
853 primary and node2 as secondary (and node1 and node2 do not have any
854 other instances in this layout), then it means that node2 must have
855 enough free memory so that if node1 fails, we can failover instance2
856 without any other operations (for reducing the downtime window). Let's
857 increase the memory of the current instances to 4G, and add three new
858 instances, two on node2:node3 with 8GB of RAM and one on node1:node2,
859 with 12GB of RAM (numbers chosen so that we run out of memory)::
861 $ gnt-instance modify -B memory=%4G% %instance1%
862 Modified instance instance1
865 Please don't forget that these parameters take effect only at the next start of the instance.
866 $ gnt-instance modify …
868 $ gnt-instance add -t drbd -n %node2%:%node3% -s %512m% -B memory=%8G% -o %debootstrap% %instance5%
870 $ gnt-instance add -t drbd -n %node2%:%node3% -s %512m% -B memory=%8G% -o %debootstrap% %instance6%
872 $ gnt-instance add -t drbd -n %node1%:%node2% -s %512m% -B memory=%8G% -o %debootstrap% %instance7%
873 $ gnt-instance reboot --all
874 The reboot will operate on 7 instances.
875 Do you want to continue?
885 Submitted jobs 677, 678, 679, 680, 681, 682, 683
886 Waiting for job 677 for instance1...
887 Waiting for job 678 for instance2...
888 Waiting for job 679 for instance3...
889 Waiting for job 680 for instance4...
890 Waiting for job 681 for instance5...
891 Waiting for job 682 for instance6...
892 Waiting for job 683 for instance7...
895 We rebooted the instances for the memory changes to have effect. Now the
899 Node DTotal DFree MTotal MNode MFree Pinst Sinst
900 node1 1.3T 1.3T 32.0G 1.0G 6.5G 4 1
901 node2 1.3T 1.3T 32.0G 1.0G 10.5G 3 4
902 node3 1.3T 1.3T 32.0G 1.0G 30.5G 0 2
904 Mon Oct 26 18:59:36 2009 * Verifying global settings
905 Mon Oct 26 18:59:36 2009 * Gathering data (3 nodes)
906 Mon Oct 26 18:59:37 2009 * Verifying node status
907 Mon Oct 26 18:59:37 2009 * Verifying instance status
908 Mon Oct 26 18:59:37 2009 * Verifying orphan volumes
909 Mon Oct 26 18:59:37 2009 * Verifying remaining instances
910 Mon Oct 26 18:59:37 2009 * Verifying N+1 Memory redundancy
911 Mon Oct 26 18:59:37 2009 - ERROR: node node2: not enough memory to accommodate instance failovers should node node1 fail
912 Mon Oct 26 18:59:37 2009 * Other Notes
913 Mon Oct 26 18:59:37 2009 * Hooks Results
916 The cluster verify error above shows that if node1 fails, node2 will not
917 have enough memory to failover all primary instances on node1 to it. To
918 solve this, you have a number of options:
920 - try to manually move instances around (but this can become complicated
921 for any non-trivial cluster)
922 - try to reduce the minimum memory of some instances on the source node
923 of the N+1 failure (in the example above ``node1``): this will allow
924 it to start and be failed over/migrated with less than its maximum
926 - try to reduce the runtime/maximum memory of some instances on the
927 destination node of the N+1 failure (in the example above ``node2``)
928 to create additional available node memory (check the :doc:`admin`
929 guide for what Ganeti will and won't automatically do in regards to
930 instance runtime memory modification)
931 - if Ganeti has been built with the htools package enabled, you can run
932 the ``hbal`` tool which will try to compute an automated cluster
933 solution that complies with the N+1 rule
938 In case a node has problems with the network (usually the secondary
939 network, as problems with the primary network will render the node
940 unusable for ganeti commands), it will show up in cluster verify as::
943 Mon Oct 26 19:07:19 2009 * Verifying global settings
944 Mon Oct 26 19:07:19 2009 * Gathering data (3 nodes)
945 Mon Oct 26 19:07:23 2009 * Verifying node status
946 Mon Oct 26 19:07:23 2009 - ERROR: node node1: tcp communication with node 'node3': failure using the secondary interface(s)
947 Mon Oct 26 19:07:23 2009 - ERROR: node node2: tcp communication with node 'node3': failure using the secondary interface(s)
948 Mon Oct 26 19:07:23 2009 - ERROR: node node3: tcp communication with node 'node1': failure using the secondary interface(s)
949 Mon Oct 26 19:07:23 2009 - ERROR: node node3: tcp communication with node 'node2': failure using the secondary interface(s)
950 Mon Oct 26 19:07:23 2009 - ERROR: node node3: tcp communication with node 'node3': failure using the secondary interface(s)
951 Mon Oct 26 19:07:23 2009 * Verifying instance status
952 Mon Oct 26 19:07:23 2009 * Verifying orphan volumes
953 Mon Oct 26 19:07:23 2009 * Verifying remaining instances
954 Mon Oct 26 19:07:23 2009 * Verifying N+1 Memory redundancy
955 Mon Oct 26 19:07:23 2009 * Other Notes
956 Mon Oct 26 19:07:23 2009 * Hooks Results
959 This shows that both node1 and node2 have problems contacting node3 over
960 the secondary network, and node3 has problems contacting them. From this
961 output is can be deduced that since node1 and node2 can communicate
962 between themselves, node3 is the one having problems, and you need to
963 investigate its network settings/connection.
968 Since live migration can sometimes fail and leave the instance in an
969 inconsistent state, Ganeti provides a ``--cleanup`` argument to the
970 migrate command that does:
972 - check on which node the instance is actually running (has the
973 command failed before or after the actual migration?)
974 - reconfigure the DRBD disks accordingly
976 It is always safe to run this command as long as the instance has good
977 data on its primary node (i.e. not showing as degraded). If so, you can
980 $ gnt-instance migrate --cleanup %instance1%
981 Instance instance1 will be recovered from a failed migration. Note
982 that the migration procedure (including cleanup) is **experimental**
983 in this version. This might impact the instance if anything goes
986 Mon Oct 26 19:13:49 2009 Migrating instance instance1
987 Mon Oct 26 19:13:49 2009 * checking where the instance actually runs (if this hangs, the hypervisor might be in a bad state)
988 Mon Oct 26 19:13:49 2009 * instance confirmed to be running on its primary node (node2)
989 Mon Oct 26 19:13:49 2009 * switching node node1 to secondary mode
990 Mon Oct 26 19:13:50 2009 * wait until resync is done
991 Mon Oct 26 19:13:50 2009 * changing into standalone mode
992 Mon Oct 26 19:13:50 2009 * changing disks into single-master mode
993 Mon Oct 26 19:13:50 2009 * wait until resync is done
994 Mon Oct 26 19:13:51 2009 * done
997 In use disks at instance shutdown
998 +++++++++++++++++++++++++++++++++
1000 If you see something like the following when trying to shutdown or
1001 deactivate disks for an instance::
1003 $ gnt-instance shutdown %instance1%
1004 Mon Oct 26 19:16:23 2009 - WARNING: Could not shutdown block device disk/0 on node node2: drbd0: can't shutdown drbd device: /dev/drbd0: State change failed: (-12) Device is held open by someone\n
1006 It most likely means something is holding open the underlying DRBD
1007 device. This can be bad if the instance is not running, as it might mean
1008 that there was concurrent access from both the node and the instance to
1009 the disks, but not always (e.g. you could only have had the partitions
1010 activated via ``kpartx``).
1012 To troubleshoot this issue you need to follow standard Linux practices,
1013 and pay attention to the hypervisor being used:
1015 - check if (in the above example) ``/dev/drbd0`` on node2 is being
1016 mounted somewhere (``cat /proc/mounts``)
1017 - check if the device is not being used by device mapper itself:
1018 ``dmsetup ls`` and look for entries of the form ``drbd0pX``, and if so
1019 remove them with either ``kpartx -d`` or ``dmsetup remove``
1021 For Xen, check if it's not using the disks itself::
1023 $ xenstore-ls /local/domain/%0%/backend/vbd|grep -e "domain =" -e physical-device
1024 domain = "instance2"
1025 physical-device = "93:0"
1026 domain = "instance3"
1027 physical-device = "93:1"
1028 domain = "instance4"
1029 physical-device = "93:2"
1032 You can see in the above output that the node exports three disks, to
1033 three instances. The ``physical-device`` key is in major:minor format in
1034 hexadecimal, and ``0x93`` represents DRBD's major number. Thus we can
1035 see from the above that instance2 has /dev/drbd0, instance3 /dev/drbd1,
1036 and instance4 /dev/drbd2.
1038 LUXI version mismatch
1039 +++++++++++++++++++++
1041 LUXI is the protocol used for communication between clients and the
1042 master daemon. Starting in Ganeti 2.3, the peers exchange their version
1043 in each message. When they don't match, an error is raised::
1045 $ gnt-node modify -O yes %node3%
1046 Unhandled Ganeti error: LUXI version mismatch, server 2020000, request 2030000
1048 Usually this means that server and client are from different Ganeti
1049 versions or import their libraries from different, consistent paths
1050 (e.g. an older version installed in another place). You can print the
1051 import path for Ganeti's modules using the following command (note that
1052 depending on your setup you might have to use an explicit version in the
1053 Python command, e.g. ``python2.6``)::
1055 python -c 'import ganeti; print ganeti.__file__'
1057 .. vim: set textwidth=72 :