code.grnet.gr Git - ganeti-local/blob - doc/walkthrough.rst

   1 Ganeti walk-through
   2 ===================
   3
   4 Documents Ganeti version |version|
   5
   6 .. contents::
   7
   8 .. highlight:: text
   9
  10 Introduction
  11 ------------
  12
  13 This document serves as a more example-oriented guide to Ganeti; while
  14 the administration guide shows a conceptual approach, here you will find
  15 a step-by-step example to managing instances and the cluster.
  16
  17 Our simulated, example cluster will have three machines, named
  18 ``node1``, ``node2``, ``node3``. Note that in real life machines will
  19 usually FQDNs but here we use short names for brevity. We will use a
  20 secondary network for replication data, ``192.168.2.0/24``, with nodes
  21 having the last octet the same as their index. The cluster name will be
  22 ``example-cluster``. All nodes have the same simulated hardware
  23 configuration, two disks of 750GB, 32GB of memory and 4 CPUs.
  24
  25 On this cluster, we will create up to seven instances, named
  26 ``instance1`` to ``instance7``.
  27
  28
  29 Cluster creation
  30 ----------------
  31
  32 Follow the :doc:`install` document and prepare the nodes. Then it's time
  33 to initialise the cluster::
  34
  35   node1# gnt-cluster init -s 192.168.2.1 --enabled-hypervisors=xen-pvm cluster
  36   node1#
  37
  38 The creation was fine. Let's check that one node we have is functioning
  39 correctly::
  40
  41   node1# gnt-node list
  42   Node  DTotal DFree MTotal MNode MFree Pinst Sinst
  43   node1   1.3T  1.3T  32.0G  1.0G 30.5G     0     0
  44   node1# gnt-cluster verify
  45   Mon Oct 26 02:08:51 2009 * Verifying global settings
  46   Mon Oct 26 02:08:51 2009 * Gathering data (1 nodes)
  47   Mon Oct 26 02:08:52 2009 * Verifying node status
  48   Mon Oct 26 02:08:52 2009 * Verifying instance status
  49   Mon Oct 26 02:08:52 2009 * Verifying orphan volumes
  50   Mon Oct 26 02:08:52 2009 * Verifying remaining instances
  51   Mon Oct 26 02:08:52 2009 * Verifying N+1 Memory redundancy
  52   Mon Oct 26 02:08:52 2009 * Other Notes
  53   Mon Oct 26 02:08:52 2009 * Hooks Results
  54   node1#
  55
  56 Since this proceeded correctly, let's add the other two nodes::
  57
  58   node1# gnt-node add -s 192.168.2.2 node2
  59   -- WARNING --
  60   Performing this operation is going to replace the ssh daemon keypair
  61   on the target machine (node2) with the ones of the current one
  62   and grant full intra-cluster ssh root access to/from it
  63
  64   The authenticity of host 'node2 (192.168.1.2)' can't be established.
  65   RSA key fingerprint is 9f:…
  66   Are you sure you want to continue connecting (yes/no)? yes
  67   root@node2's password:
  68   Mon Oct 26 02:11:54 2009  - INFO: Node will be a master candidate
  69   node1# gnt-node add -s 192.168.2.3 node3
  70   -- WARNING --
  71   Performing this operation is going to replace the ssh daemon keypair
  72   on the target machine (node2) with the ones of the current one
  73   and grant full intra-cluster ssh root access to/from it
  74
  75   The authenticity of host 'node3 (192.168.1.3)' can't be established.
  76   RSA key fingerprint is 9f:…
  77   Are you sure you want to continue connecting (yes/no)? yes
  78   root@node2's password:
  79   Mon Oct 26 02:11:54 2009  - INFO: Node will be a master candidate
  80
  81 Checking the cluster status again::
  82
  83   node1# gnt-node list
  84   Node  DTotal DFree MTotal MNode MFree Pinst Sinst
  85   node1   1.3T  1.3T  32.0G  1.0G 30.5G     0     0
  86   node2   1.3T  1.3T  32.0G  1.0G 30.5G     0     0
  87   node3   1.3T  1.3T  32.0G  1.0G 30.5G     0     0
  88   node1# gnt-cluster verify
  89   Mon Oct 26 02:15:14 2009 * Verifying global settings
  90   Mon Oct 26 02:15:14 2009 * Gathering data (3 nodes)
  91   Mon Oct 26 02:15:16 2009 * Verifying node status
  92   Mon Oct 26 02:15:16 2009 * Verifying instance status
  93   Mon Oct 26 02:15:16 2009 * Verifying orphan volumes
  94   Mon Oct 26 02:15:16 2009 * Verifying remaining instances
  95   Mon Oct 26 02:15:16 2009 * Verifying N+1 Memory redundancy
  96   Mon Oct 26 02:15:16 2009 * Other Notes
  97   Mon Oct 26 02:15:16 2009 * Hooks Results
  98   node1#
  99
 100 And let's check that we have a valid OS::
 101
 102   node1# gnt-os list
 103   Name
 104   debootstrap
 105   node1#
 106
 107 Running a burnin
 108 ----------------
 109
 110 Now that the cluster is created, it is time to check that the hardware
 111 works correctly, that the hypervisor can actually create instances,
 112 etc. This is done via the debootstrap tool as described in the admin
 113 guide. Similar output lines are replaced with ``…`` in the below log::
 114
 115   node1# /usr/lib/ganeti/tools/burnin -o debootstrap -p instance{1..5}
 116   - Testing global parameters
 117   - Creating instances
 118     * instance instance1
 119       on node1, node2
 120     * instance instance2
 121       on node2, node3
 122     …
 123     * instance instance5
 124       on node2, node3
 125     * Submitted job ID(s) 157, 158, 159, 160, 161
 126       waiting for job 157 for instance1
 127       …
 128       waiting for job 161 for instance5
 129   - Replacing disks on the same nodes
 130     * instance instance1
 131       run replace_on_secondary
 132       run replace_on_primary
 133     …
 134     * instance instance5
 135       run replace_on_secondary
 136       run replace_on_primary
 137     * Submitted job ID(s) 162, 163, 164, 165, 166
 138       waiting for job 162 for instance1
 139       …
 140   - Changing the secondary node
 141     * instance instance1
 142       run replace_new_secondary node3
 143     * instance instance2
 144       run replace_new_secondary node1
 145     …
 146     * instance instance5
 147       run replace_new_secondary node1
 148     * Submitted job ID(s) 167, 168, 169, 170, 171
 149       waiting for job 167 for instance1
 150       …
 151   - Growing disks
 152     * instance instance1
 153       increase disk/0 by 128 MB
 154     …
 155     * instance instance5
 156       increase disk/0 by 128 MB
 157     * Submitted job ID(s) 173, 174, 175, 176, 177
 158       waiting for job 173 for instance1
 159       …
 160   - Failing over instances
 161     * instance instance1
 162     …
 163     * instance instance5
 164     * Submitted job ID(s) 179, 180, 181, 182, 183
 165       waiting for job 179 for instance1
 166       …
 167   - Migrating instances
 168     * instance instance1
 169       migration and migration cleanup
 170     …
 171     * instance instance5
 172       migration and migration cleanup
 173     * Submitted job ID(s) 184, 185, 186, 187, 188
 174       waiting for job 184 for instance1
 175       …
 176   - Exporting and re-importing instances
 177     * instance instance1
 178       export to node node3
 179       remove instance
 180       import from node3 to node1, node2
 181       remove export
 182     …
 183     * instance instance5
 184       export to node node1
 185       remove instance
 186       import from node1 to node2, node3
 187       remove export
 188     * Submitted job ID(s) 196, 197, 198, 199, 200
 189       waiting for job 196 for instance1
 190       …
 191   - Reinstalling instances
 192     * instance instance1
 193       reinstall without passing the OS
 194       reinstall specifying the OS
 195     …
 196     * instance instance5
 197       reinstall without passing the OS
 198       reinstall specifying the OS
 199     * Submitted job ID(s) 203, 204, 205, 206, 207
 200       waiting for job 203 for instance1
 201       …
 202   - Rebooting instances
 203     * instance instance1
 204       reboot with type 'hard'
 205       reboot with type 'soft'
 206       reboot with type 'full'
 207     …
 208     * instance instance5
 209       reboot with type 'hard'
 210       reboot with type 'soft'
 211       reboot with type 'full'
 212     * Submitted job ID(s) 208, 209, 210, 211, 212
 213       waiting for job 208 for instance1
 214     …
 215   - Adding and removing disks
 216     * instance instance1
 217       adding a disk
 218       removing last disk
 219     …
 220     * instance instance5
 221       adding a disk
 222       removing last disk
 223     * Submitted job ID(s) 213, 214, 215, 216, 217
 224       waiting for job 213 for instance1
 225       …
 226   - Adding and removing NICs
 227     * instance instance1
 228       adding a NIC
 229       removing last NIC
 230     …
 231     * instance instance5
 232       adding a NIC
 233       removing last NIC
 234     * Submitted job ID(s) 218, 219, 220, 221, 222
 235       waiting for job 218 for instance1
 236       …
 237   - Activating/deactivating disks
 238     * instance instance1
 239       activate disks when online
 240       activate disks when offline
 241       deactivate disks (when offline)
 242     …
 243     * instance instance5
 244       activate disks when online
 245       activate disks when offline
 246       deactivate disks (when offline)
 247     * Submitted job ID(s) 223, 224, 225, 226, 227
 248       waiting for job 223 for instance1
 249       …
 250   - Stopping and starting instances
 251     * instance instance1
 252     …
 253     * instance instance5
 254     * Submitted job ID(s) 230, 231, 232, 233, 234
 255       waiting for job 230 for instance1
 256       …
 257   - Removing instances
 258     * instance instance1
 259     …
 260     * instance instance5
 261     * Submitted job ID(s) 235, 236, 237, 238, 239
 262       waiting for job 235 for instance1
 263       …
 264   node1#
 265
 266 You can see in the above what operations the burnin does. Ideally, the
 267 burnin log would proceed successfully through all the steps and end
 268 cleanly, without throwing errors.
 269
 270 Instance operations
 271 -------------------
 272
 273 Creation
 274 ++++++++
 275
 276 At this point, Ganeti and the hardware seems to be functioning
 277 correctly, so we'll follow up with creating the instances manually::
 278
 279   node1# gnt-instance add -t drbd -o debootstrap -s 256m -n node1:node2 instance3
 280   Mon Oct 26 04:06:52 2009  - INFO: Selected nodes for instance instance1 via iallocator hail: node2, node3
 281   Mon Oct 26 04:06:53 2009 * creating instance disks...
 282   Mon Oct 26 04:06:57 2009 adding instance instance1 to cluster config
 283   Mon Oct 26 04:06:57 2009  - INFO: Waiting for instance instance1 to sync disks.
 284   Mon Oct 26 04:06:57 2009  - INFO: - device disk/0: 20.00% done, 4 estimated seconds remaining
 285   Mon Oct 26 04:07:01 2009  - INFO: Instance instance1's disks are in sync.
 286   Mon Oct 26 04:07:01 2009 creating os for instance instance1 on node node2
 287   Mon Oct 26 04:07:01 2009 * running the instance OS create scripts...
 288   Mon Oct 26 04:07:14 2009 * starting instance...
 289   node1# gnt-instance add -t drbd -o debootstrap -s 256m -n node1:node2 instanc<drbd -o debootstrap -s 256m -n node1:node2 instance2
 290   Mon Oct 26 04:11:37 2009 * creating instance disks...
 291   Mon Oct 26 04:11:40 2009 adding instance instance2 to cluster config
 292   Mon Oct 26 04:11:41 2009  - INFO: Waiting for instance instance2 to sync disks.
 293   Mon Oct 26 04:11:41 2009  - INFO: - device disk/0: 35.40% done, 1 estimated seconds remaining
 294   Mon Oct 26 04:11:42 2009  - INFO: - device disk/0: 58.50% done, 1 estimated seconds remaining
 295   Mon Oct 26 04:11:43 2009  - INFO: - device disk/0: 86.20% done, 0 estimated seconds remaining
 296   Mon Oct 26 04:11:44 2009  - INFO: - device disk/0: 92.40% done, 0 estimated seconds remaining
 297   Mon Oct 26 04:11:44 2009  - INFO: - device disk/0: 97.00% done, 0 estimated seconds remaining
 298   Mon Oct 26 04:11:44 2009  - INFO: Instance instance2's disks are in sync.
 299   Mon Oct 26 04:11:44 2009 creating os for instance instance2 on node node1
 300   Mon Oct 26 04:11:44 2009 * running the instance OS create scripts...
 301   Mon Oct 26 04:11:57 2009 * starting instance...
 302   node1#
 303
 304 The above shows one instance created via an iallocator script, and one
 305 being created with manual node assignment. The other three instances
 306 were also created and now it's time to check them::
 307
 308   node1# gnt-instance list
 309   Instance  Hypervisor OS          Primary_node Status  Memory
 310   instance1 xen-pvm    debootstrap node2        running   128M
 311   instance2 xen-pvm    debootstrap node1        running   128M
 312   instance3 xen-pvm    debootstrap node1        running   128M
 313   instance4 xen-pvm    debootstrap node3        running   128M
 314   instance5 xen-pvm    debootstrap node2        running   128M
 315
 316 Accessing instances
 317 +++++++++++++++++++
 318
 319 Accessing an instance's console is easy::
 320
 321   node1# gnt-instance console instance2
 322   [    0.000000] Bootdata ok (command line is root=/dev/sda1 ro)
 323   [    0.000000] Linux version 2.6…
 324   [    0.000000] BIOS-provided physical RAM map:
 325   [    0.000000]  Xen: 0000000000000000 - 0000000008800000 (usable)
 326   [13138176.018071] Built 1 zonelists.  Total pages: 34816
 327   [13138176.018074] Kernel command line: root=/dev/sda1 ro
 328   [13138176.018694] Initializing CPU#0
 329   …
 330   Checking file systems...fsck 1.41.3 (12-Oct-2008)
 331   done.
 332   Setting kernel variables (/etc/sysctl.conf)...done.
 333   Mounting local filesystems...done.
 334   Activating swapfile swap...done.
 335   Setting up networking....
 336   Configuring network interfaces...done.
 337   Setting console screen modes and fonts.
 338   INIT: Entering runlevel: 2
 339   Starting enhanced syslogd: rsyslogd.
 340   Starting periodic command scheduler: crond.
 341
 342   Debian GNU/Linux 5.0 instance2 tty1
 343
 344   instance2 login:
 345
 346 At this moment you can login to the instance and, after configuring the
 347 network (and doing this on all instances), we can check their
 348 connectivity::
 349
 350   node1# fping instance{1..5}
 351   instance1 is alive
 352   instance2 is alive
 353   instance3 is alive
 354   instance4 is alive
 355   instance5 is alive
 356   node1#
 357
 358 Removal
 359 +++++++
 360
 361 Removing unwanted instances is also easy::
 362
 363   node1# gnt-instance remove instance5
 364   This will remove the volumes of the instance instance5 (including
 365   mirrors), thus removing all the data of the instance. Continue?
 366   y/[n]/?: y
 367   node1#
 368
 369
 370 Recovering from hardware failures
 371 ---------------------------------
 372
 373 Recovering from node failure
 374 ++++++++++++++++++++++++++++
 375
 376 We are now left with four instances. Assume that at this point, node3,
 377 which has one primary and one secondary instance, crashes::
 378
 379   node1# gnt-node info node3
 380   Node name: node3
 381     primary ip: 172.24.227.1
 382     secondary ip: 192.168.2.3
 383     master candidate: True
 384     drained: False
 385     offline: False
 386     primary for instances:
 387       - instance4
 388     secondary for instances:
 389       - instance1
 390   node1# fping node3
 391   node3 is unreachable
 392
 393 At this point, the primary instance of that node (instance4) is down,
 394 but the secondary instance (instance1) is not affected except it has
 395 lost disk redundancy::
 396
 397   node1# fping instance{1,4}
 398   instance1 is alive
 399   instance4 is unreachable
 400   node1#
 401
 402 If we try to check the status of instance4 via the instance info
 403 command, it fails because it tries to contact node3 which is down::
 404
 405   node1# gnt-instance info instance4
 406   Failure: command execution error:
 407   Error checking node node3: Connection failed (113: No route to host)
 408   node1#
 409
 410 So we need to mark node3 as being *offline*, and thus Ganeti won't talk
 411 to it anymore::
 412
 413   node1# gnt-node modify -O yes -f node3
 414   Mon Oct 26 04:34:12 2009  - WARNING: Not enough master candidates (desired 10, new value will be 2)
 415   Mon Oct 26 04:34:15 2009  - WARNING: Communication failure to node node3: Connection failed (113: No route to host)
 416   Modified node node3
 417    - offline -> True
 418    - master_candidate -> auto-demotion due to offline
 419   node1#
 420
 421 And now we can failover the instance::
 422
 423   node1# gnt-instance failover --ignore-consistency instance4
 424   Failover will happen to image instance4. This requires a shutdown of
 425   the instance. Continue?
 426   y/[n]/?: y
 427   Mon Oct 26 04:35:34 2009 * checking disk consistency between source and target
 428   Failure: command execution error:
 429   Disk disk/0 is degraded on target node, aborting failover.
 430   node1# gnt-instance failover --ignore-consistency instance4
 431   Failover will happen to image instance4. This requires a shutdown of
 432   the instance. Continue?
 433   y/[n]/?: y
 434   Mon Oct 26 04:35:47 2009 * checking disk consistency between source and target
 435   Mon Oct 26 04:35:47 2009 * shutting down instance on source node
 436   Mon Oct 26 04:35:47 2009  - WARNING: Could not shutdown instance instance4 on node node3. Proceeding anyway. Please make sure node node3 is down. Error details: Node is marked offline
 437   Mon Oct 26 04:35:47 2009 * deactivating the instance's disks on source node
 438   Mon Oct 26 04:35:47 2009  - WARNING: Could not shutdown block device disk/0 on node node3: Node is marked offline
 439   Mon Oct 26 04:35:47 2009 * activating the instance's disks on target node
 440   Mon Oct 26 04:35:47 2009  - WARNING: Could not prepare block device disk/0 on node node3 (is_primary=False, pass=1): Node is marked offline
 441   Mon Oct 26 04:35:48 2009 * starting the instance on the target node
 442   node1#
 443
 444 Note in our first attempt, Ganeti refused to do the failover since it
 445 wasn't sure what is the status of the instance's disks. We pass the
 446 ``--ignore-consistency`` flag and then we can failover::
 447
 448   node1# gnt-instance list
 449   Instance  Hypervisor OS          Primary_node Status  Memory
 450   instance1 xen-pvm    debootstrap node2        running   128M
 451   instance2 xen-pvm    debootstrap node1        running   128M
 452   instance3 xen-pvm    debootstrap node1        running   128M
 453   instance4 xen-pvm    debootstrap node1        running   128M
 454   node1#
 455
 456 But at this point, both instance1 and instance4 are without disk
 457 redundancy::
 458
 459   node1# gnt-instance info instance1
 460   Instance name: instance1
 461   UUID: 45173e82-d1fa-417c-8758-7d582ab7eef4
 462   Serial number: 2
 463   Creation time: 2009-10-26 04:06:57
 464   Modification time: 2009-10-26 04:07:14
 465   State: configured to be up, actual state is up
 466     Nodes:
 467       - primary: node2
 468       - secondaries: node3
 469     Operating system: debootstrap
 470     Allocated network port: None
 471     Hypervisor: xen-pvm
 472       - root_path: default (/dev/sda1)
 473       - kernel_args: default (ro)
 474       - use_bootloader: default (False)
 475       - bootloader_args: default ()
 476       - bootloader_path: default ()
 477       - kernel_path: default (/boot/vmlinuz-2.6-xenU)
 478       - initrd_path: default ()
 479     Hardware:
 480       - VCPUs: 1
 481       - memory: 128MiB
 482       - NICs:
 483         - nic/0: MAC: aa:00:00:78:da:63, IP: None, mode: bridged, link: xen-br0
 484     Disks:
 485       - disk/0: drbd8, size 256M
 486         access mode: rw
 487         nodeA:       node2, minor=0
 488         nodeB:       node3, minor=0
 489         port:        11035
 490         auth key:    8e950e3cec6854b0181fbc3a6058657701f2d458
 491         on primary:  /dev/drbd0 (147:0) in sync, status *DEGRADED*
 492         child devices:
 493           - child 0: lvm, size 256M
 494             logical_id: xenvg/22459cf8-117d-4bea-a1aa-791667d07800.disk0_data
 495             on primary: /dev/xenvg/22459cf8-117d-4bea-a1aa-791667d07800.disk0_data (254:0)
 496           - child 1: lvm, size 128M
 497             logical_id: xenvg/22459cf8-117d-4bea-a1aa-791667d07800.disk0_meta
 498             on primary: /dev/xenvg/22459cf8-117d-4bea-a1aa-791667d07800.disk0_meta (254:1)
 499
 500 The output is similar for instance4. In order to recover this, we need
 501 to run the node evacuate command which will change from the current
 502 secondary node to a new one (in this case, we only have two working
 503 nodes, so all instances will be end on nodes one and two)::
 504
 505   node1# gnt-node evacuate -I hail node3
 506   Relocate instance(s) 'instance1','instance4' from node
 507    node3 using iallocator hail?
 508   y/[n]/?: y
 509   Mon Oct 26 05:05:39 2009  - INFO: Selected new secondary for instance 'instance1': node1
 510   Mon Oct 26 05:05:40 2009  - INFO: Selected new secondary for instance 'instance4': node2
 511   Mon Oct 26 05:05:40 2009 Replacing disk(s) 0 for instance1
 512   Mon Oct 26 05:05:40 2009 STEP 1/6 Check device existence
 513   Mon Oct 26 05:05:40 2009  - INFO: Checking disk/0 on node2
 514   Mon Oct 26 05:05:40 2009  - INFO: Checking volume groups
 515   Mon Oct 26 05:05:40 2009 STEP 2/6 Check peer consistency
 516   Mon Oct 26 05:05:40 2009  - INFO: Checking disk/0 consistency on node node2
 517   Mon Oct 26 05:05:40 2009 STEP 3/6 Allocate new storage
 518   Mon Oct 26 05:05:40 2009  - INFO: Adding new local storage on node1 for disk/0
 519   Mon Oct 26 05:05:41 2009 STEP 4/6 Changing drbd configuration
 520   Mon Oct 26 05:05:41 2009  - INFO: activating a new drbd on node1 for disk/0
 521   Mon Oct 26 05:05:42 2009  - INFO: Shutting down drbd for disk/0 on old node
 522   Mon Oct 26 05:05:42 2009  - WARNING: Failed to shutdown drbd for disk/0 on oldnode: Node is marked offline
 523   Mon Oct 26 05:05:42 2009       Hint: Please cleanup this device manually as soon as possible
 524   Mon Oct 26 05:05:42 2009  - INFO: Detaching primary drbds from the network (=> standalone)
 525   Mon Oct 26 05:05:42 2009  - INFO: Updating instance configuration
 526   Mon Oct 26 05:05:45 2009  - INFO: Attaching primary drbds to new secondary (standalone => connected)
 527   Mon Oct 26 05:05:46 2009 STEP 5/6 Sync devices
 528   Mon Oct 26 05:05:46 2009  - INFO: Waiting for instance instance1 to sync disks.
 529   Mon Oct 26 05:05:46 2009  - INFO: - device disk/0: 13.90% done, 7 estimated seconds remaining
 530   Mon Oct 26 05:05:53 2009  - INFO: Instance instance1's disks are in sync.
 531   Mon Oct 26 05:05:53 2009 STEP 6/6 Removing old storage
 532   Mon Oct 26 05:05:53 2009  - INFO: Remove logical volumes for 0
 533   Mon Oct 26 05:05:53 2009  - WARNING: Can't remove old LV: Node is marked offline
 534   Mon Oct 26 05:05:53 2009       Hint: remove unused LVs manually
 535   Mon Oct 26 05:05:53 2009  - WARNING: Can't remove old LV: Node is marked offline
 536   Mon Oct 26 05:05:53 2009       Hint: remove unused LVs manually
 537   Mon Oct 26 05:05:53 2009 Replacing disk(s) 0 for instance4
 538   Mon Oct 26 05:05:53 2009 STEP 1/6 Check device existence
 539   Mon Oct 26 05:05:53 2009  - INFO: Checking disk/0 on node1
 540   Mon Oct 26 05:05:53 2009  - INFO: Checking volume groups
 541   Mon Oct 26 05:05:53 2009 STEP 2/6 Check peer consistency
 542   Mon Oct 26 05:05:53 2009  - INFO: Checking disk/0 consistency on node node1
 543   Mon Oct 26 05:05:54 2009 STEP 3/6 Allocate new storage
 544   Mon Oct 26 05:05:54 2009  - INFO: Adding new local storage on node2 for disk/0
 545   Mon Oct 26 05:05:54 2009 STEP 4/6 Changing drbd configuration
 546   Mon Oct 26 05:05:54 2009  - INFO: activating a new drbd on node2 for disk/0
 547   Mon Oct 26 05:05:55 2009  - INFO: Shutting down drbd for disk/0 on old node
 548   Mon Oct 26 05:05:55 2009  - WARNING: Failed to shutdown drbd for disk/0 on oldnode: Node is marked offline
 549   Mon Oct 26 05:05:55 2009       Hint: Please cleanup this device manually as soon as possible
 550   Mon Oct 26 05:05:55 2009  - INFO: Detaching primary drbds from the network (=> standalone)
 551   Mon Oct 26 05:05:55 2009  - INFO: Updating instance configuration
 552   Mon Oct 26 05:05:55 2009  - INFO: Attaching primary drbds to new secondary (standalone => connected)
 553   Mon Oct 26 05:05:56 2009 STEP 5/6 Sync devices
 554   Mon Oct 26 05:05:56 2009  - INFO: Waiting for instance instance4 to sync disks.
 555   Mon Oct 26 05:05:56 2009  - INFO: - device disk/0: 12.40% done, 8 estimated seconds remaining
 556   Mon Oct 26 05:06:04 2009  - INFO: Instance instance4's disks are in sync.
 557   Mon Oct 26 05:06:04 2009 STEP 6/6 Removing old storage
 558   Mon Oct 26 05:06:04 2009  - INFO: Remove logical volumes for 0
 559   Mon Oct 26 05:06:04 2009  - WARNING: Can't remove old LV: Node is marked offline
 560   Mon Oct 26 05:06:04 2009       Hint: remove unused LVs manually
 561   Mon Oct 26 05:06:04 2009  - WARNING: Can't remove old LV: Node is marked offline
 562   Mon Oct 26 05:06:04 2009       Hint: remove unused LVs manually
 563   node1#
 564
 565 And now node3 is completely free of instances and can be repaired::
 566
 567   node1# gnt-node list
 568   Node  DTotal DFree MTotal MNode MFree Pinst Sinst
 569   node1   1.3T  1.3T  32.0G  1.0G 30.2G     3     1
 570   node2   1.3T  1.3T  32.0G  1.0G 30.4G     1     3
 571   node3      ?     ?      ?     ?     ?     0     0
 572
 573 Re-adding a node to the cluster
 574 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 575
 576
 577 Let's say node3 has been repaired and is now ready to be
 578 reused. Re-adding it is simple::
 579
 580   node1# gnt-node add --readd node3
 581   The authenticity of host 'node3 (172.24.227.1)' can't be established.
 582   RSA key fingerprint is 9f:2e:5a:2e:e0:bd:00:09:e4:5c:32:f2:27:57:7a:f4.
 583   Are you sure you want to continue connecting (yes/no)? yes
 584   Mon Oct 26 05:27:39 2009  - INFO: Readding a node, the offline/drained flags were reset
 585   Mon Oct 26 05:27:39 2009  - INFO: Node will be a master candidate
 586
 587 And is now working again::
 588
 589   node1# gnt-node list
 590   Node  DTotal DFree MTotal MNode MFree Pinst Sinst
 591   node1   1.3T  1.3T  32.0G  1.0G 30.2G     3     1
 592   node2   1.3T  1.3T  32.0G  1.0G 30.4G     1     3
 593   node3   1.3T  1.3T  32.0G  1.0G 30.4G     0     0
 594
 595 .. note:: If you have the ganeti-htools package installed, you can
 596    shuffle the instances around to have a better use of the nodes.
 597
 598 Disk failures
 599 +++++++++++++
 600
 601 A disk failure is simpler than a full node failure. First, a single disk
 602 failure should not cause data-loss for any redundant instance; only the
 603 performance of some instances might be reduced due to more network
 604 traffic.
 605
 606 Let take the cluster status in the above listing, and check what volumes
 607 are in use::
 608
 609   node1# gnt-node volumes -o phys,instance node2
 610   PhysDev   Instance
 611   /dev/sdb1 instance4
 612   /dev/sdb1 instance4
 613   /dev/sdb1 instance1
 614   /dev/sdb1 instance1
 615   /dev/sdb1 instance3
 616   /dev/sdb1 instance3
 617   /dev/sdb1 instance2
 618   /dev/sdb1 instance2
 619   node1#
 620
 621 You can see that all instances on node2 have logical volumes on
 622 ``/dev/sdb1``. Let's simulate a disk failure on that disk::
 623
 624   node1# ssh node2
 625   node2# echo offline > /sys/block/sdb/device/state
 626   node2# vgs
 627     /dev/sdb1: read failed after 0 of 4096 at 0: Input/output error
 628     /dev/sdb1: read failed after 0 of 4096 at 750153695232: Input/output error
 629     /dev/sdb1: read failed after 0 of 4096 at 0: Input/output error
 630     Couldn't find device with uuid '954bJA-mNL0-7ydj-sdpW-nc2C-ZrCi-zFp91c'.
 631     Couldn't find all physical volumes for volume group xenvg.
 632     /dev/sdb1: read failed after 0 of 4096 at 0: Input/output error
 633     /dev/sdb1: read failed after 0 of 4096 at 0: Input/output error
 634     Couldn't find device with uuid '954bJA-mNL0-7ydj-sdpW-nc2C-ZrCi-zFp91c'.
 635     Couldn't find all physical volumes for volume group xenvg.
 636     Volume group xenvg not found
 637   node2#
 638
 639 At this point, the node is broken and if we are to examine
 640 instance2 we get (simplified output shown)::
 641
 642   node1# gnt-instance info instance2
 643   Instance name: instance2
 644   State: configured to be up, actual state is up
 645     Nodes:
 646       - primary: node1
 647       - secondaries: node2
 648     Disks:
 649       - disk/0: drbd8, size 256M
 650         on primary:   /dev/drbd0 (147:0) in sync, status ok
 651         on secondary: /dev/drbd1 (147:1) in sync, status *DEGRADED* *MISSING DISK*
 652
 653 This instance has a secondary only on node2. Let's verify a primary
 654 instance of node2::
 655
 656   node1# gnt-instance info instance1
 657   Instance name: instance1
 658   State: configured to be up, actual state is up
 659     Nodes:
 660       - primary: node2
 661       - secondaries: node1
 662     Disks:
 663       - disk/0: drbd8, size 256M
 664         on primary:   /dev/drbd0 (147:0) in sync, status *DEGRADED* *MISSING DISK*
 665         on secondary: /dev/drbd3 (147:3) in sync, status ok
 666   node1# gnt-instance console instance1
 667
 668   Debian GNU/Linux 5.0 instance1 tty1
 669
 670   instance1 login: root
 671   Last login: Tue Oct 27 01:24:09 UTC 2009 on tty1
 672   instance1:~# date > test
 673   instance1:~# sync
 674   instance1:~# cat test
 675   Tue Oct 27 01:25:20 UTC 2009
 676   instance1:~# dmesg|tail
 677   [5439785.235448] NET: Registered protocol family 15
 678   [5439785.235489] 802.1Q VLAN Support v1.8 Ben Greear <greearb@candelatech.com>
 679   [5439785.235495] All bugs added by David S. Miller <davem@redhat.com>
 680   [5439785.235517] XENBUS: Device with no driver: device/console/0
 681   [5439785.236576] kjournald starting.  Commit interval 5 seconds
 682   [5439785.236588] EXT3-fs: mounted filesystem with ordered data mode.
 683   [5439785.236625] VFS: Mounted root (ext3 filesystem) readonly.
 684   [5439785.236663] Freeing unused kernel memory: 172k freed
 685   [5439787.533779] EXT3 FS on sda1, internal journal
 686   [5440655.065431] eth0: no IPv6 routers present
 687   instance1:~#
 688
 689 As you can see, the instance is running fine and doesn't see any disk
 690 issues. It is now time to fix node2 and re-establish redundancy for the
 691 involved instances.
 692
 693 .. note:: For Ganeti 2.0 we need to fix manually the volume group on
 694    node2 by running ``vgreduce --removemissing xenvg``
 695
 696 ::
 697
 698   node1# gnt-node repair-storage node2 lvm-vg xenvg
 699   Mon Oct 26 18:14:03 2009 Repairing storage unit 'xenvg' on node2 ...
 700   node1# ssh node2 vgs
 701     VG    #PV #LV #SN Attr   VSize   VFree
 702     xenvg   1   8   0 wz--n- 673.84G 673.84G
 703   node1#
 704
 705 This has removed the 'bad' disk from the volume group, which is now left
 706 with only one PV. We can now replace the disks for the involved
 707 instances::
 708
 709   node1# for i in instance{1..4}; do gnt-instance replace-disks -a $i; done
 710   Mon Oct 26 18:15:38 2009 Replacing disk(s) 0 for instance1
 711   Mon Oct 26 18:15:38 2009 STEP 1/6 Check device existence
 712   Mon Oct 26 18:15:38 2009  - INFO: Checking disk/0 on node1
 713   Mon Oct 26 18:15:38 2009  - INFO: Checking disk/0 on node2
 714   Mon Oct 26 18:15:38 2009  - INFO: Checking volume groups
 715   Mon Oct 26 18:15:38 2009 STEP 2/6 Check peer consistency
 716   Mon Oct 26 18:15:38 2009  - INFO: Checking disk/0 consistency on node node1
 717   Mon Oct 26 18:15:39 2009 STEP 3/6 Allocate new storage
 718   Mon Oct 26 18:15:39 2009  - INFO: Adding storage on node2 for disk/0
 719   Mon Oct 26 18:15:39 2009 STEP 4/6 Changing drbd configuration
 720   Mon Oct 26 18:15:39 2009  - INFO: Detaching disk/0 drbd from local storage
 721   Mon Oct 26 18:15:40 2009  - INFO: Renaming the old LVs on the target node
 722   Mon Oct 26 18:15:40 2009  - INFO: Renaming the new LVs on the target node
 723   Mon Oct 26 18:15:40 2009  - INFO: Adding new mirror component on node2
 724   Mon Oct 26 18:15:41 2009 STEP 5/6 Sync devices
 725   Mon Oct 26 18:15:41 2009  - INFO: Waiting for instance instance1 to sync disks.
 726   Mon Oct 26 18:15:41 2009  - INFO: - device disk/0: 12.40% done, 9 estimated seconds remaining
 727   Mon Oct 26 18:15:50 2009  - INFO: Instance instance1's disks are in sync.
 728   Mon Oct 26 18:15:50 2009 STEP 6/6 Removing old storage
 729   Mon Oct 26 18:15:50 2009  - INFO: Remove logical volumes for disk/0
 730   Mon Oct 26 18:15:52 2009 Replacing disk(s) 0 for instance2
 731   Mon Oct 26 18:15:52 2009 STEP 1/6 Check device existence
 732   …
 733   Mon Oct 26 18:16:01 2009 STEP 6/6 Removing old storage
 734   Mon Oct 26 18:16:01 2009  - INFO: Remove logical volumes for disk/0
 735   Mon Oct 26 18:16:02 2009 Replacing disk(s) 0 for instance3
 736   Mon Oct 26 18:16:02 2009 STEP 1/6 Check device existence
 737   …
 738   Mon Oct 26 18:16:09 2009 STEP 6/6 Removing old storage
 739   Mon Oct 26 18:16:09 2009  - INFO: Remove logical volumes for disk/0
 740   Mon Oct 26 18:16:10 2009 Replacing disk(s) 0 for instance4
 741   Mon Oct 26 18:16:10 2009 STEP 1/6 Check device existence
 742   …
 743   Mon Oct 26 18:16:18 2009 STEP 6/6 Removing old storage
 744   Mon Oct 26 18:16:18 2009  - INFO: Remove logical volumes for disk/0
 745   node1#
 746
 747 As this point, all instances should be healthy again.
 748
 749 .. note:: Ganeti 2.0 doesn't have the ``-a`` option to replace-disks, so
 750    for it you have to run the loop twice, once over primary instances
 751    with argument ``-p`` and once secondary instances with argument
 752    ``-s``, but otherwise the operations are similar::
 753
 754      node1# gnt-instance replace-disks -p instance1
 755      …
 756      node1# for i in instance{2..4}; do gnt-instance replace-disks -s $i; done
 757
 758 Common cluster problems
 759 -----------------------
 760
 761 There are a number of small issues that might appear on a cluster that
 762 can be solved easily as long as the issue is properly identified. For
 763 this exercise we will consider the case of node3, which was broken
 764 previously and re-added to the cluster without reinstallation. Running
 765 cluster verify on the cluster reports::
 766
 767   node1# gnt-cluster verify
 768   Mon Oct 26 18:30:08 2009 * Verifying global settings
 769   Mon Oct 26 18:30:08 2009 * Gathering data (3 nodes)
 770   Mon Oct 26 18:30:10 2009 * Verifying node status
 771   Mon Oct 26 18:30:10 2009   - ERROR: node node3: unallocated drbd minor 0 is in use
 772   Mon Oct 26 18:30:10 2009   - ERROR: node node3: unallocated drbd minor 1 is in use
 773   Mon Oct 26 18:30:10 2009 * Verifying instance status
 774   Mon Oct 26 18:30:10 2009   - ERROR: instance instance4: instance should not run on node node3
 775   Mon Oct 26 18:30:10 2009 * Verifying orphan volumes
 776   Mon Oct 26 18:30:10 2009   - ERROR: node node3: volume 22459cf8-117d-4bea-a1aa-791667d07800.disk0_data is unknown
 777   Mon Oct 26 18:30:10 2009   - ERROR: node node3: volume 1aaf4716-e57f-4101-a8d6-03af5da9dc50.disk0_data is unknown
 778   Mon Oct 26 18:30:10 2009   - ERROR: node node3: volume 1aaf4716-e57f-4101-a8d6-03af5da9dc50.disk0_meta is unknown
 779   Mon Oct 26 18:30:10 2009   - ERROR: node node3: volume 22459cf8-117d-4bea-a1aa-791667d07800.disk0_meta is unknown
 780   Mon Oct 26 18:30:10 2009 * Verifying remaining instances
 781   Mon Oct 26 18:30:10 2009 * Verifying N+1 Memory redundancy
 782   Mon Oct 26 18:30:10 2009 * Other Notes
 783   Mon Oct 26 18:30:10 2009 * Hooks Results
 784   node1#
 785
 786 Instance status
 787 +++++++++++++++
 788
 789 As you can see, *instance4* has a copy running on node3, because we
 790 forced the failover when node3 failed. This case is dangerous as the
 791 instance will have the same IP and MAC address, wreaking havok on the
 792 network environment and anyone who tries to use it.
 793
 794 Ganeti doesn't directly handle this case. It is recommended to logon to
 795 node3 and run::
 796
 797   node3# xm destroy instance4
 798
 799 Unallocated DRBD minors
 800 +++++++++++++++++++++++
 801
 802 There are still unallocated DRBD minors on node3. Again, these are not
 803 handled by Ganeti directly and need to be cleaned up via DRBD commands::
 804
 805   node3# drbdsetup /dev/drbd0 down
 806   node3# drbdsetup /dev/drbd1 down
 807   node3#
 808
 809 Orphan volumes
 810 ++++++++++++++
 811
 812 At this point, the only remaining problem should be the so-called
 813 *orphan* volumes. This can happen also in the case of an aborted
 814 disk-replace, or similar situation where Ganeti was not able to recover
 815 automatically. Here you need to remove them manually via LVM commands::
 816
 817   node3# lvremove xenvg
 818   Do you really want to remove active logical volume "22459cf8-117d-4bea-a1aa-791667d07800.disk0_data"? [y/n]: y
 819     Logical volume "22459cf8-117d-4bea-a1aa-791667d07800.disk0_data" successfully removed
 820   Do you really want to remove active logical volume "22459cf8-117d-4bea-a1aa-791667d07800.disk0_meta"? [y/n]: y
 821     Logical volume "22459cf8-117d-4bea-a1aa-791667d07800.disk0_meta" successfully removed
 822   Do you really want to remove active logical volume "1aaf4716-e57f-4101-a8d6-03af5da9dc50.disk0_data"? [y/n]: y
 823     Logical volume "1aaf4716-e57f-4101-a8d6-03af5da9dc50.disk0_data" successfully removed
 824   Do you really want to remove active logical volume "1aaf4716-e57f-4101-a8d6-03af5da9dc50.disk0_meta"? [y/n]: y
 825     Logical volume "1aaf4716-e57f-4101-a8d6-03af5da9dc50.disk0_meta" successfully removed
 826   node3#
 827
 828 At this point cluster verify shouldn't complain anymore::
 829
 830   node1# gnt-cluster verify
 831   Mon Oct 26 18:37:51 2009 * Verifying global settings
 832   Mon Oct 26 18:37:51 2009 * Gathering data (3 nodes)
 833   Mon Oct 26 18:37:53 2009 * Verifying node status
 834   Mon Oct 26 18:37:53 2009 * Verifying instance status
 835   Mon Oct 26 18:37:53 2009 * Verifying orphan volumes
 836   Mon Oct 26 18:37:53 2009 * Verifying remaining instances
 837   Mon Oct 26 18:37:53 2009 * Verifying N+1 Memory redundancy
 838   Mon Oct 26 18:37:53 2009 * Other Notes
 839   Mon Oct 26 18:37:53 2009 * Hooks Results
 840   node1#
 841
 842 N+1 errors
 843 ++++++++++
 844
 845 Since redundant instances in Ganeti have a primary/secondary model, it
 846 is needed to leave aside on each node enough memory so that if one of
 847 its peer node fails, all the secondary instances that have that node as
 848 primary can be relocated. More specifically, if instance2 has node1 as
 849 primary and node2 as secondary (and node1 and node2 do not have any
 850 other instances in this layout), then it means that node2 must have
 851 enough free memory so that if node1 fails, we can failover instance2
 852 without any other operations (for reducing the downtime window). Let's
 853 increase the memory of the current instances to 4G, and add three new
 854 instances, two on node2:node3 with 8GB of RAM and one on node1:node2,
 855 with 12GB of RAM (numbers chosen so that we run out of memory)::
 856
 857   node1# gnt-instance modify -B memory=4G instance1
 858   Modified instance instance1
 859    - be/memory -> 4096
 860   Please don't forget that these parameters take effect only at the next start of the instance.
 861   node1# gnt-instance modify …
 862
 863   node1# gnt-instance add -t drbd -n node2:node3 -s 512m -B memory=8G -o debootstrap instance5
 864   …
 865   node1# gnt-instance add -t drbd -n node2:node3 -s 512m -B memory=8G -o debootstrap instance6
 866   …
 867   node1# gnt-instance add -t drbd -n node1:node2 -s 512m -B memory=8G -o debootstrap instance7
 868   node1# gnt-instance reboot --all
 869   The reboot will operate on 7 instances.
 870   Do you want to continue?
 871   Affected instances:
 872     instance1
 873     instance2
 874     instance3
 875     instance4
 876     instance5
 877     instance6
 878     instance7
 879   y/[n]/?: y
 880   Submitted jobs 677, 678, 679, 680, 681, 682, 683
 881   Waiting for job 677 for instance1...
 882   Waiting for job 678 for instance2...
 883   Waiting for job 679 for instance3...
 884   Waiting for job 680 for instance4...
 885   Waiting for job 681 for instance5...
 886   Waiting for job 682 for instance6...
 887   Waiting for job 683 for instance7...
 888   node1#
 889
 890 We rebooted instances for the memory changes to have effect. Now the
 891 cluster looks like::
 892
 893   node1# gnt-node list
 894   Node  DTotal DFree MTotal MNode MFree Pinst Sinst
 895   node1   1.3T  1.3T  32.0G  1.0G  6.5G     4     1
 896   node2   1.3T  1.3T  32.0G  1.0G 10.5G     3     4
 897   node3   1.3T  1.3T  32.0G  1.0G 30.5G     0     2
 898   node1# gnt-cluster verify
 899   Mon Oct 26 18:59:36 2009 * Verifying global settings
 900   Mon Oct 26 18:59:36 2009 * Gathering data (3 nodes)
 901   Mon Oct 26 18:59:37 2009 * Verifying node status
 902   Mon Oct 26 18:59:37 2009 * Verifying instance status
 903   Mon Oct 26 18:59:37 2009 * Verifying orphan volumes
 904   Mon Oct 26 18:59:37 2009 * Verifying remaining instances
 905   Mon Oct 26 18:59:37 2009 * Verifying N+1 Memory redundancy
 906   Mon Oct 26 18:59:37 2009   - ERROR: node node2: not enough memory on to accommodate failovers should peer node node1 fail
 907   Mon Oct 26 18:59:37 2009 * Other Notes
 908   Mon Oct 26 18:59:37 2009 * Hooks Results
 909   node1#
 910
 911 The cluster verify error above shows that if node1 fails, node2 will not
 912 have enough memory to failover all primary instances on node1 to it. To
 913 solve this, you have a number of options:
 914
 915 - try to manually move instances around (but this can become complicated
 916   for any non-trivial cluster)
 917 - try to reduce memory of some instances to accommodate the available
 918   node memory
 919 - if you have the ganeti-htools package installed, you can run the
 920   ``hbal`` tool which will try to compute an automated cluster solution
 921   that complies with the N+1 rule
 922
 923 Network issues
 924 ++++++++++++++
 925
 926 In case a node has problems with the network (usually the secondary
 927 network, as problems with the primary network will render the node
 928 unusable for ganeti commands), it will show up in cluster verify as::
 929
 930   node1# gnt-cluster verify
 931   Mon Oct 26 19:07:19 2009 * Verifying global settings
 932   Mon Oct 26 19:07:19 2009 * Gathering data (3 nodes)
 933   Mon Oct 26 19:07:23 2009 * Verifying node status
 934   Mon Oct 26 19:07:23 2009   - ERROR: node node1: tcp communication with node 'node3': failure using the secondary interface(s)
 935   Mon Oct 26 19:07:23 2009   - ERROR: node node2: tcp communication with node 'node3': failure using the secondary interface(s)
 936   Mon Oct 26 19:07:23 2009   - ERROR: node node3: tcp communication with node 'node1': failure using the secondary interface(s)
 937   Mon Oct 26 19:07:23 2009   - ERROR: node node3: tcp communication with node 'node2': failure using the secondary interface(s)
 938   Mon Oct 26 19:07:23 2009   - ERROR: node node3: tcp communication with node 'node3': failure using the secondary interface(s)
 939   Mon Oct 26 19:07:23 2009 * Verifying instance status
 940   Mon Oct 26 19:07:23 2009 * Verifying orphan volumes
 941   Mon Oct 26 19:07:23 2009 * Verifying remaining instances
 942   Mon Oct 26 19:07:23 2009 * Verifying N+1 Memory redundancy
 943   Mon Oct 26 19:07:23 2009 * Other Notes
 944   Mon Oct 26 19:07:23 2009 * Hooks Results
 945   node1#
 946
 947 This shows that both node1 and node2 have problems contacting node3 over
 948 the secondary network, and node3 has problems contacting them. From this
 949 output is can be deduced that since node1 and node2 can communicate
 950 between themselves, node3 is the one having problems, and you need to
 951 investigate its network settings/connection.
 952
 953 Migration problems
 954 ++++++++++++++++++
 955
 956 Since live migration can sometimes fail and leave the instance in an
 957 inconsistent state, Ganeti provides a ``--cleanup`` argument to the
 958 migrate command that does:
 959
 960 - check on which node the instance is actually running (has the
 961   command failed before or after the actual migration?)
 962 - reconfigure the DRBD disks accordingly
 963
 964 It is always safe to run this command as long as the instance has good
 965 data on its primary node (i.e. not showing as degraded). If so, you can
 966 simply run::
 967
 968   node1# gnt-instance migrate --cleanup instance1
 969   Instance instance1 will be recovered from a failed migration. Note
 970   that the migration procedure (including cleanup) is **experimental**
 971   in this version. This might impact the instance if anything goes
 972   wrong. Continue?
 973   y/[n]/?: y
 974   Mon Oct 26 19:13:49 2009 Migrating instance instance1
 975   Mon Oct 26 19:13:49 2009 * checking where the instance actually runs (if this hangs, the hypervisor might be in a bad state)
 976   Mon Oct 26 19:13:49 2009 * instance confirmed to be running on its primary node (node2)
 977   Mon Oct 26 19:13:49 2009 * switching node node1 to secondary mode
 978   Mon Oct 26 19:13:50 2009 * wait until resync is done
 979   Mon Oct 26 19:13:50 2009 * changing into standalone mode
 980   Mon Oct 26 19:13:50 2009 * changing disks into single-master mode
 981   Mon Oct 26 19:13:50 2009 * wait until resync is done
 982   Mon Oct 26 19:13:51 2009 * done
 983   node1#
 984
 985 In use disks at instance shutdown
 986 +++++++++++++++++++++++++++++++++
 987
 988 If you see something like the following when trying to shutdown or
 989 deactivate disks for an instance::
 990
 991   node1# gnt-instance shutdown instance1
 992   Mon Oct 26 19:16:23 2009  - WARNING: Could not shutdown block device disk/0 on node node2: drbd0: can't shutdown drbd device: /dev/drbd0: State change failed: (-12) Device is held open by someone\n
 993
 994 It most likely means something is holding open the underlying DRBD
 995 device. This can be bad if the instance is not running, as it might mean
 996 that there was concurrent access from both the node and the instance to
 997 the disks, but not always (e.g. you could only have had the partitions
 998 activated via ``kpartx``).
 999
1000 To troubleshoot this issue you need to follow standard Linux practices,
1001 and pay attention to the hypervisor being used:
1002
1003 - check if (in the above example) ``/dev/drbd0`` on node2 is being
1004   mounted somewhere (``cat /proc/mounts``)
1005 - check if the device is not being used by device mapper itself:
1006   ``dmsetup ls`` and look for entries of the form ``drbd0pX``, and if so
1007   remove them with either ``kpartx -d`` or ``dmsetup remove``
1008
1009 For Xen, check if it's not using the disks itself::
1010
1011   node1# xenstore-ls /local/domain/0/backend/vbd|grep -e "domain =" -e physical-device
1012   domain = "instance2"
1013   physical-device = "93:0"
1014   domain = "instance3"
1015   physical-device = "93:1"
1016   domain = "instance4"
1017   physical-device = "93:2"
1018   node1#
1019
1020 You can see in the above output that the node exports three disks, to
1021 three instances. The ``physical-device`` key is in major:minor format in
1022 hexadecimal, and 0x93 represents DRBD's major number. Thus we can see
1023 from the above that instance2 has /dev/drbd0, instance3 /dev/drbd1, and
1024 instance4 /dev/drbd2.
1025
1026 .. vim: set textwidth=72 :
1027 .. Local Variables:
1028 .. mode: rst
1029 .. fill-column: 72
1030 .. End: