Revision 832f8c6a
b/doc/walkthrough.rst | ||
---|---|---|
5 | 5 |
|
6 | 6 |
.. contents:: |
7 | 7 |
|
8 |
.. highlight:: text
|
|
8 |
.. highlight:: shell-example
|
|
9 | 9 |
|
10 | 10 |
Introduction |
11 | 11 |
------------ |
... | ... | |
32 | 32 |
Follow the :doc:`install` document and prepare the nodes. Then it's time |
33 | 33 |
to initialise the cluster:: |
34 | 34 |
|
35 |
node1# gnt-cluster init -s 192.0.2.1 --enabled-hypervisors=xen-pvm example-cluster
|
|
36 |
node1#
|
|
35 |
$ gnt-cluster init -s %192.0.2.1% --enabled-hypervisors=xen-pvm %example-cluster%
|
|
36 |
$
|
|
37 | 37 |
|
38 | 38 |
The creation was fine. Let's check that one node we have is functioning |
39 | 39 |
correctly:: |
40 | 40 |
|
41 |
node1# gnt-node list
|
|
41 |
$ gnt-node list
|
|
42 | 42 |
Node DTotal DFree MTotal MNode MFree Pinst Sinst |
43 | 43 |
node1 1.3T 1.3T 32.0G 1.0G 30.5G 0 0 |
44 |
node1# gnt-cluster verify
|
|
44 |
$ gnt-cluster verify
|
|
45 | 45 |
Mon Oct 26 02:08:51 2009 * Verifying global settings |
46 | 46 |
Mon Oct 26 02:08:51 2009 * Gathering data (1 nodes) |
47 | 47 |
Mon Oct 26 02:08:52 2009 * Verifying node status |
... | ... | |
51 | 51 |
Mon Oct 26 02:08:52 2009 * Verifying N+1 Memory redundancy |
52 | 52 |
Mon Oct 26 02:08:52 2009 * Other Notes |
53 | 53 |
Mon Oct 26 02:08:52 2009 * Hooks Results |
54 |
node1#
|
|
54 |
$
|
|
55 | 55 |
|
56 | 56 |
Since this proceeded correctly, let's add the other two nodes:: |
57 | 57 |
|
58 |
node1# gnt-node add -s 192.0.2.2 node2
|
|
58 |
$ gnt-node add -s %192.0.2.2% %node2%
|
|
59 | 59 |
-- WARNING -- |
60 | 60 |
Performing this operation is going to replace the ssh daemon keypair |
61 | 61 |
on the target machine (node2) with the ones of the current one |
62 | 62 |
and grant full intra-cluster ssh root access to/from it |
63 | 63 |
|
64 |
The authenticity of host 'node2 (192.0.2.2)' can't be established. |
|
65 |
RSA key fingerprint is 9f:… |
|
66 |
Are you sure you want to continue connecting (yes/no)? yes |
|
67 |
root@node2's password: |
|
64 |
Unable to verify hostkey of host xen-devi-5.fra.corp.google.com: |
|
65 |
f7:…. Do you want to accept it? |
|
66 |
y/[n]/?: %y% |
|
67 |
Mon Oct 26 02:11:53 2009 Authentication to node2 via public key failed, trying password |
|
68 |
root password: |
|
68 | 69 |
Mon Oct 26 02:11:54 2009 - INFO: Node will be a master candidate |
69 |
node1# gnt-node add -s 192.0.2.3 node3
|
|
70 |
$ gnt-node add -s %192.0.2.3% %node3%
|
|
70 | 71 |
-- WARNING -- |
71 | 72 |
Performing this operation is going to replace the ssh daemon keypair |
72 |
on the target machine (node2) with the ones of the current one
|
|
73 |
on the target machine (node3) with the ones of the current one
|
|
73 | 74 |
and grant full intra-cluster ssh root access to/from it |
74 | 75 |
|
75 |
The authenticity of host 'node3 (192.0.2.3)' can't be established. |
|
76 |
RSA key fingerprint is 9f:… |
|
77 |
Are you sure you want to continue connecting (yes/no)? yes |
|
78 |
root@node2's password: |
|
79 |
Mon Oct 26 02:11:54 2009 - INFO: Node will be a master candidate |
|
76 |
… |
|
77 |
Mon Oct 26 02:12:43 2009 - INFO: Node will be a master candidate |
|
80 | 78 |
|
81 | 79 |
Checking the cluster status again:: |
82 | 80 |
|
83 |
node1# gnt-node list
|
|
81 |
$ gnt-node list
|
|
84 | 82 |
Node DTotal DFree MTotal MNode MFree Pinst Sinst |
85 | 83 |
node1 1.3T 1.3T 32.0G 1.0G 30.5G 0 0 |
86 | 84 |
node2 1.3T 1.3T 32.0G 1.0G 30.5G 0 0 |
87 | 85 |
node3 1.3T 1.3T 32.0G 1.0G 30.5G 0 0 |
88 |
node1# gnt-cluster verify
|
|
86 |
$ gnt-cluster verify
|
|
89 | 87 |
Mon Oct 26 02:15:14 2009 * Verifying global settings |
90 | 88 |
Mon Oct 26 02:15:14 2009 * Gathering data (3 nodes) |
91 | 89 |
Mon Oct 26 02:15:16 2009 * Verifying node status |
... | ... | |
95 | 93 |
Mon Oct 26 02:15:16 2009 * Verifying N+1 Memory redundancy |
96 | 94 |
Mon Oct 26 02:15:16 2009 * Other Notes |
97 | 95 |
Mon Oct 26 02:15:16 2009 * Hooks Results |
98 |
node1#
|
|
96 |
$
|
|
99 | 97 |
|
100 | 98 |
And let's check that we have a valid OS:: |
101 | 99 |
|
102 |
node1# gnt-os list
|
|
100 |
$ gnt-os list
|
|
103 | 101 |
Name |
104 | 102 |
debootstrap |
105 | 103 |
node1# |
... | ... | |
112 | 110 |
etc. This is done via the debootstrap tool as described in the admin |
113 | 111 |
guide. Similar output lines are replaced with ``…`` in the below log:: |
114 | 112 |
|
115 |
node1# /usr/lib/ganeti/tools/burnin -o debootstrap -p instance{1..5}
|
|
113 |
$ /usr/lib/ganeti/tools/burnin -o debootstrap -p instance{1..5}
|
|
116 | 114 |
- Testing global parameters |
117 | 115 |
- Creating instances |
118 | 116 |
* instance instance1 |
... | ... | |
261 | 259 |
* Submitted job ID(s) 235, 236, 237, 238, 239 |
262 | 260 |
waiting for job 235 for instance1 |
263 | 261 |
… |
264 |
node1#
|
|
262 |
$
|
|
265 | 263 |
|
266 | 264 |
You can see in the above what operations the burn-in does. Ideally, the |
267 | 265 |
burn-in log would proceed successfully through all the steps and end |
... | ... | |
276 | 274 |
At this point, Ganeti and the hardware seems to be functioning |
277 | 275 |
correctly, so we'll follow up with creating the instances manually:: |
278 | 276 |
|
279 |
node1# gnt-instance add -t drbd -o debootstrap -s 256m -n node1:node2 instance3
|
|
277 |
$ gnt-instance add -t drbd -o debootstrap -s %256m% -n %node1%:%node2% %instance3%
|
|
280 | 278 |
Mon Oct 26 04:06:52 2009 - INFO: Selected nodes for instance instance1 via iallocator hail: node2, node3 |
281 | 279 |
Mon Oct 26 04:06:53 2009 * creating instance disks... |
282 | 280 |
Mon Oct 26 04:06:57 2009 adding instance instance1 to cluster config |
283 | 281 |
Mon Oct 26 04:06:57 2009 - INFO: Waiting for instance instance1 to sync disks. |
284 |
Mon Oct 26 04:06:57 2009 - INFO: - device disk/0: 20.00% done, 4 estimated seconds remaining |
|
282 |
Mon Oct 26 04:06:57 2009 - INFO: - device disk/0: 20.00\% done, 4 estimated seconds remaining
|
|
285 | 283 |
Mon Oct 26 04:07:01 2009 - INFO: Instance instance1's disks are in sync. |
286 | 284 |
Mon Oct 26 04:07:01 2009 creating os for instance instance1 on node node2 |
287 | 285 |
Mon Oct 26 04:07:01 2009 * running the instance OS create scripts... |
288 | 286 |
Mon Oct 26 04:07:14 2009 * starting instance... |
289 |
node1# gnt-instance add -t drbd -o debootstrap -s 256m -n node1:node2 instanc<drbd -o debootstrap -s 256m -n node1:node2 instance2
|
|
287 |
$ gnt-instance add -t drbd -o debootstrap -s %256m% -n %node1%:%node2% %instance2%
|
|
290 | 288 |
Mon Oct 26 04:11:37 2009 * creating instance disks... |
291 | 289 |
Mon Oct 26 04:11:40 2009 adding instance instance2 to cluster config |
292 | 290 |
Mon Oct 26 04:11:41 2009 - INFO: Waiting for instance instance2 to sync disks. |
293 |
Mon Oct 26 04:11:41 2009 - INFO: - device disk/0: 35.40% done, 1 estimated seconds remaining |
|
294 |
Mon Oct 26 04:11:42 2009 - INFO: - device disk/0: 58.50% done, 1 estimated seconds remaining |
|
295 |
Mon Oct 26 04:11:43 2009 - INFO: - device disk/0: 86.20% done, 0 estimated seconds remaining |
|
296 |
Mon Oct 26 04:11:44 2009 - INFO: - device disk/0: 92.40% done, 0 estimated seconds remaining |
|
297 |
Mon Oct 26 04:11:44 2009 - INFO: - device disk/0: 97.00% done, 0 estimated seconds remaining |
|
291 |
Mon Oct 26 04:11:41 2009 - INFO: - device disk/0: 35.40\% done, 1 estimated seconds remaining
|
|
292 |
Mon Oct 26 04:11:42 2009 - INFO: - device disk/0: 58.50\% done, 1 estimated seconds remaining
|
|
293 |
Mon Oct 26 04:11:43 2009 - INFO: - device disk/0: 86.20\% done, 0 estimated seconds remaining
|
|
294 |
Mon Oct 26 04:11:44 2009 - INFO: - device disk/0: 92.40\% done, 0 estimated seconds remaining
|
|
295 |
Mon Oct 26 04:11:44 2009 - INFO: - device disk/0: 97.00\% done, 0 estimated seconds remaining
|
|
298 | 296 |
Mon Oct 26 04:11:44 2009 - INFO: Instance instance2's disks are in sync. |
299 | 297 |
Mon Oct 26 04:11:44 2009 creating os for instance instance2 on node node1 |
300 | 298 |
Mon Oct 26 04:11:44 2009 * running the instance OS create scripts... |
301 | 299 |
Mon Oct 26 04:11:57 2009 * starting instance... |
302 |
node1#
|
|
300 |
$
|
|
303 | 301 |
|
304 | 302 |
The above shows one instance created via an iallocator script, and one |
305 | 303 |
being created with manual node assignment. The other three instances |
306 | 304 |
were also created and now it's time to check them:: |
307 | 305 |
|
308 |
node1# gnt-instance list
|
|
306 |
$ gnt-instance list
|
|
309 | 307 |
Instance Hypervisor OS Primary_node Status Memory |
310 | 308 |
instance1 xen-pvm debootstrap node2 running 128M |
311 | 309 |
instance2 xen-pvm debootstrap node1 running 128M |
... | ... | |
318 | 316 |
|
319 | 317 |
Accessing an instance's console is easy:: |
320 | 318 |
|
321 |
node1# gnt-instance console instance2
|
|
319 |
$ gnt-instance console %instance2%
|
|
322 | 320 |
[ 0.000000] Bootdata ok (command line is root=/dev/sda1 ro) |
323 | 321 |
[ 0.000000] Linux version 2.6… |
324 | 322 |
[ 0.000000] BIOS-provided physical RAM map: |
... | ... | |
347 | 345 |
network (and doing this on all instances), we can check their |
348 | 346 |
connectivity:: |
349 | 347 |
|
350 |
node1# fping instance{1..5}
|
|
348 |
$ fping %instance{1..5}%
|
|
351 | 349 |
instance1 is alive |
352 | 350 |
instance2 is alive |
353 | 351 |
instance3 is alive |
354 | 352 |
instance4 is alive |
355 | 353 |
instance5 is alive |
356 |
node1#
|
|
354 |
$
|
|
357 | 355 |
|
358 | 356 |
Removal |
359 | 357 |
+++++++ |
360 | 358 |
|
361 | 359 |
Removing unwanted instances is also easy:: |
362 | 360 |
|
363 |
node1# gnt-instance remove instance5
|
|
361 |
$ gnt-instance remove %instance5%
|
|
364 | 362 |
This will remove the volumes of the instance instance5 (including |
365 | 363 |
mirrors), thus removing all the data of the instance. Continue? |
366 |
y/[n]/?: y
|
|
367 |
node1#
|
|
364 |
y/[n]/?: %y%
|
|
365 |
$
|
|
368 | 366 |
|
369 | 367 |
|
370 | 368 |
Recovering from hardware failures |
... | ... | |
376 | 374 |
We are now left with four instances. Assume that at this point, node3, |
377 | 375 |
which has one primary and one secondary instance, crashes:: |
378 | 376 |
|
379 |
node1# gnt-node info node3
|
|
377 |
$ gnt-node info %node3%
|
|
380 | 378 |
Node name: node3 |
381 | 379 |
primary ip: 198.51.100.1 |
382 | 380 |
secondary ip: 192.0.2.3 |
... | ... | |
387 | 385 |
- instance4 |
388 | 386 |
secondary for instances: |
389 | 387 |
- instance1 |
390 |
node1# fping node3
|
|
388 |
$ fping %node3%
|
|
391 | 389 |
node3 is unreachable |
392 | 390 |
|
393 | 391 |
At this point, the primary instance of that node (instance4) is down, |
394 | 392 |
but the secondary instance (instance1) is not affected except it has |
395 | 393 |
lost disk redundancy:: |
396 | 394 |
|
397 |
node1# fping instance{1,4}
|
|
395 |
$ fping %instance{1,4}%
|
|
398 | 396 |
instance1 is alive |
399 | 397 |
instance4 is unreachable |
400 |
node1#
|
|
398 |
$
|
|
401 | 399 |
|
402 | 400 |
If we try to check the status of instance4 via the instance info |
403 | 401 |
command, it fails because it tries to contact node3 which is down:: |
404 | 402 |
|
405 |
node1# gnt-instance info instance4
|
|
403 |
$ gnt-instance info %instance4%
|
|
406 | 404 |
Failure: command execution error: |
407 | 405 |
Error checking node node3: Connection failed (113: No route to host) |
408 |
node1#
|
|
406 |
$
|
|
409 | 407 |
|
410 | 408 |
So we need to mark node3 as being *offline*, and thus Ganeti won't talk |
411 | 409 |
to it anymore:: |
412 | 410 |
|
413 |
node1# gnt-node modify -O yes -f node3
|
|
411 |
$ gnt-node modify -O yes -f %node3%
|
|
414 | 412 |
Mon Oct 26 04:34:12 2009 - WARNING: Not enough master candidates (desired 10, new value will be 2) |
415 | 413 |
Mon Oct 26 04:34:15 2009 - WARNING: Communication failure to node node3: Connection failed (113: No route to host) |
416 | 414 |
Modified node node3 |
417 | 415 |
- offline -> True |
418 | 416 |
- master_candidate -> auto-demotion due to offline |
419 |
node1#
|
|
417 |
$
|
|
420 | 418 |
|
421 | 419 |
And now we can failover the instance:: |
422 | 420 |
|
423 |
node1# gnt-instance failover --ignore-consistency instance4
|
|
421 |
$ gnt-instance failover --ignore-consistency %instance4%
|
|
424 | 422 |
Failover will happen to image instance4. This requires a shutdown of |
425 | 423 |
the instance. Continue? |
426 |
y/[n]/?: y
|
|
424 |
y/[n]/?: %y%
|
|
427 | 425 |
Mon Oct 26 04:35:34 2009 * checking disk consistency between source and target |
428 | 426 |
Failure: command execution error: |
429 | 427 |
Disk disk/0 is degraded on target node, aborting failover. |
430 |
node1# gnt-instance failover --ignore-consistency instance4
|
|
428 |
$ gnt-instance failover --ignore-consistency %instance4%
|
|
431 | 429 |
Failover will happen to image instance4. This requires a shutdown of |
432 | 430 |
the instance. Continue? |
433 | 431 |
y/[n]/?: y |
... | ... | |
439 | 437 |
Mon Oct 26 04:35:47 2009 * activating the instance's disks on target node |
440 | 438 |
Mon Oct 26 04:35:47 2009 - WARNING: Could not prepare block device disk/0 on node node3 (is_primary=False, pass=1): Node is marked offline |
441 | 439 |
Mon Oct 26 04:35:48 2009 * starting the instance on the target node |
442 |
node1#
|
|
440 |
$
|
|
443 | 441 |
|
444 | 442 |
Note in our first attempt, Ganeti refused to do the failover since it |
445 | 443 |
wasn't sure what is the status of the instance's disks. We pass the |
446 | 444 |
``--ignore-consistency`` flag and then we can failover:: |
447 | 445 |
|
448 |
node1# gnt-instance list
|
|
446 |
$ gnt-instance list
|
|
449 | 447 |
Instance Hypervisor OS Primary_node Status Memory |
450 | 448 |
instance1 xen-pvm debootstrap node2 running 128M |
451 | 449 |
instance2 xen-pvm debootstrap node1 running 128M |
452 | 450 |
instance3 xen-pvm debootstrap node1 running 128M |
453 | 451 |
instance4 xen-pvm debootstrap node1 running 128M |
454 |
node1#
|
|
452 |
$
|
|
455 | 453 |
|
456 | 454 |
But at this point, both instance1 and instance4 are without disk |
457 | 455 |
redundancy:: |
458 | 456 |
|
459 |
node1# gnt-instance info instance1
|
|
457 |
$ gnt-instance info %instance1%
|
|
460 | 458 |
Instance name: instance1 |
461 | 459 |
UUID: 45173e82-d1fa-417c-8758-7d582ab7eef4 |
462 | 460 |
Serial number: 2 |
... | ... | |
503 | 501 |
secondary node to a new one (in this case, we only have two working |
504 | 502 |
nodes, so all instances will be end on nodes one and two):: |
505 | 503 |
|
506 |
node1# gnt-node evacuate -I hail node3
|
|
504 |
$ gnt-node evacuate -I hail %node3%
|
|
507 | 505 |
Relocate instance(s) 'instance1','instance4' from node |
508 | 506 |
node3 using iallocator hail? |
509 |
y/[n]/?: y
|
|
507 |
y/[n]/?: %y%
|
|
510 | 508 |
Mon Oct 26 05:05:39 2009 - INFO: Selected new secondary for instance 'instance1': node1 |
511 | 509 |
Mon Oct 26 05:05:40 2009 - INFO: Selected new secondary for instance 'instance4': node2 |
512 | 510 |
Mon Oct 26 05:05:40 2009 Replacing disk(s) 0 for instance1 |
... | ... | |
527 | 525 |
Mon Oct 26 05:05:45 2009 - INFO: Attaching primary drbds to new secondary (standalone => connected) |
528 | 526 |
Mon Oct 26 05:05:46 2009 STEP 5/6 Sync devices |
529 | 527 |
Mon Oct 26 05:05:46 2009 - INFO: Waiting for instance instance1 to sync disks. |
530 |
Mon Oct 26 05:05:46 2009 - INFO: - device disk/0: 13.90% done, 7 estimated seconds remaining |
|
528 |
Mon Oct 26 05:05:46 2009 - INFO: - device disk/0: 13.90\% done, 7 estimated seconds remaining
|
|
531 | 529 |
Mon Oct 26 05:05:53 2009 - INFO: Instance instance1's disks are in sync. |
532 | 530 |
Mon Oct 26 05:05:53 2009 STEP 6/6 Removing old storage |
533 | 531 |
Mon Oct 26 05:05:53 2009 - INFO: Remove logical volumes for 0 |
... | ... | |
553 | 551 |
Mon Oct 26 05:05:55 2009 - INFO: Attaching primary drbds to new secondary (standalone => connected) |
554 | 552 |
Mon Oct 26 05:05:56 2009 STEP 5/6 Sync devices |
555 | 553 |
Mon Oct 26 05:05:56 2009 - INFO: Waiting for instance instance4 to sync disks. |
556 |
Mon Oct 26 05:05:56 2009 - INFO: - device disk/0: 12.40% done, 8 estimated seconds remaining |
|
554 |
Mon Oct 26 05:05:56 2009 - INFO: - device disk/0: 12.40\% done, 8 estimated seconds remaining
|
|
557 | 555 |
Mon Oct 26 05:06:04 2009 - INFO: Instance instance4's disks are in sync. |
558 | 556 |
Mon Oct 26 05:06:04 2009 STEP 6/6 Removing old storage |
559 | 557 |
Mon Oct 26 05:06:04 2009 - INFO: Remove logical volumes for 0 |
... | ... | |
561 | 559 |
Mon Oct 26 05:06:04 2009 Hint: remove unused LVs manually |
562 | 560 |
Mon Oct 26 05:06:04 2009 - WARNING: Can't remove old LV: Node is marked offline |
563 | 561 |
Mon Oct 26 05:06:04 2009 Hint: remove unused LVs manually |
564 |
node1#
|
|
562 |
$
|
|
565 | 563 |
|
566 | 564 |
And now node3 is completely free of instances and can be repaired:: |
567 | 565 |
|
568 |
node1# gnt-node list
|
|
566 |
$ gnt-node list
|
|
569 | 567 |
Node DTotal DFree MTotal MNode MFree Pinst Sinst |
570 | 568 |
node1 1.3T 1.3T 32.0G 1.0G 30.2G 3 1 |
571 | 569 |
node2 1.3T 1.3T 32.0G 1.0G 30.4G 1 3 |
... | ... | |
574 | 572 |
Re-adding a node to the cluster |
575 | 573 |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
576 | 574 |
|
577 |
|
|
578 | 575 |
Let's say node3 has been repaired and is now ready to be |
579 | 576 |
reused. Re-adding it is simple:: |
580 | 577 |
|
581 |
node1# gnt-node add --readd node3
|
|
578 |
$ gnt-node add --readd %node3%
|
|
582 | 579 |
The authenticity of host 'node3 (198.51.100.1)' can't be established. |
583 | 580 |
RSA key fingerprint is 9f:2e:5a:2e:e0:bd:00:09:e4:5c:32:f2:27:57:7a:f4. |
584 | 581 |
Are you sure you want to continue connecting (yes/no)? yes |
... | ... | |
587 | 584 |
|
588 | 585 |
And it is now working again:: |
589 | 586 |
|
590 |
node1# gnt-node list
|
|
587 |
$ gnt-node list
|
|
591 | 588 |
Node DTotal DFree MTotal MNode MFree Pinst Sinst |
592 | 589 |
node1 1.3T 1.3T 32.0G 1.0G 30.2G 3 1 |
593 | 590 |
node2 1.3T 1.3T 32.0G 1.0G 30.4G 1 3 |
... | ... | |
608 | 605 |
Let take the cluster status in the above listing, and check what volumes |
609 | 606 |
are in use:: |
610 | 607 |
|
611 |
node1# gnt-node volumes -o phys,instance node2
|
|
608 |
$ gnt-node volumes -o phys,instance %node2%
|
|
612 | 609 |
PhysDev Instance |
613 | 610 |
/dev/sdb1 instance4 |
614 | 611 |
/dev/sdb1 instance4 |
... | ... | |
618 | 615 |
/dev/sdb1 instance3 |
619 | 616 |
/dev/sdb1 instance2 |
620 | 617 |
/dev/sdb1 instance2 |
621 |
node1#
|
|
618 |
$
|
|
622 | 619 |
|
623 | 620 |
You can see that all instances on node2 have logical volumes on |
624 | 621 |
``/dev/sdb1``. Let's simulate a disk failure on that disk:: |
625 | 622 |
|
626 |
node1# ssh node2 |
|
627 |
node2# echo offline > /sys/block/sdb/device/state |
|
628 |
node2# vgs |
|
623 |
$ ssh node2 |
|
624 |
# on node2 |
|
625 |
$ echo offline > /sys/block/sdb/device/state |
|
626 |
$ vgs |
|
629 | 627 |
/dev/sdb1: read failed after 0 of 4096 at 0: Input/output error |
630 | 628 |
/dev/sdb1: read failed after 0 of 4096 at 750153695232: Input/output error |
631 | 629 |
/dev/sdb1: read failed after 0 of 4096 at 0: Input/output error |
... | ... | |
636 | 634 |
Couldn't find device with uuid '954bJA-mNL0-7ydj-sdpW-nc2C-ZrCi-zFp91c'. |
637 | 635 |
Couldn't find all physical volumes for volume group xenvg. |
638 | 636 |
Volume group xenvg not found |
639 |
node2#
|
|
637 |
$
|
|
640 | 638 |
|
641 | 639 |
At this point, the node is broken and if we are to examine |
642 | 640 |
instance2 we get (simplified output shown):: |
643 | 641 |
|
644 |
node1# gnt-instance info instance2
|
|
642 |
$ gnt-instance info %instance2%
|
|
645 | 643 |
Instance name: instance2 |
646 | 644 |
State: configured to be up, actual state is up |
647 | 645 |
Nodes: |
... | ... | |
655 | 653 |
This instance has a secondary only on node2. Let's verify a primary |
656 | 654 |
instance of node2:: |
657 | 655 |
|
658 |
node1# gnt-instance info instance1
|
|
656 |
$ gnt-instance info %instance1%
|
|
659 | 657 |
Instance name: instance1 |
660 | 658 |
State: configured to be up, actual state is up |
661 | 659 |
Nodes: |
... | ... | |
665 | 663 |
- disk/0: drbd8, size 256M |
666 | 664 |
on primary: /dev/drbd0 (147:0) in sync, status *DEGRADED* *MISSING DISK* |
667 | 665 |
on secondary: /dev/drbd3 (147:3) in sync, status ok |
668 |
node1# gnt-instance console instance1
|
|
666 |
$ gnt-instance console %instance1%
|
|
669 | 667 |
|
670 | 668 |
Debian GNU/Linux 5.0 instance1 tty1 |
671 | 669 |
|
... | ... | |
697 | 695 |
|
698 | 696 |
:: |
699 | 697 |
|
700 |
node1# gnt-node repair-storage node2 lvm-vg xenvg
|
|
698 |
$ gnt-node repair-storage %node2% lvm-vg %xenvg%
|
|
701 | 699 |
Mon Oct 26 18:14:03 2009 Repairing storage unit 'xenvg' on node2 ... |
702 |
node1# ssh node2 vgs
|
|
703 |
VG #PV #LV #SN Attr VSize VFree
|
|
704 |
xenvg 1 8 0 wz--n- 673.84G 673.84G
|
|
705 |
node1#
|
|
700 |
$ ssh %node2% vgs
|
|
701 |
VG #PV #LV #SN Attr VSize VFree |
|
702 |
xenvg 1 8 0 wz--n- 673.84G 673.84G |
|
703 |
$
|
|
706 | 704 |
|
707 | 705 |
This has removed the 'bad' disk from the volume group, which is now left |
708 | 706 |
with only one PV. We can now replace the disks for the involved |
709 | 707 |
instances:: |
710 | 708 |
|
711 |
node1# for i in instance{1..4}; do gnt-instance replace-disks -a $i; done
|
|
709 |
$ for i in %instance{1..4}%; do gnt-instance replace-disks -a $i; done
|
|
712 | 710 |
Mon Oct 26 18:15:38 2009 Replacing disk(s) 0 for instance1 |
713 | 711 |
Mon Oct 26 18:15:38 2009 STEP 1/6 Check device existence |
714 | 712 |
Mon Oct 26 18:15:38 2009 - INFO: Checking disk/0 on node1 |
... | ... | |
725 | 723 |
Mon Oct 26 18:15:40 2009 - INFO: Adding new mirror component on node2 |
726 | 724 |
Mon Oct 26 18:15:41 2009 STEP 5/6 Sync devices |
727 | 725 |
Mon Oct 26 18:15:41 2009 - INFO: Waiting for instance instance1 to sync disks. |
728 |
Mon Oct 26 18:15:41 2009 - INFO: - device disk/0: 12.40% done, 9 estimated seconds remaining |
|
726 |
Mon Oct 26 18:15:41 2009 - INFO: - device disk/0: 12.40\% done, 9 estimated seconds remaining
|
|
729 | 727 |
Mon Oct 26 18:15:50 2009 - INFO: Instance instance1's disks are in sync. |
730 | 728 |
Mon Oct 26 18:15:50 2009 STEP 6/6 Removing old storage |
731 | 729 |
Mon Oct 26 18:15:50 2009 - INFO: Remove logical volumes for disk/0 |
... | ... | |
744 | 742 |
… |
745 | 743 |
Mon Oct 26 18:16:18 2009 STEP 6/6 Removing old storage |
746 | 744 |
Mon Oct 26 18:16:18 2009 - INFO: Remove logical volumes for disk/0 |
747 |
node1#
|
|
745 |
$
|
|
748 | 746 |
|
749 | 747 |
As this point, all instances should be healthy again. |
750 | 748 |
|
... | ... | |
753 | 751 |
with argument ``-p`` and once secondary instances with argument |
754 | 752 |
``-s``, but otherwise the operations are similar:: |
755 | 753 |
|
756 |
node1# gnt-instance replace-disks -p instance1
|
|
754 |
$ gnt-instance replace-disks -p instance1
|
|
757 | 755 |
… |
758 |
node1# for i in instance{2..4}; do gnt-instance replace-disks -s $i; done
|
|
756 |
$ for i in %instance{2..4}%; do gnt-instance replace-disks -s $i; done
|
|
759 | 757 |
|
760 | 758 |
Common cluster problems |
761 | 759 |
----------------------- |
... | ... | |
766 | 764 |
previously and re-added to the cluster without reinstallation. Running |
767 | 765 |
cluster verify on the cluster reports:: |
768 | 766 |
|
769 |
node1# gnt-cluster verify
|
|
767 |
$ gnt-cluster verify
|
|
770 | 768 |
Mon Oct 26 18:30:08 2009 * Verifying global settings |
771 | 769 |
Mon Oct 26 18:30:08 2009 * Gathering data (3 nodes) |
772 | 770 |
Mon Oct 26 18:30:10 2009 * Verifying node status |
... | ... | |
783 | 781 |
Mon Oct 26 18:30:10 2009 * Verifying N+1 Memory redundancy |
784 | 782 |
Mon Oct 26 18:30:10 2009 * Other Notes |
785 | 783 |
Mon Oct 26 18:30:10 2009 * Hooks Results |
786 |
node1#
|
|
784 |
$
|
|
787 | 785 |
|
788 | 786 |
Instance status |
789 | 787 |
+++++++++++++++ |
... | ... | |
796 | 794 |
Ganeti doesn't directly handle this case. It is recommended to logon to |
797 | 795 |
node3 and run:: |
798 | 796 |
|
799 |
node3# xm destroy instance4
|
|
797 |
$ xm destroy %instance4%
|
|
800 | 798 |
|
801 | 799 |
Unallocated DRBD minors |
802 | 800 |
+++++++++++++++++++++++ |
... | ... | |
804 | 802 |
There are still unallocated DRBD minors on node3. Again, these are not |
805 | 803 |
handled by Ganeti directly and need to be cleaned up via DRBD commands:: |
806 | 804 |
|
807 |
node3# drbdsetup /dev/drbd0 down |
|
808 |
node3# drbdsetup /dev/drbd1 down |
|
809 |
node3# |
|
805 |
$ ssh %node3% |
|
806 |
# on node 3 |
|
807 |
$ drbdsetup /dev/drbd%0% down |
|
808 |
$ drbdsetup /dev/drbd%1% down |
|
809 |
$ |
|
810 | 810 |
|
811 | 811 |
Orphan volumes |
812 | 812 |
++++++++++++++ |
... | ... | |
816 | 816 |
disk-replace, or similar situation where Ganeti was not able to recover |
817 | 817 |
automatically. Here you need to remove them manually via LVM commands:: |
818 | 818 |
|
819 |
node3# lvremove xenvg |
|
820 |
Do you really want to remove active logical volume "22459cf8-117d-4bea-a1aa-791667d07800.disk0_data"? [y/n]: y |
|
819 |
$ ssh %node3% |
|
820 |
# on node3 |
|
821 |
$ lvremove %xenvg% |
|
822 |
Do you really want to remove active logical volume "22459cf8-117d-4bea-a1aa-791667d07800.disk0_data"? [y/n]: %y% |
|
821 | 823 |
Logical volume "22459cf8-117d-4bea-a1aa-791667d07800.disk0_data" successfully removed |
822 |
Do you really want to remove active logical volume "22459cf8-117d-4bea-a1aa-791667d07800.disk0_meta"? [y/n]: y
|
|
824 |
Do you really want to remove active logical volume "22459cf8-117d-4bea-a1aa-791667d07800.disk0_meta"? [y/n]: %y%
|
|
823 | 825 |
Logical volume "22459cf8-117d-4bea-a1aa-791667d07800.disk0_meta" successfully removed |
824 |
Do you really want to remove active logical volume "1aaf4716-e57f-4101-a8d6-03af5da9dc50.disk0_data"? [y/n]: y
|
|
826 |
Do you really want to remove active logical volume "1aaf4716-e57f-4101-a8d6-03af5da9dc50.disk0_data"? [y/n]: %y%
|
|
825 | 827 |
Logical volume "1aaf4716-e57f-4101-a8d6-03af5da9dc50.disk0_data" successfully removed |
826 |
Do you really want to remove active logical volume "1aaf4716-e57f-4101-a8d6-03af5da9dc50.disk0_meta"? [y/n]: y
|
|
828 |
Do you really want to remove active logical volume "1aaf4716-e57f-4101-a8d6-03af5da9dc50.disk0_meta"? [y/n]: %y%
|
|
827 | 829 |
Logical volume "1aaf4716-e57f-4101-a8d6-03af5da9dc50.disk0_meta" successfully removed |
828 | 830 |
node3# |
829 | 831 |
|
830 | 832 |
At this point cluster verify shouldn't complain anymore:: |
831 | 833 |
|
832 |
node1# gnt-cluster verify
|
|
834 |
$ gnt-cluster verify
|
|
833 | 835 |
Mon Oct 26 18:37:51 2009 * Verifying global settings |
834 | 836 |
Mon Oct 26 18:37:51 2009 * Gathering data (3 nodes) |
835 | 837 |
Mon Oct 26 18:37:53 2009 * Verifying node status |
... | ... | |
839 | 841 |
Mon Oct 26 18:37:53 2009 * Verifying N+1 Memory redundancy |
840 | 842 |
Mon Oct 26 18:37:53 2009 * Other Notes |
841 | 843 |
Mon Oct 26 18:37:53 2009 * Hooks Results |
842 |
node1#
|
|
844 |
$
|
|
843 | 845 |
|
844 | 846 |
N+1 errors |
845 | 847 |
++++++++++ |
... | ... | |
856 | 858 |
instances, two on node2:node3 with 8GB of RAM and one on node1:node2, |
857 | 859 |
with 12GB of RAM (numbers chosen so that we run out of memory):: |
858 | 860 |
|
859 |
node1# gnt-instance modify -B memory=4G instance1
|
|
861 |
$ gnt-instance modify -B memory=%4G% %instance1%
|
|
860 | 862 |
Modified instance instance1 |
861 | 863 |
- be/maxmem -> 4096 |
862 | 864 |
- be/minmem -> 4096 |
863 | 865 |
Please don't forget that these parameters take effect only at the next start of the instance. |
864 |
node1# gnt-instance modify …
|
|
866 |
$ gnt-instance modify …
|
|
865 | 867 |
|
866 |
node1# gnt-instance add -t drbd -n node2:node3 -s 512m -B memory=8G -o debootstrap instance5
|
|
868 |
$ gnt-instance add -t drbd -n %node2%:%node3% -s %512m% -B memory=%8G% -o %debootstrap% %instance5%
|
|
867 | 869 |
… |
868 |
node1# gnt-instance add -t drbd -n node2:node3 -s 512m -B memory=8G -o debootstrap instance6
|
|
870 |
$ gnt-instance add -t drbd -n %node2%:%node3% -s %512m% -B memory=%8G% -o %debootstrap% %instance6%
|
|
869 | 871 |
… |
870 |
node1# gnt-instance add -t drbd -n node1:node2 -s 512m -B memory=8G -o debootstrap instance7
|
|
871 |
node1# gnt-instance reboot --all
|
|
872 |
$ gnt-instance add -t drbd -n %node1%:%node2% -s %512m% -B memory=%8G% -o %debootstrap% %instance7%
|
|
873 |
$ gnt-instance reboot --all
|
|
872 | 874 |
The reboot will operate on 7 instances. |
873 | 875 |
Do you want to continue? |
874 | 876 |
Affected instances: |
... | ... | |
879 | 881 |
instance5 |
880 | 882 |
instance6 |
881 | 883 |
instance7 |
882 |
y/[n]/?: y
|
|
884 |
y/[n]/?: %y%
|
|
883 | 885 |
Submitted jobs 677, 678, 679, 680, 681, 682, 683 |
884 | 886 |
Waiting for job 677 for instance1... |
885 | 887 |
Waiting for job 678 for instance2... |
... | ... | |
888 | 890 |
Waiting for job 681 for instance5... |
889 | 891 |
Waiting for job 682 for instance6... |
890 | 892 |
Waiting for job 683 for instance7... |
891 |
node1#
|
|
893 |
$
|
|
892 | 894 |
|
893 |
We rebooted instances for the memory changes to have effect. Now the |
|
895 |
We rebooted the instances for the memory changes to have effect. Now the
|
|
894 | 896 |
cluster looks like:: |
895 | 897 |
|
896 |
node1# gnt-node list
|
|
898 |
$ gnt-node list
|
|
897 | 899 |
Node DTotal DFree MTotal MNode MFree Pinst Sinst |
898 | 900 |
node1 1.3T 1.3T 32.0G 1.0G 6.5G 4 1 |
899 | 901 |
node2 1.3T 1.3T 32.0G 1.0G 10.5G 3 4 |
900 | 902 |
node3 1.3T 1.3T 32.0G 1.0G 30.5G 0 2 |
901 |
node1# gnt-cluster verify
|
|
903 |
$ gnt-cluster verify
|
|
902 | 904 |
Mon Oct 26 18:59:36 2009 * Verifying global settings |
903 | 905 |
Mon Oct 26 18:59:36 2009 * Gathering data (3 nodes) |
904 | 906 |
Mon Oct 26 18:59:37 2009 * Verifying node status |
... | ... | |
909 | 911 |
Mon Oct 26 18:59:37 2009 - ERROR: node node2: not enough memory to accommodate instance failovers should node node1 fail |
910 | 912 |
Mon Oct 26 18:59:37 2009 * Other Notes |
911 | 913 |
Mon Oct 26 18:59:37 2009 * Hooks Results |
912 |
node1#
|
|
914 |
$
|
|
913 | 915 |
|
914 | 916 |
The cluster verify error above shows that if node1 fails, node2 will not |
915 | 917 |
have enough memory to failover all primary instances on node1 to it. To |
... | ... | |
937 | 939 |
network, as problems with the primary network will render the node |
938 | 940 |
unusable for ganeti commands), it will show up in cluster verify as:: |
939 | 941 |
|
940 |
node1# gnt-cluster verify
|
|
942 |
$ gnt-cluster verify
|
|
941 | 943 |
Mon Oct 26 19:07:19 2009 * Verifying global settings |
942 | 944 |
Mon Oct 26 19:07:19 2009 * Gathering data (3 nodes) |
943 | 945 |
Mon Oct 26 19:07:23 2009 * Verifying node status |
... | ... | |
952 | 954 |
Mon Oct 26 19:07:23 2009 * Verifying N+1 Memory redundancy |
953 | 955 |
Mon Oct 26 19:07:23 2009 * Other Notes |
954 | 956 |
Mon Oct 26 19:07:23 2009 * Hooks Results |
955 |
node1#
|
|
957 |
$
|
|
956 | 958 |
|
957 | 959 |
This shows that both node1 and node2 have problems contacting node3 over |
958 | 960 |
the secondary network, and node3 has problems contacting them. From this |
... | ... | |
975 | 977 |
data on its primary node (i.e. not showing as degraded). If so, you can |
976 | 978 |
simply run:: |
977 | 979 |
|
978 |
node1# gnt-instance migrate --cleanup instance1
|
|
980 |
$ gnt-instance migrate --cleanup %instance1%
|
|
979 | 981 |
Instance instance1 will be recovered from a failed migration. Note |
980 | 982 |
that the migration procedure (including cleanup) is **experimental** |
981 | 983 |
in this version. This might impact the instance if anything goes |
982 | 984 |
wrong. Continue? |
983 |
y/[n]/?: y
|
|
985 |
y/[n]/?: %y%
|
|
984 | 986 |
Mon Oct 26 19:13:49 2009 Migrating instance instance1 |
985 | 987 |
Mon Oct 26 19:13:49 2009 * checking where the instance actually runs (if this hangs, the hypervisor might be in a bad state) |
986 | 988 |
Mon Oct 26 19:13:49 2009 * instance confirmed to be running on its primary node (node2) |
... | ... | |
990 | 992 |
Mon Oct 26 19:13:50 2009 * changing disks into single-master mode |
991 | 993 |
Mon Oct 26 19:13:50 2009 * wait until resync is done |
992 | 994 |
Mon Oct 26 19:13:51 2009 * done |
993 |
node1#
|
|
995 |
$
|
|
994 | 996 |
|
995 | 997 |
In use disks at instance shutdown |
996 | 998 |
+++++++++++++++++++++++++++++++++ |
... | ... | |
998 | 1000 |
If you see something like the following when trying to shutdown or |
999 | 1001 |
deactivate disks for an instance:: |
1000 | 1002 |
|
1001 |
node1# gnt-instance shutdown instance1
|
|
1003 |
$ gnt-instance shutdown %instance1%
|
|
1002 | 1004 |
Mon Oct 26 19:16:23 2009 - WARNING: Could not shutdown block device disk/0 on node node2: drbd0: can't shutdown drbd device: /dev/drbd0: State change failed: (-12) Device is held open by someone\n |
1003 | 1005 |
|
1004 | 1006 |
It most likely means something is holding open the underlying DRBD |
... | ... | |
1018 | 1020 |
|
1019 | 1021 |
For Xen, check if it's not using the disks itself:: |
1020 | 1022 |
|
1021 |
node1# xenstore-ls /local/domain/0/backend/vbd|grep -e "domain =" -e physical-device
|
|
1023 |
$ xenstore-ls /local/domain/%0%/backend/vbd|grep -e "domain =" -e physical-device
|
|
1022 | 1024 |
domain = "instance2" |
1023 | 1025 |
physical-device = "93:0" |
1024 | 1026 |
domain = "instance3" |
1025 | 1027 |
physical-device = "93:1" |
1026 | 1028 |
domain = "instance4" |
1027 | 1029 |
physical-device = "93:2" |
1028 |
node1#
|
|
1030 |
$
|
|
1029 | 1031 |
|
1030 | 1032 |
You can see in the above output that the node exports three disks, to |
1031 | 1033 |
three instances. The ``physical-device`` key is in major:minor format in |
1032 |
hexadecimal, and 0x93 represents DRBD's major number. Thus we can see
|
|
1033 |
from the above that instance2 has /dev/drbd0, instance3 /dev/drbd1, and
|
|
1034 |
instance4 /dev/drbd2. |
|
1034 |
hexadecimal, and ``0x93`` represents DRBD's major number. Thus we can
|
|
1035 |
see from the above that instance2 has /dev/drbd0, instance3 /dev/drbd1,
|
|
1036 |
and instance4 /dev/drbd2.
|
|
1035 | 1037 |
|
1036 | 1038 |
LUXI version mismatch |
1037 | 1039 |
+++++++++++++++++++++ |
... | ... | |
1040 | 1042 |
master daemon. Starting in Ganeti 2.3, the peers exchange their version |
1041 | 1043 |
in each message. When they don't match, an error is raised:: |
1042 | 1044 |
|
1043 |
$ gnt-node modify -O yes node3
|
|
1045 |
$ gnt-node modify -O yes %node3%
|
|
1044 | 1046 |
Unhandled Ganeti error: LUXI version mismatch, server 2020000, request 2030000 |
1045 | 1047 |
|
1046 | 1048 |
Usually this means that server and client are from different Ganeti |
Also available in: Unified diff