root / doc / walkthrough.rst @ 33c730a2
History | View | Annotate | Download (42.6 kB)
1 |
Ganeti walk-through |
---|---|
2 |
=================== |
3 |
|
4 |
Documents Ganeti version |version| |
5 |
|
6 |
.. contents:: |
7 |
|
8 |
.. highlight:: shell-example |
9 |
|
10 |
Introduction |
11 |
------------ |
12 |
|
13 |
This document serves as a more example-oriented guide to Ganeti; while |
14 |
the administration guide shows a conceptual approach, here you will find |
15 |
a step-by-step example to managing instances and the cluster. |
16 |
|
17 |
Our simulated, example cluster will have three machines, named |
18 |
``node1``, ``node2``, ``node3``. Note that in real life machines will |
19 |
usually have FQDNs but here we use short names for brevity. We will use |
20 |
a secondary network for replication data, ``192.0.2.0/24``, with nodes |
21 |
having the last octet the same as their index. The cluster name will be |
22 |
``example-cluster``. All nodes have the same simulated hardware |
23 |
configuration, two disks of 750GB, 32GB of memory and 4 CPUs. |
24 |
|
25 |
On this cluster, we will create up to seven instances, named |
26 |
``instance1`` to ``instance7``. |
27 |
|
28 |
|
29 |
Cluster creation |
30 |
---------------- |
31 |
|
32 |
Follow the :doc:`install` document and prepare the nodes. Then it's time |
33 |
to initialise the cluster:: |
34 |
|
35 |
$ gnt-cluster init -s %192.0.2.1% --enabled-hypervisors=xen-pvm %example-cluster% |
36 |
$ |
37 |
|
38 |
The creation was fine. Let's check that one node we have is functioning |
39 |
correctly:: |
40 |
|
41 |
$ gnt-node list |
42 |
Node DTotal DFree MTotal MNode MFree Pinst Sinst |
43 |
node1 1.3T 1.3T 32.0G 1.0G 30.5G 0 0 |
44 |
$ gnt-cluster verify |
45 |
Mon Oct 26 02:08:51 2009 * Verifying global settings |
46 |
Mon Oct 26 02:08:51 2009 * Gathering data (1 nodes) |
47 |
Mon Oct 26 02:08:52 2009 * Verifying node status |
48 |
Mon Oct 26 02:08:52 2009 * Verifying instance status |
49 |
Mon Oct 26 02:08:52 2009 * Verifying orphan volumes |
50 |
Mon Oct 26 02:08:52 2009 * Verifying remaining instances |
51 |
Mon Oct 26 02:08:52 2009 * Verifying N+1 Memory redundancy |
52 |
Mon Oct 26 02:08:52 2009 * Other Notes |
53 |
Mon Oct 26 02:08:52 2009 * Hooks Results |
54 |
$ |
55 |
|
56 |
Since this proceeded correctly, let's add the other two nodes:: |
57 |
|
58 |
$ gnt-node add -s %192.0.2.2% %node2% |
59 |
-- WARNING -- |
60 |
Performing this operation is going to replace the ssh daemon keypair |
61 |
on the target machine (node2) with the ones of the current one |
62 |
and grant full intra-cluster ssh root access to/from it |
63 |
|
64 |
Unable to verify hostkey of host xen-devi-5.fra.corp.google.com: |
65 |
f7:…. Do you want to accept it? |
66 |
y/[n]/?: %y% |
67 |
Mon Oct 26 02:11:53 2009 Authentication to node2 via public key failed, trying password |
68 |
root password: |
69 |
Mon Oct 26 02:11:54 2009 - INFO: Node will be a master candidate |
70 |
$ gnt-node add -s %192.0.2.3% %node3% |
71 |
-- WARNING -- |
72 |
Performing this operation is going to replace the ssh daemon keypair |
73 |
on the target machine (node3) with the ones of the current one |
74 |
and grant full intra-cluster ssh root access to/from it |
75 |
|
76 |
… |
77 |
Mon Oct 26 02:12:43 2009 - INFO: Node will be a master candidate |
78 |
|
79 |
Checking the cluster status again:: |
80 |
|
81 |
$ gnt-node list |
82 |
Node DTotal DFree MTotal MNode MFree Pinst Sinst |
83 |
node1 1.3T 1.3T 32.0G 1.0G 30.5G 0 0 |
84 |
node2 1.3T 1.3T 32.0G 1.0G 30.5G 0 0 |
85 |
node3 1.3T 1.3T 32.0G 1.0G 30.5G 0 0 |
86 |
$ gnt-cluster verify |
87 |
Mon Oct 26 02:15:14 2009 * Verifying global settings |
88 |
Mon Oct 26 02:15:14 2009 * Gathering data (3 nodes) |
89 |
Mon Oct 26 02:15:16 2009 * Verifying node status |
90 |
Mon Oct 26 02:15:16 2009 * Verifying instance status |
91 |
Mon Oct 26 02:15:16 2009 * Verifying orphan volumes |
92 |
Mon Oct 26 02:15:16 2009 * Verifying remaining instances |
93 |
Mon Oct 26 02:15:16 2009 * Verifying N+1 Memory redundancy |
94 |
Mon Oct 26 02:15:16 2009 * Other Notes |
95 |
Mon Oct 26 02:15:16 2009 * Hooks Results |
96 |
$ |
97 |
|
98 |
And let's check that we have a valid OS:: |
99 |
|
100 |
$ gnt-os list |
101 |
Name |
102 |
debootstrap |
103 |
node1# |
104 |
|
105 |
Running a burn-in |
106 |
----------------- |
107 |
|
108 |
Now that the cluster is created, it is time to check that the hardware |
109 |
works correctly, that the hypervisor can actually create instances, |
110 |
etc. This is done via the debootstrap tool as described in the admin |
111 |
guide. Similar output lines are replaced with ``…`` in the below log:: |
112 |
|
113 |
$ /usr/lib/ganeti/tools/burnin -o debootstrap -p instance{1..5} |
114 |
- Testing global parameters |
115 |
- Creating instances |
116 |
* instance instance1 |
117 |
on node1, node2 |
118 |
* instance instance2 |
119 |
on node2, node3 |
120 |
… |
121 |
* instance instance5 |
122 |
on node2, node3 |
123 |
* Submitted job ID(s) 157, 158, 159, 160, 161 |
124 |
waiting for job 157 for instance1 |
125 |
… |
126 |
waiting for job 161 for instance5 |
127 |
- Replacing disks on the same nodes |
128 |
* instance instance1 |
129 |
run replace_on_secondary |
130 |
run replace_on_primary |
131 |
… |
132 |
* instance instance5 |
133 |
run replace_on_secondary |
134 |
run replace_on_primary |
135 |
* Submitted job ID(s) 162, 163, 164, 165, 166 |
136 |
waiting for job 162 for instance1 |
137 |
… |
138 |
- Changing the secondary node |
139 |
* instance instance1 |
140 |
run replace_new_secondary node3 |
141 |
* instance instance2 |
142 |
run replace_new_secondary node1 |
143 |
… |
144 |
* instance instance5 |
145 |
run replace_new_secondary node1 |
146 |
* Submitted job ID(s) 167, 168, 169, 170, 171 |
147 |
waiting for job 167 for instance1 |
148 |
… |
149 |
- Growing disks |
150 |
* instance instance1 |
151 |
increase disk/0 by 128 MB |
152 |
… |
153 |
* instance instance5 |
154 |
increase disk/0 by 128 MB |
155 |
* Submitted job ID(s) 173, 174, 175, 176, 177 |
156 |
waiting for job 173 for instance1 |
157 |
… |
158 |
- Failing over instances |
159 |
* instance instance1 |
160 |
… |
161 |
* instance instance5 |
162 |
* Submitted job ID(s) 179, 180, 181, 182, 183 |
163 |
waiting for job 179 for instance1 |
164 |
… |
165 |
- Migrating instances |
166 |
* instance instance1 |
167 |
migration and migration cleanup |
168 |
… |
169 |
* instance instance5 |
170 |
migration and migration cleanup |
171 |
* Submitted job ID(s) 184, 185, 186, 187, 188 |
172 |
waiting for job 184 for instance1 |
173 |
… |
174 |
- Exporting and re-importing instances |
175 |
* instance instance1 |
176 |
export to node node3 |
177 |
remove instance |
178 |
import from node3 to node1, node2 |
179 |
remove export |
180 |
… |
181 |
* instance instance5 |
182 |
export to node node1 |
183 |
remove instance |
184 |
import from node1 to node2, node3 |
185 |
remove export |
186 |
* Submitted job ID(s) 196, 197, 198, 199, 200 |
187 |
waiting for job 196 for instance1 |
188 |
… |
189 |
- Reinstalling instances |
190 |
* instance instance1 |
191 |
reinstall without passing the OS |
192 |
reinstall specifying the OS |
193 |
… |
194 |
* instance instance5 |
195 |
reinstall without passing the OS |
196 |
reinstall specifying the OS |
197 |
* Submitted job ID(s) 203, 204, 205, 206, 207 |
198 |
waiting for job 203 for instance1 |
199 |
… |
200 |
- Rebooting instances |
201 |
* instance instance1 |
202 |
reboot with type 'hard' |
203 |
reboot with type 'soft' |
204 |
reboot with type 'full' |
205 |
… |
206 |
* instance instance5 |
207 |
reboot with type 'hard' |
208 |
reboot with type 'soft' |
209 |
reboot with type 'full' |
210 |
* Submitted job ID(s) 208, 209, 210, 211, 212 |
211 |
waiting for job 208 for instance1 |
212 |
… |
213 |
- Adding and removing disks |
214 |
* instance instance1 |
215 |
adding a disk |
216 |
removing last disk |
217 |
… |
218 |
* instance instance5 |
219 |
adding a disk |
220 |
removing last disk |
221 |
* Submitted job ID(s) 213, 214, 215, 216, 217 |
222 |
waiting for job 213 for instance1 |
223 |
… |
224 |
- Adding and removing NICs |
225 |
* instance instance1 |
226 |
adding a NIC |
227 |
removing last NIC |
228 |
… |
229 |
* instance instance5 |
230 |
adding a NIC |
231 |
removing last NIC |
232 |
* Submitted job ID(s) 218, 219, 220, 221, 222 |
233 |
waiting for job 218 for instance1 |
234 |
… |
235 |
- Activating/deactivating disks |
236 |
* instance instance1 |
237 |
activate disks when online |
238 |
activate disks when offline |
239 |
deactivate disks (when offline) |
240 |
… |
241 |
* instance instance5 |
242 |
activate disks when online |
243 |
activate disks when offline |
244 |
deactivate disks (when offline) |
245 |
* Submitted job ID(s) 223, 224, 225, 226, 227 |
246 |
waiting for job 223 for instance1 |
247 |
… |
248 |
- Stopping and starting instances |
249 |
* instance instance1 |
250 |
… |
251 |
* instance instance5 |
252 |
* Submitted job ID(s) 230, 231, 232, 233, 234 |
253 |
waiting for job 230 for instance1 |
254 |
… |
255 |
- Removing instances |
256 |
* instance instance1 |
257 |
… |
258 |
* instance instance5 |
259 |
* Submitted job ID(s) 235, 236, 237, 238, 239 |
260 |
waiting for job 235 for instance1 |
261 |
… |
262 |
$ |
263 |
|
264 |
You can see in the above what operations the burn-in does. Ideally, the |
265 |
burn-in log would proceed successfully through all the steps and end |
266 |
cleanly, without throwing errors. |
267 |
|
268 |
Instance operations |
269 |
------------------- |
270 |
|
271 |
Creation |
272 |
++++++++ |
273 |
|
274 |
At this point, Ganeti and the hardware seems to be functioning |
275 |
correctly, so we'll follow up with creating the instances manually:: |
276 |
|
277 |
$ gnt-instance add -t drbd -o debootstrap -s %256m% %instance3% |
278 |
Mon Oct 26 04:06:52 2009 - INFO: Selected nodes for instance instance1 via iallocator hail: node2, node3 |
279 |
Mon Oct 26 04:06:53 2009 * creating instance disks... |
280 |
Mon Oct 26 04:06:57 2009 adding instance instance1 to cluster config |
281 |
Mon Oct 26 04:06:57 2009 - INFO: Waiting for instance instance1 to sync disks. |
282 |
Mon Oct 26 04:06:57 2009 - INFO: - device disk/0: 20.00\% done, 4 estimated seconds remaining |
283 |
Mon Oct 26 04:07:01 2009 - INFO: Instance instance1's disks are in sync. |
284 |
Mon Oct 26 04:07:01 2009 creating os for instance instance1 on node node2 |
285 |
Mon Oct 26 04:07:01 2009 * running the instance OS create scripts... |
286 |
Mon Oct 26 04:07:14 2009 * starting instance... |
287 |
$ gnt-instance add -t drbd -o debootstrap -s %256m% -n %node1%:%node2% %instance2% |
288 |
Mon Oct 26 04:11:37 2009 * creating instance disks... |
289 |
Mon Oct 26 04:11:40 2009 adding instance instance2 to cluster config |
290 |
Mon Oct 26 04:11:41 2009 - INFO: Waiting for instance instance2 to sync disks. |
291 |
Mon Oct 26 04:11:41 2009 - INFO: - device disk/0: 35.40\% done, 1 estimated seconds remaining |
292 |
Mon Oct 26 04:11:42 2009 - INFO: - device disk/0: 58.50\% done, 1 estimated seconds remaining |
293 |
Mon Oct 26 04:11:43 2009 - INFO: - device disk/0: 86.20\% done, 0 estimated seconds remaining |
294 |
Mon Oct 26 04:11:44 2009 - INFO: - device disk/0: 92.40\% done, 0 estimated seconds remaining |
295 |
Mon Oct 26 04:11:44 2009 - INFO: - device disk/0: 97.00\% done, 0 estimated seconds remaining |
296 |
Mon Oct 26 04:11:44 2009 - INFO: Instance instance2's disks are in sync. |
297 |
Mon Oct 26 04:11:44 2009 creating os for instance instance2 on node node1 |
298 |
Mon Oct 26 04:11:44 2009 * running the instance OS create scripts... |
299 |
Mon Oct 26 04:11:57 2009 * starting instance... |
300 |
$ |
301 |
|
302 |
The above shows one instance created via an iallocator script, and one |
303 |
being created with manual node assignment. The other three instances |
304 |
were also created and now it's time to check them:: |
305 |
|
306 |
$ gnt-instance list |
307 |
Instance Hypervisor OS Primary_node Status Memory |
308 |
instance1 xen-pvm debootstrap node2 running 128M |
309 |
instance2 xen-pvm debootstrap node1 running 128M |
310 |
instance3 xen-pvm debootstrap node1 running 128M |
311 |
instance4 xen-pvm debootstrap node3 running 128M |
312 |
instance5 xen-pvm debootstrap node2 running 128M |
313 |
|
314 |
Accessing instances |
315 |
+++++++++++++++++++ |
316 |
|
317 |
Accessing an instance's console is easy:: |
318 |
|
319 |
$ gnt-instance console %instance2% |
320 |
[ 0.000000] Bootdata ok (command line is root=/dev/sda1 ro) |
321 |
[ 0.000000] Linux version 2.6… |
322 |
[ 0.000000] BIOS-provided physical RAM map: |
323 |
[ 0.000000] Xen: 0000000000000000 - 0000000008800000 (usable) |
324 |
[13138176.018071] Built 1 zonelists. Total pages: 34816 |
325 |
[13138176.018074] Kernel command line: root=/dev/sda1 ro |
326 |
[13138176.018694] Initializing CPU#0 |
327 |
… |
328 |
Checking file systems...fsck 1.41.3 (12-Oct-2008) |
329 |
done. |
330 |
Setting kernel variables (/etc/sysctl.conf)...done. |
331 |
Mounting local filesystems...done. |
332 |
Activating swapfile swap...done. |
333 |
Setting up networking.... |
334 |
Configuring network interfaces...done. |
335 |
Setting console screen modes and fonts. |
336 |
INIT: Entering runlevel: 2 |
337 |
Starting enhanced syslogd: rsyslogd. |
338 |
Starting periodic command scheduler: crond. |
339 |
|
340 |
Debian GNU/Linux 5.0 instance2 tty1 |
341 |
|
342 |
instance2 login: |
343 |
|
344 |
At this moment you can login to the instance and, after configuring the |
345 |
network (and doing this on all instances), we can check their |
346 |
connectivity:: |
347 |
|
348 |
$ fping %instance{1..5}% |
349 |
instance1 is alive |
350 |
instance2 is alive |
351 |
instance3 is alive |
352 |
instance4 is alive |
353 |
instance5 is alive |
354 |
$ |
355 |
|
356 |
Removal |
357 |
+++++++ |
358 |
|
359 |
Removing unwanted instances is also easy:: |
360 |
|
361 |
$ gnt-instance remove %instance5% |
362 |
This will remove the volumes of the instance instance5 (including |
363 |
mirrors), thus removing all the data of the instance. Continue? |
364 |
y/[n]/?: %y% |
365 |
$ |
366 |
|
367 |
|
368 |
Recovering from hardware failures |
369 |
--------------------------------- |
370 |
|
371 |
Recovering from node failure |
372 |
++++++++++++++++++++++++++++ |
373 |
|
374 |
We are now left with four instances. Assume that at this point, node3, |
375 |
which has one primary and one secondary instance, crashes:: |
376 |
|
377 |
$ gnt-node info %node3% |
378 |
Node name: node3 |
379 |
primary ip: 198.51.100.1 |
380 |
secondary ip: 192.0.2.3 |
381 |
master candidate: True |
382 |
drained: False |
383 |
offline: False |
384 |
primary for instances: |
385 |
- instance4 |
386 |
secondary for instances: |
387 |
- instance1 |
388 |
$ fping %node3% |
389 |
node3 is unreachable |
390 |
|
391 |
At this point, the primary instance of that node (instance4) is down, |
392 |
but the secondary instance (instance1) is not affected except it has |
393 |
lost disk redundancy:: |
394 |
|
395 |
$ fping %instance{1,4}% |
396 |
instance1 is alive |
397 |
instance4 is unreachable |
398 |
$ |
399 |
|
400 |
If we try to check the status of instance4 via the instance info |
401 |
command, it fails because it tries to contact node3 which is down:: |
402 |
|
403 |
$ gnt-instance info %instance4% |
404 |
Failure: command execution error: |
405 |
Error checking node node3: Connection failed (113: No route to host) |
406 |
$ |
407 |
|
408 |
So we need to mark node3 as being *offline*, and thus Ganeti won't talk |
409 |
to it anymore:: |
410 |
|
411 |
$ gnt-node modify -O yes -f %node3% |
412 |
Mon Oct 26 04:34:12 2009 - WARNING: Not enough master candidates (desired 10, new value will be 2) |
413 |
Mon Oct 26 04:34:15 2009 - WARNING: Communication failure to node node3: Connection failed (113: No route to host) |
414 |
Modified node node3 |
415 |
- offline -> True |
416 |
- master_candidate -> auto-demotion due to offline |
417 |
$ |
418 |
|
419 |
And now we can failover the instance:: |
420 |
|
421 |
$ gnt-instance failover %instance4% |
422 |
Failover will happen to image instance4. This requires a shutdown of |
423 |
the instance. Continue? |
424 |
y/[n]/?: %y% |
425 |
Mon Oct 26 04:35:34 2009 * checking disk consistency between source and target |
426 |
Failure: command execution error: |
427 |
Disk disk/0 is degraded on target node, aborting failover. |
428 |
$ gnt-instance failover --ignore-consistency %instance4% |
429 |
Failover will happen to image instance4. This requires a shutdown of |
430 |
the instance. Continue? |
431 |
y/[n]/?: y |
432 |
Mon Oct 26 04:35:47 2009 * checking disk consistency between source and target |
433 |
Mon Oct 26 04:35:47 2009 * shutting down instance on source node |
434 |
Mon Oct 26 04:35:47 2009 - WARNING: Could not shutdown instance instance4 on node node3. Proceeding anyway. Please make sure node node3 is down. Error details: Node is marked offline |
435 |
Mon Oct 26 04:35:47 2009 * deactivating the instance's disks on source node |
436 |
Mon Oct 26 04:35:47 2009 - WARNING: Could not shutdown block device disk/0 on node node3: Node is marked offline |
437 |
Mon Oct 26 04:35:47 2009 * activating the instance's disks on target node |
438 |
Mon Oct 26 04:35:47 2009 - WARNING: Could not prepare block device disk/0 on node node3 (is_primary=False, pass=1): Node is marked offline |
439 |
Mon Oct 26 04:35:48 2009 * starting the instance on the target node |
440 |
$ |
441 |
|
442 |
Note in our first attempt, Ganeti refused to do the failover since it |
443 |
wasn't sure what is the status of the instance's disks. We pass the |
444 |
``--ignore-consistency`` flag and then we can failover:: |
445 |
|
446 |
$ gnt-instance list |
447 |
Instance Hypervisor OS Primary_node Status Memory |
448 |
instance1 xen-pvm debootstrap node2 running 128M |
449 |
instance2 xen-pvm debootstrap node1 running 128M |
450 |
instance3 xen-pvm debootstrap node1 running 128M |
451 |
instance4 xen-pvm debootstrap node1 running 128M |
452 |
$ |
453 |
|
454 |
But at this point, both instance1 and instance4 are without disk |
455 |
redundancy:: |
456 |
|
457 |
$ gnt-instance info %instance1% |
458 |
Instance name: instance1 |
459 |
UUID: 45173e82-d1fa-417c-8758-7d582ab7eef4 |
460 |
Serial number: 2 |
461 |
Creation time: 2009-10-26 04:06:57 |
462 |
Modification time: 2009-10-26 04:07:14 |
463 |
State: configured to be up, actual state is up |
464 |
Nodes: |
465 |
- primary: node2 |
466 |
- secondaries: node3 |
467 |
Operating system: debootstrap |
468 |
Allocated network port: None |
469 |
Hypervisor: xen-pvm |
470 |
- root_path: default (/dev/sda1) |
471 |
- kernel_args: default (ro) |
472 |
- use_bootloader: default (False) |
473 |
- bootloader_args: default () |
474 |
- bootloader_path: default () |
475 |
- kernel_path: default (/boot/vmlinuz-2.6-xenU) |
476 |
- initrd_path: default () |
477 |
Hardware: |
478 |
- VCPUs: 1 |
479 |
- maxmem: 256MiB |
480 |
- minmem: 512MiB |
481 |
- NICs: |
482 |
- nic/0: MAC: aa:00:00:78:da:63, IP: None, mode: bridged, link: xen-br0 |
483 |
Disks: |
484 |
- disk/0: drbd8, size 256M |
485 |
access mode: rw |
486 |
nodeA: node2, minor=0 |
487 |
nodeB: node3, minor=0 |
488 |
port: 11035 |
489 |
auth key: 8e950e3cec6854b0181fbc3a6058657701f2d458 |
490 |
on primary: /dev/drbd0 (147:0) in sync, status *DEGRADED* |
491 |
child devices: |
492 |
- child 0: lvm, size 256M |
493 |
logical_id: xenvg/22459cf8-117d-4bea-a1aa-791667d07800.disk0_data |
494 |
on primary: /dev/xenvg/22459cf8-117d-4bea-a1aa-791667d07800.disk0_data (254:0) |
495 |
- child 1: lvm, size 128M |
496 |
logical_id: xenvg/22459cf8-117d-4bea-a1aa-791667d07800.disk0_meta |
497 |
on primary: /dev/xenvg/22459cf8-117d-4bea-a1aa-791667d07800.disk0_meta (254:1) |
498 |
|
499 |
The output is similar for instance4. In order to recover this, we need |
500 |
to run the node evacuate command which will change from the current |
501 |
secondary node to a new one (in this case, we only have two working |
502 |
nodes, so all instances will be end on nodes one and two):: |
503 |
|
504 |
$ gnt-node evacuate -I hail %node3% |
505 |
Relocate instance(s) 'instance1','instance4' from node |
506 |
node3 using iallocator hail? |
507 |
y/[n]/?: %y% |
508 |
Mon Oct 26 05:05:39 2009 - INFO: Selected new secondary for instance 'instance1': node1 |
509 |
Mon Oct 26 05:05:40 2009 - INFO: Selected new secondary for instance 'instance4': node2 |
510 |
Mon Oct 26 05:05:40 2009 Replacing disk(s) 0 for instance1 |
511 |
Mon Oct 26 05:05:40 2009 STEP 1/6 Check device existence |
512 |
Mon Oct 26 05:05:40 2009 - INFO: Checking disk/0 on node2 |
513 |
Mon Oct 26 05:05:40 2009 - INFO: Checking volume groups |
514 |
Mon Oct 26 05:05:40 2009 STEP 2/6 Check peer consistency |
515 |
Mon Oct 26 05:05:40 2009 - INFO: Checking disk/0 consistency on node node2 |
516 |
Mon Oct 26 05:05:40 2009 STEP 3/6 Allocate new storage |
517 |
Mon Oct 26 05:05:40 2009 - INFO: Adding new local storage on node1 for disk/0 |
518 |
Mon Oct 26 05:05:41 2009 STEP 4/6 Changing drbd configuration |
519 |
Mon Oct 26 05:05:41 2009 - INFO: activating a new drbd on node1 for disk/0 |
520 |
Mon Oct 26 05:05:42 2009 - INFO: Shutting down drbd for disk/0 on old node |
521 |
Mon Oct 26 05:05:42 2009 - WARNING: Failed to shutdown drbd for disk/0 on oldnode: Node is marked offline |
522 |
Mon Oct 26 05:05:42 2009 Hint: Please cleanup this device manually as soon as possible |
523 |
Mon Oct 26 05:05:42 2009 - INFO: Detaching primary drbds from the network (=> standalone) |
524 |
Mon Oct 26 05:05:42 2009 - INFO: Updating instance configuration |
525 |
Mon Oct 26 05:05:45 2009 - INFO: Attaching primary drbds to new secondary (standalone => connected) |
526 |
Mon Oct 26 05:05:46 2009 STEP 5/6 Sync devices |
527 |
Mon Oct 26 05:05:46 2009 - INFO: Waiting for instance instance1 to sync disks. |
528 |
Mon Oct 26 05:05:46 2009 - INFO: - device disk/0: 13.90\% done, 7 estimated seconds remaining |
529 |
Mon Oct 26 05:05:53 2009 - INFO: Instance instance1's disks are in sync. |
530 |
Mon Oct 26 05:05:53 2009 STEP 6/6 Removing old storage |
531 |
Mon Oct 26 05:05:53 2009 - INFO: Remove logical volumes for 0 |
532 |
Mon Oct 26 05:05:53 2009 - WARNING: Can't remove old LV: Node is marked offline |
533 |
Mon Oct 26 05:05:53 2009 Hint: remove unused LVs manually |
534 |
Mon Oct 26 05:05:53 2009 - WARNING: Can't remove old LV: Node is marked offline |
535 |
Mon Oct 26 05:05:53 2009 Hint: remove unused LVs manually |
536 |
Mon Oct 26 05:05:53 2009 Replacing disk(s) 0 for instance4 |
537 |
Mon Oct 26 05:05:53 2009 STEP 1/6 Check device existence |
538 |
Mon Oct 26 05:05:53 2009 - INFO: Checking disk/0 on node1 |
539 |
Mon Oct 26 05:05:53 2009 - INFO: Checking volume groups |
540 |
Mon Oct 26 05:05:53 2009 STEP 2/6 Check peer consistency |
541 |
Mon Oct 26 05:05:53 2009 - INFO: Checking disk/0 consistency on node node1 |
542 |
Mon Oct 26 05:05:54 2009 STEP 3/6 Allocate new storage |
543 |
Mon Oct 26 05:05:54 2009 - INFO: Adding new local storage on node2 for disk/0 |
544 |
Mon Oct 26 05:05:54 2009 STEP 4/6 Changing drbd configuration |
545 |
Mon Oct 26 05:05:54 2009 - INFO: activating a new drbd on node2 for disk/0 |
546 |
Mon Oct 26 05:05:55 2009 - INFO: Shutting down drbd for disk/0 on old node |
547 |
Mon Oct 26 05:05:55 2009 - WARNING: Failed to shutdown drbd for disk/0 on oldnode: Node is marked offline |
548 |
Mon Oct 26 05:05:55 2009 Hint: Please cleanup this device manually as soon as possible |
549 |
Mon Oct 26 05:05:55 2009 - INFO: Detaching primary drbds from the network (=> standalone) |
550 |
Mon Oct 26 05:05:55 2009 - INFO: Updating instance configuration |
551 |
Mon Oct 26 05:05:55 2009 - INFO: Attaching primary drbds to new secondary (standalone => connected) |
552 |
Mon Oct 26 05:05:56 2009 STEP 5/6 Sync devices |
553 |
Mon Oct 26 05:05:56 2009 - INFO: Waiting for instance instance4 to sync disks. |
554 |
Mon Oct 26 05:05:56 2009 - INFO: - device disk/0: 12.40\% done, 8 estimated seconds remaining |
555 |
Mon Oct 26 05:06:04 2009 - INFO: Instance instance4's disks are in sync. |
556 |
Mon Oct 26 05:06:04 2009 STEP 6/6 Removing old storage |
557 |
Mon Oct 26 05:06:04 2009 - INFO: Remove logical volumes for 0 |
558 |
Mon Oct 26 05:06:04 2009 - WARNING: Can't remove old LV: Node is marked offline |
559 |
Mon Oct 26 05:06:04 2009 Hint: remove unused LVs manually |
560 |
Mon Oct 26 05:06:04 2009 - WARNING: Can't remove old LV: Node is marked offline |
561 |
Mon Oct 26 05:06:04 2009 Hint: remove unused LVs manually |
562 |
$ |
563 |
|
564 |
And now node3 is completely free of instances and can be repaired:: |
565 |
|
566 |
$ gnt-node list |
567 |
Node DTotal DFree MTotal MNode MFree Pinst Sinst |
568 |
node1 1.3T 1.3T 32.0G 1.0G 30.2G 3 1 |
569 |
node2 1.3T 1.3T 32.0G 1.0G 30.4G 1 3 |
570 |
node3 ? ? ? ? ? 0 0 |
571 |
|
572 |
Re-adding a node to the cluster |
573 |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
574 |
|
575 |
Let's say node3 has been repaired and is now ready to be |
576 |
reused. Re-adding it is simple:: |
577 |
|
578 |
$ gnt-node add --readd %node3% |
579 |
The authenticity of host 'node3 (198.51.100.1)' can't be established. |
580 |
RSA key fingerprint is 9f:2e:5a:2e:e0:bd:00:09:e4:5c:32:f2:27:57:7a:f4. |
581 |
Are you sure you want to continue connecting (yes/no)? yes |
582 |
Mon Oct 26 05:27:39 2009 - INFO: Readding a node, the offline/drained flags were reset |
583 |
Mon Oct 26 05:27:39 2009 - INFO: Node will be a master candidate |
584 |
|
585 |
And it is now working again:: |
586 |
|
587 |
$ gnt-node list |
588 |
Node DTotal DFree MTotal MNode MFree Pinst Sinst |
589 |
node1 1.3T 1.3T 32.0G 1.0G 30.2G 3 1 |
590 |
node2 1.3T 1.3T 32.0G 1.0G 30.4G 1 3 |
591 |
node3 1.3T 1.3T 32.0G 1.0G 30.4G 0 0 |
592 |
|
593 |
.. note:: If Ganeti has been built with the htools |
594 |
component enabled, you can shuffle the instances around to have a |
595 |
better use of the nodes. |
596 |
|
597 |
Disk failures |
598 |
+++++++++++++ |
599 |
|
600 |
A disk failure is simpler than a full node failure. First, a single disk |
601 |
failure should not cause data-loss for any redundant instance; only the |
602 |
performance of some instances might be reduced due to more network |
603 |
traffic. |
604 |
|
605 |
Let take the cluster status in the above listing, and check what volumes |
606 |
are in use:: |
607 |
|
608 |
$ gnt-node volumes -o phys,instance %node2% |
609 |
PhysDev Instance |
610 |
/dev/sdb1 instance4 |
611 |
/dev/sdb1 instance4 |
612 |
/dev/sdb1 instance1 |
613 |
/dev/sdb1 instance1 |
614 |
/dev/sdb1 instance3 |
615 |
/dev/sdb1 instance3 |
616 |
/dev/sdb1 instance2 |
617 |
/dev/sdb1 instance2 |
618 |
$ |
619 |
|
620 |
You can see that all instances on node2 have logical volumes on |
621 |
``/dev/sdb1``. Let's simulate a disk failure on that disk:: |
622 |
|
623 |
$ ssh node2 |
624 |
# on node2 |
625 |
$ echo offline > /sys/block/sdb/device/state |
626 |
$ vgs |
627 |
/dev/sdb1: read failed after 0 of 4096 at 0: Input/output error |
628 |
/dev/sdb1: read failed after 0 of 4096 at 750153695232: Input/output error |
629 |
/dev/sdb1: read failed after 0 of 4096 at 0: Input/output error |
630 |
Couldn't find device with uuid '954bJA-mNL0-7ydj-sdpW-nc2C-ZrCi-zFp91c'. |
631 |
Couldn't find all physical volumes for volume group xenvg. |
632 |
/dev/sdb1: read failed after 0 of 4096 at 0: Input/output error |
633 |
/dev/sdb1: read failed after 0 of 4096 at 0: Input/output error |
634 |
Couldn't find device with uuid '954bJA-mNL0-7ydj-sdpW-nc2C-ZrCi-zFp91c'. |
635 |
Couldn't find all physical volumes for volume group xenvg. |
636 |
Volume group xenvg not found |
637 |
$ |
638 |
|
639 |
At this point, the node is broken and if we are to examine |
640 |
instance2 we get (simplified output shown):: |
641 |
|
642 |
$ gnt-instance info %instance2% |
643 |
Instance name: instance2 |
644 |
State: configured to be up, actual state is up |
645 |
Nodes: |
646 |
- primary: node1 |
647 |
- secondaries: node2 |
648 |
Disks: |
649 |
- disk/0: drbd8, size 256M |
650 |
on primary: /dev/drbd0 (147:0) in sync, status ok |
651 |
on secondary: /dev/drbd1 (147:1) in sync, status *DEGRADED* *MISSING DISK* |
652 |
|
653 |
This instance has a secondary only on node2. Let's verify a primary |
654 |
instance of node2:: |
655 |
|
656 |
$ gnt-instance info %instance1% |
657 |
Instance name: instance1 |
658 |
State: configured to be up, actual state is up |
659 |
Nodes: |
660 |
- primary: node2 |
661 |
- secondaries: node1 |
662 |
Disks: |
663 |
- disk/0: drbd8, size 256M |
664 |
on primary: /dev/drbd0 (147:0) in sync, status *DEGRADED* *MISSING DISK* |
665 |
on secondary: /dev/drbd3 (147:3) in sync, status ok |
666 |
$ gnt-instance console %instance1% |
667 |
|
668 |
Debian GNU/Linux 5.0 instance1 tty1 |
669 |
|
670 |
instance1 login: root |
671 |
Last login: Tue Oct 27 01:24:09 UTC 2009 on tty1 |
672 |
instance1:~# date > test |
673 |
instance1:~# sync |
674 |
instance1:~# cat test |
675 |
Tue Oct 27 01:25:20 UTC 2009 |
676 |
instance1:~# dmesg|tail |
677 |
[5439785.235448] NET: Registered protocol family 15 |
678 |
[5439785.235489] 802.1Q VLAN Support v1.8 Ben Greear <greearb@candelatech.com> |
679 |
[5439785.235495] All bugs added by David S. Miller <davem@redhat.com> |
680 |
[5439785.235517] XENBUS: Device with no driver: device/console/0 |
681 |
[5439785.236576] kjournald starting. Commit interval 5 seconds |
682 |
[5439785.236588] EXT3-fs: mounted filesystem with ordered data mode. |
683 |
[5439785.236625] VFS: Mounted root (ext3 filesystem) readonly. |
684 |
[5439785.236663] Freeing unused kernel memory: 172k freed |
685 |
[5439787.533779] EXT3 FS on sda1, internal journal |
686 |
[5440655.065431] eth0: no IPv6 routers present |
687 |
instance1:~# |
688 |
|
689 |
As you can see, the instance is running fine and doesn't see any disk |
690 |
issues. It is now time to fix node2 and re-establish redundancy for the |
691 |
involved instances. |
692 |
|
693 |
.. note:: For Ganeti 2.0 we need to fix manually the volume group on |
694 |
node2 by running ``vgreduce --removemissing xenvg`` |
695 |
|
696 |
:: |
697 |
|
698 |
$ gnt-node repair-storage %node2% lvm-vg %xenvg% |
699 |
Mon Oct 26 18:14:03 2009 Repairing storage unit 'xenvg' on node2 ... |
700 |
$ ssh %node2% vgs |
701 |
VG #PV #LV #SN Attr VSize VFree |
702 |
xenvg 1 8 0 wz--n- 673.84G 673.84G |
703 |
$ |
704 |
|
705 |
This has removed the 'bad' disk from the volume group, which is now left |
706 |
with only one PV. We can now replace the disks for the involved |
707 |
instances:: |
708 |
|
709 |
$ for i in %instance{1..4}%; do gnt-instance replace-disks -a $i; done |
710 |
Mon Oct 26 18:15:38 2009 Replacing disk(s) 0 for instance1 |
711 |
Mon Oct 26 18:15:38 2009 STEP 1/6 Check device existence |
712 |
Mon Oct 26 18:15:38 2009 - INFO: Checking disk/0 on node1 |
713 |
Mon Oct 26 18:15:38 2009 - INFO: Checking disk/0 on node2 |
714 |
Mon Oct 26 18:15:38 2009 - INFO: Checking volume groups |
715 |
Mon Oct 26 18:15:38 2009 STEP 2/6 Check peer consistency |
716 |
Mon Oct 26 18:15:38 2009 - INFO: Checking disk/0 consistency on node node1 |
717 |
Mon Oct 26 18:15:39 2009 STEP 3/6 Allocate new storage |
718 |
Mon Oct 26 18:15:39 2009 - INFO: Adding storage on node2 for disk/0 |
719 |
Mon Oct 26 18:15:39 2009 STEP 4/6 Changing drbd configuration |
720 |
Mon Oct 26 18:15:39 2009 - INFO: Detaching disk/0 drbd from local storage |
721 |
Mon Oct 26 18:15:40 2009 - INFO: Renaming the old LVs on the target node |
722 |
Mon Oct 26 18:15:40 2009 - INFO: Renaming the new LVs on the target node |
723 |
Mon Oct 26 18:15:40 2009 - INFO: Adding new mirror component on node2 |
724 |
Mon Oct 26 18:15:41 2009 STEP 5/6 Sync devices |
725 |
Mon Oct 26 18:15:41 2009 - INFO: Waiting for instance instance1 to sync disks. |
726 |
Mon Oct 26 18:15:41 2009 - INFO: - device disk/0: 12.40\% done, 9 estimated seconds remaining |
727 |
Mon Oct 26 18:15:50 2009 - INFO: Instance instance1's disks are in sync. |
728 |
Mon Oct 26 18:15:50 2009 STEP 6/6 Removing old storage |
729 |
Mon Oct 26 18:15:50 2009 - INFO: Remove logical volumes for disk/0 |
730 |
Mon Oct 26 18:15:52 2009 Replacing disk(s) 0 for instance2 |
731 |
Mon Oct 26 18:15:52 2009 STEP 1/6 Check device existence |
732 |
… |
733 |
Mon Oct 26 18:16:01 2009 STEP 6/6 Removing old storage |
734 |
Mon Oct 26 18:16:01 2009 - INFO: Remove logical volumes for disk/0 |
735 |
Mon Oct 26 18:16:02 2009 Replacing disk(s) 0 for instance3 |
736 |
Mon Oct 26 18:16:02 2009 STEP 1/6 Check device existence |
737 |
… |
738 |
Mon Oct 26 18:16:09 2009 STEP 6/6 Removing old storage |
739 |
Mon Oct 26 18:16:09 2009 - INFO: Remove logical volumes for disk/0 |
740 |
Mon Oct 26 18:16:10 2009 Replacing disk(s) 0 for instance4 |
741 |
Mon Oct 26 18:16:10 2009 STEP 1/6 Check device existence |
742 |
… |
743 |
Mon Oct 26 18:16:18 2009 STEP 6/6 Removing old storage |
744 |
Mon Oct 26 18:16:18 2009 - INFO: Remove logical volumes for disk/0 |
745 |
$ |
746 |
|
747 |
As this point, all instances should be healthy again. |
748 |
|
749 |
.. note:: Ganeti 2.0 doesn't have the ``-a`` option to replace-disks, so |
750 |
for it you have to run the loop twice, once over primary instances |
751 |
with argument ``-p`` and once secondary instances with argument |
752 |
``-s``, but otherwise the operations are similar:: |
753 |
|
754 |
$ gnt-instance replace-disks -p instance1 |
755 |
… |
756 |
$ for i in %instance{2..4}%; do gnt-instance replace-disks -s $i; done |
757 |
|
758 |
Common cluster problems |
759 |
----------------------- |
760 |
|
761 |
There are a number of small issues that might appear on a cluster that |
762 |
can be solved easily as long as the issue is properly identified. For |
763 |
this exercise we will consider the case of node3, which was broken |
764 |
previously and re-added to the cluster without reinstallation. Running |
765 |
cluster verify on the cluster reports:: |
766 |
|
767 |
$ gnt-cluster verify |
768 |
Mon Oct 26 18:30:08 2009 * Verifying global settings |
769 |
Mon Oct 26 18:30:08 2009 * Gathering data (3 nodes) |
770 |
Mon Oct 26 18:30:10 2009 * Verifying node status |
771 |
Mon Oct 26 18:30:10 2009 - ERROR: node node3: unallocated drbd minor 0 is in use |
772 |
Mon Oct 26 18:30:10 2009 - ERROR: node node3: unallocated drbd minor 1 is in use |
773 |
Mon Oct 26 18:30:10 2009 * Verifying instance status |
774 |
Mon Oct 26 18:30:10 2009 - ERROR: instance instance4: instance should not run on node node3 |
775 |
Mon Oct 26 18:30:10 2009 * Verifying orphan volumes |
776 |
Mon Oct 26 18:30:10 2009 - ERROR: node node3: volume 22459cf8-117d-4bea-a1aa-791667d07800.disk0_data is unknown |
777 |
Mon Oct 26 18:30:10 2009 - ERROR: node node3: volume 1aaf4716-e57f-4101-a8d6-03af5da9dc50.disk0_data is unknown |
778 |
Mon Oct 26 18:30:10 2009 - ERROR: node node3: volume 1aaf4716-e57f-4101-a8d6-03af5da9dc50.disk0_meta is unknown |
779 |
Mon Oct 26 18:30:10 2009 - ERROR: node node3: volume 22459cf8-117d-4bea-a1aa-791667d07800.disk0_meta is unknown |
780 |
Mon Oct 26 18:30:10 2009 * Verifying remaining instances |
781 |
Mon Oct 26 18:30:10 2009 * Verifying N+1 Memory redundancy |
782 |
Mon Oct 26 18:30:10 2009 * Other Notes |
783 |
Mon Oct 26 18:30:10 2009 * Hooks Results |
784 |
$ |
785 |
|
786 |
Instance status |
787 |
+++++++++++++++ |
788 |
|
789 |
As you can see, *instance4* has a copy running on node3, because we |
790 |
forced the failover when node3 failed. This case is dangerous as the |
791 |
instance will have the same IP and MAC address, wreaking havoc on the |
792 |
network environment and anyone who tries to use it. |
793 |
|
794 |
Ganeti doesn't directly handle this case. It is recommended to logon to |
795 |
node3 and run:: |
796 |
|
797 |
$ xm destroy %instance4% |
798 |
|
799 |
Unallocated DRBD minors |
800 |
+++++++++++++++++++++++ |
801 |
|
802 |
There are still unallocated DRBD minors on node3. Again, these are not |
803 |
handled by Ganeti directly and need to be cleaned up via DRBD commands:: |
804 |
|
805 |
$ ssh %node3% |
806 |
# on node 3 |
807 |
$ drbdsetup /dev/drbd%0% down |
808 |
$ drbdsetup /dev/drbd%1% down |
809 |
$ |
810 |
|
811 |
Orphan volumes |
812 |
++++++++++++++ |
813 |
|
814 |
At this point, the only remaining problem should be the so-called |
815 |
*orphan* volumes. This can happen also in the case of an aborted |
816 |
disk-replace, or similar situation where Ganeti was not able to recover |
817 |
automatically. Here you need to remove them manually via LVM commands:: |
818 |
|
819 |
$ ssh %node3% |
820 |
# on node3 |
821 |
$ lvremove %xenvg% |
822 |
Do you really want to remove active logical volume "22459cf8-117d-4bea-a1aa-791667d07800.disk0_data"? [y/n]: %y% |
823 |
Logical volume "22459cf8-117d-4bea-a1aa-791667d07800.disk0_data" successfully removed |
824 |
Do you really want to remove active logical volume "22459cf8-117d-4bea-a1aa-791667d07800.disk0_meta"? [y/n]: %y% |
825 |
Logical volume "22459cf8-117d-4bea-a1aa-791667d07800.disk0_meta" successfully removed |
826 |
Do you really want to remove active logical volume "1aaf4716-e57f-4101-a8d6-03af5da9dc50.disk0_data"? [y/n]: %y% |
827 |
Logical volume "1aaf4716-e57f-4101-a8d6-03af5da9dc50.disk0_data" successfully removed |
828 |
Do you really want to remove active logical volume "1aaf4716-e57f-4101-a8d6-03af5da9dc50.disk0_meta"? [y/n]: %y% |
829 |
Logical volume "1aaf4716-e57f-4101-a8d6-03af5da9dc50.disk0_meta" successfully removed |
830 |
node3# |
831 |
|
832 |
At this point cluster verify shouldn't complain anymore:: |
833 |
|
834 |
$ gnt-cluster verify |
835 |
Mon Oct 26 18:37:51 2009 * Verifying global settings |
836 |
Mon Oct 26 18:37:51 2009 * Gathering data (3 nodes) |
837 |
Mon Oct 26 18:37:53 2009 * Verifying node status |
838 |
Mon Oct 26 18:37:53 2009 * Verifying instance status |
839 |
Mon Oct 26 18:37:53 2009 * Verifying orphan volumes |
840 |
Mon Oct 26 18:37:53 2009 * Verifying remaining instances |
841 |
Mon Oct 26 18:37:53 2009 * Verifying N+1 Memory redundancy |
842 |
Mon Oct 26 18:37:53 2009 * Other Notes |
843 |
Mon Oct 26 18:37:53 2009 * Hooks Results |
844 |
$ |
845 |
|
846 |
N+1 errors |
847 |
++++++++++ |
848 |
|
849 |
Since redundant instances in Ganeti have a primary/secondary model, it |
850 |
is needed to leave aside on each node enough memory so that if one of |
851 |
its peer node fails, all the secondary instances that have that node as |
852 |
primary can be relocated. More specifically, if instance2 has node1 as |
853 |
primary and node2 as secondary (and node1 and node2 do not have any |
854 |
other instances in this layout), then it means that node2 must have |
855 |
enough free memory so that if node1 fails, we can failover instance2 |
856 |
without any other operations (for reducing the downtime window). Let's |
857 |
increase the memory of the current instances to 4G, and add three new |
858 |
instances, two on node2:node3 with 8GB of RAM and one on node1:node2, |
859 |
with 12GB of RAM (numbers chosen so that we run out of memory):: |
860 |
|
861 |
$ gnt-instance modify -B memory=%4G% %instance1% |
862 |
Modified instance instance1 |
863 |
- be/maxmem -> 4096 |
864 |
- be/minmem -> 4096 |
865 |
Please don't forget that these parameters take effect only at the next start of the instance. |
866 |
$ gnt-instance modify … |
867 |
|
868 |
$ gnt-instance add -t drbd -n %node2%:%node3% -s %512m% -B memory=%8G% -o %debootstrap% %instance5% |
869 |
… |
870 |
$ gnt-instance add -t drbd -n %node2%:%node3% -s %512m% -B memory=%8G% -o %debootstrap% %instance6% |
871 |
… |
872 |
$ gnt-instance add -t drbd -n %node1%:%node2% -s %512m% -B memory=%8G% -o %debootstrap% %instance7% |
873 |
$ gnt-instance reboot --all |
874 |
The reboot will operate on 7 instances. |
875 |
Do you want to continue? |
876 |
Affected instances: |
877 |
instance1 |
878 |
instance2 |
879 |
instance3 |
880 |
instance4 |
881 |
instance5 |
882 |
instance6 |
883 |
instance7 |
884 |
y/[n]/?: %y% |
885 |
Submitted jobs 677, 678, 679, 680, 681, 682, 683 |
886 |
Waiting for job 677 for instance1... |
887 |
Waiting for job 678 for instance2... |
888 |
Waiting for job 679 for instance3... |
889 |
Waiting for job 680 for instance4... |
890 |
Waiting for job 681 for instance5... |
891 |
Waiting for job 682 for instance6... |
892 |
Waiting for job 683 for instance7... |
893 |
$ |
894 |
|
895 |
We rebooted the instances for the memory changes to have effect. Now the |
896 |
cluster looks like:: |
897 |
|
898 |
$ gnt-node list |
899 |
Node DTotal DFree MTotal MNode MFree Pinst Sinst |
900 |
node1 1.3T 1.3T 32.0G 1.0G 6.5G 4 1 |
901 |
node2 1.3T 1.3T 32.0G 1.0G 10.5G 3 4 |
902 |
node3 1.3T 1.3T 32.0G 1.0G 30.5G 0 2 |
903 |
$ gnt-cluster verify |
904 |
Mon Oct 26 18:59:36 2009 * Verifying global settings |
905 |
Mon Oct 26 18:59:36 2009 * Gathering data (3 nodes) |
906 |
Mon Oct 26 18:59:37 2009 * Verifying node status |
907 |
Mon Oct 26 18:59:37 2009 * Verifying instance status |
908 |
Mon Oct 26 18:59:37 2009 * Verifying orphan volumes |
909 |
Mon Oct 26 18:59:37 2009 * Verifying remaining instances |
910 |
Mon Oct 26 18:59:37 2009 * Verifying N+1 Memory redundancy |
911 |
Mon Oct 26 18:59:37 2009 - ERROR: node node2: not enough memory to accommodate instance failovers should node node1 fail |
912 |
Mon Oct 26 18:59:37 2009 * Other Notes |
913 |
Mon Oct 26 18:59:37 2009 * Hooks Results |
914 |
$ |
915 |
|
916 |
The cluster verify error above shows that if node1 fails, node2 will not |
917 |
have enough memory to failover all primary instances on node1 to it. To |
918 |
solve this, you have a number of options: |
919 |
|
920 |
- try to manually move instances around (but this can become complicated |
921 |
for any non-trivial cluster) |
922 |
- try to reduce the minimum memory of some instances on the source node |
923 |
of the N+1 failure (in the example above ``node1``): this will allow |
924 |
it to start and be failed over/migrated with less than its maximum |
925 |
memory |
926 |
- try to reduce the runtime/maximum memory of some instances on the |
927 |
destination node of the N+1 failure (in the example above ``node2``) |
928 |
to create additional available node memory (check the :doc:`admin` |
929 |
guide for what Ganeti will and won't automatically do in regards to |
930 |
instance runtime memory modification) |
931 |
- if Ganeti has been built with the htools package enabled, you can run |
932 |
the ``hbal`` tool which will try to compute an automated cluster |
933 |
solution that complies with the N+1 rule |
934 |
|
935 |
Network issues |
936 |
++++++++++++++ |
937 |
|
938 |
In case a node has problems with the network (usually the secondary |
939 |
network, as problems with the primary network will render the node |
940 |
unusable for ganeti commands), it will show up in cluster verify as:: |
941 |
|
942 |
$ gnt-cluster verify |
943 |
Mon Oct 26 19:07:19 2009 * Verifying global settings |
944 |
Mon Oct 26 19:07:19 2009 * Gathering data (3 nodes) |
945 |
Mon Oct 26 19:07:23 2009 * Verifying node status |
946 |
Mon Oct 26 19:07:23 2009 - ERROR: node node1: tcp communication with node 'node3': failure using the secondary interface(s) |
947 |
Mon Oct 26 19:07:23 2009 - ERROR: node node2: tcp communication with node 'node3': failure using the secondary interface(s) |
948 |
Mon Oct 26 19:07:23 2009 - ERROR: node node3: tcp communication with node 'node1': failure using the secondary interface(s) |
949 |
Mon Oct 26 19:07:23 2009 - ERROR: node node3: tcp communication with node 'node2': failure using the secondary interface(s) |
950 |
Mon Oct 26 19:07:23 2009 - ERROR: node node3: tcp communication with node 'node3': failure using the secondary interface(s) |
951 |
Mon Oct 26 19:07:23 2009 * Verifying instance status |
952 |
Mon Oct 26 19:07:23 2009 * Verifying orphan volumes |
953 |
Mon Oct 26 19:07:23 2009 * Verifying remaining instances |
954 |
Mon Oct 26 19:07:23 2009 * Verifying N+1 Memory redundancy |
955 |
Mon Oct 26 19:07:23 2009 * Other Notes |
956 |
Mon Oct 26 19:07:23 2009 * Hooks Results |
957 |
$ |
958 |
|
959 |
This shows that both node1 and node2 have problems contacting node3 over |
960 |
the secondary network, and node3 has problems contacting them. From this |
961 |
output is can be deduced that since node1 and node2 can communicate |
962 |
between themselves, node3 is the one having problems, and you need to |
963 |
investigate its network settings/connection. |
964 |
|
965 |
Migration problems |
966 |
++++++++++++++++++ |
967 |
|
968 |
Since live migration can sometimes fail and leave the instance in an |
969 |
inconsistent state, Ganeti provides a ``--cleanup`` argument to the |
970 |
migrate command that does: |
971 |
|
972 |
- check on which node the instance is actually running (has the |
973 |
command failed before or after the actual migration?) |
974 |
- reconfigure the DRBD disks accordingly |
975 |
|
976 |
It is always safe to run this command as long as the instance has good |
977 |
data on its primary node (i.e. not showing as degraded). If so, you can |
978 |
simply run:: |
979 |
|
980 |
$ gnt-instance migrate --cleanup %instance1% |
981 |
Instance instance1 will be recovered from a failed migration. Note |
982 |
that the migration procedure (including cleanup) is **experimental** |
983 |
in this version. This might impact the instance if anything goes |
984 |
wrong. Continue? |
985 |
y/[n]/?: %y% |
986 |
Mon Oct 26 19:13:49 2009 Migrating instance instance1 |
987 |
Mon Oct 26 19:13:49 2009 * checking where the instance actually runs (if this hangs, the hypervisor might be in a bad state) |
988 |
Mon Oct 26 19:13:49 2009 * instance confirmed to be running on its primary node (node2) |
989 |
Mon Oct 26 19:13:49 2009 * switching node node1 to secondary mode |
990 |
Mon Oct 26 19:13:50 2009 * wait until resync is done |
991 |
Mon Oct 26 19:13:50 2009 * changing into standalone mode |
992 |
Mon Oct 26 19:13:50 2009 * changing disks into single-master mode |
993 |
Mon Oct 26 19:13:50 2009 * wait until resync is done |
994 |
Mon Oct 26 19:13:51 2009 * done |
995 |
$ |
996 |
|
997 |
In use disks at instance shutdown |
998 |
+++++++++++++++++++++++++++++++++ |
999 |
|
1000 |
If you see something like the following when trying to shutdown or |
1001 |
deactivate disks for an instance:: |
1002 |
|
1003 |
$ gnt-instance shutdown %instance1% |
1004 |
Mon Oct 26 19:16:23 2009 - WARNING: Could not shutdown block device disk/0 on node node2: drbd0: can't shutdown drbd device: /dev/drbd0: State change failed: (-12) Device is held open by someone\n |
1005 |
|
1006 |
It most likely means something is holding open the underlying DRBD |
1007 |
device. This can be bad if the instance is not running, as it might mean |
1008 |
that there was concurrent access from both the node and the instance to |
1009 |
the disks, but not always (e.g. you could only have had the partitions |
1010 |
activated via ``kpartx``). |
1011 |
|
1012 |
To troubleshoot this issue you need to follow standard Linux practices, |
1013 |
and pay attention to the hypervisor being used: |
1014 |
|
1015 |
- check if (in the above example) ``/dev/drbd0`` on node2 is being |
1016 |
mounted somewhere (``cat /proc/mounts``) |
1017 |
- check if the device is not being used by device mapper itself: |
1018 |
``dmsetup ls`` and look for entries of the form ``drbd0pX``, and if so |
1019 |
remove them with either ``kpartx -d`` or ``dmsetup remove`` |
1020 |
|
1021 |
For Xen, check if it's not using the disks itself:: |
1022 |
|
1023 |
$ xenstore-ls /local/domain/%0%/backend/vbd|grep -e "domain =" -e physical-device |
1024 |
domain = "instance2" |
1025 |
physical-device = "93:0" |
1026 |
domain = "instance3" |
1027 |
physical-device = "93:1" |
1028 |
domain = "instance4" |
1029 |
physical-device = "93:2" |
1030 |
$ |
1031 |
|
1032 |
You can see in the above output that the node exports three disks, to |
1033 |
three instances. The ``physical-device`` key is in major:minor format in |
1034 |
hexadecimal, and ``0x93`` represents DRBD's major number. Thus we can |
1035 |
see from the above that instance2 has /dev/drbd0, instance3 /dev/drbd1, |
1036 |
and instance4 /dev/drbd2. |
1037 |
|
1038 |
LUXI version mismatch |
1039 |
+++++++++++++++++++++ |
1040 |
|
1041 |
LUXI is the protocol used for communication between clients and the |
1042 |
master daemon. Starting in Ganeti 2.3, the peers exchange their version |
1043 |
in each message. When they don't match, an error is raised:: |
1044 |
|
1045 |
$ gnt-node modify -O yes %node3% |
1046 |
Unhandled Ganeti error: LUXI version mismatch, server 2020000, request 2030000 |
1047 |
|
1048 |
Usually this means that server and client are from different Ganeti |
1049 |
versions or import their libraries from different, consistent paths |
1050 |
(e.g. an older version installed in another place). You can print the |
1051 |
import path for Ganeti's modules using the following command (note that |
1052 |
depending on your setup you might have to use an explicit version in the |
1053 |
Python command, e.g. ``python2.6``):: |
1054 |
|
1055 |
python -c 'import ganeti; print ganeti.__file__' |
1056 |
|
1057 |
.. vim: set textwidth=72 : |
1058 |
.. Local Variables: |
1059 |
.. mode: rst |
1060 |
.. fill-column: 72 |
1061 |
.. End: |