root / doc / walkthrough.rst @ 1ebe6dbd
History | View | Annotate | Download (42.5 kB)
1 |
Ganeti walk-through |
---|---|
2 |
=================== |
3 |
|
4 |
Documents Ganeti version |version| |
5 |
|
6 |
.. contents:: |
7 |
|
8 |
.. highlight:: text |
9 |
|
10 |
Introduction |
11 |
------------ |
12 |
|
13 |
This document serves as a more example-oriented guide to Ganeti; while |
14 |
the administration guide shows a conceptual approach, here you will find |
15 |
a step-by-step example to managing instances and the cluster. |
16 |
|
17 |
Our simulated, example cluster will have three machines, named |
18 |
``node1``, ``node2``, ``node3``. Note that in real life machines will |
19 |
usually FQDNs but here we use short names for brevity. We will use a |
20 |
secondary network for replication data, ``192.0.2.0/24``, with nodes |
21 |
having the last octet the same as their index. The cluster name will be |
22 |
``example-cluster``. All nodes have the same simulated hardware |
23 |
configuration, two disks of 750GB, 32GB of memory and 4 CPUs. |
24 |
|
25 |
On this cluster, we will create up to seven instances, named |
26 |
``instance1`` to ``instance7``. |
27 |
|
28 |
|
29 |
Cluster creation |
30 |
---------------- |
31 |
|
32 |
Follow the :doc:`install` document and prepare the nodes. Then it's time |
33 |
to initialise the cluster:: |
34 |
|
35 |
node1# gnt-cluster init -s 192.0.2.1 --enabled-hypervisors=xen-pvm example-cluster |
36 |
node1# |
37 |
|
38 |
The creation was fine. Let's check that one node we have is functioning |
39 |
correctly:: |
40 |
|
41 |
node1# gnt-node list |
42 |
Node DTotal DFree MTotal MNode MFree Pinst Sinst |
43 |
node1 1.3T 1.3T 32.0G 1.0G 30.5G 0 0 |
44 |
node1# gnt-cluster verify |
45 |
Mon Oct 26 02:08:51 2009 * Verifying global settings |
46 |
Mon Oct 26 02:08:51 2009 * Gathering data (1 nodes) |
47 |
Mon Oct 26 02:08:52 2009 * Verifying node status |
48 |
Mon Oct 26 02:08:52 2009 * Verifying instance status |
49 |
Mon Oct 26 02:08:52 2009 * Verifying orphan volumes |
50 |
Mon Oct 26 02:08:52 2009 * Verifying remaining instances |
51 |
Mon Oct 26 02:08:52 2009 * Verifying N+1 Memory redundancy |
52 |
Mon Oct 26 02:08:52 2009 * Other Notes |
53 |
Mon Oct 26 02:08:52 2009 * Hooks Results |
54 |
node1# |
55 |
|
56 |
Since this proceeded correctly, let's add the other two nodes:: |
57 |
|
58 |
node1# gnt-node add -s 192.0.2.2 node2 |
59 |
-- WARNING -- |
60 |
Performing this operation is going to replace the ssh daemon keypair |
61 |
on the target machine (node2) with the ones of the current one |
62 |
and grant full intra-cluster ssh root access to/from it |
63 |
|
64 |
The authenticity of host 'node2 (192.0.2.2)' can't be established. |
65 |
RSA key fingerprint is 9f:… |
66 |
Are you sure you want to continue connecting (yes/no)? yes |
67 |
root@node2's password: |
68 |
Mon Oct 26 02:11:54 2009 - INFO: Node will be a master candidate |
69 |
node1# gnt-node add -s 192.0.2.3 node3 |
70 |
-- WARNING -- |
71 |
Performing this operation is going to replace the ssh daemon keypair |
72 |
on the target machine (node2) with the ones of the current one |
73 |
and grant full intra-cluster ssh root access to/from it |
74 |
|
75 |
The authenticity of host 'node3 (192.0.2.3)' can't be established. |
76 |
RSA key fingerprint is 9f:… |
77 |
Are you sure you want to continue connecting (yes/no)? yes |
78 |
root@node2's password: |
79 |
Mon Oct 26 02:11:54 2009 - INFO: Node will be a master candidate |
80 |
|
81 |
Checking the cluster status again:: |
82 |
|
83 |
node1# gnt-node list |
84 |
Node DTotal DFree MTotal MNode MFree Pinst Sinst |
85 |
node1 1.3T 1.3T 32.0G 1.0G 30.5G 0 0 |
86 |
node2 1.3T 1.3T 32.0G 1.0G 30.5G 0 0 |
87 |
node3 1.3T 1.3T 32.0G 1.0G 30.5G 0 0 |
88 |
node1# gnt-cluster verify |
89 |
Mon Oct 26 02:15:14 2009 * Verifying global settings |
90 |
Mon Oct 26 02:15:14 2009 * Gathering data (3 nodes) |
91 |
Mon Oct 26 02:15:16 2009 * Verifying node status |
92 |
Mon Oct 26 02:15:16 2009 * Verifying instance status |
93 |
Mon Oct 26 02:15:16 2009 * Verifying orphan volumes |
94 |
Mon Oct 26 02:15:16 2009 * Verifying remaining instances |
95 |
Mon Oct 26 02:15:16 2009 * Verifying N+1 Memory redundancy |
96 |
Mon Oct 26 02:15:16 2009 * Other Notes |
97 |
Mon Oct 26 02:15:16 2009 * Hooks Results |
98 |
node1# |
99 |
|
100 |
And let's check that we have a valid OS:: |
101 |
|
102 |
node1# gnt-os list |
103 |
Name |
104 |
debootstrap |
105 |
node1# |
106 |
|
107 |
Running a burnin |
108 |
---------------- |
109 |
|
110 |
Now that the cluster is created, it is time to check that the hardware |
111 |
works correctly, that the hypervisor can actually create instances, |
112 |
etc. This is done via the debootstrap tool as described in the admin |
113 |
guide. Similar output lines are replaced with ``…`` in the below log:: |
114 |
|
115 |
node1# /usr/lib/ganeti/tools/burnin -o debootstrap -p instance{1..5} |
116 |
- Testing global parameters |
117 |
- Creating instances |
118 |
* instance instance1 |
119 |
on node1, node2 |
120 |
* instance instance2 |
121 |
on node2, node3 |
122 |
… |
123 |
* instance instance5 |
124 |
on node2, node3 |
125 |
* Submitted job ID(s) 157, 158, 159, 160, 161 |
126 |
waiting for job 157 for instance1 |
127 |
… |
128 |
waiting for job 161 for instance5 |
129 |
- Replacing disks on the same nodes |
130 |
* instance instance1 |
131 |
run replace_on_secondary |
132 |
run replace_on_primary |
133 |
… |
134 |
* instance instance5 |
135 |
run replace_on_secondary |
136 |
run replace_on_primary |
137 |
* Submitted job ID(s) 162, 163, 164, 165, 166 |
138 |
waiting for job 162 for instance1 |
139 |
… |
140 |
- Changing the secondary node |
141 |
* instance instance1 |
142 |
run replace_new_secondary node3 |
143 |
* instance instance2 |
144 |
run replace_new_secondary node1 |
145 |
… |
146 |
* instance instance5 |
147 |
run replace_new_secondary node1 |
148 |
* Submitted job ID(s) 167, 168, 169, 170, 171 |
149 |
waiting for job 167 for instance1 |
150 |
… |
151 |
- Growing disks |
152 |
* instance instance1 |
153 |
increase disk/0 by 128 MB |
154 |
… |
155 |
* instance instance5 |
156 |
increase disk/0 by 128 MB |
157 |
* Submitted job ID(s) 173, 174, 175, 176, 177 |
158 |
waiting for job 173 for instance1 |
159 |
… |
160 |
- Failing over instances |
161 |
* instance instance1 |
162 |
… |
163 |
* instance instance5 |
164 |
* Submitted job ID(s) 179, 180, 181, 182, 183 |
165 |
waiting for job 179 for instance1 |
166 |
… |
167 |
- Migrating instances |
168 |
* instance instance1 |
169 |
migration and migration cleanup |
170 |
… |
171 |
* instance instance5 |
172 |
migration and migration cleanup |
173 |
* Submitted job ID(s) 184, 185, 186, 187, 188 |
174 |
waiting for job 184 for instance1 |
175 |
… |
176 |
- Exporting and re-importing instances |
177 |
* instance instance1 |
178 |
export to node node3 |
179 |
remove instance |
180 |
import from node3 to node1, node2 |
181 |
remove export |
182 |
… |
183 |
* instance instance5 |
184 |
export to node node1 |
185 |
remove instance |
186 |
import from node1 to node2, node3 |
187 |
remove export |
188 |
* Submitted job ID(s) 196, 197, 198, 199, 200 |
189 |
waiting for job 196 for instance1 |
190 |
… |
191 |
- Reinstalling instances |
192 |
* instance instance1 |
193 |
reinstall without passing the OS |
194 |
reinstall specifying the OS |
195 |
… |
196 |
* instance instance5 |
197 |
reinstall without passing the OS |
198 |
reinstall specifying the OS |
199 |
* Submitted job ID(s) 203, 204, 205, 206, 207 |
200 |
waiting for job 203 for instance1 |
201 |
… |
202 |
- Rebooting instances |
203 |
* instance instance1 |
204 |
reboot with type 'hard' |
205 |
reboot with type 'soft' |
206 |
reboot with type 'full' |
207 |
… |
208 |
* instance instance5 |
209 |
reboot with type 'hard' |
210 |
reboot with type 'soft' |
211 |
reboot with type 'full' |
212 |
* Submitted job ID(s) 208, 209, 210, 211, 212 |
213 |
waiting for job 208 for instance1 |
214 |
… |
215 |
- Adding and removing disks |
216 |
* instance instance1 |
217 |
adding a disk |
218 |
removing last disk |
219 |
… |
220 |
* instance instance5 |
221 |
adding a disk |
222 |
removing last disk |
223 |
* Submitted job ID(s) 213, 214, 215, 216, 217 |
224 |
waiting for job 213 for instance1 |
225 |
… |
226 |
- Adding and removing NICs |
227 |
* instance instance1 |
228 |
adding a NIC |
229 |
removing last NIC |
230 |
… |
231 |
* instance instance5 |
232 |
adding a NIC |
233 |
removing last NIC |
234 |
* Submitted job ID(s) 218, 219, 220, 221, 222 |
235 |
waiting for job 218 for instance1 |
236 |
… |
237 |
- Activating/deactivating disks |
238 |
* instance instance1 |
239 |
activate disks when online |
240 |
activate disks when offline |
241 |
deactivate disks (when offline) |
242 |
… |
243 |
* instance instance5 |
244 |
activate disks when online |
245 |
activate disks when offline |
246 |
deactivate disks (when offline) |
247 |
* Submitted job ID(s) 223, 224, 225, 226, 227 |
248 |
waiting for job 223 for instance1 |
249 |
… |
250 |
- Stopping and starting instances |
251 |
* instance instance1 |
252 |
… |
253 |
* instance instance5 |
254 |
* Submitted job ID(s) 230, 231, 232, 233, 234 |
255 |
waiting for job 230 for instance1 |
256 |
… |
257 |
- Removing instances |
258 |
* instance instance1 |
259 |
… |
260 |
* instance instance5 |
261 |
* Submitted job ID(s) 235, 236, 237, 238, 239 |
262 |
waiting for job 235 for instance1 |
263 |
… |
264 |
node1# |
265 |
|
266 |
You can see in the above what operations the burnin does. Ideally, the |
267 |
burnin log would proceed successfully through all the steps and end |
268 |
cleanly, without throwing errors. |
269 |
|
270 |
Instance operations |
271 |
------------------- |
272 |
|
273 |
Creation |
274 |
++++++++ |
275 |
|
276 |
At this point, Ganeti and the hardware seems to be functioning |
277 |
correctly, so we'll follow up with creating the instances manually:: |
278 |
|
279 |
node1# gnt-instance add -t drbd -o debootstrap -s 256m -n node1:node2 instance3 |
280 |
Mon Oct 26 04:06:52 2009 - INFO: Selected nodes for instance instance1 via iallocator hail: node2, node3 |
281 |
Mon Oct 26 04:06:53 2009 * creating instance disks... |
282 |
Mon Oct 26 04:06:57 2009 adding instance instance1 to cluster config |
283 |
Mon Oct 26 04:06:57 2009 - INFO: Waiting for instance instance1 to sync disks. |
284 |
Mon Oct 26 04:06:57 2009 - INFO: - device disk/0: 20.00% done, 4 estimated seconds remaining |
285 |
Mon Oct 26 04:07:01 2009 - INFO: Instance instance1's disks are in sync. |
286 |
Mon Oct 26 04:07:01 2009 creating os for instance instance1 on node node2 |
287 |
Mon Oct 26 04:07:01 2009 * running the instance OS create scripts... |
288 |
Mon Oct 26 04:07:14 2009 * starting instance... |
289 |
node1# gnt-instance add -t drbd -o debootstrap -s 256m -n node1:node2 instanc<drbd -o debootstrap -s 256m -n node1:node2 instance2 |
290 |
Mon Oct 26 04:11:37 2009 * creating instance disks... |
291 |
Mon Oct 26 04:11:40 2009 adding instance instance2 to cluster config |
292 |
Mon Oct 26 04:11:41 2009 - INFO: Waiting for instance instance2 to sync disks. |
293 |
Mon Oct 26 04:11:41 2009 - INFO: - device disk/0: 35.40% done, 1 estimated seconds remaining |
294 |
Mon Oct 26 04:11:42 2009 - INFO: - device disk/0: 58.50% done, 1 estimated seconds remaining |
295 |
Mon Oct 26 04:11:43 2009 - INFO: - device disk/0: 86.20% done, 0 estimated seconds remaining |
296 |
Mon Oct 26 04:11:44 2009 - INFO: - device disk/0: 92.40% done, 0 estimated seconds remaining |
297 |
Mon Oct 26 04:11:44 2009 - INFO: - device disk/0: 97.00% done, 0 estimated seconds remaining |
298 |
Mon Oct 26 04:11:44 2009 - INFO: Instance instance2's disks are in sync. |
299 |
Mon Oct 26 04:11:44 2009 creating os for instance instance2 on node node1 |
300 |
Mon Oct 26 04:11:44 2009 * running the instance OS create scripts... |
301 |
Mon Oct 26 04:11:57 2009 * starting instance... |
302 |
node1# |
303 |
|
304 |
The above shows one instance created via an iallocator script, and one |
305 |
being created with manual node assignment. The other three instances |
306 |
were also created and now it's time to check them:: |
307 |
|
308 |
node1# gnt-instance list |
309 |
Instance Hypervisor OS Primary_node Status Memory |
310 |
instance1 xen-pvm debootstrap node2 running 128M |
311 |
instance2 xen-pvm debootstrap node1 running 128M |
312 |
instance3 xen-pvm debootstrap node1 running 128M |
313 |
instance4 xen-pvm debootstrap node3 running 128M |
314 |
instance5 xen-pvm debootstrap node2 running 128M |
315 |
|
316 |
Accessing instances |
317 |
+++++++++++++++++++ |
318 |
|
319 |
Accessing an instance's console is easy:: |
320 |
|
321 |
node1# gnt-instance console instance2 |
322 |
[ 0.000000] Bootdata ok (command line is root=/dev/sda1 ro) |
323 |
[ 0.000000] Linux version 2.6… |
324 |
[ 0.000000] BIOS-provided physical RAM map: |
325 |
[ 0.000000] Xen: 0000000000000000 - 0000000008800000 (usable) |
326 |
[13138176.018071] Built 1 zonelists. Total pages: 34816 |
327 |
[13138176.018074] Kernel command line: root=/dev/sda1 ro |
328 |
[13138176.018694] Initializing CPU#0 |
329 |
… |
330 |
Checking file systems...fsck 1.41.3 (12-Oct-2008) |
331 |
done. |
332 |
Setting kernel variables (/etc/sysctl.conf)...done. |
333 |
Mounting local filesystems...done. |
334 |
Activating swapfile swap...done. |
335 |
Setting up networking.... |
336 |
Configuring network interfaces...done. |
337 |
Setting console screen modes and fonts. |
338 |
INIT: Entering runlevel: 2 |
339 |
Starting enhanced syslogd: rsyslogd. |
340 |
Starting periodic command scheduler: crond. |
341 |
|
342 |
Debian GNU/Linux 5.0 instance2 tty1 |
343 |
|
344 |
instance2 login: |
345 |
|
346 |
At this moment you can login to the instance and, after configuring the |
347 |
network (and doing this on all instances), we can check their |
348 |
connectivity:: |
349 |
|
350 |
node1# fping instance{1..5} |
351 |
instance1 is alive |
352 |
instance2 is alive |
353 |
instance3 is alive |
354 |
instance4 is alive |
355 |
instance5 is alive |
356 |
node1# |
357 |
|
358 |
Removal |
359 |
+++++++ |
360 |
|
361 |
Removing unwanted instances is also easy:: |
362 |
|
363 |
node1# gnt-instance remove instance5 |
364 |
This will remove the volumes of the instance instance5 (including |
365 |
mirrors), thus removing all the data of the instance. Continue? |
366 |
y/[n]/?: y |
367 |
node1# |
368 |
|
369 |
|
370 |
Recovering from hardware failures |
371 |
--------------------------------- |
372 |
|
373 |
Recovering from node failure |
374 |
++++++++++++++++++++++++++++ |
375 |
|
376 |
We are now left with four instances. Assume that at this point, node3, |
377 |
which has one primary and one secondary instance, crashes:: |
378 |
|
379 |
node1# gnt-node info node3 |
380 |
Node name: node3 |
381 |
primary ip: 198.51.100.1 |
382 |
secondary ip: 192.0.2.3 |
383 |
master candidate: True |
384 |
drained: False |
385 |
offline: False |
386 |
primary for instances: |
387 |
- instance4 |
388 |
secondary for instances: |
389 |
- instance1 |
390 |
node1# fping node3 |
391 |
node3 is unreachable |
392 |
|
393 |
At this point, the primary instance of that node (instance4) is down, |
394 |
but the secondary instance (instance1) is not affected except it has |
395 |
lost disk redundancy:: |
396 |
|
397 |
node1# fping instance{1,4} |
398 |
instance1 is alive |
399 |
instance4 is unreachable |
400 |
node1# |
401 |
|
402 |
If we try to check the status of instance4 via the instance info |
403 |
command, it fails because it tries to contact node3 which is down:: |
404 |
|
405 |
node1# gnt-instance info instance4 |
406 |
Failure: command execution error: |
407 |
Error checking node node3: Connection failed (113: No route to host) |
408 |
node1# |
409 |
|
410 |
So we need to mark node3 as being *offline*, and thus Ganeti won't talk |
411 |
to it anymore:: |
412 |
|
413 |
node1# gnt-node modify -O yes -f node3 |
414 |
Mon Oct 26 04:34:12 2009 - WARNING: Not enough master candidates (desired 10, new value will be 2) |
415 |
Mon Oct 26 04:34:15 2009 - WARNING: Communication failure to node node3: Connection failed (113: No route to host) |
416 |
Modified node node3 |
417 |
- offline -> True |
418 |
- master_candidate -> auto-demotion due to offline |
419 |
node1# |
420 |
|
421 |
And now we can failover the instance:: |
422 |
|
423 |
node1# gnt-instance failover --ignore-consistency instance4 |
424 |
Failover will happen to image instance4. This requires a shutdown of |
425 |
the instance. Continue? |
426 |
y/[n]/?: y |
427 |
Mon Oct 26 04:35:34 2009 * checking disk consistency between source and target |
428 |
Failure: command execution error: |
429 |
Disk disk/0 is degraded on target node, aborting failover. |
430 |
node1# gnt-instance failover --ignore-consistency instance4 |
431 |
Failover will happen to image instance4. This requires a shutdown of |
432 |
the instance. Continue? |
433 |
y/[n]/?: y |
434 |
Mon Oct 26 04:35:47 2009 * checking disk consistency between source and target |
435 |
Mon Oct 26 04:35:47 2009 * shutting down instance on source node |
436 |
Mon Oct 26 04:35:47 2009 - WARNING: Could not shutdown instance instance4 on node node3. Proceeding anyway. Please make sure node node3 is down. Error details: Node is marked offline |
437 |
Mon Oct 26 04:35:47 2009 * deactivating the instance's disks on source node |
438 |
Mon Oct 26 04:35:47 2009 - WARNING: Could not shutdown block device disk/0 on node node3: Node is marked offline |
439 |
Mon Oct 26 04:35:47 2009 * activating the instance's disks on target node |
440 |
Mon Oct 26 04:35:47 2009 - WARNING: Could not prepare block device disk/0 on node node3 (is_primary=False, pass=1): Node is marked offline |
441 |
Mon Oct 26 04:35:48 2009 * starting the instance on the target node |
442 |
node1# |
443 |
|
444 |
Note in our first attempt, Ganeti refused to do the failover since it |
445 |
wasn't sure what is the status of the instance's disks. We pass the |
446 |
``--ignore-consistency`` flag and then we can failover:: |
447 |
|
448 |
node1# gnt-instance list |
449 |
Instance Hypervisor OS Primary_node Status Memory |
450 |
instance1 xen-pvm debootstrap node2 running 128M |
451 |
instance2 xen-pvm debootstrap node1 running 128M |
452 |
instance3 xen-pvm debootstrap node1 running 128M |
453 |
instance4 xen-pvm debootstrap node1 running 128M |
454 |
node1# |
455 |
|
456 |
But at this point, both instance1 and instance4 are without disk |
457 |
redundancy:: |
458 |
|
459 |
node1# gnt-instance info instance1 |
460 |
Instance name: instance1 |
461 |
UUID: 45173e82-d1fa-417c-8758-7d582ab7eef4 |
462 |
Serial number: 2 |
463 |
Creation time: 2009-10-26 04:06:57 |
464 |
Modification time: 2009-10-26 04:07:14 |
465 |
State: configured to be up, actual state is up |
466 |
Nodes: |
467 |
- primary: node2 |
468 |
- secondaries: node3 |
469 |
Operating system: debootstrap |
470 |
Allocated network port: None |
471 |
Hypervisor: xen-pvm |
472 |
- root_path: default (/dev/sda1) |
473 |
- kernel_args: default (ro) |
474 |
- use_bootloader: default (False) |
475 |
- bootloader_args: default () |
476 |
- bootloader_path: default () |
477 |
- kernel_path: default (/boot/vmlinuz-2.6-xenU) |
478 |
- initrd_path: default () |
479 |
Hardware: |
480 |
- VCPUs: 1 |
481 |
- memory: 128MiB |
482 |
- NICs: |
483 |
- nic/0: MAC: aa:00:00:78:da:63, IP: None, mode: bridged, link: xen-br0 |
484 |
Disks: |
485 |
- disk/0: drbd8, size 256M |
486 |
access mode: rw |
487 |
nodeA: node2, minor=0 |
488 |
nodeB: node3, minor=0 |
489 |
port: 11035 |
490 |
auth key: 8e950e3cec6854b0181fbc3a6058657701f2d458 |
491 |
on primary: /dev/drbd0 (147:0) in sync, status *DEGRADED* |
492 |
child devices: |
493 |
- child 0: lvm, size 256M |
494 |
logical_id: xenvg/22459cf8-117d-4bea-a1aa-791667d07800.disk0_data |
495 |
on primary: /dev/xenvg/22459cf8-117d-4bea-a1aa-791667d07800.disk0_data (254:0) |
496 |
- child 1: lvm, size 128M |
497 |
logical_id: xenvg/22459cf8-117d-4bea-a1aa-791667d07800.disk0_meta |
498 |
on primary: /dev/xenvg/22459cf8-117d-4bea-a1aa-791667d07800.disk0_meta (254:1) |
499 |
|
500 |
The output is similar for instance4. In order to recover this, we need |
501 |
to run the node evacuate command which will change from the current |
502 |
secondary node to a new one (in this case, we only have two working |
503 |
nodes, so all instances will be end on nodes one and two):: |
504 |
|
505 |
node1# gnt-node evacuate -I hail node3 |
506 |
Relocate instance(s) 'instance1','instance4' from node |
507 |
node3 using iallocator hail? |
508 |
y/[n]/?: y |
509 |
Mon Oct 26 05:05:39 2009 - INFO: Selected new secondary for instance 'instance1': node1 |
510 |
Mon Oct 26 05:05:40 2009 - INFO: Selected new secondary for instance 'instance4': node2 |
511 |
Mon Oct 26 05:05:40 2009 Replacing disk(s) 0 for instance1 |
512 |
Mon Oct 26 05:05:40 2009 STEP 1/6 Check device existence |
513 |
Mon Oct 26 05:05:40 2009 - INFO: Checking disk/0 on node2 |
514 |
Mon Oct 26 05:05:40 2009 - INFO: Checking volume groups |
515 |
Mon Oct 26 05:05:40 2009 STEP 2/6 Check peer consistency |
516 |
Mon Oct 26 05:05:40 2009 - INFO: Checking disk/0 consistency on node node2 |
517 |
Mon Oct 26 05:05:40 2009 STEP 3/6 Allocate new storage |
518 |
Mon Oct 26 05:05:40 2009 - INFO: Adding new local storage on node1 for disk/0 |
519 |
Mon Oct 26 05:05:41 2009 STEP 4/6 Changing drbd configuration |
520 |
Mon Oct 26 05:05:41 2009 - INFO: activating a new drbd on node1 for disk/0 |
521 |
Mon Oct 26 05:05:42 2009 - INFO: Shutting down drbd for disk/0 on old node |
522 |
Mon Oct 26 05:05:42 2009 - WARNING: Failed to shutdown drbd for disk/0 on oldnode: Node is marked offline |
523 |
Mon Oct 26 05:05:42 2009 Hint: Please cleanup this device manually as soon as possible |
524 |
Mon Oct 26 05:05:42 2009 - INFO: Detaching primary drbds from the network (=> standalone) |
525 |
Mon Oct 26 05:05:42 2009 - INFO: Updating instance configuration |
526 |
Mon Oct 26 05:05:45 2009 - INFO: Attaching primary drbds to new secondary (standalone => connected) |
527 |
Mon Oct 26 05:05:46 2009 STEP 5/6 Sync devices |
528 |
Mon Oct 26 05:05:46 2009 - INFO: Waiting for instance instance1 to sync disks. |
529 |
Mon Oct 26 05:05:46 2009 - INFO: - device disk/0: 13.90% done, 7 estimated seconds remaining |
530 |
Mon Oct 26 05:05:53 2009 - INFO: Instance instance1's disks are in sync. |
531 |
Mon Oct 26 05:05:53 2009 STEP 6/6 Removing old storage |
532 |
Mon Oct 26 05:05:53 2009 - INFO: Remove logical volumes for 0 |
533 |
Mon Oct 26 05:05:53 2009 - WARNING: Can't remove old LV: Node is marked offline |
534 |
Mon Oct 26 05:05:53 2009 Hint: remove unused LVs manually |
535 |
Mon Oct 26 05:05:53 2009 - WARNING: Can't remove old LV: Node is marked offline |
536 |
Mon Oct 26 05:05:53 2009 Hint: remove unused LVs manually |
537 |
Mon Oct 26 05:05:53 2009 Replacing disk(s) 0 for instance4 |
538 |
Mon Oct 26 05:05:53 2009 STEP 1/6 Check device existence |
539 |
Mon Oct 26 05:05:53 2009 - INFO: Checking disk/0 on node1 |
540 |
Mon Oct 26 05:05:53 2009 - INFO: Checking volume groups |
541 |
Mon Oct 26 05:05:53 2009 STEP 2/6 Check peer consistency |
542 |
Mon Oct 26 05:05:53 2009 - INFO: Checking disk/0 consistency on node node1 |
543 |
Mon Oct 26 05:05:54 2009 STEP 3/6 Allocate new storage |
544 |
Mon Oct 26 05:05:54 2009 - INFO: Adding new local storage on node2 for disk/0 |
545 |
Mon Oct 26 05:05:54 2009 STEP 4/6 Changing drbd configuration |
546 |
Mon Oct 26 05:05:54 2009 - INFO: activating a new drbd on node2 for disk/0 |
547 |
Mon Oct 26 05:05:55 2009 - INFO: Shutting down drbd for disk/0 on old node |
548 |
Mon Oct 26 05:05:55 2009 - WARNING: Failed to shutdown drbd for disk/0 on oldnode: Node is marked offline |
549 |
Mon Oct 26 05:05:55 2009 Hint: Please cleanup this device manually as soon as possible |
550 |
Mon Oct 26 05:05:55 2009 - INFO: Detaching primary drbds from the network (=> standalone) |
551 |
Mon Oct 26 05:05:55 2009 - INFO: Updating instance configuration |
552 |
Mon Oct 26 05:05:55 2009 - INFO: Attaching primary drbds to new secondary (standalone => connected) |
553 |
Mon Oct 26 05:05:56 2009 STEP 5/6 Sync devices |
554 |
Mon Oct 26 05:05:56 2009 - INFO: Waiting for instance instance4 to sync disks. |
555 |
Mon Oct 26 05:05:56 2009 - INFO: - device disk/0: 12.40% done, 8 estimated seconds remaining |
556 |
Mon Oct 26 05:06:04 2009 - INFO: Instance instance4's disks are in sync. |
557 |
Mon Oct 26 05:06:04 2009 STEP 6/6 Removing old storage |
558 |
Mon Oct 26 05:06:04 2009 - INFO: Remove logical volumes for 0 |
559 |
Mon Oct 26 05:06:04 2009 - WARNING: Can't remove old LV: Node is marked offline |
560 |
Mon Oct 26 05:06:04 2009 Hint: remove unused LVs manually |
561 |
Mon Oct 26 05:06:04 2009 - WARNING: Can't remove old LV: Node is marked offline |
562 |
Mon Oct 26 05:06:04 2009 Hint: remove unused LVs manually |
563 |
node1# |
564 |
|
565 |
And now node3 is completely free of instances and can be repaired:: |
566 |
|
567 |
node1# gnt-node list |
568 |
Node DTotal DFree MTotal MNode MFree Pinst Sinst |
569 |
node1 1.3T 1.3T 32.0G 1.0G 30.2G 3 1 |
570 |
node2 1.3T 1.3T 32.0G 1.0G 30.4G 1 3 |
571 |
node3 ? ? ? ? ? 0 0 |
572 |
|
573 |
Re-adding a node to the cluster |
574 |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
575 |
|
576 |
|
577 |
Let's say node3 has been repaired and is now ready to be |
578 |
reused. Re-adding it is simple:: |
579 |
|
580 |
node1# gnt-node add --readd node3 |
581 |
The authenticity of host 'node3 (198.51.100.1)' can't be established. |
582 |
RSA key fingerprint is 9f:2e:5a:2e:e0:bd:00:09:e4:5c:32:f2:27:57:7a:f4. |
583 |
Are you sure you want to continue connecting (yes/no)? yes |
584 |
Mon Oct 26 05:27:39 2009 - INFO: Readding a node, the offline/drained flags were reset |
585 |
Mon Oct 26 05:27:39 2009 - INFO: Node will be a master candidate |
586 |
|
587 |
And is now working again:: |
588 |
|
589 |
node1# gnt-node list |
590 |
Node DTotal DFree MTotal MNode MFree Pinst Sinst |
591 |
node1 1.3T 1.3T 32.0G 1.0G 30.2G 3 1 |
592 |
node2 1.3T 1.3T 32.0G 1.0G 30.4G 1 3 |
593 |
node3 1.3T 1.3T 32.0G 1.0G 30.4G 0 0 |
594 |
|
595 |
.. note:: If you have the Ganeti has been built with the htools |
596 |
component enabled, you can shuffle the instances around to have a |
597 |
better use of the nodes. |
598 |
|
599 |
Disk failures |
600 |
+++++++++++++ |
601 |
|
602 |
A disk failure is simpler than a full node failure. First, a single disk |
603 |
failure should not cause data-loss for any redundant instance; only the |
604 |
performance of some instances might be reduced due to more network |
605 |
traffic. |
606 |
|
607 |
Let take the cluster status in the above listing, and check what volumes |
608 |
are in use:: |
609 |
|
610 |
node1# gnt-node volumes -o phys,instance node2 |
611 |
PhysDev Instance |
612 |
/dev/sdb1 instance4 |
613 |
/dev/sdb1 instance4 |
614 |
/dev/sdb1 instance1 |
615 |
/dev/sdb1 instance1 |
616 |
/dev/sdb1 instance3 |
617 |
/dev/sdb1 instance3 |
618 |
/dev/sdb1 instance2 |
619 |
/dev/sdb1 instance2 |
620 |
node1# |
621 |
|
622 |
You can see that all instances on node2 have logical volumes on |
623 |
``/dev/sdb1``. Let's simulate a disk failure on that disk:: |
624 |
|
625 |
node1# ssh node2 |
626 |
node2# echo offline > /sys/block/sdb/device/state |
627 |
node2# vgs |
628 |
/dev/sdb1: read failed after 0 of 4096 at 0: Input/output error |
629 |
/dev/sdb1: read failed after 0 of 4096 at 750153695232: Input/output error |
630 |
/dev/sdb1: read failed after 0 of 4096 at 0: Input/output error |
631 |
Couldn't find device with uuid '954bJA-mNL0-7ydj-sdpW-nc2C-ZrCi-zFp91c'. |
632 |
Couldn't find all physical volumes for volume group xenvg. |
633 |
/dev/sdb1: read failed after 0 of 4096 at 0: Input/output error |
634 |
/dev/sdb1: read failed after 0 of 4096 at 0: Input/output error |
635 |
Couldn't find device with uuid '954bJA-mNL0-7ydj-sdpW-nc2C-ZrCi-zFp91c'. |
636 |
Couldn't find all physical volumes for volume group xenvg. |
637 |
Volume group xenvg not found |
638 |
node2# |
639 |
|
640 |
At this point, the node is broken and if we are to examine |
641 |
instance2 we get (simplified output shown):: |
642 |
|
643 |
node1# gnt-instance info instance2 |
644 |
Instance name: instance2 |
645 |
State: configured to be up, actual state is up |
646 |
Nodes: |
647 |
- primary: node1 |
648 |
- secondaries: node2 |
649 |
Disks: |
650 |
- disk/0: drbd8, size 256M |
651 |
on primary: /dev/drbd0 (147:0) in sync, status ok |
652 |
on secondary: /dev/drbd1 (147:1) in sync, status *DEGRADED* *MISSING DISK* |
653 |
|
654 |
This instance has a secondary only on node2. Let's verify a primary |
655 |
instance of node2:: |
656 |
|
657 |
node1# gnt-instance info instance1 |
658 |
Instance name: instance1 |
659 |
State: configured to be up, actual state is up |
660 |
Nodes: |
661 |
- primary: node2 |
662 |
- secondaries: node1 |
663 |
Disks: |
664 |
- disk/0: drbd8, size 256M |
665 |
on primary: /dev/drbd0 (147:0) in sync, status *DEGRADED* *MISSING DISK* |
666 |
on secondary: /dev/drbd3 (147:3) in sync, status ok |
667 |
node1# gnt-instance console instance1 |
668 |
|
669 |
Debian GNU/Linux 5.0 instance1 tty1 |
670 |
|
671 |
instance1 login: root |
672 |
Last login: Tue Oct 27 01:24:09 UTC 2009 on tty1 |
673 |
instance1:~# date > test |
674 |
instance1:~# sync |
675 |
instance1:~# cat test |
676 |
Tue Oct 27 01:25:20 UTC 2009 |
677 |
instance1:~# dmesg|tail |
678 |
[5439785.235448] NET: Registered protocol family 15 |
679 |
[5439785.235489] 802.1Q VLAN Support v1.8 Ben Greear <greearb@candelatech.com> |
680 |
[5439785.235495] All bugs added by David S. Miller <davem@redhat.com> |
681 |
[5439785.235517] XENBUS: Device with no driver: device/console/0 |
682 |
[5439785.236576] kjournald starting. Commit interval 5 seconds |
683 |
[5439785.236588] EXT3-fs: mounted filesystem with ordered data mode. |
684 |
[5439785.236625] VFS: Mounted root (ext3 filesystem) readonly. |
685 |
[5439785.236663] Freeing unused kernel memory: 172k freed |
686 |
[5439787.533779] EXT3 FS on sda1, internal journal |
687 |
[5440655.065431] eth0: no IPv6 routers present |
688 |
instance1:~# |
689 |
|
690 |
As you can see, the instance is running fine and doesn't see any disk |
691 |
issues. It is now time to fix node2 and re-establish redundancy for the |
692 |
involved instances. |
693 |
|
694 |
.. note:: For Ganeti 2.0 we need to fix manually the volume group on |
695 |
node2 by running ``vgreduce --removemissing xenvg`` |
696 |
|
697 |
:: |
698 |
|
699 |
node1# gnt-node repair-storage node2 lvm-vg xenvg |
700 |
Mon Oct 26 18:14:03 2009 Repairing storage unit 'xenvg' on node2 ... |
701 |
node1# ssh node2 vgs |
702 |
VG #PV #LV #SN Attr VSize VFree |
703 |
xenvg 1 8 0 wz--n- 673.84G 673.84G |
704 |
node1# |
705 |
|
706 |
This has removed the 'bad' disk from the volume group, which is now left |
707 |
with only one PV. We can now replace the disks for the involved |
708 |
instances:: |
709 |
|
710 |
node1# for i in instance{1..4}; do gnt-instance replace-disks -a $i; done |
711 |
Mon Oct 26 18:15:38 2009 Replacing disk(s) 0 for instance1 |
712 |
Mon Oct 26 18:15:38 2009 STEP 1/6 Check device existence |
713 |
Mon Oct 26 18:15:38 2009 - INFO: Checking disk/0 on node1 |
714 |
Mon Oct 26 18:15:38 2009 - INFO: Checking disk/0 on node2 |
715 |
Mon Oct 26 18:15:38 2009 - INFO: Checking volume groups |
716 |
Mon Oct 26 18:15:38 2009 STEP 2/6 Check peer consistency |
717 |
Mon Oct 26 18:15:38 2009 - INFO: Checking disk/0 consistency on node node1 |
718 |
Mon Oct 26 18:15:39 2009 STEP 3/6 Allocate new storage |
719 |
Mon Oct 26 18:15:39 2009 - INFO: Adding storage on node2 for disk/0 |
720 |
Mon Oct 26 18:15:39 2009 STEP 4/6 Changing drbd configuration |
721 |
Mon Oct 26 18:15:39 2009 - INFO: Detaching disk/0 drbd from local storage |
722 |
Mon Oct 26 18:15:40 2009 - INFO: Renaming the old LVs on the target node |
723 |
Mon Oct 26 18:15:40 2009 - INFO: Renaming the new LVs on the target node |
724 |
Mon Oct 26 18:15:40 2009 - INFO: Adding new mirror component on node2 |
725 |
Mon Oct 26 18:15:41 2009 STEP 5/6 Sync devices |
726 |
Mon Oct 26 18:15:41 2009 - INFO: Waiting for instance instance1 to sync disks. |
727 |
Mon Oct 26 18:15:41 2009 - INFO: - device disk/0: 12.40% done, 9 estimated seconds remaining |
728 |
Mon Oct 26 18:15:50 2009 - INFO: Instance instance1's disks are in sync. |
729 |
Mon Oct 26 18:15:50 2009 STEP 6/6 Removing old storage |
730 |
Mon Oct 26 18:15:50 2009 - INFO: Remove logical volumes for disk/0 |
731 |
Mon Oct 26 18:15:52 2009 Replacing disk(s) 0 for instance2 |
732 |
Mon Oct 26 18:15:52 2009 STEP 1/6 Check device existence |
733 |
… |
734 |
Mon Oct 26 18:16:01 2009 STEP 6/6 Removing old storage |
735 |
Mon Oct 26 18:16:01 2009 - INFO: Remove logical volumes for disk/0 |
736 |
Mon Oct 26 18:16:02 2009 Replacing disk(s) 0 for instance3 |
737 |
Mon Oct 26 18:16:02 2009 STEP 1/6 Check device existence |
738 |
… |
739 |
Mon Oct 26 18:16:09 2009 STEP 6/6 Removing old storage |
740 |
Mon Oct 26 18:16:09 2009 - INFO: Remove logical volumes for disk/0 |
741 |
Mon Oct 26 18:16:10 2009 Replacing disk(s) 0 for instance4 |
742 |
Mon Oct 26 18:16:10 2009 STEP 1/6 Check device existence |
743 |
… |
744 |
Mon Oct 26 18:16:18 2009 STEP 6/6 Removing old storage |
745 |
Mon Oct 26 18:16:18 2009 - INFO: Remove logical volumes for disk/0 |
746 |
node1# |
747 |
|
748 |
As this point, all instances should be healthy again. |
749 |
|
750 |
.. note:: Ganeti 2.0 doesn't have the ``-a`` option to replace-disks, so |
751 |
for it you have to run the loop twice, once over primary instances |
752 |
with argument ``-p`` and once secondary instances with argument |
753 |
``-s``, but otherwise the operations are similar:: |
754 |
|
755 |
node1# gnt-instance replace-disks -p instance1 |
756 |
… |
757 |
node1# for i in instance{2..4}; do gnt-instance replace-disks -s $i; done |
758 |
|
759 |
Common cluster problems |
760 |
----------------------- |
761 |
|
762 |
There are a number of small issues that might appear on a cluster that |
763 |
can be solved easily as long as the issue is properly identified. For |
764 |
this exercise we will consider the case of node3, which was broken |
765 |
previously and re-added to the cluster without reinstallation. Running |
766 |
cluster verify on the cluster reports:: |
767 |
|
768 |
node1# gnt-cluster verify |
769 |
Mon Oct 26 18:30:08 2009 * Verifying global settings |
770 |
Mon Oct 26 18:30:08 2009 * Gathering data (3 nodes) |
771 |
Mon Oct 26 18:30:10 2009 * Verifying node status |
772 |
Mon Oct 26 18:30:10 2009 - ERROR: node node3: unallocated drbd minor 0 is in use |
773 |
Mon Oct 26 18:30:10 2009 - ERROR: node node3: unallocated drbd minor 1 is in use |
774 |
Mon Oct 26 18:30:10 2009 * Verifying instance status |
775 |
Mon Oct 26 18:30:10 2009 - ERROR: instance instance4: instance should not run on node node3 |
776 |
Mon Oct 26 18:30:10 2009 * Verifying orphan volumes |
777 |
Mon Oct 26 18:30:10 2009 - ERROR: node node3: volume 22459cf8-117d-4bea-a1aa-791667d07800.disk0_data is unknown |
778 |
Mon Oct 26 18:30:10 2009 - ERROR: node node3: volume 1aaf4716-e57f-4101-a8d6-03af5da9dc50.disk0_data is unknown |
779 |
Mon Oct 26 18:30:10 2009 - ERROR: node node3: volume 1aaf4716-e57f-4101-a8d6-03af5da9dc50.disk0_meta is unknown |
780 |
Mon Oct 26 18:30:10 2009 - ERROR: node node3: volume 22459cf8-117d-4bea-a1aa-791667d07800.disk0_meta is unknown |
781 |
Mon Oct 26 18:30:10 2009 * Verifying remaining instances |
782 |
Mon Oct 26 18:30:10 2009 * Verifying N+1 Memory redundancy |
783 |
Mon Oct 26 18:30:10 2009 * Other Notes |
784 |
Mon Oct 26 18:30:10 2009 * Hooks Results |
785 |
node1# |
786 |
|
787 |
Instance status |
788 |
+++++++++++++++ |
789 |
|
790 |
As you can see, *instance4* has a copy running on node3, because we |
791 |
forced the failover when node3 failed. This case is dangerous as the |
792 |
instance will have the same IP and MAC address, wreaking havok on the |
793 |
network environment and anyone who tries to use it. |
794 |
|
795 |
Ganeti doesn't directly handle this case. It is recommended to logon to |
796 |
node3 and run:: |
797 |
|
798 |
node3# xm destroy instance4 |
799 |
|
800 |
Unallocated DRBD minors |
801 |
+++++++++++++++++++++++ |
802 |
|
803 |
There are still unallocated DRBD minors on node3. Again, these are not |
804 |
handled by Ganeti directly and need to be cleaned up via DRBD commands:: |
805 |
|
806 |
node3# drbdsetup /dev/drbd0 down |
807 |
node3# drbdsetup /dev/drbd1 down |
808 |
node3# |
809 |
|
810 |
Orphan volumes |
811 |
++++++++++++++ |
812 |
|
813 |
At this point, the only remaining problem should be the so-called |
814 |
*orphan* volumes. This can happen also in the case of an aborted |
815 |
disk-replace, or similar situation where Ganeti was not able to recover |
816 |
automatically. Here you need to remove them manually via LVM commands:: |
817 |
|
818 |
node3# lvremove xenvg |
819 |
Do you really want to remove active logical volume "22459cf8-117d-4bea-a1aa-791667d07800.disk0_data"? [y/n]: y |
820 |
Logical volume "22459cf8-117d-4bea-a1aa-791667d07800.disk0_data" successfully removed |
821 |
Do you really want to remove active logical volume "22459cf8-117d-4bea-a1aa-791667d07800.disk0_meta"? [y/n]: y |
822 |
Logical volume "22459cf8-117d-4bea-a1aa-791667d07800.disk0_meta" successfully removed |
823 |
Do you really want to remove active logical volume "1aaf4716-e57f-4101-a8d6-03af5da9dc50.disk0_data"? [y/n]: y |
824 |
Logical volume "1aaf4716-e57f-4101-a8d6-03af5da9dc50.disk0_data" successfully removed |
825 |
Do you really want to remove active logical volume "1aaf4716-e57f-4101-a8d6-03af5da9dc50.disk0_meta"? [y/n]: y |
826 |
Logical volume "1aaf4716-e57f-4101-a8d6-03af5da9dc50.disk0_meta" successfully removed |
827 |
node3# |
828 |
|
829 |
At this point cluster verify shouldn't complain anymore:: |
830 |
|
831 |
node1# gnt-cluster verify |
832 |
Mon Oct 26 18:37:51 2009 * Verifying global settings |
833 |
Mon Oct 26 18:37:51 2009 * Gathering data (3 nodes) |
834 |
Mon Oct 26 18:37:53 2009 * Verifying node status |
835 |
Mon Oct 26 18:37:53 2009 * Verifying instance status |
836 |
Mon Oct 26 18:37:53 2009 * Verifying orphan volumes |
837 |
Mon Oct 26 18:37:53 2009 * Verifying remaining instances |
838 |
Mon Oct 26 18:37:53 2009 * Verifying N+1 Memory redundancy |
839 |
Mon Oct 26 18:37:53 2009 * Other Notes |
840 |
Mon Oct 26 18:37:53 2009 * Hooks Results |
841 |
node1# |
842 |
|
843 |
N+1 errors |
844 |
++++++++++ |
845 |
|
846 |
Since redundant instances in Ganeti have a primary/secondary model, it |
847 |
is needed to leave aside on each node enough memory so that if one of |
848 |
its peer node fails, all the secondary instances that have that node as |
849 |
primary can be relocated. More specifically, if instance2 has node1 as |
850 |
primary and node2 as secondary (and node1 and node2 do not have any |
851 |
other instances in this layout), then it means that node2 must have |
852 |
enough free memory so that if node1 fails, we can failover instance2 |
853 |
without any other operations (for reducing the downtime window). Let's |
854 |
increase the memory of the current instances to 4G, and add three new |
855 |
instances, two on node2:node3 with 8GB of RAM and one on node1:node2, |
856 |
with 12GB of RAM (numbers chosen so that we run out of memory):: |
857 |
|
858 |
node1# gnt-instance modify -B memory=4G instance1 |
859 |
Modified instance instance1 |
860 |
- be/memory -> 4096 |
861 |
Please don't forget that these parameters take effect only at the next start of the instance. |
862 |
node1# gnt-instance modify … |
863 |
|
864 |
node1# gnt-instance add -t drbd -n node2:node3 -s 512m -B memory=8G -o debootstrap instance5 |
865 |
… |
866 |
node1# gnt-instance add -t drbd -n node2:node3 -s 512m -B memory=8G -o debootstrap instance6 |
867 |
… |
868 |
node1# gnt-instance add -t drbd -n node1:node2 -s 512m -B memory=8G -o debootstrap instance7 |
869 |
node1# gnt-instance reboot --all |
870 |
The reboot will operate on 7 instances. |
871 |
Do you want to continue? |
872 |
Affected instances: |
873 |
instance1 |
874 |
instance2 |
875 |
instance3 |
876 |
instance4 |
877 |
instance5 |
878 |
instance6 |
879 |
instance7 |
880 |
y/[n]/?: y |
881 |
Submitted jobs 677, 678, 679, 680, 681, 682, 683 |
882 |
Waiting for job 677 for instance1... |
883 |
Waiting for job 678 for instance2... |
884 |
Waiting for job 679 for instance3... |
885 |
Waiting for job 680 for instance4... |
886 |
Waiting for job 681 for instance5... |
887 |
Waiting for job 682 for instance6... |
888 |
Waiting for job 683 for instance7... |
889 |
node1# |
890 |
|
891 |
We rebooted instances for the memory changes to have effect. Now the |
892 |
cluster looks like:: |
893 |
|
894 |
node1# gnt-node list |
895 |
Node DTotal DFree MTotal MNode MFree Pinst Sinst |
896 |
node1 1.3T 1.3T 32.0G 1.0G 6.5G 4 1 |
897 |
node2 1.3T 1.3T 32.0G 1.0G 10.5G 3 4 |
898 |
node3 1.3T 1.3T 32.0G 1.0G 30.5G 0 2 |
899 |
node1# gnt-cluster verify |
900 |
Mon Oct 26 18:59:36 2009 * Verifying global settings |
901 |
Mon Oct 26 18:59:36 2009 * Gathering data (3 nodes) |
902 |
Mon Oct 26 18:59:37 2009 * Verifying node status |
903 |
Mon Oct 26 18:59:37 2009 * Verifying instance status |
904 |
Mon Oct 26 18:59:37 2009 * Verifying orphan volumes |
905 |
Mon Oct 26 18:59:37 2009 * Verifying remaining instances |
906 |
Mon Oct 26 18:59:37 2009 * Verifying N+1 Memory redundancy |
907 |
Mon Oct 26 18:59:37 2009 - ERROR: node node2: not enough memory to accommodate instance failovers should node node1 fail |
908 |
Mon Oct 26 18:59:37 2009 * Other Notes |
909 |
Mon Oct 26 18:59:37 2009 * Hooks Results |
910 |
node1# |
911 |
|
912 |
The cluster verify error above shows that if node1 fails, node2 will not |
913 |
have enough memory to failover all primary instances on node1 to it. To |
914 |
solve this, you have a number of options: |
915 |
|
916 |
- try to manually move instances around (but this can become complicated |
917 |
for any non-trivial cluster) |
918 |
- try to reduce memory of some instances to accommodate the available |
919 |
node memory |
920 |
- if Ganeti has been built with the htools package enabled, you can run |
921 |
the ``hbal`` tool which will try to compute an automated cluster |
922 |
solution that complies with the N+1 rule |
923 |
|
924 |
Network issues |
925 |
++++++++++++++ |
926 |
|
927 |
In case a node has problems with the network (usually the secondary |
928 |
network, as problems with the primary network will render the node |
929 |
unusable for ganeti commands), it will show up in cluster verify as:: |
930 |
|
931 |
node1# gnt-cluster verify |
932 |
Mon Oct 26 19:07:19 2009 * Verifying global settings |
933 |
Mon Oct 26 19:07:19 2009 * Gathering data (3 nodes) |
934 |
Mon Oct 26 19:07:23 2009 * Verifying node status |
935 |
Mon Oct 26 19:07:23 2009 - ERROR: node node1: tcp communication with node 'node3': failure using the secondary interface(s) |
936 |
Mon Oct 26 19:07:23 2009 - ERROR: node node2: tcp communication with node 'node3': failure using the secondary interface(s) |
937 |
Mon Oct 26 19:07:23 2009 - ERROR: node node3: tcp communication with node 'node1': failure using the secondary interface(s) |
938 |
Mon Oct 26 19:07:23 2009 - ERROR: node node3: tcp communication with node 'node2': failure using the secondary interface(s) |
939 |
Mon Oct 26 19:07:23 2009 - ERROR: node node3: tcp communication with node 'node3': failure using the secondary interface(s) |
940 |
Mon Oct 26 19:07:23 2009 * Verifying instance status |
941 |
Mon Oct 26 19:07:23 2009 * Verifying orphan volumes |
942 |
Mon Oct 26 19:07:23 2009 * Verifying remaining instances |
943 |
Mon Oct 26 19:07:23 2009 * Verifying N+1 Memory redundancy |
944 |
Mon Oct 26 19:07:23 2009 * Other Notes |
945 |
Mon Oct 26 19:07:23 2009 * Hooks Results |
946 |
node1# |
947 |
|
948 |
This shows that both node1 and node2 have problems contacting node3 over |
949 |
the secondary network, and node3 has problems contacting them. From this |
950 |
output is can be deduced that since node1 and node2 can communicate |
951 |
between themselves, node3 is the one having problems, and you need to |
952 |
investigate its network settings/connection. |
953 |
|
954 |
Migration problems |
955 |
++++++++++++++++++ |
956 |
|
957 |
Since live migration can sometimes fail and leave the instance in an |
958 |
inconsistent state, Ganeti provides a ``--cleanup`` argument to the |
959 |
migrate command that does: |
960 |
|
961 |
- check on which node the instance is actually running (has the |
962 |
command failed before or after the actual migration?) |
963 |
- reconfigure the DRBD disks accordingly |
964 |
|
965 |
It is always safe to run this command as long as the instance has good |
966 |
data on its primary node (i.e. not showing as degraded). If so, you can |
967 |
simply run:: |
968 |
|
969 |
node1# gnt-instance migrate --cleanup instance1 |
970 |
Instance instance1 will be recovered from a failed migration. Note |
971 |
that the migration procedure (including cleanup) is **experimental** |
972 |
in this version. This might impact the instance if anything goes |
973 |
wrong. Continue? |
974 |
y/[n]/?: y |
975 |
Mon Oct 26 19:13:49 2009 Migrating instance instance1 |
976 |
Mon Oct 26 19:13:49 2009 * checking where the instance actually runs (if this hangs, the hypervisor might be in a bad state) |
977 |
Mon Oct 26 19:13:49 2009 * instance confirmed to be running on its primary node (node2) |
978 |
Mon Oct 26 19:13:49 2009 * switching node node1 to secondary mode |
979 |
Mon Oct 26 19:13:50 2009 * wait until resync is done |
980 |
Mon Oct 26 19:13:50 2009 * changing into standalone mode |
981 |
Mon Oct 26 19:13:50 2009 * changing disks into single-master mode |
982 |
Mon Oct 26 19:13:50 2009 * wait until resync is done |
983 |
Mon Oct 26 19:13:51 2009 * done |
984 |
node1# |
985 |
|
986 |
In use disks at instance shutdown |
987 |
+++++++++++++++++++++++++++++++++ |
988 |
|
989 |
If you see something like the following when trying to shutdown or |
990 |
deactivate disks for an instance:: |
991 |
|
992 |
node1# gnt-instance shutdown instance1 |
993 |
Mon Oct 26 19:16:23 2009 - WARNING: Could not shutdown block device disk/0 on node node2: drbd0: can't shutdown drbd device: /dev/drbd0: State change failed: (-12) Device is held open by someone\n |
994 |
|
995 |
It most likely means something is holding open the underlying DRBD |
996 |
device. This can be bad if the instance is not running, as it might mean |
997 |
that there was concurrent access from both the node and the instance to |
998 |
the disks, but not always (e.g. you could only have had the partitions |
999 |
activated via ``kpartx``). |
1000 |
|
1001 |
To troubleshoot this issue you need to follow standard Linux practices, |
1002 |
and pay attention to the hypervisor being used: |
1003 |
|
1004 |
- check if (in the above example) ``/dev/drbd0`` on node2 is being |
1005 |
mounted somewhere (``cat /proc/mounts``) |
1006 |
- check if the device is not being used by device mapper itself: |
1007 |
``dmsetup ls`` and look for entries of the form ``drbd0pX``, and if so |
1008 |
remove them with either ``kpartx -d`` or ``dmsetup remove`` |
1009 |
|
1010 |
For Xen, check if it's not using the disks itself:: |
1011 |
|
1012 |
node1# xenstore-ls /local/domain/0/backend/vbd|grep -e "domain =" -e physical-device |
1013 |
domain = "instance2" |
1014 |
physical-device = "93:0" |
1015 |
domain = "instance3" |
1016 |
physical-device = "93:1" |
1017 |
domain = "instance4" |
1018 |
physical-device = "93:2" |
1019 |
node1# |
1020 |
|
1021 |
You can see in the above output that the node exports three disks, to |
1022 |
three instances. The ``physical-device`` key is in major:minor format in |
1023 |
hexadecimal, and 0x93 represents DRBD's major number. Thus we can see |
1024 |
from the above that instance2 has /dev/drbd0, instance3 /dev/drbd1, and |
1025 |
instance4 /dev/drbd2. |
1026 |
|
1027 |
LUXI version mismatch |
1028 |
+++++++++++++++++++++ |
1029 |
|
1030 |
LUXI is the protocol used for communication between clients and the |
1031 |
master daemon. Starting in Ganeti 2.3, the peers exchange their version |
1032 |
in each message. When they don't match, an error is raised:: |
1033 |
|
1034 |
$ gnt-node modify -O yes node3 |
1035 |
Unhandled Ganeti error: LUXI version mismatch, server 2020000, request 2030000 |
1036 |
|
1037 |
Usually this means that server and client are from different Ganeti |
1038 |
versions or import their libraries from different, consistent paths |
1039 |
(e.g. an older version installed in another place). You can print the |
1040 |
import path for Ganeti's modules using the following command (note that |
1041 |
depending on your setup you might have to use an explicit version in the |
1042 |
Python command, e.g. ``python2.6``):: |
1043 |
|
1044 |
python -c 'import ganeti; print ganeti.__file__' |
1045 |
|
1046 |
.. vim: set textwidth=72 : |
1047 |
.. Local Variables: |
1048 |
.. mode: rst |
1049 |
.. fill-column: 72 |
1050 |
.. End: |