root / doc / iallocator.sgml @ 3fc175f0
History | View | Annotate | Download (19.2 kB)
1 |
<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook V4.2//EN" [ |
---|---|
2 |
]> |
3 |
<article class="specification"> |
4 |
<articleinfo> |
5 |
<title>Ganeti automatic instance allocation</title> |
6 |
</articleinfo> |
7 |
<para>Documents Ganeti version 1.2</para> |
8 |
<sect1> |
9 |
<title>Introduction</title> |
10 |
|
11 |
<para>Currently in Ganeti the admin has to specify the exact |
12 |
locations for an instance's node(s). This prevents a completely |
13 |
automatic node evacuation, and is in general a nuisance.</para> |
14 |
|
15 |
<para>The <acronym>iallocator</acronym> framework will enable |
16 |
automatic placement via external scripts, which allows |
17 |
customization of the cluster layout per the site's |
18 |
requirements.</para> |
19 |
|
20 |
</sect1> |
21 |
|
22 |
<sect1> |
23 |
<title>User-visible changes</title> |
24 |
|
25 |
<para>There are two parts of the ganeti operation that are |
26 |
impacted by the auto-allocation: how the cluster knows what the |
27 |
allocator algorithms are and how the admin uses these in creating |
28 |
instances.</para> |
29 |
|
30 |
<para>An allocation algorithm is just the filename of a program |
31 |
installed in a defined list of directories.</para> |
32 |
|
33 |
<sect2> |
34 |
<title>Cluster configuration</title> |
35 |
|
36 |
<para>At configure time, the list of the directories can be |
37 |
selected via the |
38 |
<option>--with-iallocator-search-path=LIST</option> option, |
39 |
where <userinput>LIST</userinput> is a comma-separated list of |
40 |
directories. If not given, this defaults to |
41 |
<constant>$libdir/ganeti/iallocators</constant>, i.e. for an |
42 |
installation under <filename class="directory">/usr</filename>, |
43 |
this will be <filename |
44 |
class="directory">/usr/lib/ganeti/iallocators</filename>.</para> |
45 |
|
46 |
<para>Ganeti will then search for allocator script in the |
47 |
configured list, using the first one whose filename matches the |
48 |
one given by the user.</para> |
49 |
|
50 |
</sect2> |
51 |
|
52 |
<sect2> |
53 |
<title>Command line interface changes</title> |
54 |
|
55 |
<para>The node selection options in instanece add and instance |
56 |
replace disks can be replace by the new <option>--iallocator |
57 |
<replaceable>NAME</replaceable></option> option, which will |
58 |
cause the autoassignation. The selected node(s) will be show as |
59 |
part of the command output.</para> |
60 |
|
61 |
</sect2> |
62 |
|
63 |
</sect1> |
64 |
|
65 |
<sect1> |
66 |
<title>IAllocator API</title> |
67 |
|
68 |
<para>The protocol for communication between Ganeti and an |
69 |
allocator script will be the following:</para> |
70 |
|
71 |
<orderedlist> |
72 |
<listitem> |
73 |
<simpara>ganeti launches the program with a single argument, a |
74 |
filename that contains a JSON-encoded structure (the input |
75 |
message)</simpara> |
76 |
</listitem> |
77 |
<listitem> |
78 |
<simpara>if the script finishes with exit code different from |
79 |
zero, it is considered a general failure and the full output |
80 |
will be reported to the users; this can be the case when the |
81 |
allocator can't parse the input message;</simpara> |
82 |
</listitem> |
83 |
<listitem> |
84 |
<simpara>if the allocator finishes with exit code zero, it is |
85 |
expected to output (on its stdout) a JSON-encoded structure |
86 |
(the response)</simpara> |
87 |
</listitem> |
88 |
</orderedlist> |
89 |
|
90 |
<sect2> |
91 |
<title>Input message</title> |
92 |
|
93 |
<para>The input message will be the JSON encoding of a |
94 |
dictionary containing the following:</para> |
95 |
|
96 |
<variablelist> |
97 |
<varlistentry> |
98 |
<term>version</term> |
99 |
<listitem> |
100 |
<simpara>the version of the protocol; this document |
101 |
specifies version 1</simpara> |
102 |
</listitem> |
103 |
</varlistentry> |
104 |
<varlistentry> |
105 |
<term>cluster_name</term> |
106 |
<listitem> |
107 |
<simpara>the cluster name</simpara> |
108 |
</listitem> |
109 |
</varlistentry> |
110 |
<varlistentry> |
111 |
<term>cluster_tags</term> |
112 |
<listitem> |
113 |
<simpara>the list of cluster tags</simpara> |
114 |
</listitem> |
115 |
</varlistentry> |
116 |
<varlistentry> |
117 |
<term>request</term> |
118 |
<listitem> |
119 |
<simpara>a dictionary containing the request data:</simpara> |
120 |
<variablelist> |
121 |
<varlistentry> |
122 |
<term>type</term> |
123 |
<listitem> |
124 |
<simpara>the request type; this can be either |
125 |
<literal>allocate</literal> or |
126 |
<literal>relocate</literal>; the |
127 |
<literal>allocate</literal> request is used when a |
128 |
new instance needs to be placed on the cluster, |
129 |
while the <literal>relocate</literal> request is |
130 |
used when an existing instance needs to be moved |
131 |
within the cluster</simpara> |
132 |
</listitem> |
133 |
</varlistentry> |
134 |
<varlistentry> |
135 |
<term>name</term> |
136 |
<listitem> |
137 |
<simpara>the name of the instance; if the request is |
138 |
a realocation, then this name will be found in the |
139 |
list of instances (see below), otherwise is the |
140 |
<acronym>FQDN</acronym> of the new |
141 |
instance</simpara> |
142 |
</listitem> |
143 |
</varlistentry> |
144 |
<varlistentry> |
145 |
<term>required_nodes</term> |
146 |
<listitem> |
147 |
<simpara>how many nodes should the algorithm return; |
148 |
while this information can be deduced from the |
149 |
instace's disk template, it's better if this |
150 |
computation is left to Ganeti as then allocator |
151 |
scripts are less sensitive to changes to the disk |
152 |
templates</simpara> |
153 |
</listitem> |
154 |
</varlistentry> |
155 |
<varlistentry> |
156 |
<term>disk_space_total</term> |
157 |
<listitem> |
158 |
<simpara>the total disk space that will be used by |
159 |
this instance on the (new) nodes; again, this |
160 |
information can be computed from the list of |
161 |
instance disks and its template type, but Ganeti is |
162 |
better suited to compute it</simpara> |
163 |
</listitem> |
164 |
</varlistentry> |
165 |
</variablelist> |
166 |
<simpara>If the request is an allocation, then there are |
167 |
extra fields in the request dictionary:</simpara> |
168 |
<variablelist> |
169 |
<varlistentry> |
170 |
<term>disks</term> |
171 |
<listitem> |
172 |
<simpara>list of dictionaries holding the disk |
173 |
definitions for this instance (in the order they are |
174 |
exported to the hypervisor):</simpara> |
175 |
<variablelist> |
176 |
<varlistentry> |
177 |
<term>mode</term> |
178 |
<listitem> |
179 |
<simpara>either <literal>w</literal> or |
180 |
<literal>w</literal> denoting if the disk is |
181 |
read-only or writable; for Ganeti 1.2, this |
182 |
will always be <literal>w</literal</simpara> |
183 |
</listitem> |
184 |
</varlistentry> |
185 |
<varlistentry> |
186 |
<term>size</term> |
187 |
<listitem> |
188 |
<simpara>the size of this disk in mebibyte</simpara> |
189 |
</listitem> |
190 |
</varlistentry> |
191 |
</variablelist> |
192 |
</listitem> |
193 |
</varlistentry> |
194 |
<varlistentry> |
195 |
<term>nics</term> |
196 |
<listitem> |
197 |
<simpara>a list of dictionaries holding the network |
198 |
interfaces for this instance, containing:</simpara> |
199 |
<variablelist> |
200 |
<varlistentry> |
201 |
<term>ip</term> |
202 |
<listitem> |
203 |
<simpara>the IP address that Ganeti know for |
204 |
this instance, or null</simpara> |
205 |
</listitem> |
206 |
</varlistentry> |
207 |
<varlistentry> |
208 |
<term>mac</term> |
209 |
<listitem> |
210 |
<simpara>the MAC address for this interface</simpara> |
211 |
</listitem> |
212 |
</varlistentry> |
213 |
<varlistentry> |
214 |
<term>bridge</term> |
215 |
<listitem> |
216 |
<simpara>the bridge to which this interface |
217 |
will be connected</simpara> |
218 |
</listitem> |
219 |
</varlistentry> |
220 |
</variablelist> |
221 |
</listitem> |
222 |
</varlistentry> |
223 |
<varlistentry> |
224 |
<term>vcpus</term> |
225 |
<listitem> |
226 |
<simpara>the number of VCPUs for the instance</simpara> |
227 |
</listitem> |
228 |
</varlistentry> |
229 |
<varlistentry> |
230 |
<term>disk_template</term> |
231 |
<listitem> |
232 |
<simpara>the disk template for the instance</simpara> |
233 |
</listitem> |
234 |
</varlistentry> |
235 |
<varlistentry> |
236 |
<term>memory</term> |
237 |
<listitem> |
238 |
<simpara>the memory size for the instance</simpara> |
239 |
</listitem> |
240 |
</varlistentry> |
241 |
<varlistentry> |
242 |
<term>os</term> |
243 |
<listitem> |
244 |
<simpara>the OS type for the instance</simpara> |
245 |
</listitem> |
246 |
</varlistentry> |
247 |
<varlistentry> |
248 |
<term>tags</term> |
249 |
<listitem> |
250 |
<simpara>the list of the instance's tags</simpara> |
251 |
</listitem> |
252 |
</varlistentry> |
253 |
</variablelist> |
254 |
<simpara>If the request is of type relocate, then there is |
255 |
one more entry in the request dictionary, named |
256 |
<varname>relocate_from</varname>, and it contains a list |
257 |
of nodes to move the instance away from; note that with |
258 |
Ganeti 1.2, this list will always contain a single node, |
259 |
the current secondary of the instance.</simpara> |
260 |
</listitem> |
261 |
</varlistentry> |
262 |
<varlistentry> |
263 |
<term>instances</term> |
264 |
<listitem> |
265 |
<simpara>a dictionary with the data for the current |
266 |
existing instance on the cluster, indexed by instance |
267 |
name; the contents are similar to the instance definitions |
268 |
for the allocate mode, with the addition of:</simpara> |
269 |
<variablelist> |
270 |
<varlistentry> |
271 |
<term>should_run</term> |
272 |
<listitem> |
273 |
<simpara>if this instance is set to run (but not the |
274 |
actual status of the instance)</simpara> |
275 |
</listitem> |
276 |
</varlistentry> |
277 |
<varlistentry> |
278 |
<term>nodes</term> |
279 |
<listitem> |
280 |
<simpara>list of nodes on which this instance is |
281 |
placed; the primary node of the instance is always |
282 |
the first one</simpara> |
283 |
</listitem> |
284 |
</varlistentry> |
285 |
</variablelist> |
286 |
</listitem> |
287 |
</varlistentry> |
288 |
<varlistentry> |
289 |
<term>nodes</term> |
290 |
<listitem> |
291 |
<simpara>dictionary with the data for the nodes in the |
292 |
cluster, indexed by the node name; the dict |
293 |
contains:</simpara> |
294 |
<variablelist> |
295 |
<varlistentry> |
296 |
<term>total_disk</term> |
297 |
<listitem> |
298 |
<simpara>the total disk size of this node |
299 |
(mebibytes)</simpara> |
300 |
</listitem> |
301 |
</varlistentry> |
302 |
<varlistentry> |
303 |
<term>free_disk</term> |
304 |
<listitem> |
305 |
<simpara>the free disk space on the node</simpara> |
306 |
</listitem> |
307 |
</varlistentry> |
308 |
<varlistentry> |
309 |
<term>total_memory</term> |
310 |
<listitem> |
311 |
<simpara>the total memory size</simpara> |
312 |
</listitem> |
313 |
</varlistentry> |
314 |
<varlistentry> |
315 |
<term>free_memory</term> |
316 |
<listitem> |
317 |
<simpara>free memory on the node; note that |
318 |
currently this does not take into account the |
319 |
instances which are down on the node</simpara> |
320 |
</listitem> |
321 |
</varlistentry> |
322 |
<varlistentry> |
323 |
<term>total_cpus</term> |
324 |
<listitem> |
325 |
<simpara>the physical number of CPUs present on the |
326 |
machine; depending on the hypervisor, this might or |
327 |
might not be equal to how many CPUs the node |
328 |
operating system sees;</simpara> |
329 |
</listitem> |
330 |
</varlistentry> |
331 |
<varlistentry> |
332 |
<term>primary_ip</term> |
333 |
<listitem> |
334 |
<simpara>the primary IP address of the |
335 |
node</simpara> |
336 |
</listitem> |
337 |
</varlistentry> |
338 |
<varlistentry> |
339 |
<term>secondary_ip</term> |
340 |
<listitem> |
341 |
<simpara>the secondary IP address of the node (the |
342 |
one used for the DRBD replication); note that this |
343 |
can be the same as the primary one</simpara> |
344 |
</listitem> |
345 |
</varlistentry> |
346 |
<varlistentry> |
347 |
<term>tags</term> |
348 |
<listitem> |
349 |
<simpara>list with the tags of the node</simpara> |
350 |
</listitem> |
351 |
</varlistentry> |
352 |
</variablelist> |
353 |
</listitem> |
354 |
</varlistentry> |
355 |
</variablelist> |
356 |
|
357 |
</sect2> |
358 |
|
359 |
<sect2> |
360 |
<title>Respone message</title> |
361 |
|
362 |
<para>The response message is much more simple than the input |
363 |
one. It is also a dict having three keys:</para> |
364 |
<variablelist> |
365 |
<varlistentry> |
366 |
<term>success</term> |
367 |
<listitem> |
368 |
<simpara>a boolean value denoting if the allocation was |
369 |
successfull or not</simpara> |
370 |
</listitem> |
371 |
</varlistentry> |
372 |
<varlistentry> |
373 |
<term>info</term> |
374 |
<listitem> |
375 |
<simpara>a string with information from the scripts; if |
376 |
the allocation fails, this will be shown to the |
377 |
user</simpara> |
378 |
</listitem> |
379 |
</varlistentry> |
380 |
<varlistentry> |
381 |
<term>nodes</term> |
382 |
<listitem> |
383 |
<simpara>the list of nodes computed by the algorithm; even |
384 |
if the algorithm failed (i.e. success is false), this must |
385 |
be returned as an empty list; also note that the length of |
386 |
this list must equal the |
387 |
<varname>requested_nodes</varname> entry in the input |
388 |
message, otherwise Ganeti will consider the result as |
389 |
failed</simpara> |
390 |
</listitem> |
391 |
</varlistentry> |
392 |
</variablelist> |
393 |
</sect2> |
394 |
</sect1> |
395 |
|
396 |
<sect1> |
397 |
<title>Examples</title> |
398 |
<sect2> |
399 |
<title>Input messages to scripts</title> |
400 |
<simpara>Input message, new instance allocation</simpara> |
401 |
<screen> |
402 |
{ |
403 |
"cluster_tags": [], |
404 |
"request": { |
405 |
"required_nodes": 2, |
406 |
"name": "instance3.example.com", |
407 |
"tags": [ |
408 |
"type:test", |
409 |
"owner:foo" |
410 |
], |
411 |
"type": "allocate", |
412 |
"disks": [ |
413 |
{ |
414 |
"mode": "w", |
415 |
"size": 1024 |
416 |
}, |
417 |
{ |
418 |
"mode": "w", |
419 |
"size": 2048 |
420 |
} |
421 |
], |
422 |
"nics": [ |
423 |
{ |
424 |
"ip": null, |
425 |
"mac": "00:11:22:33:44:55", |
426 |
"bridge": null |
427 |
} |
428 |
], |
429 |
"vcpus": 1, |
430 |
"disk_template": "drbd", |
431 |
"memory": 2048, |
432 |
"disk_space_total": 3328, |
433 |
"os": "etch-image" |
434 |
}, |
435 |
"cluster_name": "cluster1.example.com", |
436 |
"instances": { |
437 |
"instance1.example.com": { |
438 |
"tags": [], |
439 |
"should_run": false, |
440 |
"disks": [ |
441 |
{ |
442 |
"mode": "w", |
443 |
"size": 64 |
444 |
}, |
445 |
{ |
446 |
"mode": "w", |
447 |
"size": 512 |
448 |
} |
449 |
], |
450 |
"nics": [ |
451 |
{ |
452 |
"ip": null, |
453 |
"mac": "aa:00:00:00:60:bf", |
454 |
"bridge": "xen-br0" |
455 |
} |
456 |
], |
457 |
"vcpus": 1, |
458 |
"disk_template": "plain", |
459 |
"memory": 128, |
460 |
"nodes": [ |
461 |
"nodee1.com" |
462 |
], |
463 |
"os": "etch-image" |
464 |
}, |
465 |
"instance2.example.com": { |
466 |
"tags": [], |
467 |
"should_run": false, |
468 |
"disks": [ |
469 |
{ |
470 |
"mode": "w", |
471 |
"size": 512 |
472 |
}, |
473 |
{ |
474 |
"mode": "w", |
475 |
"size": 256 |
476 |
} |
477 |
], |
478 |
"nics": [ |
479 |
{ |
480 |
"ip": null, |
481 |
"mac": "aa:00:00:55:f8:38", |
482 |
"bridge": "xen-br0" |
483 |
} |
484 |
], |
485 |
"vcpus": 1, |
486 |
"disk_template": "drbd", |
487 |
"memory": 512, |
488 |
"nodes": [ |
489 |
"node2.example.com", |
490 |
"node3.example.com" |
491 |
], |
492 |
"os": "etch-image" |
493 |
} |
494 |
}, |
495 |
"version": 1, |
496 |
"nodes": { |
497 |
"node1.example.com": { |
498 |
"total_disk": 858276, |
499 |
"primary_ip": "192.168.1.1", |
500 |
"secondary_ip": "192.168.2.1", |
501 |
"tags": [], |
502 |
"free_memory": 3505, |
503 |
"free_disk": 856740, |
504 |
"total_memory": 4095 |
505 |
}, |
506 |
"node2.example.com": { |
507 |
"total_disk": 858240, |
508 |
"primary_ip": "192.168.1.3", |
509 |
"secondary_ip": "192.168.2.3", |
510 |
"tags": ["test"], |
511 |
"free_memory": 3505, |
512 |
"free_disk": 848320, |
513 |
"total_memory": 4095 |
514 |
}, |
515 |
"node3.example.com.com": { |
516 |
"total_disk": 572184, |
517 |
"primary_ip": "192.168.1.3", |
518 |
"secondary_ip": "192.168.2.3", |
519 |
"tags": [], |
520 |
"free_memory": 3505, |
521 |
"free_disk": 570648, |
522 |
"total_memory": 4095 |
523 |
} |
524 |
} |
525 |
} |
526 |
</screen> |
527 |
<simpara>Input message, reallocation. Since only the request |
528 |
entry in the input message is changed, the following shows only |
529 |
this entry:</simpara> |
530 |
<screen> |
531 |
"request": { |
532 |
"relocate_from": [ |
533 |
"node3.example.com" |
534 |
], |
535 |
"required_nodes": 1, |
536 |
"type": "relocate", |
537 |
"name": "instance2.example.com", |
538 |
"disk_space_total": 832 |
539 |
}, |
540 |
</screen> |
541 |
|
542 |
</sect2> |
543 |
<sect2> |
544 |
<title>Response messages</title> |
545 |
<simpara>Successful response message:</simpara> |
546 |
<screen> |
547 |
{ |
548 |
"info": "Allocation successful", |
549 |
"nodes": [ |
550 |
"node2.example.com", |
551 |
"node1.example.com" |
552 |
], |
553 |
"success": true |
554 |
} |
555 |
</screen> |
556 |
<simpara>Failed response message:</simpara> |
557 |
<screen> |
558 |
{ |
559 |
"info": "Can't find a suitable node for position 2 (already selected: node2.example.com)", |
560 |
"nodes": [], |
561 |
"success": false |
562 |
} |
563 |
</screen> |
564 |
</sect2> |
565 |
|
566 |
<sect2> |
567 |
<title>Command line messages</title> |
568 |
<screen> |
569 |
# gnt-instance add -t plain -m 2g --os-size 1g --swap-size 512m --iallocator dumb-allocator -o etch-image instance3 |
570 |
Selected nodes for the instance: node1.example.com |
571 |
* creating instance disks... |
572 |
[...] |
573 |
|
574 |
# gnt-instance add -t plain -m 3400m --os-size 1g --swap-size 512m --iallocator dumb-allocator -o etch-image instance4 |
575 |
Failure: prerequisites not met for this operation: |
576 |
Can't compute nodes using iallocator 'dumb-allocator': Can't find a suitable node for position 1 (already selected: ) |
577 |
|
578 |
# gnt-instance add -t drbd -m 1400m --os-size 1g --swap-size 512m --iallocator dumb-allocator -o etch-image instance5 |
579 |
Failure: prerequisites not met for this operation: |
580 |
Can't compute nodes using iallocator 'dumb-allocator': Can't find a suitable node for position 2 (already selected: node1.example.com) |
581 |
|
582 |
</screen> |
583 |
</sect2> |
584 |
</sect1> |
585 |
|
586 |
</article> |