Statistics
| Branch: | Tag: | Revision:

root / doc / iallocator.sgml @ 0c55c24b

History | View | Annotate | Download (19.2 kB)

1
<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook V4.2//EN" [
2
]>
3
  <article class="specification">
4
  <articleinfo>
5
    <title>Ganeti automatic instance allocation</title>
6
  </articleinfo>
7
  <para>Documents Ganeti version 1.2</para>
8
  <sect1>
9
    <title>Introduction</title>
10

    
11
    <para>Currently in Ganeti the admin has to specify the exact
12
    locations for an instance's node(s). This prevents a completely
13
    automatic node evacuation, and is in general a nuisance.</para>
14

    
15
    <para>The <acronym>iallocator</acronym> framework will enable
16
    automatic placement via external scripts, which allows
17
    customization of the cluster layout per the site's
18
    requirements.</para>
19

    
20
  </sect1>
21

    
22
  <sect1>
23
    <title>User-visible changes</title>
24

    
25
    <para>There are two parts of the ganeti operation that are
26
    impacted by the auto-allocation: how the cluster knows what the
27
    allocator algorithms are and how the admin uses these in creating
28
    instances.</para>
29

    
30
    <para>An allocation algorithm is just the filename of a program
31
    installed in a defined list of directories.</para>
32

    
33
    <sect2>
34
      <title>Cluster configuration</title>
35

    
36
      <para>At configure time, the list of the directories can be
37
      selected via the
38
      <option>--with-iallocator-search-path=LIST</option> option,
39
      where <userinput>LIST</userinput> is a comma-separated list of
40
      directories. If not given, this defaults to
41
      <constant>$libdir/ganeti/iallocators</constant>, i.e. for an
42
      installation under <filename class="directory">/usr</filename>,
43
      this will be <filename
44
      class="directory">/usr/lib/ganeti/iallocators</filename>.</para>
45

    
46
      <para>Ganeti will then search for allocator script in the
47
      configured list, using the first one whose filename matches the
48
      one given by the user.</para>
49

    
50
    </sect2>
51

    
52
    <sect2>
53
      <title>Command line interface changes</title>
54

    
55
      <para>The node selection options in instanece add and instance
56
      replace disks can be replace by the new <option>--iallocator
57
      <replaceable>NAME</replaceable></option> option, which will
58
      cause the autoassignation. The selected node(s) will be show as
59
      part of the command output.</para>
60

    
61
    </sect2>
62

    
63
  </sect1>
64

    
65
  <sect1>
66
    <title>IAllocator API</title>
67

    
68
    <para>The protocol for communication between Ganeti and an
69
    allocator script will be the following:</para>
70

    
71
    <orderedlist>
72
      <listitem>
73
        <simpara>ganeti launches the program with a single argument, a
74
        filename that contains a JSON-encoded structure (the input
75
        message)</simpara>
76
      </listitem>
77
      <listitem>
78
        <simpara>if the script finishes with exit code different from
79
        zero, it is considered a general failure and the full output
80
        will be reported to the users; this can be the case when the
81
        allocator can't parse the input message;</simpara>
82
      </listitem>
83
      <listitem>
84
        <simpara>if the allocator finishes with exit code zero, it is
85
        expected to output (on its stdout) a JSON-encoded structure
86
        (the response)</simpara>
87
      </listitem>
88
    </orderedlist>
89

    
90
    <sect2>
91
      <title>Input message</title>
92

    
93
      <para>The input message will be the JSON encoding of a
94
      dictionary containing the following:</para>
95

    
96
      <variablelist>
97
        <varlistentry>
98
          <term>version</term>
99
          <listitem>
100
            <simpara>the version of the protocol; this document
101
            specifies version 1</simpara>
102
          </listitem>
103
        </varlistentry>
104
        <varlistentry>
105
          <term>cluster_name</term>
106
          <listitem>
107
            <simpara>the cluster name</simpara>
108
          </listitem>
109
        </varlistentry>
110
        <varlistentry>
111
          <term>cluster_tags</term>
112
          <listitem>
113
            <simpara>the list of cluster tags</simpara>
114
          </listitem>
115
        </varlistentry>
116
        <varlistentry>
117
          <term>request</term>
118
          <listitem>
119
            <simpara>a dictionary containing the request data:</simpara>
120
            <variablelist>
121
              <varlistentry>
122
                <term>type</term>
123
                <listitem>
124
                  <simpara>the request type; this can be either
125
                  <literal>allocate</literal> or
126
                  <literal>relocate</literal>; the
127
                  <literal>allocate</literal> request is used when a
128
                  new instance needs to be placed on the cluster,
129
                  while the <literal>relocate</literal> request is
130
                  used when an existing instance needs to be moved
131
                  within the cluster</simpara>
132
                </listitem>
133
              </varlistentry>
134
              <varlistentry>
135
                <term>name</term>
136
                <listitem>
137
                  <simpara>the name of the instance; if the request is
138
                  a realocation, then this name will be found in the
139
                  list of instances (see below), otherwise is the
140
                  <acronym>FQDN</acronym> of the new
141
                  instance</simpara>
142
                </listitem>
143
              </varlistentry>
144
              <varlistentry>
145
                <term>required_nodes</term>
146
                <listitem>
147
                  <simpara>how many nodes should the algorithm return;
148
                  while this information can be deduced from the
149
                  instace's disk template, it's better if this
150
                  computation is left to Ganeti as then allocator
151
                  scripts are less sensitive to changes to the disk
152
                  templates</simpara>
153
                </listitem>
154
              </varlistentry>
155
              <varlistentry>
156
                <term>disk_space_total</term>
157
                <listitem>
158
                  <simpara>the total disk space that will be used by
159
                  this instance on the (new) nodes; again, this
160
                  information can be computed from the list of
161
                  instance disks and its template type, but Ganeti is
162
                  better suited to compute it</simpara>
163
                </listitem>
164
              </varlistentry>
165
            </variablelist>
166
            <simpara>If the request is an allocation, then there are
167
            extra fields in the request dictionary:</simpara>
168
            <variablelist>
169
              <varlistentry>
170
                <term>disks</term>
171
                <listitem>
172
                  <simpara>list of dictionaries holding the disk
173
                  definitions for this instance (in the order they are
174
                  exported to the hypervisor):</simpara>
175
                  <variablelist>
176
                    <varlistentry>
177
                      <term>mode</term>
178
                      <listitem>
179
                        <simpara>either <literal>w</literal> or
180
                        <literal>w</literal> denoting if the disk is
181
                        read-only or writable; for Ganeti 1.2, this
182
                        will always be <literal>w</literal</simpara>
183
                      </listitem>
184
                    </varlistentry>
185
                    <varlistentry>
186
                      <term>size</term>
187
                      <listitem>
188
                        <simpara>the size of this disk in mebibyte</simpara>
189
                      </listitem>
190
                    </varlistentry>
191
                  </variablelist>
192
                </listitem>
193
              </varlistentry>
194
              <varlistentry>
195
                <term>nics</term>
196
                <listitem>
197
                  <simpara>a list of dictionaries holding the network
198
                  interfaces for this instance, containing:</simpara>
199
                  <variablelist>
200
                    <varlistentry>
201
                      <term>ip</term>
202
                      <listitem>
203
                        <simpara>the IP address that Ganeti know for
204
                        this instance, or null</simpara>
205
                      </listitem>
206
                    </varlistentry>
207
                    <varlistentry>
208
                      <term>mac</term>
209
                      <listitem>
210
                        <simpara>the MAC address for this interface</simpara>
211
                      </listitem>
212
                    </varlistentry>
213
                    <varlistentry>
214
                      <term>bridge</term>
215
                      <listitem>
216
                        <simpara>the bridge to which this interface
217
                        will be connected</simpara>
218
                      </listitem>
219
                    </varlistentry>
220
                  </variablelist>
221
                </listitem>
222
              </varlistentry>
223
              <varlistentry>
224
                <term>vcpus</term>
225
                <listitem>
226
                  <simpara>the number of VCPUs for the instance</simpara>
227
                </listitem>
228
              </varlistentry>
229
              <varlistentry>
230
                <term>disk_template</term>
231
                <listitem>
232
                  <simpara>the disk template for the instance</simpara>
233
                </listitem>
234
              </varlistentry>
235
              <varlistentry>
236
                <term>memory</term>
237
                <listitem>
238
                  <simpara>the memory size for the instance</simpara>
239
                </listitem>
240
              </varlistentry>
241
              <varlistentry>
242
                <term>os</term>
243
                <listitem>
244
                  <simpara>the OS type for the instance</simpara>
245
                </listitem>
246
              </varlistentry>
247
              <varlistentry>
248
                <term>tags</term>
249
                <listitem>
250
                  <simpara>the list of the instance's tags</simpara>
251
                </listitem>
252
              </varlistentry>
253
            </variablelist>
254
            <simpara>If the request is of type relocate, then there is
255
            one more entry in the request dictionary, named
256
            <varname>relocate_from</varname>, and it contains a list
257
            of nodes to move the instance away from; note that with
258
            Ganeti 1.2, this list will always contain a single node,
259
            the current secondary of the instance.</simpara>
260
          </listitem>
261
        </varlistentry>
262
        <varlistentry>
263
          <term>instances</term>
264
          <listitem>
265
            <simpara>a dictionary with the data for the current
266
            existing instance on the cluster, indexed by instance
267
            name; the contents are similar to the instance definitions
268
            for the allocate mode, with the addition of:</simpara>
269
            <variablelist>
270
              <varlistentry>
271
                <term>should_run</term>
272
                <listitem>
273
                  <simpara>if this instance is set to run (but not the
274
                  actual status of the instance)</simpara>
275
                </listitem>
276
              </varlistentry>
277
              <varlistentry>
278
                <term>nodes</term>
279
                <listitem>
280
                  <simpara>list of nodes on which this instance is
281
                  placed; the primary node of the instance is always
282
                  the first one</simpara>
283
                </listitem>
284
              </varlistentry>
285
            </variablelist>
286
          </listitem>
287
        </varlistentry>
288
        <varlistentry>
289
          <term>nodes</term>
290
          <listitem>
291
            <simpara>dictionary with the data for the nodes in the
292
            cluster, indexed by the node name; the dict
293
            contains:</simpara>
294
            <variablelist>
295
              <varlistentry>
296
                <term>total_disk</term>
297
                <listitem>
298
                  <simpara>the total disk size of this node
299
                  (mebibytes)</simpara>
300
                </listitem>
301
              </varlistentry>
302
              <varlistentry>
303
                <term>free_disk</term>
304
                <listitem>
305
                  <simpara>the free disk space on the node</simpara>
306
                </listitem>
307
              </varlistentry>
308
              <varlistentry>
309
                <term>total_memory</term>
310
                <listitem>
311
                  <simpara>the total memory size</simpara>
312
                </listitem>
313
              </varlistentry>
314
              <varlistentry>
315
                <term>free_memory</term>
316
                <listitem>
317
                  <simpara>free memory on the node; note that
318
                  currently this does not take into account the
319
                  instances which are down on the node</simpara>
320
                </listitem>
321
              </varlistentry>
322
              <varlistentry>
323
                <term>total_cpus</term>
324
                <listitem>
325
                  <simpara>the physical number of CPUs present on the
326
                  machine; depending on the hypervisor, this might or
327
                  might not be equal to how many CPUs the node
328
                  operating system sees;</simpara>
329
                </listitem>
330
              </varlistentry>
331
              <varlistentry>
332
                <term>primary_ip</term>
333
                <listitem>
334
                  <simpara>the primary IP address of the
335
                  node</simpara>
336
                </listitem>
337
              </varlistentry>
338
              <varlistentry>
339
                <term>secondary_ip</term>
340
                <listitem>
341
                  <simpara>the secondary IP address of the node (the
342
                  one used for the DRBD replication); note that this
343
                  can be the same as the primary one</simpara>
344
                </listitem>
345
              </varlistentry>
346
              <varlistentry>
347
                <term>tags</term>
348
                <listitem>
349
                  <simpara>list with the tags of the node</simpara>
350
                </listitem>
351
              </varlistentry>
352
            </variablelist>
353
          </listitem>
354
        </varlistentry>
355
      </variablelist>
356

    
357
    </sect2>
358

    
359
    <sect2>
360
      <title>Respone message</title>
361

    
362
      <para>The response message is much more simple than the input
363
      one. It is also a dict having three keys:</para>
364
      <variablelist>
365
        <varlistentry>
366
          <term>success</term>
367
          <listitem>
368
            <simpara>a boolean value denoting if the allocation was
369
            successfull or not</simpara>
370
          </listitem>
371
        </varlistentry>
372
        <varlistentry>
373
          <term>info</term>
374
          <listitem>
375
            <simpara>a string with information from the scripts; if
376
            the allocation fails, this will be shown to the
377
            user</simpara>
378
          </listitem>
379
        </varlistentry>
380
        <varlistentry>
381
          <term>nodes</term>
382
          <listitem>
383
            <simpara>the list of nodes computed by the algorithm; even
384
            if the algorithm failed (i.e. success is false), this must
385
            be returned as an empty list; also note that the length of
386
            this list must equal the
387
            <varname>requested_nodes</varname> entry in the input
388
            message, otherwise Ganeti will consider the result as
389
            failed</simpara>
390
          </listitem>
391
        </varlistentry>
392
      </variablelist>
393
    </sect2>
394
  </sect1>
395

    
396
  <sect1>
397
    <title>Examples</title>
398
    <sect2>
399
      <title>Input messages to scripts</title>
400
      <simpara>Input message, new instance allocation</simpara>
401
      <screen>
402
{
403
  "cluster_tags": [],
404
  "request": {
405
    "required_nodes": 2,
406
    "name": "instance3.example.com",
407
    "tags": [
408
      "type:test",
409
      "owner:foo"
410
    ],
411
    "type": "allocate",
412
    "disks": [
413
      {
414
        "mode": "w",
415
        "size": 1024
416
      },
417
      {
418
        "mode": "w",
419
        "size": 2048
420
      }
421
    ],
422
    "nics": [
423
      {
424
        "ip": null,
425
        "mac": "00:11:22:33:44:55",
426
        "bridge": null
427
      }
428
    ],
429
    "vcpus": 1,
430
    "disk_template": "drbd",
431
    "memory": 2048,
432
    "disk_space_total": 3328,
433
    "os": "etch-image"
434
  },
435
  "cluster_name": "cluster1.example.com",
436
  "instances": {
437
    "instance1.example.com": {
438
      "tags": [],
439
      "should_run": false,
440
      "disks": [
441
        {
442
          "mode": "w",
443
          "size": 64
444
        },
445
        {
446
          "mode": "w",
447
          "size": 512
448
        }
449
      ],
450
      "nics": [
451
        {
452
          "ip": null,
453
          "mac": "aa:00:00:00:60:bf",
454
          "bridge": "xen-br0"
455
        }
456
      ],
457
      "vcpus": 1,
458
      "disk_template": "plain",
459
      "memory": 128,
460
      "nodes": [
461
        "nodee1.com"
462
      ],
463
      "os": "etch-image"
464
    },
465
    "instance2.example.com": {
466
      "tags": [],
467
      "should_run": false,
468
      "disks": [
469
        {
470
          "mode": "w",
471
          "size": 512
472
        },
473
        {
474
          "mode": "w",
475
          "size": 256
476
        }
477
      ],
478
      "nics": [
479
        {
480
          "ip": null,
481
          "mac": "aa:00:00:55:f8:38",
482
          "bridge": "xen-br0"
483
        }
484
      ],
485
      "vcpus": 1,
486
      "disk_template": "drbd",
487
      "memory": 512,
488
      "nodes": [
489
        "node2.example.com",
490
        "node3.example.com"
491
      ],
492
      "os": "etch-image"
493
    }
494
  },
495
  "version": 1,
496
  "nodes": {
497
    "node1.example.com": {
498
      "total_disk": 858276,
499
      "primary_ip": "192.168.1.1",
500
      "secondary_ip": "192.168.2.1",
501
      "tags": [],
502
      "free_memory": 3505,
503
      "free_disk": 856740,
504
      "total_memory": 4095
505
    },
506
    "node2.example.com": {
507
      "total_disk": 858240,
508
      "primary_ip": "192.168.1.3",
509
      "secondary_ip": "192.168.2.3",
510
      "tags": ["test"],
511
      "free_memory": 3505,
512
      "free_disk": 848320,
513
      "total_memory": 4095
514
    },
515
    "node3.example.com.com": {
516
      "total_disk": 572184,
517
      "primary_ip": "192.168.1.3",
518
      "secondary_ip": "192.168.2.3",
519
      "tags": [],
520
      "free_memory": 3505,
521
      "free_disk": 570648,
522
      "total_memory": 4095
523
    }
524
  }
525
}
526
</screen>
527
      <simpara>Input message, reallocation. Since only the request
528
      entry in the input message is changed, the following shows only
529
      this entry:</simpara>
530
      <screen>
531
  "request": {
532
    "relocate_from": [
533
      "node3.example.com"
534
    ],
535
    "required_nodes": 1,
536
    "type": "relocate",
537
    "name": "instance2.example.com",
538
    "disk_space_total": 832
539
  },
540
</screen>
541

    
542
    </sect2>
543
    <sect2>
544
      <title>Response messages</title>
545
      <simpara>Successful response message:</simpara>
546
      <screen>
547
{
548
  "info": "Allocation successful",
549
  "nodes": [
550
    "node2.example.com",
551
    "node1.example.com"
552
  ],
553
  "success": true
554
}
555
</screen>
556
      <simpara>Failed response message:</simpara>
557
      <screen>
558
{
559
  "info": "Can't find a suitable node for position 2 (already selected: node2.example.com)",
560
  "nodes": [],
561
  "success": false
562
}
563
</screen>
564
    </sect2>
565

    
566
    <sect2>
567
      <title>Command line messages</title>
568
      <screen>
569
# gnt-instance add -t plain -m 2g --os-size 1g --swap-size 512m --iallocator dumb-allocator -o etch-image instance3
570
Selected nodes for the instance: node1.example.com
571
* creating instance disks...
572
[...]
573

    
574
# gnt-instance add -t plain -m 3400m --os-size 1g --swap-size 512m --iallocator dumb-allocator -o etch-image instance4
575
Failure: prerequisites not met for this operation:
576
Can't compute nodes using iallocator 'dumb-allocator': Can't find a suitable node for position 1 (already selected: )
577

    
578
# gnt-instance add -t drbd -m 1400m --os-size 1g --swap-size 512m --iallocator dumb-allocator -o etch-image instance5
579
Failure: prerequisites not met for this operation:
580
Can't compute nodes using iallocator 'dumb-allocator': Can't find a suitable node for position 2 (already selected: node1.example.com)
581

    
582
</screen>
583
    </sect2>
584
  </sect1>
585

    
586
  </article>