Revision c71a1a3d doc/admin.rst

b/doc/admin.rst
5 5

  
6 6
.. contents::
7 7

  
8
.. highlight:: text
9

  
8 10
Introduction
9 11
------------
10 12

  
11
Ganeti is a virtualization cluster management software. You are
12
expected to be a system administrator familiar with your Linux
13
distribution and the Xen or KVM virtualization environments before
14
using it.
15

  
13
Ganeti is a virtualization cluster management software. You are expected
14
to be a system administrator familiar with your Linux distribution and
15
the Xen or KVM virtualization environments before using it.
16 16

  
17 17
The various components of Ganeti all have man pages and interactive
18
help. This manual though will help you getting familiar with the
19
system by explaining the most common operations, grouped by related
20
use.
18
help. This manual though will help you getting familiar with the system
19
by explaining the most common operations, grouped by related use.
21 20

  
22 21
After a terminology glossary and a section on the prerequisites needed
23
to use this manual, the rest of this document is divided in three main
24
sections, which group different features of Ganeti:
22
to use this manual, the rest of this document is divided in sections
23
for the different targets that a command affects: instance, nodes, etc.
25 24

  
26
- Instance Management
27
- High Availability Features
28
- Debugging Features
25
.. _terminology-label:
29 26

  
30 27
Ganeti terminology
31
~~~~~~~~~~~~~~~~~~
28
++++++++++++++++++
32 29

  
33
This section provides a small introduction to Ganeti terminology,
34
which might be useful to read the rest of the document.
30
This section provides a small introduction to Ganeti terminology, which
31
might be useful when reading the rest of the document.
35 32

  
36 33
Cluster
37
  A set of machines (nodes) that cooperate to offer a coherent
38
  highly available virtualization service.
34
~~~~~~~
39 35

  
40
Node
41
  A physical machine which is member of a cluster.
42
  Nodes are the basic cluster infrastructure, and are
43
  not fault tolerant.
36
A set of machines (nodes) that cooperate to offer a coherent, highly
37
available virtualization service under a single administration domain.
44 38

  
45
Master node
46
  The node which controls the Cluster, from which all
47
  Ganeti commands must be given.
39
Node
40
~~~~
41

  
42
A physical machine which is member of a cluster.  Nodes are the basic
43
cluster infrastructure, and they don't need to be fault tolerant in
44
order to achieve high availability for instances.
45

  
46
Node can be added and removed (if they host no instances) at will from
47
the cluster. In a HA cluster and only with HA instances, the loss of any
48
single node will not cause disk data loss for any instance; of course,
49
a node crash will cause the crash of the its primary instances.
50

  
51
A node belonging to a cluster can be in one of the following roles at a
52
given time:
53

  
54
- *master* node, which is the node from which the cluster is controlled
55
- *master candidate* node, only nodes in this role have the full cluster
56
  configuration and knowledge, and only master candidates can become the
57
  master node
58
- *regular* node, which is the state in which most nodes will be on
59
  bigger clusters (>20 nodes)
60
- *drained* node, nodes in this state are functioning normally but the
61
  cannot receive new instances; the intention is that nodes in this role
62
  have some issue and they are being evacuated for hardware repairs
63
- *offline* node, in which there is a record in the cluster
64
  configuration about the node, but the daemons on the master node will
65
  not talk to this node; any instances declared as having an offline
66
  node as either primary or secondary will be flagged as an error in the
67
  cluster verify operation
68

  
69
Depending on the role, each node will run a set of daemons:
70

  
71
- the :command:`ganeti-noded` daemon, which control the manipulation of
72
  this node's hardware resources; it runs on all nodes which are in a
73
  cluster
74
- the :command:`ganeti-confd` daemon (Ganeti 2.1+) which runs on all
75
  nodes, but is only functional on master candidate nodes
76
- the :command:`ganeti-rapi` daemon which runs on the master node and
77
  offers an HTTP-based API for the cluster
78
- the :command:`ganeti-masterd` daemon which runs on the master node and
79
  allows control of the cluster
48 80

  
49 81
Instance
50
  A virtual machine which runs on a cluster. It can be a
51
  fault tolerant highly available entity.
82
~~~~~~~~
52 83

  
53
Pool
54
  A pool is a set of clusters sharing the same network.
84
A virtual machine which runs on a cluster. It can be a fault tolerant,
85
highly available entity.
55 86

  
56
Meta-Cluster
57
  Anything that concerns more than one cluster.
87
An instance has various parameters, which are classified in three
88
categories: hypervisor related-parameters (called ``hvparams``), general
89
parameters (called ``beparams``) and per network-card parameters (called
90
``nicparams``). All these parameters can be modified either at instance
91
level or via defaults at cluster level.
58 92

  
59
Prerequisites
93
Disk template
60 94
~~~~~~~~~~~~~
61 95

  
62
You need to have your Ganeti cluster installed and configured before
63
you try any of the commands in this document. Please follow the
64
*Ganeti installation tutorial* for instructions on how to do that.
96
The are multiple options for the storage provided to an instance; while
97
the instance sees the same virtual drive in all cases, the node-level
98
configuration varies between them.
65 99

  
66
Managing Instances
67
------------------
100
There are four disk templates you can choose from:
68 101

  
69
Adding/Removing an instance
70
~~~~~~~~~~~~~~~~~~~~~~~~~~~
102
diskless
103
  The instance has no disks. Only used for special purpose operating
104
  systems or for testing.
105

  
106
file
107
  The instance will use plain files as backend for its disks. No
108
  redundancy is provided, and this is somewhat more difficult to
109
  configure for high performance.
110

  
111
plain
112
  The instance will use LVM devices as backend for its disks. No
113
  redundancy is provided.
114

  
115
drbd
116
  .. note:: This is only valid for multi-node clusters using DRBD 8.0+
117

  
118
  A mirror is set between the local node and a remote one, which must be
119
  specified with the second value of the --node option. Use this option
120
  to obtain a highly available instance that can be failed over to a
121
  remote node should the primary one fail.
122

  
123
IAllocator
124
~~~~~~~~~~
125

  
126
A framework for using external (user-provided) scripts to compute the
127
placement of instances on the cluster nodes. This eliminates the need to
128
manually specify nodes in instance add, instance moves, node evacuate,
129
etc.
130

  
131
In order for Ganeti to be able to use these scripts, they must be place
132
in the iallocator directory (usually ``lib/ganeti/iallocators`` under
133
the installation prefix, e.g. ``/usr/local``).
134

  
135
“Primary” and “secondary” concepts
136
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
137

  
138
An instance has a primary and depending on the disk configuration, might
139
also have a secondary node. The instance always runs on the primary node
140
and only uses its secondary node for disk replication.
141

  
142
Similarly, the term of primary and secondary instances when talking
143
about a node refers to the set of instances having the given node as
144
primary, respectively secondary.
145

  
146
Tags
147
~~~~
148

  
149
Tags are short strings that can be attached to either to cluster itself,
150
or to nodes or instances. They are useful as a very simplistic
151
information store for helping with cluster administration, for example
152
by attaching owner information to each instance after it's created::
71 153

  
72
Adding a new virtual instance to your Ganeti cluster is really easy.
73
The command is::
154
  gnt-instance add … instance1
155
  gnt-instance add-tags instance1 owner:user2
156

  
157
And then by listing each instance and its tags, this information could
158
be used for contacting the users of each instance.
159

  
160
Jobs and OpCodes
161
~~~~~~~~~~~~~~~~
162

  
163
While not directly visible by an end-user, it's useful to know that a
164
basic cluster operation (e.g. starting an instance) is represented
165
internall by Ganeti as an *OpCode* (abbreviation from operation
166
code). These OpCodes are executed as part of a *Job*. The OpCodes in a
167
single Job are processed serially by Ganeti, but different Jobs will be
168
processed (depending on resource availability) in parallel.
169

  
170
For example, shutting down the entire cluster can be done by running the
171
command ``gnt-instance shutdown --all``, which will submit for each
172
instance a separate job containing the “shutdown instance” OpCode.
173

  
174

  
175
Prerequisites
176
+++++++++++++
177

  
178
You need to have your Ganeti cluster installed and configured before you
179
try any of the commands in this document. Please follow the
180
:doc:`install` for instructions on how to do that.
181

  
182
Instance management
183
-------------------
184

  
185
Adding an instance
186
++++++++++++++++++
187

  
188
The add operation might seem complex due to the many parameters it
189
accepts, but once you have understood the (few) required parameters and
190
the customisation capabilities you will see it is an easy operation.
191

  
192
The add operation requires at minimum five parameters:
193

  
194
- the OS for the instance
195
- the disk template
196
- the disk count and size
197
- the node specification or alternatively the iallocator to use
198
- and finally the instance name
199

  
200
The OS for the instance must be visible in the output of the command
201
``gnt-os list`` and specifies which guest OS to install on the instance.
202

  
203
The disk template specifies what kind of storage to use as backend for
204
the (virtual) disks presented to the instance; note that for instances
205
with multiple virtual disks, they all must be of the same type.
206

  
207
The node(s) on which the instance will run can be given either manually,
208
via the ``-n`` option, or computed automatically by Ganeti, if you have
209
installed any iallocator script.
210

  
211
With the above parameters in mind, the command is::
74 212

  
75 213
  gnt-instance add \
76
    -n TARGET_NODE:SECONDARY_NODE -o OS_TYPE -t DISK_TEMPLATE \
214
    -n TARGET_NODE:SECONDARY_NODE \
215
    -o OS_TYPE \
216
    -t DISK_TEMPLATE -s DISK_SIZE \
77 217
    INSTANCE_NAME
78 218

  
79 219
The instance name must be resolvable (e.g. exist in DNS) and usually
80
to an address in the same subnet as the cluster itself. Options you
81
can give to this command include:
220
points to an address in the same subnet as the cluster itself.
82 221

  
83
- The disk size (``-s``) for a single-disk instance, or multiple
84
  ``--disk N:size=SIZE`` options for multi-instance disks
222
The above command has the minimum required options; other options you
223
can give include, among others:
85 224

  
86 225
- The memory size (``-B memory``)
87 226

  
......
91 230
  instance is created. The IP and/or bridge of the NIC can be changed
92 231
  via ``--nic 0:ip=IP,bridge=BRIDGE``
93 232

  
233
See the manpage for gnt-instance for the detailed option list.
94 234

  
95
There are four types of disk template you can choose from:
96

  
97
diskless
98
  The instance has no disks. Only used for special purpouse operating
99
  systems or for testing.
100

  
101
file
102
  The instance will use plain files as backend for its disks. No
103
  redundancy is provided, and this is somewhat more difficult to
104
  configure for high performance.
105

  
106
plain
107
  The instance will use LVM devices as backend for its disks. No
108
  redundancy is provided.
109

  
110
drbd
111
  .. note:: This is only valid for multi-node clusters using DRBD 8.0.x
112

  
113
  A mirror is set between the local node and a remote one, which must
114
  be specified with the second value of the --node option. Use this
115
  option to obtain a highly available instance that can be failed over
116
  to a remote node should the primary one fail.
235
For example if you want to create an highly available instance, with a
236
single disk of 50GB and the default memory size, having primary node
237
``node1`` and secondary node ``node3``, use the following command::
117 238

  
118
For example if you want to create an highly available instance use the
119
drbd disk templates::
239
  gnt-instance add -n node1:node3 -o debootstrap -t drbd \
240
    instance1
120 241

  
121
  gnt-instance add -n TARGET_NODE:SECONDARY_NODE -o OS_TYPE -t drbd \
122
    INSTANCE_NAME
242
There is a also a command for batch instance creation from a
243
specification file, see the ``batch-create`` operation in the
244
gnt-instance manual page.
123 245

  
124
To know which operating systems your cluster supports you can use
125
the command::
246
Regular instance operations
247
+++++++++++++++++++++++++++
126 248

  
127
  gnt-os list
249
Removal
250
~~~~~~~
128 251

  
129
Removing an instance is even easier than creating one. This operation
130
is irrereversible and destroys all the contents of your instance. Use
131
with care::
252
Removing an instance is even easier than creating one. This operation is
253
irreversible and destroys all the contents of your instance. Use with
254
care::
132 255

  
133 256
  gnt-instance remove INSTANCE_NAME
134 257

  
135
Starting/Stopping an instance
136
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
258
Startup/shutdown
259
~~~~~~~~~~~~~~~~
137 260

  
138 261
Instances are automatically started at instance creation time. To
139 262
manually start one which is currently stopped you can run::
......
144 267

  
145 268
  gnt-instance shutdown INSTANCE_NAME
146 269

  
270
.. warning:: Do not use the Xen or KVM commands directly to stop
271
   instances. If you run for example ``xm shutdown`` or ``xm destroy``
272
   on an instance Ganeti will automatically restart it (via the
273
   :command:`ganeti-watcher` command which is launched via cron).
274

  
275
Querying instances
276
~~~~~~~~~~~~~~~~~~
277

  
278
There are two ways to get information about instances: listing
279
instances, which does a tabular output containing a given set of fields
280
about each instance, and querying detailed information about a set of
281
instances.
282

  
147 283
The command to see all the instances configured and their status is::
148 284

  
149 285
  gnt-instance list
150 286

  
151
Do not use the Xen commands to stop instances. If you run for example
152
xm shutdown or xm destroy on an instance Ganeti will automatically
153
restart it (via the ``ganeti-watcher``).
287
The command can return a custom set of information when using the ``-o``
288
option (as always, check the manpage for a detailed specification). Each
289
instance will be represented on a line, thus making it easy to parse
290
this output via the usual shell utilities (grep, sed, etc.).
291

  
292
To get more detailed information about an instance, you can run::
293

  
294
  gnt-instance info INSTANCE
295

  
296
which will give a multi-line block of information about the instance,
297
it's hardware resources (especially its disks and their redundancy
298
status), etc. This is harder to parse and is more expensive than the
299
list operation, but returns much more detailed information.
154 300

  
155
Exporting/Importing an instance
156
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
157 301

  
158
You can create a snapshot of an instance disk and Ganeti
302
Export/Import
303
+++++++++++++
304

  
305
You can create a snapshot of an instance disk and its Ganeti
159 306
configuration, which then you can backup, or import into another
160 307
cluster. The way to export an instance is::
161 308

  
162 309
  gnt-backup export -n TARGET_NODE INSTANCE_NAME
163 310

  
311

  
164 312
The target node can be any node in the cluster with enough space under
165
``/srv/ganeti`` to hold the instance image. Use the *--noshutdown*
166
option to snapshot an instance without rebooting it. Any previous
167
snapshot of the same instance existing cluster-wide under
168
``/srv/ganeti`` will be removed by this operation: if you want to keep
169
them move them out of the Ganeti exports directory.
313
``/srv/ganeti`` to hold the instance image. Use the ``--noshutdown``
314
option to snapshot an instance without rebooting it. Note that Ganeti
315
only keeps one snapshot for an instance - any previous snapshot of the
316
same instance existing cluster-wide under ``/srv/ganeti`` will be
317
removed by this operation: if you want to keep them, you need to move
318
them out of the Ganeti exports directory.
170 319

  
171
Importing an instance is similar to creating a new one. The command is::
320
Importing an instance is similar to creating a new one, but additionally
321
one must specify the location of the snapshot. The command is::
172 322

  
173 323
  gnt-backup import -n TARGET_NODE -t DISK_TEMPLATE \
174 324
    --src-node=NODE --src-dir=DIR INSTANCE_NAME
......
176 326
Most of the options available for the command :command:`gnt-instance
177 327
add` are supported here too.
178 328

  
179
High availability features
180
--------------------------
329
Instance HA features
330
--------------------
181 331

  
182 332
.. note:: This section only applies to multi-node clusters
183 333

  
334
.. _instance-change-primary-label:
335

  
336
Changing the primary node
337
+++++++++++++++++++++++++
338

  
339
There are three ways to exchange an instance's primary and secondary
340
nodes; the right one to choose depends on how the instance has been
341
created and the status of its current primary node. See
342
:ref:`rest-redundancy-label` for information on changing the secondary
343
node. Note that it's only possible to change the primary node to the
344
secondary and vice-versa; a direct change of the primary node with a
345
third node, while keeping the current secondary is not possible in a
346
single step, only via multiple operations as detailed in
347
:ref:`instance-relocation-label`.
348

  
184 349
Failing over an instance
185 350
~~~~~~~~~~~~~~~~~~~~~~~~
186 351

  
......
192 357
  gnt-instance failover INSTANCE_NAME
193 358

  
194 359
That's it. After the command completes the secondary node is now the
195
primary, and vice versa.
360
primary, and vice-versa.
196 361

  
197 362
Live migrating an instance
198 363
~~~~~~~~~~~~~~~~~~~~~~~~~~
199 364

  
200
If an instance is built in highly available mode, it currently runs
201
and both its nodes are running fine, you can at migrate it over to its
202
secondary node, without dowtime. On the master node you need to run::
365
If an instance is built in highly available mode, it currently runs and
366
both its nodes are running fine, you can at migrate it over to its
367
secondary node, without downtime. On the master node you need to run::
203 368

  
204 369
  gnt-instance migrate INSTANCE_NAME
205 370

  
206
Replacing an instance disks
207
~~~~~~~~~~~~~~~~~~~~~~~~~~~
371
The current load on the instance and its memory size will influence how
372
long the migration will take. In any case, for both KVM and Xen
373
hypervisors, the migration will be transparent to the instance.
208 374

  
209
So what if instead the secondary node for an instance has failed, or
210
you plan to remove a node from your cluster, and you failed over all
211
its instances, but it's still secondary for some? The solution here is
212
to replace the instance disks, changing the secondary node::
375
Moving an instance (offline)
376
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
213 377

  
214
  gnt-instance replace-disks -n NODE INSTANCE_NAME
378
If an instance has not been create as mirrored, then the only way to
379
change its primary node is to execute the move command::
215 380

  
216
This process is a bit long, but involves no instance downtime, and at
217
the end of it the instance has changed its secondary node, to which it
218
can if necessary be failed over.
381
  gnt-instance move -n NEW_NODE INSTANCE
219 382

  
220
Failing over the master node
221
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
383
This has a few prerequisites:
222 384

  
223
This is all good as long as the Ganeti Master Node is up. Should it go
224
down, or should you wish to decommission it, just run on any other
225
node the command::
385
- the instance must be stopped
386
- its current primary node must be on-line and healthy
387
- the disks of the instance must not have any errors
226 388

  
227
  gnt-cluster masterfailover
389
Since this operation actually copies the data from the old node to the
390
new node, expect it to take proportional to the size of the instance's
391
disks and the speed of both the nodes' I/O system and their networking.
228 392

  
229
and the node you ran it on is now the new master.
393
Disk operations
394
+++++++++++++++
230 395

  
231
Adding/Removing nodes
232
~~~~~~~~~~~~~~~~~~~~~
396
Disk failures are a common cause of errors in any server
397
deployment. Ganeti offers protection from single-node failure if your
398
instances were created in HA mode, and it also offers ways to restore
399
redundancy after a failure.
233 400

  
234
And of course, now that you know how to move instances around, it's
235
easy to free up a node, and then you can remove it from the cluster::
401
Preparing for disk operations
402
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
236 403

  
237
  gnt-node remove NODE_NAME
404
It is important to note that for Ganeti to be able to do any disk
405
operation, the Linux machines on top of which Ganeti must be consistent;
406
for LVM, this means that the LVM commands must not return failures; it
407
is common that after a complete disk failure, any LVM command aborts
408
with an error similar to::
238 409

  
239
and maybe add a new one::
410
  # vgs
411
  /dev/sdb1: read failed after 0 of 4096 at 0: Input/output error
412
  /dev/sdb1: read failed after 0 of 4096 at 750153695232: Input/output
413
  error
414
  /dev/sdb1: read failed after 0 of 4096 at 0: Input/output error
415
  Couldn't find device with uuid
416
  't30jmN-4Rcf-Fr5e-CURS-pawt-z0jU-m1TgeJ'.
417
  Couldn't find all physical volumes for volume group xenvg.
240 418

  
241
  gnt-node add --secondary-ip=ADDRESS NODE_NAME
419
Before restoring an instance's disks to healthy status, it's needed to
420
fix the volume group used by Ganeti so that we can actually create and
421
manage the logical volumes. This is usually done in a multi-step
422
process:
242 423

  
243
Debugging Features
244
------------------
424
#. first, if the disk is completely gone and LVM commands exit with
425
   “Couldn't find device with uuid…” then you need to run the command::
426

  
427
    vgreduce --removemissing VOLUME_GROUP
428

  
429
#. after the above command, the LVM commands should be executing
430
   normally (warnings are normal, but the commands will not fail
431
   completely).
432

  
433
#. if the failed disk is still visible in the output of the ``pvs``
434
   command, you need to deactivate it from allocations by running::
435

  
436
    pvs -x n /dev/DISK
245 437

  
246
At some point you might need to do some debugging operations on your
247
cluster or on your instances. This section will help you with the most
248
used debugging functionalities.
438
At this point, the volume group should be consistent and any bad
439
physical volumes should not longer be available for allocation.
440

  
441
Note that since version 2.1 Ganeti provides some commands to automate
442
these two operations, see :ref:`storage-units-label`.
443

  
444
.. _rest-redundancy-label:
445

  
446
Restoring redundancy for DRBD-based instances
447
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
448

  
449
A DRBD instance has two nodes, and the storage on one of them has
450
failed. Depending on which node (primary or secondary) has failed, you
451
have three options at hand:
452

  
453
- if the storage on the primary node has failed, you need to re-create
454
  the disks on it
455
- if the storage on the secondary node has failed, you can either
456
  re-create the disks on it or change the secondary and recreate
457
  redundancy on the new secondary node
458

  
459
Of course, at any point it's possible to force re-creation of disks even
460
though everything is already fine.
461

  
462
For all three cases, the ``replace-disks`` operation can be used::
463

  
464
  # re-create disks on the primary node
465
  gnt-instance replace-disks -p INSTANCE_NAME
466
  # re-create disks on the current secondary
467
  gnt-instance replace-disks -s INSTANCE_NAME
468
  # change the secondary node, via manual specification
469
  gnt-instance replace-disks -n NODE INSTANCE_NAME
470
  # change the secondary node, via an iallocator script
471
  gnt-instance replace-disks -I SCRIPT INSTANCE_NAME
472
  # since Ganeti 2.1: automatically fix the primary or secondary node
473
  gnt-instance replace-disks -a INSTANCE_NAME
474

  
475
Since the process involves copying all data from the working node to the
476
target node, it will take a while, depending on the instance's disk
477
size, node I/O system and network speed. But it is (baring any network
478
interruption) completely transparent for the instance.
479

  
480
Re-creating disks for non-redundant instances
481
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
482

  
483
.. versionadded:: 2.1
484

  
485
For non-redundant instances, there isn't a copy (except backups) to
486
re-create the disks. But it's possible to at-least re-create empty
487
disks, after which a reinstall can be run, via the ``recreate-disks``
488
command::
489

  
490
  gnt-instance recreate-disks INSTANCE
491

  
492
Note that this will fail if the disks already exists.
493

  
494
Debugging instances
495
+++++++++++++++++++
249 496

  
250 497
Accessing an instance's disks
251 498
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
252 499

  
253
From an instance's primary node you have access to its disks. Never
500
From an instance's primary node you can have access to its disks. Never
254 501
ever mount the underlying logical volume manually on a fault tolerant
255
instance, or you risk breaking replication. The correct way to access
256
them is to run the command::
502
instance, or will break replication and your data will be
503
inconsistent. The correct way to access an instance's disks is to run
504
(on the master node, as usual) the command::
505

  
506
  gnt-instance activate-disks INSTANCE
507

  
508
And then, *on the primary node of the instance*, access the device that
509
gets created. For example, you could mount the given disks, then edit
510
files on the filesystem, etc.
511

  
512
Note that with partitioned disks (as opposed to whole-disk filesystems),
513
you will need to use a tool like :manpage:`kpartx(8)`::
257 514

  
258
  gnt-instance activate-disks INSTANCE_NAME
515
  node1# gnt-instance activate-disks instance1
516
517
  node1# ssh node3
518
  node3# kpartx -l /dev/…
519
  node3# kpartx -a /dev/…
520
  node3# mount /dev/mapper/… /mnt/
521
  # edit files under mnt as desired
522
  node3# umount /mnt/
523
  node3# kpartx -d /dev/…
524
  node3# exit
525
  node1#
259 526

  
260
And then access the device that gets created.  After you've finished
261
you can deactivate them with the deactivate-disks command, which works
262
in the same way.
527
After you've finished you can deactivate them with the deactivate-disks
528
command, which works in the same way::
529

  
530
  gnt-instance deactivate-disks INSTANCE
531

  
532
Note that if any process started by you is still using the disks, the
533
above command will error out, and you **must** cleanup and ensure that
534
the above command runs successfully before you start the instance,
535
otherwise the instance will suffer corruption.
263 536

  
264 537
Accessing an instance's console
265 538
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
......
268 541

  
269 542
  gnt-instance console INSTANCE_NAME
270 543

  
271
Use the console normally and then type ``^]`` when
272
done, to exit.
544
Use the console normally and then type ``^]`` when done, to exit.
545

  
546
Other instance operations
547
+++++++++++++++++++++++++
548

  
549
Reboot
550
~~~~~~
273 551

  
274
Instance OS definitions Debugging
275
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
552
There is a wrapper command for rebooting instances::
276 553

  
277
Should you have any problems with operating systems support the
278
command to ran to see a complete status for all your nodes is::
554
  gnt-instance reboot instance2
555

  
556
By default, this does the equivalent of shutting down and then starting
557
the instance, but it accepts parameters to perform a soft-reboot (via
558
the hypervisor), a hard reboot (hypervisor shutdown and then startup) or
559
a full one (the default, which also de-configures and then configures
560
again the disks of the instance).
561

  
562
Instance OS definitions debugging
563
+++++++++++++++++++++++++++++++++
564

  
565
Should you have any problems with instance operating systems the command
566
to see a complete status for all your nodes is::
279 567

  
280 568
   gnt-os diagnose
281 569

  
282
Cluster-wide debugging
283
~~~~~~~~~~~~~~~~~~~~~~
570
.. _instance-relocation-label:
284 571

  
285
The :command:`gnt-cluster` command offers several options to run tests
286
or execute cluster-wide operations. For example::
572
Instance relocation
573
~~~~~~~~~~~~~~~~~~~
287 574

  
288
  gnt-cluster command
289
  gnt-cluster copyfile
290
  gnt-cluster verify
291
  gnt-cluster verify-disks
292
  gnt-cluster getmaster
293
  gnt-cluster version
575
While it is not possible to move an instance from nodes ``(A, B)`` to
576
nodes ``(C, D)`` in a single move, it is possible to do so in a few
577
steps::
294 578

  
295
See the man page :manpage:`gnt-cluster` to know more about their usage.
579
  # instance is located on A, B
580
  node1# gnt-instance replace -n nodeC instance1
581
  # instance has moved from (A, B) to (A, C)
582
  # we now flip the primary/secondary nodes
583
  node1# gnt-instance migrate instance1
584
  # instance lives on (C, A)
585
  # we can then change A to D via:
586
  node1# gnt-instance replace -n nodeD instance1
296 587

  
297
Removing a cluster entirely
588
Which brings it into the final configuration of ``(C, D)``. Note that we
589
needed to do two replace-disks operation (two copies of the instance
590
disks), because we needed to get rid of both the original nodes (A and
591
B).
592

  
593
Node operations
594
---------------
595

  
596
There are much fewer node operations available than for instances, but
597
they are equivalently important for maintaining a healthy cluster.
598

  
599
Add/readd
600
+++++++++
601

  
602
It is at any time possible to extend the cluster with one more node, by
603
using the node add operation::
604

  
605
  gnt-node add NEW_NODE
606

  
607
If the cluster has a replication network defined, then you need to pass
608
the ``-s REPLICATION_IP`` parameter to this option.
609

  
610
A variation of this command can be used to re-configure a node if its
611
Ganeti configuration is broken, for example if it has been reinstalled
612
by mistake::
613

  
614
  gnt-node add --readd EXISTING_NODE
615

  
616
This will reinitialise the node as if it's been newly added, but while
617
keeping its existing configuration in the cluster (primary/secondary IP,
618
etc.), in other words you won't need to use ``-s`` here.
619

  
620
Changing the node role
621
++++++++++++++++++++++
622

  
623
A node can be in different roles, as explained in the
624
:ref:`terminology-label` section. Promoting a node to the master role is
625
special, while the other roles are handled all via a single command.
626

  
627
Failing over the master node
628
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
629

  
630
If you want to promote a different node to the master role (for whatever
631
reason), run on any other master-candidate node the command::
632

  
633
  gnt-cluster masterfailover
634

  
635
and the node you ran it on is now the new master. In case you try to run
636
this on a non master-candidate node, you will get an error telling you
637
which nodes are valid.
638

  
639
Changing between the other roles
640
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
641

  
642
The ``gnt-node modify`` command can be used to select a new role::
643

  
644
  # change to master candidate
645
  gnt-node modify -C yes NODE
646
  # change to drained status
647
  gnt-node modify -D yes NODE
648
  # change to offline status
649
  gnt-node modify -O yes NODE
650
  # change to regular mode (reset all flags)
651
  gnt-node modify -O no -D no -C no NODE
652

  
653
Note that the cluster requires that at any point in time, a certain
654
number of nodes are master candidates, so changing from master candidate
655
to other roles might fail. It is recommended to either force the
656
operation (via the ``--force`` option) or first change the number of
657
master candidates in the cluster - see :ref:`cluster-config-label`.
658

  
659
Evacuating nodes
660
++++++++++++++++
661

  
662
There are two steps of moving instances off a node:
663

  
664
- moving the primary instances (actually converting them into secondary
665
  instances)
666
- moving the secondary instances (including any instances converted in
667
  the step above)
668

  
669
Primary instance conversion
670
~~~~~~~~~~~~~~~~~~~~~~~~~~~
671

  
672
For this step, you can use either individual instance move
673
commands (as seen in :ref:`instance-change-primary-label`) or the bulk
674
per-node versions; these are::
675

  
676
  gnt-node migrate NODE
677
  gnt-node evacuate NODE
678

  
679
Note that the instance “move” command doesn't currently have a node
680
equivalent.
681

  
682
Both these commands, or the equivalent per-instance command, will make
683
this node the secondary node for the respective instances, whereas their
684
current secondary node will become primary. Note that it is not possible
685
to change in one step the primary node to another node as primary, while
686
keeping the same secondary node.
687

  
688
Secondary instance evacuation
689
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
690

  
691
For the evacuation of secondary instances, a command called
692
:command:`gnt-node evacuate` is provided and its syntax is::
693

  
694
  gnt-node evacuate -I IALLOCATOR_SCRIPT NODE
695
  gnt-node evacuate -n DESTINATION_NODE NODE
696

  
697
The first version will compute the new secondary for each instance in
698
turn using the given iallocator script, whereas the second one will
699
simply move all instances to DESTINATION_NODE.
700

  
701
Removal
702
+++++++
703

  
704
Once a node no longer has any instances (neither primary nor secondary),
705
it's easy to remove it from the cluster::
706

  
707
  gnt-node remove NODE_NAME
708

  
709
This will deconfigure the node, stop the ganeti daemons on it and leave
710
it hopefully like before it joined to the cluster.
711

  
712
Storage handling
713
++++++++++++++++
714

  
715
When using LVM (either standalone or with DRBD), it can become tedious
716
to debug and fix it in case of errors. Furthermore, even file-based
717
storage can become complicated to handle manually on many hosts. Ganeti
718
provides a couple of commands to help with automation.
719

  
720
Logical volumes
721
~~~~~~~~~~~~~~~
722

  
723
This is a command specific to LVM handling. It allows listing the
724
logical volumes on a given node or on all nodes and their association to
725
instances via the ``volumes`` command::
726

  
727
  node1# gnt-node volumes
728
  Node  PhysDev   VG    Name             Size Instance
729
  node1 /dev/sdb1 xenvg e61fbc97-….disk0 512M instance17
730
  node1 /dev/sdb1 xenvg ebd1a7d1-….disk0 512M instance19
731
  node2 /dev/sdb1 xenvg 0af08a3d-….disk0 512M instance20
732
  node2 /dev/sdb1 xenvg cc012285-….disk0 512M instance16
733
  node2 /dev/sdb1 xenvg f0fac192-….disk0 512M instance18
734

  
735
The above command maps each logical volume to a volume group and
736
underlying physical volume and (possibly) to an instance.
737

  
738
.. _storage-units-label:
739

  
740
Generalized storage handling
741
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
742

  
743
.. versionadded:: 2.1
744

  
745
Starting with Ganeti 2.1, a new storage framework has been implemented
746
that tries to abstract the handling of the storage type the cluster
747
uses.
748

  
749
First is listing the backend storage and their space situation::
750

  
751
  node1# gnt-node list-storage
752
  Node  Name        Size Used   Free
753
  node1 /dev/sda7 673.8G   0M 673.8G
754
  node1 /dev/sdb1 698.6G 1.5G 697.1G
755
  node2 /dev/sda7 673.8G   0M 673.8G
756
  node2 /dev/sdb1 698.6G 1.0G 697.6G
757

  
758
The default is to list LVM physical volumes. It's also possible to list
759
the LVM volume groups::
760

  
761
  node1# gnt-node list-storage -t lvm-vg
762
  Node  Name  Size
763
  node1 xenvg 1.3T
764
  node2 xenvg 1.3T
765

  
766
Next is repairing storage units, which is currently only implemented for
767
volume groups and does the equivalent of ``vgreduce --removemissing``::
768

  
769
  node1# gnt-node repair-storage node2 lvm-vg xenvg
770
  Sun Oct 25 22:21:45 2009 Repairing storage unit 'xenvg' on node2 ...
771

  
772
Last is the modification of volume properties, which is (again) only
773
implemented for LVM physical volumes and allows toggling the
774
``allocatable`` value::
775

  
776
  node1# gnt-node modify-storage --allocatable=no node2 lvm-pv /dev/sdb1
777

  
778
Use of the storage commands
298 779
~~~~~~~~~~~~~~~~~~~~~~~~~~~
299 780

  
300
The usual method to cleanup a cluster is to run ``gnt-cluster
301
destroy`` however if the Ganeti installation is broken in any way then
302
this will not run.
781
All these commands are needed when recovering a node from a disk
782
failure:
783

  
784
- first, we need to recover from complete LVM failure (due to missing
785
  disk), by running the ``repair-storage`` command
786
- second, we need to change allocation on any partially-broken disk
787
  (i.e. LVM still sees it, but it has bad blocks) by running
788
  ``modify-storage``
789
- then we can evacuate the instances as needed
303 790

  
304
It is possible in such a case to cleanup manually most if not all
305
traces of a cluster installation by following these steps on all of
306
the nodes:
307 791

  
308
1. Shutdown all instances. This depends on the virtualisation
309
   method used (Xen, KVM, etc.):
792
Cluster operations
793
------------------
794

  
795
Beside the cluster initialisation command (which is detailed in the
796
:doc:`install` document) and the master failover command which is
797
explained under node handling, there are a couple of other cluster
798
operations available.
799

  
800
.. _cluster-config-label:
801

  
802
Standard operations
803
+++++++++++++++++++
804

  
805
One of the few commands that can be run on any node (not only the
806
master) is the ``getmaster`` command::
807

  
808
  node2# gnt-cluster getmaster
809
  node1.example.com
810
  node2#
811

  
812
It is possible to query and change global cluster parameters via the
813
``info`` and ``modify`` commands::
814

  
815
  node1# gnt-cluster info
816
  Cluster name: cluster.example.com
817
  Cluster UUID: 07805e6f-f0af-4310-95f1-572862ee939c
818
  Creation time: 2009-09-25 05:04:15
819
  Modification time: 2009-10-18 22:11:47
820
  Master node: node1.example.com
821
  Architecture (this node): 64bit (x86_64)
822
823
  Tags: foo
824
  Default hypervisor: xen-pvm
825
  Enabled hypervisors: xen-pvm
826
  Hypervisor parameters:
827
    - xen-pvm:
828
        root_path: /dev/sda1
829
830
  Cluster parameters:
831
    - candidate pool size: 10
832
833
  Default instance parameters:
834
    - default:
835
        memory: 128
836
837
  Default nic parameters:
838
    - default:
839
        link: xen-br0
840
841

  
842
There various parameters above can be changed via the ``modify``
843
commands as follows:
844

  
845
- the hypervisor parameters can be changed via ``modify -H
846
  xen-pvm:root_path=…``, and so on for other hypervisors/key/values
847
- the "default instance parameters" are changeable via ``modify -B
848
  parameter=value…`` syntax
849
- the cluster parameters are changeable via separate options to the
850
  modify command (e.g. ``--candidate-pool-size``, etc.)
851

  
852
For detailed option list see the :manpage:`gnt-cluster(8)` man page.
853

  
854
The cluster version can be obtained via the ``version`` command::
855
  node1# gnt-cluster version
856
  Software version: 2.1.0
857
  Internode protocol: 20
858
  Configuration format: 2010000
859
  OS api version: 15
860
  Export interface: 0
861

  
862
This is not very useful except when debugging Ganeti.
863

  
864
Global node commands
865
++++++++++++++++++++
866

  
867
There are two commands provided for replicating files to all nodes of a
868
cluster and for running commands on all the nodes::
869

  
870
  node1# gnt-cluster copyfile /path/to/file
871
  node1# gnt-cluster command ls -l /path/to/file
872

  
873
These are simple wrappers over scp/ssh and more advanced usage can be
874
obtained using :manpage:`dsh(1)` and similar commands. But they are
875
useful to update an OS script from the master node, for example.
876

  
877
Cluster verification
878
++++++++++++++++++++
879

  
880
There are three commands that relate to global cluster checks. The first
881
one is ``verify`` which gives an overview on the cluster state,
882
highlighting any issues. In normal operation, this command should return
883
no ``ERROR`` messages::
884

  
885
  node1# gnt-cluster verify
886
  Sun Oct 25 23:08:58 2009 * Verifying global settings
887
  Sun Oct 25 23:08:58 2009 * Gathering data (2 nodes)
888
  Sun Oct 25 23:09:00 2009 * Verifying node status
889
  Sun Oct 25 23:09:00 2009 * Verifying instance status
890
  Sun Oct 25 23:09:00 2009 * Verifying orphan volumes
891
  Sun Oct 25 23:09:00 2009 * Verifying remaining instances
892
  Sun Oct 25 23:09:00 2009 * Verifying N+1 Memory redundancy
893
  Sun Oct 25 23:09:00 2009 * Other Notes
894
  Sun Oct 25 23:09:00 2009   - NOTICE: 5 non-redundant instance(s) found.
895
  Sun Oct 25 23:09:00 2009 * Hooks Results
896

  
897
The second command is ``verify-disks``, which checks that the instance's
898
disks have the correct status based on the desired instance state
899
(up/down)::
900

  
901
  node1# gnt-cluster verify-disks
902

  
903
Note that this command will show no output when disks are healthy.
904

  
905
The last command is used to repair any discrepancies in Ganeti's
906
recorded disk size and the actual disk size (disk size information is
907
needed for proper activation and growth of DRBD-based disks)::
908

  
909
  node1# gnt-cluster repair-disk-sizes
910
  Sun Oct 25 23:13:16 2009  - INFO: Disk 0 of instance instance1 has mismatched size, correcting: recorded 512, actual 2048
911
  Sun Oct 25 23:13:17 2009  - WARNING: Invalid result from node node4, ignoring node results
912

  
913
The above shows one instance having wrong disk size, and a node which
914
returned invalid data, and thus we ignored all primary instances of that
915
node.
916

  
917
Configuration redistribution
918
++++++++++++++++++++++++++++
919

  
920
If the verify command complains about file mismatches between the master
921
and other nodes, due to some node problems or if you manually modified
922
configuration files, you can force an push of the master configuration
923
to all other nodes via the ``redist-conf`` command::
924

  
925
  node1# gnt-cluster redist-conf
926
  node1#
927

  
928
This command will be silent unless there are problems sending updates to
929
the other nodes.
930

  
931

  
932
Cluster renaming
933
++++++++++++++++
934

  
935
It is possible to rename a cluster, or to change its IP address, via the
936
``rename`` command. If only the IP has changed, you need to pass the
937
current name and Ganeti will realise its IP has changed::
938

  
939
  node1# gnt-cluster rename cluster.example.com
940
  This will rename the cluster to 'cluster.example.com'. If
941
  you are connected over the network to the cluster name, the operation
942
  is very dangerous as the IP address will be removed from the node and
943
  the change may not go through. Continue?
944
  y/[n]/?: y
945
  Failure: prerequisites not met for this operation:
946
  Neither the name nor the IP address of the cluster has changed
947

  
948
In the above output, neither value has changed since the cluster
949
initialisation so the operation is not completed.
950

  
951
Queue operations
952
++++++++++++++++
953

  
954
The job queue execution in Ganeti 2.0 and higher can be inspected,
955
suspended and resumed via the ``queue`` command::
956

  
957
  node1~# gnt-cluster queue info
958
  The drain flag is unset
959
  node1~# gnt-cluster queue drain
960
  node1~# gnt-instance stop instance1
961
  Failed to submit job for instance1: Job queue is drained, refusing job
962
  node1~# gnt-cluster queue info
963
  The drain flag is set
964
  node1~# gnt-cluster queue undrain
965

  
966
This is most useful if you have an active cluster and you need to
967
upgrade the Ganeti software, or simply restart the software on any node:
968

  
969
#. suspend the queue via ``queue drain``
970
#. wait until there are no more running jobs via ``gnt-job list``
971
#. restart the master or another node, or upgrade the software
972
#. resume the queue via ``queue undrain``
973

  
974
.. note:: this command only stores a local flag file, and if you
975
   failover the master, it will not have effect on the new master.
976

  
977

  
978
Watcher control
979
+++++++++++++++
980

  
981
The :manpage:`ganeti-watcher` is a program, usually scheduled via
982
``cron``, that takes care of cluster maintenance operations (restarting
983
downed instances, activating down DRBD disks, etc.). However, during
984
maintenance and troubleshooting, this can get in your way; disabling it
985
via commenting out the cron job is not so good as this can be
986
forgotten. Thus there are some commands for automated control of the
987
watcher: ``pause``, ``info`` and ``continue``::
988

  
989
  node1~# gnt-cluster watcher info
990
  The watcher is not paused.
991
  node1~# gnt-cluster watcher pause 1h
992
  The watcher is paused until Mon Oct 26 00:30:37 2009.
993
  node1~# gnt-cluster watcher info
994
  The watcher is paused until Mon Oct 26 00:30:37 2009.
995
  node1~# ganeti-watcher -d
996
  2009-10-25 23:30:47,984:  pid=28867 ganeti-watcher:486 DEBUG Pause has been set, exiting
997
  node1~# gnt-cluster watcher continue
998
  The watcher is no longer paused.
999
  node1~# ganeti-watcher -d
1000
  2009-10-25 23:31:04,789:  pid=28976 ganeti-watcher:345 DEBUG Archived 0 jobs, left 0
1001
  2009-10-25 23:31:05,884:  pid=28976 ganeti-watcher:280 DEBUG Got data from cluster, writing instance status file
1002
  2009-10-25 23:31:06,061:  pid=28976 ganeti-watcher:150 DEBUG Data didn't change, just touching status file
1003
  node1~# gnt-cluster watcher info
1004
  The watcher is not paused.
1005
  node1~#
1006

  
1007
The exact details of the argument to the ``pause`` command are available
1008
in the manpage.
1009

  
1010
.. note:: this command only stores a local flag file, and if you
1011
   failover the master, it will not have effect on the new master.
1012

  
1013
Removing a cluster entirely
1014
+++++++++++++++++++++++++++
1015

  
1016
The usual method to cleanup a cluster is to run ``gnt-cluster destroy``
1017
however if the Ganeti installation is broken in any way then this will
1018
not run.
1019

  
1020
It is possible in such a case to cleanup manually most if not all traces
1021
of a cluster installation by following these steps on all of the nodes:
1022

  
1023
1. Shutdown all instances. This depends on the virtualisation method
1024
   used (Xen, KVM, etc.):
310 1025

  
311 1026
  - Xen: run ``xm list`` and ``xm destroy`` on all the non-Domain-0
312 1027
    instances
313 1028
  - KVM: kill all the KVM processes
314 1029
  - chroot: kill all processes under the chroot mountpoints
315 1030

  
316
2. If using DRBD, shutdown all DRBD minors (which should by at this
317
   time no-longer in use by instances); on each node, run ``drbdsetup
1031
2. If using DRBD, shutdown all DRBD minors (which should by at this time
1032
   no-longer in use by instances); on each node, run ``drbdsetup
318 1033
   /dev/drbdN down`` for each active DRBD minor.
319 1034

  
320
3. If using LVM, cleanup the Ganeti volume group; if only Ganeti
321
   created logical volumes (and you are not sharing the volume group
322
   with the OS, for example), then simply running ``lvremove -f
323
   xenvg`` (replace 'xenvg' with your volume group name) should do the
324
   required cleanup.
1035
3. If using LVM, cleanup the Ganeti volume group; if only Ganeti created
1036
   logical volumes (and you are not sharing the volume group with the
1037
   OS, for example), then simply running ``lvremove -f xenvg`` (replace
1038
   'xenvg' with your volume group name) should do the required cleanup.
325 1039

  
326 1040
4. If using file-based storage, remove recursively all files and
327 1041
   directories under your file-storage directory: ``rm -rf
328
   /srv/ganeti/file-storage/*`` replacing the path with the correct
329
   path for your cluster.
1042
   /srv/ganeti/file-storage/*`` replacing the path with the correct path
1043
   for your cluster.
330 1044

  
331 1045
5. Stop the ganeti daemons (``/etc/init.d/ganeti stop``) and kill any
332 1046
   that remain alive (``pgrep ganeti`` and ``pkill ganeti``).
......
335 1049
   replacing the path with the correct path for your installation.
336 1050

  
337 1051
On the master node, remove the cluster from the master-netdev (usually
338
``xen-br0`` for bridged mode, otherwise ``eth0`` or similar), by
339
running ``ip a del $clusterip/32 dev xen-br0`` (use the correct
340
cluster ip and network device name).
1052
``xen-br0`` for bridged mode, otherwise ``eth0`` or similar), by running
1053
``ip a del $clusterip/32 dev xen-br0`` (use the correct cluster ip and
1054
network device name).
341 1055

  
342 1056
At this point, the machines are ready for a cluster creation; in case
343
you want to remove Ganeti completely, you need to also undo some of
344
the SSH changes and log directories:
1057
you want to remove Ganeti completely, you need to also undo some of the
1058
SSH changes and log directories:
345 1059

  
346 1060
- ``rm -rf /var/log/ganeti /srv/ganeti`` (replace with the correct
347 1061
  paths)
348
- remove from ``/root/.ssh`` the keys that Ganeti added (check
349
  the ``authorized_keys`` and ``id_dsa`` files)
1062
- remove from ``/root/.ssh`` the keys that Ganeti added (check the
1063
  ``authorized_keys`` and ``id_dsa`` files)
350 1064
- regenerate the host's SSH keys (check the OpenSSH startup scripts)
351 1065
- uninstall Ganeti
352 1066

  
353 1067
Otherwise, if you plan to re-create the cluster, you can just go ahead
354 1068
and rerun ``gnt-cluster init``.
355 1069

  
1070
Tags handling
1071
-------------
1072

  
1073
The tags handling (addition, removal, listing) is similar for all the
1074
objects that support it (instances, nodes, and the cluster).
1075

  
1076
Limitations
1077
+++++++++++
1078

  
1079
Note that the set of characters present in a tag and the maximum tag
1080
length are restricted. Currently the maximum length is 128 characters,
1081
there can be at most 4096 tags per object, and the set of characters is
1082
comprised by alphanumeric characters and additionally ``.+*/:-``.
1083

  
1084
Operations
1085
++++++++++
1086

  
1087
Tags can be added via ``add-tags``::
1088

  
1089
  gnt-instance add-tags INSTANCE a b c
1090
  gnt-node add-tags INSTANCE a b c
1091
  gnt-cluster add-tags a b c
1092

  
1093

  
1094
The above commands add three tags to an instance, to a node and to the
1095
cluster. Note that the cluster command only takes tags as arguments,
1096
whereas the node and instance commands first required the node and
1097
instance name.
1098

  
1099
Tags can also be added from a file, via the ``--from=FILENAME``
1100
argument. The file is expected to contain one tag per line.
1101

  
1102
Tags can also be remove via a syntax very similar to the add one::
1103

  
1104
  gnt-instance remove-tags INSTANCE a b c
1105

  
1106
And listed via::
1107

  
1108
  gnt-instance list-tags
1109
  gnt-node list-tags
1110
  gnt-cluster list-tags
1111

  
1112
Global tag search
1113
+++++++++++++++++
1114

  
1115
It is also possible to execute a global search on the all tags defined
1116
in the cluster configuration, via a cluster command::
1117

  
1118
  gnt-cluster search-tags REGEXP
1119

  
1120
The parameter expected is a regular expression (see
1121
:manpage:`regex(7)`). This will return all tags that match the search,
1122
together with the object they are defined in (the names being show in a
1123
hierarchical kind of way)::
1124

  
1125
  node1# gnt-cluster search-tags o
1126
  /cluster foo
1127
  /instances/instance1 owner:bar
1128

  
1129

  
1130
Job operations
1131
--------------
1132

  
1133
The various jobs submitted by the instance/node/cluster commands can be
1134
examined, canceled and archived by various invocations of the
1135
``gnt-job`` command.
1136

  
1137
First is the job list command::
1138

  
1139
  node1# gnt-job list
1140
  17771 success INSTANCE_QUERY_DATA
1141
  17773 success CLUSTER_VERIFY_DISKS
1142
  17775 success CLUSTER_REPAIR_DISK_SIZES
1143
  17776 error   CLUSTER_RENAME(cluster.example.com)
1144
  17780 success CLUSTER_REDIST_CONF
1145
  17792 success INSTANCE_REBOOT(instance1.example.com)
1146

  
1147
More detailed information about a job can be found via the ``info``
1148
command::
1149

  
1150
  node1# gnt-job info 17776
1151
  Job ID: 17776
1152
    Status: error
1153
    Received:         2009-10-25 23:18:02.180569
1154
    Processing start: 2009-10-25 23:18:02.200335 (delta 0.019766s)
1155
    Processing end:   2009-10-25 23:18:02.279743 (delta 0.079408s)
1156
    Total processing time: 0.099174 seconds
1157
    Opcodes:
1158
      OP_CLUSTER_RENAME
1159
        Status: error
1160
        Processing start: 2009-10-25 23:18:02.200335
1161
        Processing end:   2009-10-25 23:18:02.252282
1162
        Input fields:
1163
          name: cluster.example.com
1164
        Result:
1165
          OpPrereqError
1166
          [Neither the name nor the IP address of the cluster has changed]
1167
        Execution log:
1168

  
1169
During the execution of a job, it's possible to follow the output of a
1170
job, similar to the log that one get from the ``gnt-`` commands, via the
1171
watch command::
1172

  
1173
  node1# gnt-instance add --submit … instance1
1174
  JobID: 17818
1175
  node1# gnt-job watch 17818
1176
  Output from job 17818 follows
1177
  -----------------------------
1178
  Mon Oct 26 00:22:48 2009  - INFO: Selected nodes for instance instance1 via iallocator dumb: node1, node2
1179
  Mon Oct 26 00:22:49 2009 * creating instance disks...
1180
  Mon Oct 26 00:22:52 2009 adding instance instance1 to cluster config
1181
  Mon Oct 26 00:22:52 2009  - INFO: Waiting for instance instance1 to sync disks.
1182
1183
  Mon Oct 26 00:23:03 2009 creating os for instance xen-devi-18.fra.corp.google.com on node mpgntac4.fra.corp.google.com
1184
  Mon Oct 26 00:23:03 2009 * running the instance OS create scripts...
1185
  Mon Oct 26 00:23:13 2009 * starting instance...
1186
  node1#
1187

  
1188
This is useful if you need to follow a job's progress from multiple
1189
terminals.
1190

  
1191
A job that has not yet started to run can be canceled::
1192

  
1193
  node1# gnt-job cancel 17810
1194

  
1195
But not one that has already started execution::
1196

  
1197
  node1# gnt-job cancel 17805
1198
  Job 17805 is no longer waiting in the queue
1199

  
1200
There are two queues for jobs: the *current* and the *archive*
1201
queue. Jobs are initially submitted to the current queue, and they stay
1202
in that queue until they have finished execution (either successfully or
1203
not). At that point, they can be moved into the archive queue, and the
1204
ganeti-watcher script will do this automatically after 6 hours. The
1205
ganeti-cleaner script will remove the jobs from the archive directory
1206
after three weeks.
1207

  
1208
Note that only jobs in the current queue can be viewed via the list and
1209
info commands; Ganeti itself doesn't examine the archive directory. If
1210
you need to see an older job, either move the file manually in the
1211
top-level queue directory, or look at its contents (it's a
1212
JSON-formatted file).
1213

  
1214
Ganeti tools
1215
------------
1216

  
1217
Beside the usual ``gnt-`` and ``ganeti-`` commands which are provided
1218
and installed in ``$prefix/sbin`` at install time, there are a couple of
1219
other tools installed which are used seldom but can be helpful in some
1220
cases.
1221

  
1222
lvmstrap
1223
++++++++
1224

  
1225
The ``lvmstrap`` tool, introduced in :ref:`configure-lvm-label` section,
1226
has two modes of operation:
1227

  
1228
- ``diskinfo`` shows the discovered disks on the system and their status
1229
- ``create`` takes all not-in-use disks and creates a volume group out
1230
  of them
1231

  
1232
.. warning:: The ``create`` argument to this command causes data-loss!
1233

  
1234
cfgupgrade
1235
++++++++++
1236

  
1237
The ``cfgupgrade`` tools is used to upgrade between major (and minor)
1238
Ganeti versions. Point-releases are usually transparent for the admin.
1239

  
1240
More information about the upgrade procedure is listed on the wiki at
1241
http://code.google.com/p/ganeti/wiki/UpgradeNotes.
1242

  
1243
cfgshell
1244
++++++++
1245

  
1246
.. note:: This command is not actively maintained; make sure you backup
1247
   your configuration before using it
1248

  
1249
This can be used as an alternative to direct editing of the
1250
main configuration file if Ganeti has a bug and prevents you, for
1251
example, from removing an instance or a node from the configuration
1252
file.
1253

  
1254
.. _burnin-label:
1255

  
1256
burnin
1257
++++++
1258

  
1259
.. warning:: This command will erase existing instances if given as
1260
   arguments!
1261

  
1262
This tool is used to exercise either the hardware of machines or
1263
alternatively the Ganeti software. It is safe to run on an existing
1264
cluster **as long as you don't pass it existing instance names**.
1265

  
1266
The command will, by default, execute a comprehensive set of operations
1267
against a list of instances, these being:
1268

  
1269
- creation
1270
- disk replacement (for redundant instances)
1271
- failover and migration (for redundant instances)
1272
- move (for non-redundant instances)
1273
- disk growth
1274
- add disks, remove disk
1275
- add NICs, remove NICs
1276
- export and then import
1277
- rename
1278
- reboot
1279
- shutdown/startup
1280
- and finally removal of the test instances
1281

  
1282
Executing all these operations will test that the hardware performs
1283
well: the creation, disk replace, disk add and disk growth will exercise
1284
the storage and network; the migrate command will test the memory of the
1285
systems. Depending on the passed options, it can also test that the
1286
instance OS definitions are executing properly the rename, import and
1287
export operations.
1288

  
1289
Other Ganeti projects
1290
---------------------
1291

  
1292
There are two other Ganeti-related projects that can be useful in a
1293
Ganeti deployment. These can be downloaded from the project site
1294
(http://code.google.com/p/ganeti/) and the repositories are also on the
1295
project git site (http://git.ganeti.org).
1296

  
1297
NBMA tools
1298
++++++++++
1299

  
1300
The ``ganeti-nbma`` software is designed to allow instances to live on a
1301
separate, virtual network from the nodes, and in an environment where
1302
nodes are not guaranteed to be able to reach each other via multicasting
1303
or broadcasting. For more information see the README in the source
1304
archive.
1305

  
1306
ganeti-htools
1307
+++++++++++++
1308

  
1309
The ``ganeti-htools`` software consists of a set of tools:
1310

  
1311
- ``hail``: an advanced iallocator script compared to Ganeti's builtin
1312
  one
1313
- ``hbal``: a tool for rebalancing the cluster, i.e. moving instances
... This diff was truncated because it exceeds the maximum size that can be displayed.

Also available in: Unified diff