code.grnet.gr Git - ganeti-local/blob - doc/admin.sgml

   1 <!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook V4.2//EN" [
   2 ]>
   3   <article class="specification">
   4   <articleinfo>
   5     <title>Ganeti administrator's guide</title>
   6   </articleinfo>
   7   <para>Documents Ganeti version 1.2</para>
   8   <sect1>
   9     <title>Introduction</title>
  10
  11     <para>Ganeti is a virtualization cluster management software. You are
  12     expected to be a system administrator familiar with your Linux distribution
  13     and the Xen virtualization environment before using it.
  14     </para>
  15
  16     <para>The various components of Ganeti all have man pages and interactive
  17     help. This manual though will help you getting familiar with the system by
  18     explaining the most common operations, grouped by related use.
  19     </para>
  20
  21     <para>After a terminology glossary and a section on the prerequisites
  22     needed to use this manual, the rest of this document is divided in three
  23     main sections, which group different features of Ganeti:
  24       <itemizedlist>
  25         <listitem>
  26           <simpara>Instance Management</simpara>
  27         </listitem>
  28         <listitem>
  29           <simpara>High Availability Features</simpara>
  30         </listitem>
  31         <listitem>
  32           <simpara>Debugging Features</simpara>
  33         </listitem>
  34       </itemizedlist>
  35     </para>
  36
  37     <sect2>
  38       <title>Ganeti terminology</title>
  39
  40       <para>
  41         This section provides a small introduction to Ganeti terminology, which
  42         might be useful to read the rest of the document.
  43
  44         <glosslist>
  45           <glossentry>
  46             <glossterm>Cluster</glossterm>
  47             <glossdef>
  48               <simpara>
  49                 A set of machines (nodes) that cooperate to offer a
  50                 coherent highly available virtualization service.
  51               </simpara>
  52             </glossdef>
  53           </glossentry>
  54           <glossentry>
  55             <glossterm>Node</glossterm>
  56             <glossdef>
  57               <simpara>
  58                 A physical machine which is member of a cluster.
  59                 Nodes are the basic cluster infrastructure, and are
  60                 not fault tolerant.
  61               </simpara>
  62             </glossdef>
  63           </glossentry>
  64           <glossentry>
  65             <glossterm>Master node</glossterm>
  66             <glossdef>
  67               <simpara>
  68                 The node which controls the Cluster, from which all
  69                 Ganeti commands must be given.
  70               </simpara>
  71             </glossdef>
  72           </glossentry>
  73           <glossentry>
  74             <glossterm>Instance</glossterm>
  75             <glossdef>
  76               <simpara>
  77                 A virtual machine which runs on a cluster. It can be a
  78                 fault tolerant highly available entity.
  79               </simpara>
  80             </glossdef>
  81           </glossentry>
  82           <glossentry>
  83             <glossterm>Pool</glossterm>
  84             <glossdef>
  85               <simpara>
  86                 A pool is a set of clusters sharing the same network.
  87               </simpara>
  88             </glossdef>
  89           </glossentry>
  90           <glossentry>
  91             <glossterm>Meta-Cluster</glossterm>
  92             <glossdef>
  93               <simpara>
  94                 Anything that concerns more than one cluster.
  95               </simpara>
  96             </glossdef>
  97           </glossentry>
  98         </glosslist>
  99       </para>
 100     </sect2>
 101
 102     <sect2>
 103       <title>Prerequisites</title>
 104
 105       <para>
 106         You need to have your Ganeti cluster installed and configured before
 107         you try any of the commands in this document. Please follow the
 108         <emphasis>Ganeti installation tutorial</emphasis> for instructions on
 109         how to do that.
 110       </para>
 111     </sect2>
 112
 113   </sect1>
 114
 115   <sect1>
 116     <title>Managing Instances</title>
 117
 118     <sect2>
 119       <title>Adding/Removing an instance</title>
 120
 121       <para>
 122         Adding a new virtual instance to your Ganeti cluster is really easy.
 123         The command is:
 124
 125         <synopsis>gnt-instance add -n <replaceable>TARGET_NODE</replaceable> -o <replaceable>OS_TYPE</replaceable> -t <replaceable>DISK_TEMPLATE</replaceable> <replaceable>INSTANCE_NAME</replaceable></synopsis>
 126
 127         The instance name must be resolvable (e.g. exist in DNS) and
 128         of course map to an address in the same subnet as the cluster
 129         itself. Options you can give to this command include:
 130
 131       <itemizedlist>
 132         <listitem>
 133           <simpara>The disk size (<option>-s</option>)</simpara>
 134         </listitem>
 135         <listitem>
 136           <simpara>The swap size (<option>--swap-size</option>)</simpara>
 137         </listitem>
 138         <listitem>
 139           <simpara>The memory size (<option>-m</option>)</simpara>
 140         </listitem>
 141         <listitem>
 142           <simpara>The number of virtual CPUs (<option>-p</option>)</simpara>
 143         </listitem>
 144         <listitem>
 145           <simpara>The instance ip address (<option>-i</option>) (use the value
 146             <literal>auto</literal> to make Ganeti record the address from
 147             dns)</simpara>
 148         </listitem>
 149         <listitem>
 150           <simpara>The bridge to connect the instance to (<option>-b</option>),
 151             if you don't want to use the default one</simpara>
 152         </listitem>
 153       </itemizedlist>
 154       </para>
 155
 156       <para>There are four types of disk template you can choose from:</para>
 157
 158       <variablelist>
 159         <varlistentry>
 160           <term>diskless</term>
 161           <listitem>
 162             <para>The instance has no disks. Only used for special purpouse
 163               operating systems or for testing.</para>
 164           </listitem>
 165         </varlistentry>
 166
 167         <varlistentry>
 168           <term>plain</term>
 169           <listitem>
 170             <para>The instance will use LVM devices as backend for its disks.
 171               No redundancy is provided.</para>
 172           </listitem>
 173         </varlistentry>
 174
 175         <varlistentry>
 176           <term>local_raid1</term>
 177           <listitem>
 178             <para>A local mirror is set between LVM devices to back the
 179               instance. This provides some redundancy for the instance's
 180               data.</para>
 181           </listitem>
 182         </varlistentry>
 183
 184         <varlistentry>
 185           <term>remote_raid1</term>
 186           <listitem>
 187             <simpara><emphasis role="strong">Note:</emphasis> This is only
 188               valid for multi-node clusters using drbd 0.7.</simpara>
 189             <simpara>
 190               A mirror is set between the local node and a remote one, which
 191               must be specified with the second value of the --node option. Use
 192               this option to obtain a highly available instance that can be
 193               failed over to a remote node should the primary one fail.
 194             </simpara>
 195           </listitem>
 196         </varlistentry>
 197
 198         <varlistentry>
 199           <term>drbd</term>
 200           <listitem>
 201             <simpara><emphasis role="strong">Note:</emphasis> This is only
 202               valid for multi-node clusters using drbd 8.0.</simpara>
 203             <simpara>
 204               This is similar to the
 205               <replaceable>remote_raid1</replaceable> option, but uses
 206               new features in drbd 8 to simplify the device
 207               stack. From a user's point of view, this will improve
 208               the speed of the <command>replace-disks</command>
 209               command and (in future versions) provide more
 210               functionality.
 211             </simpara>
 212           </listitem>
 213         </varlistentry>
 214
 215       </variablelist>
 216
 217       <para>
 218         For example if you want to create an highly available instance use the
 219         remote_raid1 or drbd disk templates:
 220         <synopsis>gnt-instance add -n <replaceable>TARGET_NODE</replaceable><optional>:<replaceable>SECONDARY_NODE</replaceable></optional> -o <replaceable>OS_TYPE</replaceable> -t remote_raid1 \
 221   <replaceable>INSTANCE_NAME</replaceable></synopsis>
 222
 223       <para>
 224         To know which operating systems your cluster supports you can use
 225         <synopsis>gnt-os list</synopsis>
 226       </para>
 227
 228       <para>
 229         Removing an instance is even easier than creating one. This operation
 230         is non-reversible and destroys all the contents of your instance. Use
 231         with care:
 232
 233         <synopsis>gnt-instance remove <replaceable>INSTANCE_NAME</replaceable></synopsis>
 234       </para>
 235     </sect2>
 236
 237     <sect2>
 238       <title>Starting/Stopping an instance</title>
 239
 240       <para>
 241         Instances are automatically started at instance creation time. To
 242         manually start one which is currently stopped you can run:
 243
 244         <synopsis>gnt-instance startup <replaceable>INSTANCE_NAME</replaceable></synopsis>
 245
 246         While the command to stop one is:
 247
 248         <synopsis>gnt-instance shutdown <replaceable>INSTANCE_NAME</replaceable></synopsis>
 249
 250         The command to see all the instances configured and their status is:
 251
 252         <synopsis>gnt-instance list</synopsis>
 253
 254       </para>
 255
 256       <para>
 257         Do not use the xen commands to stop instances. If you run for
 258         example xm shutdown or xm destroy on an instance Ganeti will
 259         automatically restart it (via the
 260         <citerefentry><refentrytitle>ganeti-watcher</refentrytitle>
 261         <manvolnum>8</manvolnum></citerefentry>)
 262       </para>
 263
 264     </sect2>
 265
 266     <sect2>
 267       <title>Exporting/Importing an instance</title>
 268
 269       <para>
 270         You can create a snapshot of an instance disk and Ganeti
 271         configuration, which then you can backup, or import into
 272         another cluster. The way to export an instance is:
 273
 274         <synopsis>gnt-backup export -n <replaceable>TARGET_NODE</replaceable> <replaceable>INSTANCE_NAME</replaceable></synopsis>
 275
 276         The target node can be any node in the cluster with enough
 277         space under <filename class="directory">/srv/ganeti</filename>
 278         to hold the instance image. Use the
 279         <option>--noshutdown</option> option to snapshot an instance
 280         without rebooting it. Any previous snapshot of the same
 281         instance existing cluster-wide under <filename
 282         class="directory">/srv/ganeti</filename> will be removed by
 283         this operation: if you want to keep them move them out of the
 284         Ganeti exports directory.
 285       </para>
 286
 287       <para>
 288         Importing an instance is similar to creating a new one. The command is:
 289
 290         <synopsis>gnt-backup import -n <replaceable>TARGET_NODE</replaceable> -t <replaceable>DISK_TEMPLATE</replaceable> --src-node=<replaceable>NODE</replaceable> --src-dir=DIR INSTANCE_NAME</synopsis>
 291
 292         Most of the options available for the command
 293         <emphasis>gnt-instance add</emphasis> are supported here too.
 294
 295       </para>
 296     </sect2>
 297
 298   </sect1>
 299
 300
 301   <sect1>
 302     <title>High availability features</title>
 303
 304     <note>
 305       <simpara>This section only applies to multi-node clusters.</simpara>
 306     </note>
 307
 308     <sect2>
 309       <title>Failing over an instance</title>
 310
 311       <para>
 312         If an instance is built in highly available mode you can at
 313         any time fail it over to its secondary node, even if the
 314         primary has somehow failed and it's not up anymore. Doing it
 315         is really easy, on the master node you can just run:
 316
 317         <synopsis>gnt-instance failover <replaceable>INSTANCE_NAME</replaceable></synopsis>
 318
 319         That's it. After the command completes the secondary node is
 320         now the primary, and vice versa.
 321       </para>
 322     </sect2>
 323
 324     <sect2>
 325       <title>Replacing an instance disks</title>
 326
 327       <para>
 328         So what if instead the secondary node for an instance has
 329         failed, or you plan to remove a node from your cluster, and
 330         you failed over all its instances, but it's still secondary
 331         for some? The solution here is to replace the instance disks,
 332         changing the secondary node. This is done in two ways, depending on the disk template type. For <literal>remote_raid1</literal>:
 333
 334         <synopsis>gnt-instance replace-disks <option>-n <replaceable>NEW_SECONDARY</replaceable></option> <replaceable>INSTANCE_NAME</replaceable></synopsis>
 335
 336         and for <literal>drbd</literal>:
 337         <synopsis>gnt-instance replace-disks <option>-s</option> <option>-n <replaceable>NEW_SECONDARY</replaceable></option> <replaceable>INSTANCE_NAME</replaceable></synopsis>
 338
 339         This process is a bit longer, but involves no instance
 340         downtime, and at the end of it the instance has changed its
 341         secondary node, to which it can if necessary be failed over.
 342       </para>
 343     </sect2>
 344     <sect2>
 345       <title>Failing over the master node</title>
 346
 347       <para>
 348         This is all good as long as the Ganeti Master Node is
 349         up. Should it go down, or should you wish to decommission it,
 350         just run on any other node the command:
 351
 352         <synopsis>gnt-cluster masterfailover</synopsis>
 353
 354         and the node you ran it on is now the new master.
 355       </para>
 356     </sect2>
 357     <sect2>
 358       <title>Adding/Removing nodes</title>
 359
 360       <para>
 361         And of course, now that you know how to move instances around,
 362         it's easy to free up a node, and then you can remove it from
 363         the cluster:
 364
 365         <synopsis>gnt-node remove <replaceable>NODE_NAME</replaceable></synopsis>
 366
 367         and maybe add a new one:
 368
 369         <synopsis>gnt-node add <optional><option>--secondary-ip=<replaceable>ADDRESS</replaceable></option></optional> <replaceable>NODE_NAME</replaceable>
 370
 371       </synopsis>
 372       </para>
 373     </sect2>
 374   </sect1>
 375
 376   <sect1>
 377     <title>Debugging Features</title>
 378
 379     <para>
 380       At some point you might need to do some debugging operations on
 381       your cluster or on your instances. This section will help you
 382       with the most used debugging functionalities.
 383     </para>
 384
 385     <sect2>
 386       <title>Accessing an instance's disks</title>
 387
 388       <para>
 389         From an instance's primary node you have access to its
 390         disks. Never ever mount the underlying logical volume manually
 391         on a fault tolerant instance, or you risk breaking
 392         replication. The correct way to access them is to run the
 393         command:
 394
 395         <synopsis>gnt-instance activate-disks <replaceable>INSTANCE_NAME</replaceable></synopsis>
 396
 397         And then access the device that gets created.  After you've
 398         finished you can deactivate them with the deactivate-disks
 399         command, which works in the same way.
 400       </para>
 401     </sect2>
 402
 403     <sect2>
 404       <title>Accessing an instance's console</title>
 405
 406       <para>
 407         The command to access a running instance's console is:
 408
 409         <synopsis>gnt-instance console <replaceable>INSTANCE_NAME</replaceable></synopsis>
 410
 411         Use the console normally and then type
 412         <userinput>^]</userinput> when done, to exit.
 413       </para>
 414     </sect2>
 415
 416     <sect2>
 417       <title>Instance OS definitions Debugging</title>
 418
 419       <para>
 420         Should you have any problems with operating systems support
 421         the command to ran to see a complete status for all your nodes
 422         is:
 423
 424         <synopsis>gnt-os diagnose</synopsis>
 425
 426       </para>
 427
 428     </sect2>
 429
 430     <sect2>
 431       <title>Cluster-wide debugging</title>
 432
 433       <para>
 434         The gnt-cluster command offers several options to run tests or
 435         execute cluster-wide operations. For example:
 436
 437       <screen>
 438 gnt-cluster command
 439 gnt-cluster copyfile
 440 gnt-cluster verify
 441 gnt-cluster getmaster
 442 gnt-cluster version
 443       </screen>
 444
 445         See the man page <citerefentry>
 446         <refentrytitle>gnt-cluster</refentrytitle>
 447         <manvolnum>8</manvolnum> </citerefentry> to know more about
 448         their usage.
 449       </para>
 450     </sect2>
 451
 452   </sect1>
 453
 454   </article>