root / doc / admin.rst @ 56c9a709
History | View | Annotate | Download (11 kB)
1 |
Ganeti administrator's guide |
---|---|
2 |
============================ |
3 |
|
4 |
Documents Ganeti version |version| |
5 |
|
6 |
.. contents:: |
7 |
|
8 |
Introduction |
9 |
------------ |
10 |
|
11 |
Ganeti is a virtualization cluster management software. You are |
12 |
expected to be a system administrator familiar with your Linux |
13 |
distribution and the Xen or KVM virtualization environments before |
14 |
using it. |
15 |
|
16 |
|
17 |
The various components of Ganeti all have man pages and interactive |
18 |
help. This manual though will help you getting familiar with the |
19 |
system by explaining the most common operations, grouped by related |
20 |
use. |
21 |
|
22 |
After a terminology glossary and a section on the prerequisites needed |
23 |
to use this manual, the rest of this document is divided in three main |
24 |
sections, which group different features of Ganeti: |
25 |
|
26 |
- Instance Management |
27 |
- High Availability Features |
28 |
- Debugging Features |
29 |
|
30 |
Ganeti terminology |
31 |
~~~~~~~~~~~~~~~~~~ |
32 |
|
33 |
This section provides a small introduction to Ganeti terminology, |
34 |
which might be useful to read the rest of the document. |
35 |
|
36 |
Cluster |
37 |
A set of machines (nodes) that cooperate to offer a coherent |
38 |
highly available virtualization service. |
39 |
|
40 |
Node |
41 |
A physical machine which is member of a cluster. |
42 |
Nodes are the basic cluster infrastructure, and are |
43 |
not fault tolerant. |
44 |
|
45 |
Master node |
46 |
The node which controls the Cluster, from which all |
47 |
Ganeti commands must be given. |
48 |
|
49 |
Instance |
50 |
A virtual machine which runs on a cluster. It can be a |
51 |
fault tolerant highly available entity. |
52 |
|
53 |
Pool |
54 |
A pool is a set of clusters sharing the same network. |
55 |
|
56 |
Meta-Cluster |
57 |
Anything that concerns more than one cluster. |
58 |
|
59 |
Prerequisites |
60 |
~~~~~~~~~~~~~ |
61 |
|
62 |
You need to have your Ganeti cluster installed and configured before |
63 |
you try any of the commands in this document. Please follow the |
64 |
*Ganeti installation tutorial* for instructions on how to do that. |
65 |
|
66 |
Managing Instances |
67 |
------------------ |
68 |
|
69 |
Adding/Removing an instance |
70 |
~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
71 |
|
72 |
Adding a new virtual instance to your Ganeti cluster is really easy. |
73 |
The command is:: |
74 |
|
75 |
gnt-instance add \ |
76 |
-n TARGET_NODE:SECONDARY_NODE -o OS_TYPE -t DISK_TEMPLATE \ |
77 |
INSTANCE_NAME |
78 |
|
79 |
The instance name must be resolvable (e.g. exist in DNS) and usually |
80 |
to an address in the same subnet as the cluster itself. Options you |
81 |
can give to this command include: |
82 |
|
83 |
- The disk size (``-s``) for a single-disk instance, or multiple |
84 |
``--disk N:size=SIZE`` options for multi-instance disks |
85 |
|
86 |
- The memory size (``-B memory``) |
87 |
|
88 |
- The number of virtual CPUs (``-B vcpus``) |
89 |
|
90 |
- Arguments for the NICs of the instance; by default, a single-NIC |
91 |
instance is created. The IP and/or bridge of the NIC can be changed |
92 |
via ``--nic 0:ip=IP,bridge=BRIDGE`` |
93 |
|
94 |
|
95 |
There are four types of disk template you can choose from: |
96 |
|
97 |
diskless |
98 |
The instance has no disks. Only used for special purpouse operating |
99 |
systems or for testing. |
100 |
|
101 |
file |
102 |
The instance will use plain files as backend for its disks. No |
103 |
redundancy is provided, and this is somewhat more difficult to |
104 |
configure for high performance. |
105 |
|
106 |
plain |
107 |
The instance will use LVM devices as backend for its disks. No |
108 |
redundancy is provided. |
109 |
|
110 |
drbd |
111 |
.. note:: This is only valid for multi-node clusters using DRBD 8.0.x |
112 |
|
113 |
A mirror is set between the local node and a remote one, which must |
114 |
be specified with the second value of the --node option. Use this |
115 |
option to obtain a highly available instance that can be failed over |
116 |
to a remote node should the primary one fail. |
117 |
|
118 |
For example if you want to create an highly available instance use the |
119 |
drbd disk templates:: |
120 |
|
121 |
gnt-instance add -n TARGET_NODE:SECONDARY_NODE -o OS_TYPE -t drbd \ |
122 |
INSTANCE_NAME |
123 |
|
124 |
To know which operating systems your cluster supports you can use |
125 |
the command:: |
126 |
|
127 |
gnt-os list |
128 |
|
129 |
Removing an instance is even easier than creating one. This operation |
130 |
is irrereversible and destroys all the contents of your instance. Use |
131 |
with care:: |
132 |
|
133 |
gnt-instance remove INSTANCE_NAME |
134 |
|
135 |
Starting/Stopping an instance |
136 |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
137 |
|
138 |
Instances are automatically started at instance creation time. To |
139 |
manually start one which is currently stopped you can run:: |
140 |
|
141 |
gnt-instance startup INSTANCE_NAME |
142 |
|
143 |
While the command to stop one is:: |
144 |
|
145 |
gnt-instance shutdown INSTANCE_NAME |
146 |
|
147 |
The command to see all the instances configured and their status is:: |
148 |
|
149 |
gnt-instance list |
150 |
|
151 |
Do not use the Xen commands to stop instances. If you run for example |
152 |
xm shutdown or xm destroy on an instance Ganeti will automatically |
153 |
restart it (via the ``ganeti-watcher``). |
154 |
|
155 |
Exporting/Importing an instance |
156 |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
157 |
|
158 |
You can create a snapshot of an instance disk and Ganeti |
159 |
configuration, which then you can backup, or import into another |
160 |
cluster. The way to export an instance is:: |
161 |
|
162 |
gnt-backup export -n TARGET_NODE INSTANCE_NAME |
163 |
|
164 |
The target node can be any node in the cluster with enough space under |
165 |
``/srv/ganeti`` to hold the instance image. Use the *--noshutdown* |
166 |
option to snapshot an instance without rebooting it. Any previous |
167 |
snapshot of the same instance existing cluster-wide under |
168 |
``/srv/ganeti`` will be removed by this operation: if you want to keep |
169 |
them move them out of the Ganeti exports directory. |
170 |
|
171 |
Importing an instance is similar to creating a new one. The command is:: |
172 |
|
173 |
gnt-backup import -n TARGET_NODE -t DISK_TEMPLATE \ |
174 |
--src-node=NODE --src-dir=DIR INSTANCE_NAME |
175 |
|
176 |
Most of the options available for the command :command:`gnt-instance |
177 |
add` are supported here too. |
178 |
|
179 |
High availability features |
180 |
-------------------------- |
181 |
|
182 |
.. note:: This section only applies to multi-node clusters |
183 |
|
184 |
Failing over an instance |
185 |
~~~~~~~~~~~~~~~~~~~~~~~~ |
186 |
|
187 |
If an instance is built in highly available mode you can at any time |
188 |
fail it over to its secondary node, even if the primary has somehow |
189 |
failed and it's not up anymore. Doing it is really easy, on the master |
190 |
node you can just run:: |
191 |
|
192 |
gnt-instance failover INSTANCE_NAME |
193 |
|
194 |
That's it. After the command completes the secondary node is now the |
195 |
primary, and vice versa. |
196 |
|
197 |
Live migrating an instance |
198 |
~~~~~~~~~~~~~~~~~~~~~~~~~~ |
199 |
|
200 |
If an instance is built in highly available mode, it currently runs |
201 |
and both its nodes are running fine, you can at migrate it over to its |
202 |
secondary node, without dowtime. On the master node you need to run:: |
203 |
|
204 |
gnt-instance migrate INSTANCE_NAME |
205 |
|
206 |
Replacing an instance disks |
207 |
~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
208 |
|
209 |
So what if instead the secondary node for an instance has failed, or |
210 |
you plan to remove a node from your cluster, and you failed over all |
211 |
its instances, but it's still secondary for some? The solution here is |
212 |
to replace the instance disks, changing the secondary node:: |
213 |
|
214 |
gnt-instance replace-disks -n NODE INSTANCE_NAME |
215 |
|
216 |
This process is a bit long, but involves no instance downtime, and at |
217 |
the end of it the instance has changed its secondary node, to which it |
218 |
can if necessary be failed over. |
219 |
|
220 |
Failing over the master node |
221 |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
222 |
|
223 |
This is all good as long as the Ganeti Master Node is up. Should it go |
224 |
down, or should you wish to decommission it, just run on any other |
225 |
node the command:: |
226 |
|
227 |
gnt-cluster masterfailover |
228 |
|
229 |
and the node you ran it on is now the new master. |
230 |
|
231 |
Adding/Removing nodes |
232 |
~~~~~~~~~~~~~~~~~~~~~ |
233 |
|
234 |
And of course, now that you know how to move instances around, it's |
235 |
easy to free up a node, and then you can remove it from the cluster:: |
236 |
|
237 |
gnt-node remove NODE_NAME |
238 |
|
239 |
and maybe add a new one:: |
240 |
|
241 |
gnt-node add --secondary-ip=ADDRESS NODE_NAME |
242 |
|
243 |
Debugging Features |
244 |
------------------ |
245 |
|
246 |
At some point you might need to do some debugging operations on your |
247 |
cluster or on your instances. This section will help you with the most |
248 |
used debugging functionalities. |
249 |
|
250 |
Accessing an instance's disks |
251 |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
252 |
|
253 |
From an instance's primary node you have access to its disks. Never |
254 |
ever mount the underlying logical volume manually on a fault tolerant |
255 |
instance, or you risk breaking replication. The correct way to access |
256 |
them is to run the command:: |
257 |
|
258 |
gnt-instance activate-disks INSTANCE_NAME |
259 |
|
260 |
And then access the device that gets created. After you've finished |
261 |
you can deactivate them with the deactivate-disks command, which works |
262 |
in the same way. |
263 |
|
264 |
Accessing an instance's console |
265 |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
266 |
|
267 |
The command to access a running instance's console is:: |
268 |
|
269 |
gnt-instance console INSTANCE_NAME |
270 |
|
271 |
Use the console normally and then type ``^]`` when |
272 |
done, to exit. |
273 |
|
274 |
Instance OS definitions Debugging |
275 |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
276 |
|
277 |
Should you have any problems with operating systems support the |
278 |
command to ran to see a complete status for all your nodes is:: |
279 |
|
280 |
gnt-os diagnose |
281 |
|
282 |
Cluster-wide debugging |
283 |
~~~~~~~~~~~~~~~~~~~~~~ |
284 |
|
285 |
The :command:`gnt-cluster` command offers several options to run tests |
286 |
or execute cluster-wide operations. For example:: |
287 |
|
288 |
gnt-cluster command |
289 |
gnt-cluster copyfile |
290 |
gnt-cluster verify |
291 |
gnt-cluster verify-disks |
292 |
gnt-cluster getmaster |
293 |
gnt-cluster version |
294 |
|
295 |
See the man page :manpage:`gnt-cluster` to know more about their usage. |
296 |
|
297 |
Removing a cluster entirely |
298 |
~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
299 |
|
300 |
The usual method to cleanup a cluster is to run ``gnt-cluster |
301 |
destroy`` however if the Ganeti installation is broken in any way then |
302 |
this will not run. |
303 |
|
304 |
It is possible in such a case to cleanup manually most if not all |
305 |
traces of a cluster installation by following these steps on all of |
306 |
the nodes: |
307 |
|
308 |
1. Shutdown all instances. This depends on the virtualisation |
309 |
method used (Xen, KVM, etc.): |
310 |
|
311 |
- Xen: run ``xm list`` and ``xm destroy`` on all the non-Domain-0 |
312 |
instances |
313 |
- KVM: kill all the KVM processes |
314 |
- chroot: kill all processes under the chroot mountpoints |
315 |
|
316 |
2. If using DRBD, shutdown all DRBD minors (which should by at this |
317 |
time no-longer in use by instances); on each node, run ``drbdsetup |
318 |
/dev/drbdN down`` for each active DRBD minor. |
319 |
|
320 |
3. If using LVM, cleanup the Ganeti volume group; if only Ganeti |
321 |
created logical volumes (and you are not sharing the volume group |
322 |
with the OS, for example), then simply running ``lvremove -f |
323 |
xenvg`` (replace 'xenvg' with your volume group name) should do the |
324 |
required cleanup. |
325 |
|
326 |
4. If using file-based storage, remove recursively all files and |
327 |
directories under your file-storage directory: ``rm -rf |
328 |
/srv/ganeti/file-storage/*`` replacing the path with the correct |
329 |
path for your cluster. |
330 |
|
331 |
5. Stop the ganeti daemons (``/etc/init.d/ganeti stop``) and kill any |
332 |
that remain alive (``pgrep ganeti`` and ``pkill ganeti``). |
333 |
|
334 |
6. Remove the ganeti state directory (``rm -rf /var/lib/ganeti/*``), |
335 |
replacing the path with the correct path for your installation. |
336 |
|
337 |
On the master node, remove the cluster from the master-netdev (usually |
338 |
``xen-br0`` for bridged mode, otherwise ``eth0`` or similar), by |
339 |
running ``ip a del $clusterip/32 dev xen-br0`` (use the correct |
340 |
cluster ip and network device name). |
341 |
|
342 |
At this point, the machines are ready for a cluster creation; in case |
343 |
you want to remove Ganeti completely, you need to also undo some of |
344 |
the SSH changes and log directories: |
345 |
|
346 |
- ``rm -rf /var/log/ganeti /srv/ganeti`` (replace with the correct paths) |
347 |
- remove from ``/root/.ssh`` the keys that Ganeti added (check |
348 |
the ``authorized_keys`` and ``id_dsa`` files) |
349 |
- regenerate the host's SSH keys (check the OpenSSH startup scripts) |
350 |
- uninstall Ganeti |
351 |
|
352 |
Otherwise, if you plan to re-create the cluster, you can just go ahead |
353 |
and rerun ``gnt-cluster init``. |