|
1 |
====================
|
|
2 |
Linux HA integration
|
|
3 |
====================
|
|
4 |
|
|
5 |
.. contents:: :depth: 4
|
|
6 |
|
|
7 |
This is a design document detailing the integration of Ganeti and Linux HA.
|
|
8 |
|
|
9 |
|
|
10 |
Current state and shortcomings
|
|
11 |
==============================
|
|
12 |
|
|
13 |
Ganeti doesn't currently support any self-healing or self-monitoring.
|
|
14 |
|
|
15 |
We are now working on trying to improve the situation in this regard:
|
|
16 |
|
|
17 |
- The :doc:`autorepair system <design-autorepair>` will take care
|
|
18 |
of self repairing a cluster in the presence of offline nodes.
|
|
19 |
- The :doc:`monitoring agent <design-monitoring-agent>` will take care
|
|
20 |
of exporting data to monitoring.
|
|
21 |
|
|
22 |
What is still missing is a way to self-detect "obvious" failures rapidly
|
|
23 |
and to:
|
|
24 |
|
|
25 |
- Maintain the master role active.
|
|
26 |
- Offline resource that are obviously faulty so that the autorepair
|
|
27 |
system can perform its work.
|
|
28 |
|
|
29 |
|
|
30 |
Proposed changes
|
|
31 |
================
|
|
32 |
|
|
33 |
Linux-HA provides software that can be used to provide high availability
|
|
34 |
of services through automatic failover of resources. In particular
|
|
35 |
Pacemaker can be used together with Heartbeat or Corosync to make sure a
|
|
36 |
resource is kept active on a self-monitoring cluster.
|
|
37 |
|
|
38 |
Ganeti OCF agents
|
|
39 |
-----------------
|
|
40 |
|
|
41 |
The Ganeti agents will be slightly special in the HA world. The
|
|
42 |
following will apply:
|
|
43 |
|
|
44 |
- The agents will be able to be configured cluster-wise by tags (which
|
|
45 |
will be read on the nodes via ssconf_cluster_tags) and locally by
|
|
46 |
files on the filesystem that will allow them to "simulate" a
|
|
47 |
particular condition (eg. simulate a failure even if none is
|
|
48 |
detected).
|
|
49 |
- The agents will be able to run in "full" or "partial" mode: in
|
|
50 |
"partial" mode they will always succeed, and thus never fail a
|
|
51 |
resource as long as a node is online, is running the linux HA software
|
|
52 |
and is responding to the network. In "full" mode they will also check
|
|
53 |
resources like the cluster master ip or master daemon, and act if they
|
|
54 |
are missing
|
|
55 |
|
|
56 |
Note that for what Ganeti does OCF agents are needed: simply relying on
|
|
57 |
the LSB scripts will not work for the Ganeti service.
|
|
58 |
|
|
59 |
|
|
60 |
Master role agent
|
|
61 |
-----------------
|
|
62 |
|
|
63 |
This agent will manage the Ganeti master role. It needs to be configured
|
|
64 |
as a sticky resource (you don't want to flap the master role around, do
|
|
65 |
you?) that is active on only one node. You can require quorum or fencing
|
|
66 |
to protect your cluster from multiple masters.
|
|
67 |
|
|
68 |
The agent will implement a stateless resource that considers itself
|
|
69 |
"started" only the master node, "stopped" on all master candidates and
|
|
70 |
in error mode for all other nodes.
|
|
71 |
|
|
72 |
Note that if not all your nodes are master candidates this resource
|
|
73 |
might have problems:
|
|
74 |
|
|
75 |
- if all nodes are configured to run the resource, heartbeat may decide
|
|
76 |
to "fence" (aka stonith) all your non-master-candidate nodes if told
|
|
77 |
to do so. This might not be what you want.
|
|
78 |
- if only master candidates are configured as nodes for the resource,
|
|
79 |
beware of promotions and demotions, as nothing will update
|
|
80 |
automatically pacemaker should a change happen at the Ganeti level.
|
|
81 |
|
|
82 |
Other solutions, such as reporting the resource just as "stopped" on non
|
|
83 |
master candidates as well might mean that pacemaker would choose the
|
|
84 |
"wrong" node to promote to master, which is also a bad idea.
|
|
85 |
|
|
86 |
Future improvements
|
|
87 |
+++++++++++++++++++
|
|
88 |
|
|
89 |
- Ability to work better with non-master-candidate nodes
|
|
90 |
- Stateful resource that can "safely" transfer the master role between
|
|
91 |
online nodes (with queue drain and such)
|
|
92 |
- Implement "full" mode, with detection of the cluster IP and the master
|
|
93 |
node daemon.
|
|
94 |
|
|
95 |
|
|
96 |
Node role agent
|
|
97 |
---------------
|
|
98 |
|
|
99 |
This agent will manage the Ganeti node role. It needs to be configured
|
|
100 |
as a cloned resource that is active on all nodes.
|
|
101 |
|
|
102 |
In partial mode it will always return success (and thus trigger a
|
|
103 |
failure only upon an HA level or network failure). Full mode, which
|
|
104 |
initially will not be implemented, couls also check for the node daemon
|
|
105 |
being unresponsive or other local conditions (TBD).
|
|
106 |
|
|
107 |
When a failure happens the HA notification system will trigger on all
|
|
108 |
other nodes, including the master. The master will then be able to
|
|
109 |
offline the node. Any other work to restore instance availability should
|
|
110 |
then be done by the autorepair system.
|
|
111 |
|
|
112 |
The following cluster tags are supported:
|
|
113 |
- ``ocf:node-offline:use-powercycle``: Try to powercycle a node using
|
|
114 |
``gnt-node powercycle`` when offlining.
|
|
115 |
- ``ocf:node-offline:use-poweroff``: Try to power off a node using
|
|
116 |
``gnt-node power off`` when offlining (requires OOB support).
|
|
117 |
|
|
118 |
Future improvements
|
|
119 |
+++++++++++++++++++
|
|
120 |
|
|
121 |
- Handle draining differently than offlining
|
|
122 |
- Handle different modes of "stopping" the service
|
|
123 |
- Implement "full" mode
|
|
124 |
|
|
125 |
|
|
126 |
Risks
|
|
127 |
-----
|
|
128 |
|
|
129 |
Running Ganeti with Pacemaker increases the risk of stability for your
|
|
130 |
Ganeti Cluster. Events like:
|
|
131 |
|
|
132 |
- stopping heartbeat or corosync on a node
|
|
133 |
- corosync or heartbeat being killed for any reason
|
|
134 |
- temporary failure in a node's networking
|
|
135 |
|
|
136 |
will trigger potentially dangerous operations such as node offlining or
|
|
137 |
master role failover. Moreover if the autorepair system will be working
|
|
138 |
they will be able to also trigger instance failovers or migrations, and
|
|
139 |
disk replaces.
|
|
140 |
|
|
141 |
Also note that operations like: master-failover, or manual node-modify
|
|
142 |
might interact badly with this setup depending on the way your HA system
|
|
143 |
is configured (see below).
|
|
144 |
|
|
145 |
This of course is an inherent problem with any Linux-HA installation,
|
|
146 |
but is probably more visible with Ganeti given that our resources tend
|
|
147 |
to be more heavyweight than many others managed in HA clusters (eg. an
|
|
148 |
IP address).
|
|
149 |
|
|
150 |
Code status
|
|
151 |
-----------
|
|
152 |
|
|
153 |
This code is heavily experimental, and Linux-HA is a very complex
|
|
154 |
subsystem. *We might not be able to help you* if you decide to run this
|
|
155 |
code: please make sure you understand fully high availability on your
|
|
156 |
production machines. Ganeti only ships this code as an example but it
|
|
157 |
might need customization or complex configurations on your side for it
|
|
158 |
to run properly.
|
|
159 |
|
|
160 |
*Ganeti does not automate HA configuration for your cluster*. You need
|
|
161 |
to do this job by hand. Good luck, don't get it wrong.
|
|
162 |
|
|
163 |
|
|
164 |
Future work
|
|
165 |
===========
|
|
166 |
|
|
167 |
- Integrate the agents better with the ganeti monitoring
|
|
168 |
- Add hooks for managing HA at node add/remove/modify/master-failover
|
|
169 |
operations
|
|
170 |
- Provide a stonith system through Ganeti's OOB system
|
|
171 |
- Provide an OOB system that does "shunning" of offline nodes, for
|
|
172 |
emulating a real OOB, at least on all nodes
|
|
173 |
|
|
174 |
.. vim: set textwidth=72 :
|
|
175 |
.. Local Variables:
|
|
176 |
.. mode: rst
|
|
177 |
.. fill-column: 72
|
|
178 |
.. End:
|