Revision d55b80b0

b/Makefile.am
353 353
	doc/design-http-server.rst \
354 354
	doc/design-impexp2.rst \
355 355
	doc/design-lu-generated-jobs.rst \
356
	doc/design-linuxha.rst \
356 357
	doc/design-multi-reloc.rst \
357 358
	doc/design-network.rst \
358 359
	doc/design-node-state-cache.rst \
b/doc/design-draft.rst
18 18
   design-ssh-setup.rst
19 19
   design-monitoring-agent.rst
20 20
   design-remote-commands.rst
21
   design-linuxha.rst
21 22

  
22 23
.. vim: set textwidth=72 :
23 24
.. Local Variables:
b/doc/design-linuxha.rst
1
====================
2
Linux HA integration
3
====================
4

  
5
.. contents:: :depth: 4
6

  
7
This is a design document detailing the integration of Ganeti and Linux HA.
8

  
9

  
10
Current state and shortcomings
11
==============================
12

  
13
Ganeti doesn't currently support any self-healing or self-monitoring.
14

  
15
We are now working on trying to improve the situation in this regard:
16

  
17
- The :doc:`autorepair system <design-autorepair>` will take care
18
  of self repairing a cluster in the presence of offline nodes.
19
- The :doc:`monitoring agent <design-monitoring-agent>` will take care
20
  of exporting data to monitoring.
21

  
22
What is still missing is a way to self-detect "obvious" failures rapidly
23
and to:
24

  
25
- Maintain the master role active.
26
- Offline resource that are obviously faulty so that the autorepair
27
  system can perform its work.
28

  
29

  
30
Proposed changes
31
================
32

  
33
Linux-HA provides software that can be used to provide high availability
34
of services through automatic failover of resources. In particular
35
Pacemaker can be used together with Heartbeat or Corosync to make sure a
36
resource is kept active on a self-monitoring cluster.
37

  
38
Ganeti OCF agents
39
-----------------
40

  
41
The Ganeti agents will be slightly special in the HA world. The
42
following will apply:
43

  
44
- The agents will be able to be configured cluster-wise by tags (which
45
  will be read on the nodes via ssconf_cluster_tags) and locally by
46
  files on the filesystem that will allow them to "simulate" a
47
  particular condition (eg. simulate a failure even if none is
48
  detected).
49
- The agents will be able to run in "full" or "partial" mode: in
50
  "partial" mode they will always succeed, and thus never fail a
51
  resource as long as a node is online, is running the linux HA software
52
  and is responding to the network. In "full" mode they will also check
53
  resources like the cluster master ip or master daemon, and act if they
54
  are missing
55

  
56
Note that for what Ganeti does OCF agents are needed: simply relying on
57
the LSB scripts will not work for the Ganeti service.
58

  
59

  
60
Master role agent
61
-----------------
62

  
63
This agent will manage the Ganeti master role. It needs to be configured
64
as a sticky resource (you don't want to flap the master role around, do
65
you?) that is active on only one node. You can require quorum or fencing
66
to protect your cluster from multiple masters.
67

  
68
The agent will implement a stateless resource that considers itself
69
"started" only the master node, "stopped" on all master candidates and
70
in error mode for all other nodes.
71

  
72
Note that if not all your nodes are master candidates this resource
73
might have problems:
74

  
75
- if all nodes are configured to run the resource, heartbeat may decide
76
  to "fence" (aka stonith) all your non-master-candidate nodes if told
77
  to do so. This might not be what you want.
78
- if only master candidates are configured as nodes for the resource,
79
  beware of promotions and demotions, as nothing will update
80
  automatically pacemaker should a change happen at the Ganeti level.
81

  
82
Other solutions, such as reporting the resource just as "stopped" on non
83
master candidates as well might mean that pacemaker would choose the
84
"wrong" node to promote to master, which is also a bad idea.
85

  
86
Future improvements
87
+++++++++++++++++++
88

  
89
- Ability to work better with non-master-candidate nodes
90
- Stateful resource that can "safely" transfer the master role between
91
  online nodes (with queue drain and such)
92
- Implement "full" mode, with detection of the cluster IP and the master
93
  node daemon.
94

  
95

  
96
Node role agent
97
---------------
98

  
99
This agent will manage the Ganeti node role. It needs to be configured
100
as a cloned resource that is active on all nodes.
101

  
102
In partial mode it will always return success (and thus trigger a
103
failure only upon an HA level or network failure). Full mode, which
104
initially will not be implemented, couls also check for the node daemon
105
being unresponsive or other local conditions (TBD).
106

  
107
When a failure happens the HA notification system will trigger on all
108
other nodes, including the master. The master will then be able to
109
offline the node. Any other work to restore instance availability should
110
then be done by the autorepair system.
111

  
112
The following cluster tags are supported:
113
- ``ocf:node-offline:use-powercycle``: Try to powercycle a node using
114
  ``gnt-node powercycle`` when offlining.
115
- ``ocf:node-offline:use-poweroff``: Try to power off a node using
116
  ``gnt-node power off`` when offlining (requires OOB support).
117

  
118
Future improvements
119
+++++++++++++++++++
120

  
121
- Handle draining differently than offlining
122
- Handle different modes of "stopping" the service
123
- Implement "full" mode
124

  
125

  
126
Risks
127
-----
128

  
129
Running Ganeti with Pacemaker increases the risk of stability for your
130
Ganeti Cluster. Events like:
131

  
132
- stopping heartbeat or corosync on a node
133
- corosync or heartbeat being killed for any reason
134
- temporary failure in a node's networking
135

  
136
will trigger potentially dangerous operations such as node offlining or
137
master role failover. Moreover if the autorepair system will be working
138
they will be able to also trigger instance failovers or migrations, and
139
disk replaces.
140

  
141
Also note that operations like: master-failover, or manual node-modify
142
might interact badly with this setup depending on the way your HA system
143
is configured (see below).
144

  
145
This of course is an inherent problem with any Linux-HA installation,
146
but is probably more visible with Ganeti given that our resources tend
147
to be more heavyweight than many others managed in HA clusters (eg. an
148
IP address).
149

  
150
Code status
151
-----------
152

  
153
This code is heavily experimental, and Linux-HA is a very complex
154
subsystem. *We might not be able to help you* if you decide to run this
155
code: please make sure you understand fully high availability on your
156
production machines. Ganeti only ships this code as an example but it
157
might need customization or complex configurations on your side for it
158
to run properly.
159

  
160
*Ganeti does not automate HA configuration for your cluster*. You need
161
to do this job by hand. Good luck, don't get it wrong.
162

  
163

  
164
Future work
165
===========
166

  
167
- Integrate the agents better with the ganeti monitoring
168
- Add hooks for managing HA at node add/remove/modify/master-failover
169
  operations
170
- Provide a stonith system through Ganeti's OOB system
171
- Provide an OOB system that does "shunning" of offline nodes, for
172
  emulating a real OOB, at least on all nodes
173

  
174
.. vim: set textwidth=72 :
175
.. Local Variables:
176
.. mode: rst
177
.. fill-column: 72
178
.. End:

Also available in: Unified diff