code.grnet.gr Git - ganeti-local/blob - doc/design-linuxha.rst

   1 ====================
   2 Linux HA integration
   3 ====================
   4
   5 .. contents:: :depth: 4
   6
   7 This is a design document detailing the integration of Ganeti and Linux HA.
   8
   9
  10 Current state and shortcomings
  11 ==============================
  12
  13 Ganeti doesn't currently support any self-healing or self-monitoring.
  14
  15 We are now working on trying to improve the situation in this regard:
  16
  17 - The :doc:`autorepair system <design-autorepair>` will take care
  18   of self repairing a cluster in the presence of offline nodes.
  19 - The :doc:`monitoring agent <design-monitoring-agent>` will take care
  20   of exporting data to monitoring.
  21
  22 What is still missing is a way to self-detect "obvious" failures rapidly
  23 and to:
  24
  25 - Maintain the master role active.
  26 - Offline resource that are obviously faulty so that the autorepair
  27   system can perform its work.
  28
  29
  30 Proposed changes
  31 ================
  32
  33 Linux-HA provides software that can be used to provide high availability
  34 of services through automatic failover of resources. In particular
  35 Pacemaker can be used together with Heartbeat or Corosync to make sure a
  36 resource is kept active on a self-monitoring cluster.
  37
  38 Ganeti OCF agents
  39 -----------------
  40
  41 The Ganeti agents will be slightly special in the HA world. The
  42 following will apply:
  43
  44 - The agents will be able to be configured cluster-wise by tags (which
  45   will be read on the nodes via ssconf_cluster_tags) and locally by
  46   files on the filesystem that will allow them to "simulate" a
  47   particular condition (eg. simulate a failure even if none is
  48   detected).
  49 - The agents will be able to run in "full" or "partial" mode: in
  50   "partial" mode they will always succeed, and thus never fail a
  51   resource as long as a node is online, is running the linux HA software
  52   and is responding to the network. In "full" mode they will also check
  53   resources like the cluster master ip or master daemon, and act if they
  54   are missing
  55
  56 Note that for what Ganeti does OCF agents are needed: simply relying on
  57 the LSB scripts will not work for the Ganeti service.
  58
  59
  60 Master role agent
  61 -----------------
  62
  63 This agent will manage the Ganeti master role. It needs to be configured
  64 as a sticky resource (you don't want to flap the master role around, do
  65 you?) that is active on only one node. You can require quorum or fencing
  66 to protect your cluster from multiple masters.
  67
  68 The agent will implement a stateless resource that considers itself
  69 "started" only the master node, "stopped" on all master candidates and
  70 in error mode for all other nodes.
  71
  72 Note that if not all your nodes are master candidates this resource
  73 might have problems:
  74
  75 - if all nodes are configured to run the resource, heartbeat may decide
  76   to "fence" (aka stonith) all your non-master-candidate nodes if told
  77   to do so. This might not be what you want.
  78 - if only master candidates are configured as nodes for the resource,
  79   beware of promotions and demotions, as nothing will update
  80   automatically pacemaker should a change happen at the Ganeti level.
  81
  82 Other solutions, such as reporting the resource just as "stopped" on non
  83 master candidates as well might mean that pacemaker would choose the
  84 "wrong" node to promote to master, which is also a bad idea.
  85
  86 Future improvements
  87 +++++++++++++++++++
  88
  89 - Ability to work better with non-master-candidate nodes
  90 - Stateful resource that can "safely" transfer the master role between
  91   online nodes (with queue drain and such)
  92 - Implement "full" mode, with detection of the cluster IP and the master
  93   node daemon.
  94
  95
  96 Node role agent
  97 ---------------
  98
  99 This agent will manage the Ganeti node role. It needs to be configured
 100 as a cloned resource that is active on all nodes.
 101
 102 In partial mode it will always return success (and thus trigger a
 103 failure only upon an HA level or network failure). Full mode, which
 104 initially will not be implemented, couls also check for the node daemon
 105 being unresponsive or other local conditions (TBD).
 106
 107 When a failure happens the HA notification system will trigger on all
 108 other nodes, including the master. The master will then be able to
 109 offline the node. Any other work to restore instance availability should
 110 then be done by the autorepair system.
 111
 112 The following cluster tags are supported:
 113
 114 - ``ocf:node-offline:use-powercycle``: Try to powercycle a node using
 115   ``gnt-node powercycle`` when offlining.
 116 - ``ocf:node-offline:use-poweroff``: Try to power off a node using
 117   ``gnt-node power off`` when offlining (requires OOB support).
 118
 119 Future improvements
 120 +++++++++++++++++++
 121
 122 - Handle draining differently than offlining
 123 - Handle different modes of "stopping" the service
 124 - Implement "full" mode
 125
 126
 127 Risks
 128 -----
 129
 130 Running Ganeti with Pacemaker increases the risk of stability for your
 131 Ganeti Cluster. Events like:
 132
 133 - stopping heartbeat or corosync on a node
 134 - corosync or heartbeat being killed for any reason
 135 - temporary failure in a node's networking
 136
 137 will trigger potentially dangerous operations such as node offlining or
 138 master role failover. Moreover if the autorepair system will be working
 139 they will be able to also trigger instance failovers or migrations, and
 140 disk replaces.
 141
 142 Also note that operations like: master-failover, or manual node-modify
 143 might interact badly with this setup depending on the way your HA system
 144 is configured (see below).
 145
 146 This of course is an inherent problem with any Linux-HA installation,
 147 but is probably more visible with Ganeti given that our resources tend
 148 to be more heavyweight than many others managed in HA clusters (eg. an
 149 IP address).
 150
 151 Code status
 152 -----------
 153
 154 This code is heavily experimental, and Linux-HA is a very complex
 155 subsystem. *We might not be able to help you* if you decide to run this
 156 code: please make sure you understand fully high availability on your
 157 production machines. Ganeti only ships this code as an example but it
 158 might need customization or complex configurations on your side for it
 159 to run properly.
 160
 161 *Ganeti does not automate HA configuration for your cluster*. You need
 162 to do this job by hand. Good luck, don't get it wrong.
 163
 164
 165 Future work
 166 ===========
 167
 168 - Integrate the agents better with the ganeti monitoring
 169 - Add hooks for managing HA at node add/remove/modify/master-failover
 170   operations
 171 - Provide a stonith system through Ganeti's OOB system
 172 - Provide an OOB system that does "shunning" of offline nodes, for
 173   emulating a real OOB, at least on all nodes
 174
 175 .. vim: set textwidth=72 :
 176 .. Local Variables:
 177 .. mode: rst
 178 .. fill-column: 72
 179 .. End: