Design Doc: Ganeti Node OOB Management Framework
authorMarc Schmitt <mschmitt@google.com>
Thu, 28 Oct 2010 11:42:36 +0000 (13:42 +0200)
committerRené Nussbaumer <rn@google.com>
Thu, 4 Nov 2010 10:00:19 +0000 (11:00 +0100)
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

Makefile.am
doc/design-oob.rst [new file with mode: 0644]
doc/index.rst

index 56a8500..70444b1 100644 (file)
@@ -208,6 +208,7 @@ docrst = \
        doc/design-2.1.rst \
        doc/design-2.2.rst \
        doc/design-2.3.rst \
+       doc/design-oob.rst \
        doc/cluster-merge.rst \
        doc/devnotes.rst \
        doc/glossary.rst \
diff --git a/doc/design-oob.rst b/doc/design-oob.rst
new file mode 100644 (file)
index 0000000..1e78c1a
--- /dev/null
@@ -0,0 +1,362 @@
+Ganeti Node OOB Management Framework
+====================================
+
+Objective
+---------
+
+Extend Ganeti with Out of Band Cluster Node Management Capabilities.
+
+Background
+----------
+
+Ganeti currently has no support for Out of Band management of the nodes in a
+cluster. It relies on the OS running on the nodes and has therefore limited
+possibilities when the OS is not responding. The command ``gnt-node powercycle``
+can be issued to attempt a reboot of a node that crashed but there are no means
+to power a node off and power it back on. Supporting this is very handy in the
+following situations:
+
+  * **Emergency Power Off**: During emergencies, time is critical and manual
+    tasks just add latency which can be avoided through automation. If a server
+    room overheats, halting the OS on the nodes is not enough. The nodes need
+    to be powered off cleanly to prevent damage to equipment.
+  * **Repairs**: In most cases, repairing a node means that the node has to be
+    powered off.
+  * **Crashes**: Software bugs may crash a node. Having an OS independent way to
+    power-cycle a node helps to recover the node without human intervention.
+
+Overview
+--------
+
+Ganeti will be extended with OOB capabilities through adding a new **Cluster
+Parameter** (``--oob-program``), a new **Node Property** (``--oob-program``), a
+new **Node State (powered)** and support in ``gnt-node`` for invoking an
+**External Helper Command** which executes the actual OOB command (``gnt-node
+<command> nodename ...``). The supported commands are: ``power on``,
+``power off``, ``power cycle``, ``power status`` and ``health``.
+
+.. note::
+  The new **Node State (powered)** is a **State of Record
+  (SoR)**, not a **State of World (SoW)**.  The maximum execution time of the
+  **External Helper Command** will be limited to 60s to prevent the cluster from
+  getting locked for an undefined amount of time.
+
+Detailed Design
+---------------
+
+New ``gnt-cluster`` Parameter
++++++++++++++++++++++++++++++
+
+| Program: ``gnt-cluster``
+| Command: ``modify|init``
+| Parameters: ``--oob-program``
+| Options: ``--oob-program``: executable OOB program (absolute path)
+
+New ``gnt-node`` Property
++++++++++++++++++++++++++
+
+| Program: ``gnt-node``
+| Command: ``modify|add``
+| Parameters: ``--oob-program``
+| Options: ``--oob-program``: executable OOB program (absolute path)
+
+.. note::
+  If ``--oob-program`` is set to ``!`` then the node has no OOB capabilities.
+  Otherwise, we will inherit the node group respectively the cluster wide
+  value. I.e. the nodes have to opt out from OOB capabilities.
+
+Addition to ``gnt-cluster verify``
+++++++++++++++++++++++++++++++++++
+
+| Program: ``gnt-cluster``
+| Command: ``verify``
+| Parameter: None
+| Option: None
+| Additional Checks:
+
+  1. existence and execution flag of OOB program on all Master Candidates if
+     the cluster parameter ``--oob-program`` is set or at least one node has
+     the property ``--oob-program`` set. The OOB helper is just invoked on the
+     master
+  2. check if node state powered matches actual power state of the machine for
+     those nodes where ``--oob-program`` is set
+
+New Node State
+++++++++++++++
+
+Ganeti supports the following two boolean states related to the nodes:
+
+**drained**
+  The cluster still communicates with drained nodes but excludes them from
+  allocation operations
+
+**offline**
+  if offline, the cluster does not communicate with offline nodes; useful for
+  nodes that are not reachable in order to avoid delays
+
+And will extend this list with the following boolean state:
+
+**powered**
+  if not powered, the cluster does not communicate with not powered nodes if
+  the node property ``--oob-program`` is not set, the state powered is not
+  displayed
+
+Additionally modify the meaning of the offline state as follows:
+
+**offline**
+  if offline, the cluster does not communicate with offline nodes (**with the
+  exception of OOB commands for nodes where** ``--oob-program`` **is set**);
+  useful for nodes that are not reachable in order to avoid delays
+
+The corresponding command extensions are:
+
+| Program: ``gnt-node``
+| Command: ``info``
+| Parameter:  [ ``nodename`` ... ]
+| Option: None
+
+Additional Output (SoR, ommited if node property ``--oob-program`` is not set):
+powered: ``[True|False]``
+
+| Program: ``gnt-node``
+| Command: ``modify``
+| Parameter: nodename
+| Option: [ ``--powered=yes|no`` ]
+| Reasoning: sometimes you will need to sync the SoR with the SoW manually
+| Caveat: ``--powered`` can only be modified if ``--oob-program`` is set for
+|         the node in question
+
+New ``gnt-node`` commands: ``power [on|off|cycle|status]``
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+
+| Program: ``gnt-node``
+| Command: ``power [on|off|cycle|status]``
+| Parameters: [ ``nodename`` ... ]
+| Options: None
+| Caveats:
+
+  * If no nodenames are passed to ``power [on|off|cycle]``, the user will be
+    prompted with ``"Do you really want to power [on|off|cycle] the following
+    nodes: <display list of OOB capable nodes in the cluster)? (y/n)"``
+  * For ``power-status``, nodename is optional, if omitted, we list the
+    power-status of all OOB capable nodes in the cluster (SoW)
+  * User should be warned and needs to confirm with yes if s/he tries to
+    ``power [off|cycle]`` a node with running instances.
+
+Error Handling
+^^^^^^^^^^^^^^
+
++------------------------------+-----------------------------------------------+
+| Exception                    | Error Message                                 |
++==============================+===============================================+
+| OOB program return code != 0 | OOB program execution failed ($ERROR_MSG)     |
++------------------------------+-----------------------------------------------+
+| OOB program execution time   | OOB program execution timeout exceeded, OOB   |
+| exceeds 60s                  | program execution aborted                     |
++------------------------------+-----------------------------------------------+
+
+Node State Changes
+^^^^^^^^^^^^^^^^^^
+
++----------------+-----------------+----------------+--------------------------+
+| State before   | Command         | State after    | Comment                  |
+| execution      |                 | execution      |                          |
++================+=================+================+==========================+
+| powered: False | ``power off``   | powered: False | FYI: IPMI will complain  |
+|                |                 |                | if you try to power off  |
+|                |                 |                | a machine that is already|
+|                |                 |                | powered off              |
++----------------+-----------------+----------------+--------------------------+
+| powered: False | ``power cycle`` | powered: False | FYI: IPMI will complain  |
+|                |                 |                | if you try to cycle a    |
+|                |                 |                | machine that is already  |
+|                |                 |                | powered off              |
++----------------+-----------------+----------------+--------------------------+
+| powered: False | ``power on``    | powered: True  |                          |
++----------------+-----------------+----------------+--------------------------+
+| powered: True  | ``power off``   | powered: False |                          |
++----------------+-----------------+----------------+--------------------------+
+| powered: True  | ``power cycle`` | powered: True  |                          |
++----------------+-----------------+----------------+--------------------------+
+| powered: True  | ``power on``    | powered: True  | FYI: IPMI will complain  |
+|                |                 |                | if you try to power on   |
+|                |                 |                | a machine that is already|
+|                |                 |                | powered on               |
++----------------+-----------------+----------------+--------------------------+
+
+.. note::
+
+  * If the command fails, the Node State remains unchanged.
+  * We will not prevent the user from trying to power off a node that is
+    already powered off since the powered state represents the **SoR** only and
+    not the **SoW**. This can however create problems when the cluster
+    administrator wants to bring the **SoR** in sync with the **SoW** without
+    actually having to mess with the node(s). For this case, we allow direct
+    modification of the powered state through the gnt-node modify
+    ``--powered=[yes|no]`` command as long as the node has OOB capabilities
+    (i.e. ``--oob-program`` is set).
+  * All node power state changes will be logged
+
+Node Power Status Listing (SoW)
++++++++++++++++++++++++++++++++
+
+| Program: ``gnt-node``
+| Command: ``power-status``
+| Parameters: [ ``nodename`` ... ]
+
+Example output (represents **SoW**)::
+
+  gnt-node oob power-status
+  Node                      Power Status
+  node1.example.com         on
+  node2.example.com         off
+  node3.example.com         on
+  node4.example.com         unknown
+
+.. note::
+
+  * We use ``unknown`` in case the Helper Program could not determine the power
+    state.
+  * If no nodenames are provided, we will list the power state of all nodes
+    which are not opted out from OOB management.
+  * Only nodes which are not opted out from OOB management will be listed.
+    Invoking the command on a node that does not meet this condition will
+    result in an error message "Node X does not support OOB commands".
+
+Node Power Status Listing (SoR)
++++++++++++++++++++++++++++++++
+
+| Program: ``gnt-node``
+| Command: ``info``
+| Parameter:  [ ``nodename`` ... ]
+| Option: None
+
+Example output (represents **SoR**)::
+
+  gnt-node info node1.example.com
+  Node name: node1.example.com
+    primary ip: 192.168.1.1
+    secondary ip: 192.168.2.1
+    master candidate: True
+    drained: False
+    offline: False
+    powered: True
+    primary for instances:
+      - inst1.example.com
+      - inst2.example.com
+      - inst3.example.com
+    secondary for instances:
+      - inst4.example.com
+      - inst5.example.com
+      - inst6.example.com
+      - inst7.example.com
+
+.. note::
+  Only nodes which are not opted out from OOB management will
+  report the powered state.
+
+New ``gnt-node`` oob subcommand: ``health``
++++++++++++++++++++++++++++++++++++++++++++
+
+| Program: ``gnt-node``
+| Command: ``health``
+| Parameters: [ ``nodename`` ... ]
+| Options: None
+| Example: ``/usr/bin/oob health node5.example.com``
+
+Caveats:
+
+  * If no nodename(s) are provided, we will report the health of all nodes in
+    the cluster which have ``--oob-program`` set.
+  * Only nodes which are not opted out from OOB management will report their
+    health. Invoking the command on a node that does not meet this condition
+    will result in an error message "Node does not support OOB commands".
+
+For error handling see `Error Handling`_
+
+OOB Program (Helper Program) Parameters, Return Codes and Data Format
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+
+| Program: executable OOB program (absolute path)
+| Parameters: command nodename
+| Command: [power-{on|off|cycle|status}|health]
+| Options: None
+| Example: ``/usr/bin/oob power-on node1.example.com``
+| Caveat: maximum runtime is limited to 60s
+
+Return Codes
+^^^^^^^^^^^^
+
++---------------+--------------------------+
+| Return code   | Meaning                  |
++===============+==========================+
+| 0             | Command succeeded        |
++---------------+--------------------------+
+| 1             | Command failed           |
++---------------+--------------------------+
+| others        | Unsupported/undefined    |
++---------------+--------------------------+
+
+Error messages are passed from the helper program to Ganeti through StdErr
+(return code == 1).  On StdOut, the helper program will send data back to
+Ganeti (return code == 0). The format of the data is JSON.
+
++------------------+-------------------------------+
+| Command          | Expected output               |
++==================+===============================+
+| ``power-on``     | None                          |
++------------------+-------------------------------+
+| ``power-off``    | None                          |
++------------------+-------------------------------+
+| ``power-cycle``  | None                          |
++------------------+-------------------------------+
+| ``power-status`` | ``{ "powered": true|false }`` |
++------------------+-------------------------------+
+| ``health``       | ::                            |
+|                  |                               |
+|                  |   [[item, status],            |
+|                  |    [item, status],            |
+|                  |    ...]                       |
++------------------+-------------------------------+
+
+Data Format
+^^^^^^^^^^^
+
+For the health output, the fields are:
+
++--------+--------------------------------------------------------------------+
+| Field  | Meaning                                                            |
++========+====================================================================+
+| item   | String identifier of the item we are querying the health of,       |
+|        | examples:                                                          |
+|        |                                                                    |
+|        |   * Ambient Temp                                                   |
+|        |   * PS Redundancy                                                  |
+|        |   * FAN 1 RPM                                                      |
++--------+--------------------------------------------------------------------+
+| status | String; Can take one of the following four values:                 |
+|        |                                                                    |
+|        |   * OK                                                             |
+|        |   * WARNING                                                        |
+|        |   * CRITICAL                                                       |
+|        |   * UNKNOWN                                                        |
++--------+--------------------------------------------------------------------+
+
+.. note::
+
+  * The item output list is defined by the Helper Program. It is up to the
+    author of the Helper Program to decide which items should be monitored and
+    what each corresponding return status is.
+  * Ganeti will currently not take any actions based on the item status. It
+    will however create log entries for items with status WARNING or CRITICAL
+    for each run of the ``gnt-node oob health nodename`` command. Automatic
+    actions (regular monitoring of the item status) is considered a new service
+    and will be treated in a separate design document.
+
+Logging
+-------
+
+The ``gnt-node power-[on|off]`` (power state changes) commands will create log
+entries following current Ganeti logging practices. In addition, health items
+with status WARNING or CRITICAL will be logged for each run of ``gnt-node
+health``.
index 80ae4c0..4c54d3c 100644 (file)
@@ -18,6 +18,7 @@ Contents:
    design-2.1.rst
    design-2.2.rst
    design-2.3.rst
+   design-oob.rst
    cluster-merge.rst
    locking.rst
    hooks.rst