/ - Diff - snf-ganeti - Greek Research and Technology Network's projects

Revision 3301805f

     monitoring system. Such status page will be exported on a network port
     and will be encoded in JSON (simple text) over HTTP.
     The choice of json is obvious as we already depend on it in Ganeti and
     The choice of JSON is obvious as we already depend on it in Ganeti and
     thus we don't need to add extra libraries to use it, as opposed to what
     would happen for XML or some other markup format.
-...
     - Node OS resources report (memory, CPU, network interfaces)
     - Information from a plugin system
     Format of the query
     -------------------
     The query will be an HTTP GET request on a particular port. At the
     beginning it will only be possible to query the full status report.
     Format of the report
     --------------------
     The report of the will be in JSON format, and it will present an array
     of report objects.
     Each report object will be produced by a specific data collector.
     Each report object includes some mandatory fields, to be provided by all
     the data collectors:
     ``name``
       The name of the data collector that produced this part of the report.
       It is supposed to be unique inside a report.
     ``version``
       The version of the data collector that produces this part of the
       report. Built-in data collectors (as opposed to those implemented as
       plugins) should have "B" as the version number.
     ``formatVersion``
       The format of what is represented in the "data" field for each data
       collector might change over time. Every time this happens, the
       format_version should be changed, so that who reads the report knows
       what format to expect, and how to correctly interpret it.
     ``timestamp``
       The time when the reported data were gathered. Is has to be expressed
       in nanoseconds since the unix epoch (0:00:00 January 01, 1970). If not
       enough precision is available (or needed) it can be padded with
       zeroes. If a report object needs multiple timestamps, it can add more
       and/or override this one inside its own "data" section.
     ``category``
       A collector can belong to a given category of collectors (e.g.: storage
       collectors, daemon collector). This means that it will have to provide a
       minumum set of prescribed fields, as documented for each category.
       This field will contain the name of the category the collector belongs to,
       if any, or just the ``null`` value.
     ``kind``
       Two kinds of collectors are possible:
       `Performance reporting collectors`_ and `Status reporting collectors`_.
       The respective paragraphs will describe them and the value of this field.
     ``data``
       This field contains all the data generated by the specific data collector,
       in its own independently defined format. The monitoring agent could check
       this syntactically (according to the JSON specifications) but not
       semantically.
     Here follows a minimal example of a report::
+      [
+      {
           "name" : "TheCollectorIdentifier",
           "version" : "1.2",
           "formatVersion" : 1,
           "timestamp" : 1351607182000000000,
           "category" : null,
           "kind" : 0,
           "data" : { "plugin_specific_data" : "go_here" }
       },
+      {
           "name" : "AnotherDataCollector",
           "version" : "B",
           "formatVersion" : 7,
           "timestamp" : 1351609526123854000,
           "category" : "storage",
           "kind" : 1,
           "data" : { "status" : { "code" : 1,
                                   "message" : "Error on disk 2"
                                 },
                      "plugin_specific" : "data",
                      "some_late_data" : { "timestamp" : 1351609526123942720,
                                           ...
+                                        }
+                   }
+      }
+      ]
     Performance reporting collectors
     ++++++++++++++++++++++++++++++++
     These collectors only provide data about some component of the system, without
     giving any interpretation over their meaning.
     The value of the ``kind`` field of the report will be ``0``.
     Status reporting collectors
     +++++++++++++++++++++++++++
     These collectors will provide information about the status of some
     component of ganeti, or managed by ganeti.
     The value of their ``kind`` field will be ``1``.
     The rationale behind this kind of collectors is that there are some situations
     where exporting data about the underlying subsystems would expose potential
     issues. But if Ganeti itself is able (and going) to fix the problem, conflicts
     might arise between Ganeti and something/somebody else trying to fix the same
     problem.
     Also, some external monitoring systems might not be aware of the internals of a
     particular subsystem (e.g.: DRBD) and might only exploit the high level
     response of its data collector, alerting an administrator if anything is wrong.
     Still, completely hiding the underlying data is not a good idea, as they might
     still be of use in some cases. So status reporting plugins will provide two
     output modes: one just exporting a high level information about the status,
     and one also exporting all the data they gathered.
     The default output mode will be the status-only one. Through a command line
     parameter (for stand-alone data collectors) or through the HTTP request to the
     monitoring agent
     (when collectors are executed as part of it) the verbose output mode providing
     all the data can be selected.
     When exporting just the status each status reporting collector will provide,
     in its ``data`` section, at least the following field:
     ``status``
       summarizes the status of the component being monitored and consists of two
       subfields:
       ``code``
         It assumes a numeric value, encoded in such a way to allow using a bitset
         to easily distinguish which states are currently present in the whole cluster.
         If the bitwise OR of all the ``status`` fields is 0, the cluster is
         completely healty.
         The status codes are as follows:
         ``0``
           The collector can determine that everything is working as
           intended.
         ``1``
           Something is temporarily wrong but it is being automatically fixed by
           Ganeti.
           There is no need of external intervention.
         ``2``
           The collector can determine that something is wrong and Ganeti has no
           way to fix it autonomously. External intervention is required.
         ``4``
           The collector has failed to understand whether the status is good or
           bad. Further analysis is required. Interpret this status as a
           potentially dangerous situation.
       ``message``
         A message to better explain the reason of the status.
         The exact format of the message string is data collector dependent.
         The field is mandatory, but the content can be ``null`` if the code is
         ``0`` (working as intended) or ``1`` (being fixed automatically).
         If the status code is ``2``, the message should specify what has gone
         wrong.
         If the status code is ``4``, the message shoud explain why it was not
         possible to determine a proper status.
     The ``data`` section will also contain all the fields describing the gathered
     data, according to a collector-specific format.
     Instance status
     +++++++++++++++
-...
       system, which then will export those themselves.
     Monitoring and auditing systems can then use the reason to understand
     the cause of an instance status, and they can use the object version to
     the cause of an instance status, and they can use the timestamp to
     understand the freshness of their data even in the absence of an atomic
     cross-node reporting: for example if they see an instance "up" on a node
     after seeing it running on a previous one, they can compare these values
-...
     upon.
     The instance status will be on each node, for the instances it is
     primary for and will contain at least:
     primary for, and its ``data`` section of the report will contain a list
     of instances, with at least the following fields for each instance:
     ``name``
       The name of the instance.
     ``uuid``
       The UUID of the instance (stable on name change).
     ``admin_state``
       The status of the instance (up/down/offline) as requested by the admin.
     ``actual_state``
       The actual status of the instance. It can be ``up``, ``down``, or
       ``hung`` if the instance is up but it appears to be completely stuck.
     ``uptime``
       The uptime of the instance (if it is up, "null" otherwise).
     ``mtime``
       The timestamp of the last known change to the instance state.
     ``state_reason``
       The last known reason for state change, described according to the
       following subfields:
       ``text``
         Either a user-provided reason (if any), or the name of the command that
         triggered the state change, as a fallback.
       ``jobID``
         The ID of the job that caused the state change.
     - The instance name
     - The instance UUID (stable on name change)
     - The instance running status (up or down)
     - The uptime, as detected by the hypervisor
     - The timestamp of last known change
     - The timestamp of when the status was last checked (see caching, below)
     - The last known reason for change, if any
       ``source``
         Where the state change was triggered (RAPI, CLI).
     More information about all the fields and their type will be available
     in the "Format of the report" section.
     ``status``
       It represents the status of the instance, and its format is the same as that
       of the ``status`` field of `Status reporting collectors`_.
     Each hypervisor should provide its own instance status data collector, possibly
     with the addition of more, specific, fields.
     The ``category`` field of all of them will be ``instance``.
     The ``kind`` field will be ``1``.
     Note that as soon as a node knows it's not the primary anymore for an
     instance it will stop reporting status for it: this means the instance
     will either disappear, if it has been deleted, or appear on another
     node, if it's been moved.
     Instance Disk status
     ++++++++++++++++++++
     The ``code`` of the ``status`` field of the report of the Instance status data
     collector will be:
     As for the instance status Ganeti has now only partial information about
     its instance disks: in particular each node is unaware of the disk to
     instance mapping, that exists only on the master.
     ``0``
       if ``status`` is ``0`` for all the instances it is reporting about.
     For this design doc we plan to fix this by changing all RPCs that create
     a backend storage or that put an already existing one in use and passing
     the relevant instance to the node. The node can then export these to the
     status reporting tool.
     ``1``
       otherwise.
     Storage status
     ++++++++++++++
     The storage status collectors will be a series of data collectors
     (drbd, rbd, plain, file) that will gather data about all the storage types
     for the current node (this is right now hardcoded to the enabled storage
     types, and in the future tied to the enabled storage pools for the nodegroup).
     The ``name`` of each of these collector will reflect what storage type each of
     them refers to.
     The ``category`` field of these collector will be ``storage``.
     The ``kind`` field will be ``1`` (`Status reporting collectors`_).
     The ``data`` section of the report will provide at least the following fields:
     ``free``
       The amount of free space (in KBytes).
     ``used``
       The amount of used space (in KBytes).
     ``total``
       The total visible space (in KBytes).
     Each specific storage type might provide more type-specific fields.
     In case of error, the ``message`` subfield of the ``status`` field of the
     report of the instance status collector will disclose the nature of the error
     as a type specific information. Examples of these are "backend pv unavailable"
     for lvm storage, "unreachable" for network based storage or "filesystem error"
     for filesystem based implementations.
     DRBD status
     ***********
     This data collector will run only on nodes where DRBD is actually
     present and it will gather information about DRBD devices.
     Its ``kind`` in the report will be ``1`` (`Status reporting collectors`_).
     Its ``category`` field in the report will contain the value ``storage``.
     When executed in verbose mode, the ``data`` section of the report of this
     collector will provide the following fields:
     ``versionInfo``
       Information about the DRBD version number, given by a combination of
       any (but at least one) of the following fields:
       ``version``
         The DRBD driver version.
       ``api``
         The API version number.
       ``proto``
         The protocol version.
       ``srcversion``
         The version of the source files.
       ``gitHash``
         Git hash of the source files.
       ``buildBy``
         Who built the binary, and, optionally, when.
     ``device``
       A list of structures, each describing a DRBD device (a minor) and containing
       the following fields:
       ``minor``
         The device minor number.
       ``connectionState``
         The state of the connection. If it is "Unconfigured", all the following
         fields are not present.
       ``localRole``
         The role of the local resource.
       ``remoteRole``
         The role of the remote resource.
       ``localState``
         The status of the local disk.
       ``remoteState``
         The status of the remote disk.
       ``replicationProtocol``
         The replication protocol being used.
       ``ioFlags``
         The input/output flags.
       ``perfIndicators``
         The performance indicators. This field will contain the following
         sub-fields:
         ``networkSend``
           KiB of data sent on the network.
         ``networkReceive``
           KiB of data received from the network.
         ``diskWrite``
           KiB of data written on local disk.
         ``diskRead``
           KiB of date read from the local disk.
         ``activityLog``
           Number of updates of the activity log.
         ``bitMap``
           Number of updates to the bitmap area of the metadata.
         ``localCount``
           Number of open requests to the local I/O subsystem.
         ``pending``
           Number of requests sent to the partner but not yet answered.
         ``unacknowledged``
           Number of requests received by the partner but still to be answered.
         ``applicationPending``
           Num of block input/output requests forwarded to DRBD but that have not yet
           been answered.
         ``epochs``
           (Optional) Number of epoch objects. Not provided by all DRBD versions.
         ``writeOrder``
           (Optional) Currently used write ordering method. Not provided by all DRBD
           versions.
         ``outOfSync``
           (Optional) KiB of storage currently out of sync. Not provided by all DRBD
           versions.
       ``syncStatus``
         (Optional) The status of the synchronization of the disk. This is present
         only if the disk is being synchronized, and includes the following fields:
         ``percentage``
           The percentage of synchronized data.
         ``progress``
           How far the synchronization is. Written as "x/y", where x and y are
           integer numbers expressed in the measurement unit stated in
           ``progressUnit``
         ``progressUnit``
           The measurement unit for the progress indicator.
         ``timeToFinish``
           The expected time before finishing the synchronization.
         ``speed``
           The speed of the synchronization.
         ``want``
           The desiderd speed of the synchronization.
         ``speedUnit``
           The measurement unit of the ``speed`` and ``want`` values. Expressed
           as "size/time".
       ``instance``
         The name of the Ganeti instance this disk is associated to.
     While we haven't implemented these RPC changes yet, we'll use confd to
     fetch this information in the data collector.
     Since Ganeti supports many type of disks for instances (drbd, rbd,
     plain, file) we will export both a "generic" status which will work for
     any type of disk and will be very opaque (at minimum just an "healthy"
     or "error" state, plus perhaps some human readable comment and a
     "per-type" status which will explain more about the internal details but
     will not be compatible between different storage types (and will for
     example export the drbd connection status, sync, and so on).
     Status of storage for instances
     +++++++++++++++++++++++++++++++
     The node will also be reporting on all storage types it knows about for
     the current node (this is right now hardcoded to the enabled storage
     types, and in the future tied to the enabled storage pools for the
     nodegroup). For this kind of information also we will report both a
     generic health status (healthy or error) for each type of storage, and
     some more generic statistics (free space, used space, total visible
     space). In addition type specific information can be exported: for
     example, in case of error, the nature of the error can be disclosed as a
     type specific information. Examples of these are "backend pv
     unavailable" for lvm storage, "unreachable" for network based storage or
     "filesystem error" for filesystem based implementations.
     Ganeti daemons status
     +++++++++++++++++++++
     Ganeti will report what information it has about its own daemons: this
     includes memory usage, uptime, CPU usage. This should allow identifying
     possible problems with the Ganeti system itself: for example memory
     leaks, crashes and high resource utilization should be evident by
     analyzing this information.
     Ganeti will report what information it has about its own daemons.
     This should allow identifying possible problems with the Ganeti system itself:
     for example memory leaks, crashes and high resource utilization should be
     evident by analyzing this information.
     The ``kind`` field will be ``1`` (`Status reporting collectors`_).
     Each daemon will have its own data collector, and each of them will have
     a ``category`` field valued ``daemon``.
     When executed in verbose mode, their data section will include at least:
     ``memory``
       The amount of used memory.
     ``size_unit``
       The measurement unit used for the memory.
     Ganeti daemons will also be able to export extra internal information to
     the status reporting, through the plugin system (see below).
     ``uptime``
       The uptime of the daemon.
     ``CPU usage``
       How much cpu the daemon is using (percentage).
     Any other daemon-specific information can be included as well in the ``data``
     section.
     Hypervisor resources report
     +++++++++++++++++++++++++++
-...
     specific" way. Each hypervisor can then add extra specific information
     that is not generic enough be abstracted.
     The ``kind`` field will be ``0`` (`Performance reporting collectors`_).
     Each of the hypervisor data collectory will be of ``category``: ``hypervisor``.
     Node OS resources report
     ++++++++++++++++++++++++
     Since Ganeti assumes it's running on Linux, it's useful to export some
     basic information as seen by the host system. This includes number and
     status of CPUs, memory, filesystems and network intefaces as well as the
     version of components Ganeti interacts with (Linux, drbd, hypervisor,
     etc).
     basic information as seen by the host system.
     Note that we won't go into any hardware specific details (e.g. querying a
     node RAID is outside the scope of this, and can be implemented as a
     plugin) but we can easily just report the information above, since it's
     standard enough across all systems.
     The ``category`` field of the report will be ``null``.
     Plugin system
     +++++++++++++
     The ``kind`` field will be ``0`` (`Performance reporting collectors`_).
     The monitoring system will be equipped with a plugin system that can
     export specific local information through it. The plugin system will be
     in the form of either scripts whose output will be inserted in the
     report, plain text files which will be inserted into the report, or
     local unix or network sockets from which the information has to be read.
     This should allow most flexibility for implementing an efficient system,
     while being able to keep it as simple as possible.
     The ``data`` section will include:
     The plugin system is expected to be used by local installations to
     export any installation specific information that they want to be
     monitored, about either hardware or software on their systems.
     ``cpu_number``
       The number of available cpus.
     ``cpus``
       A list with one element per cpu, showing its average load.
     Format of the query
     -------------------
     ``memory``
       The current view of memory (free, used, cached, etc.)
     The query will be an HTTP GET request on a particular port. At the
     beginning it will only be possible to query the full status report.
     ``filesystem``
       A list with one element per filesystem, showing a summary of the
       total/available space.
     ``NICs``
       A list with one element per network interface, showing the amount of
       sent/received data, error rate, IP address of the interface, etc.
     Format of the report
     --------------------
     ``versions``
       A map using the name of a component Ganeti interacts (Linux, drbd,
       hypervisor, etc) as the key and its version number as the value.
     The report of the will be in JSON format, and it will present an array
     of report objects.
     Each report object will be produced by a specific data collector.
     Each report object includes some mandatory fields, to be provided by all
     the data collectors, and a field to contain data collector-specific
     data.
     Note that we won't go into any hardware specific details (e.g. querying a
     node RAID is outside the scope of this, and can be implemented as a
     plugin) but we can easily just report the information above, since it's
     standard enough across all systems.
     Here follows a minimal example of a report::
     Instance disk status propagation
     --------------------------------
+      [
+      {
           "name" : "TheCollectorIdentifier",
           "version" : "1.2",
           "format_version" : 1,
           "timestamp" : 1351607182000000000,
           "data" : { "plugin_specific_data" : "go_here" }
       },
+      {
           "name" : "AnotherDataCollector",
           "version" : "B",
           "format_version" : 7,
           "timestamp" : 1351609526123854000,
           "data" : { "plugin_specific" : "data",
                      "some_late_data" : { "timestamp" : "SPECIFIC_TIME",
                                           ... }
+                   }
+      }
+      ]
     As for the instance status Ganeti has now only partial information about
     its instance disks: in particular each node is unaware of the disk to
     instance mapping, that exists only on the master.
     Here is the description of the mandatory fields of each object:
     For this design doc we plan to fix this by changing all RPCs that create
     a backend storage or that put an already existing one in use and passing
     the relevant instance to the node. The node can then export these to the
     status reporting tool.
     name
       the name of the data collector that produced this part of the report.
       It is supposed to be unique inside a report.
     While we haven't implemented these RPC changes yet, we'll use Confd to
     fetch this information in the data collectors.
     version
       the version of the data collector that produces this part of the
       report. Built-in data collectors (as opposed to those implemented as
       plugins) should have "B" as the version number.
     Plugin system
     -------------
     format_version
       the format of what is represented in the "data" field for each data
       collector might change over time. Every time this happens, the
       format_version should be changed, so that who reads the report knows
       what format to expect, and how to correctly interpret it.
     The monitoring system will be equipped with a plugin system that can
     export specific local information through it.
     timestamp
       the time when the reported data were gathered. Is has to be expressed
       in nanoseconds since the unix epoch (0:00:00 January 01, 1970). If not
       enough precision is available (or needed) it can be padded with
       zeroes. If a report object needs multiple timestamps, it can add more
       and/or override this one inside its own "data" section.
     The plugin system is expected to be used by local installations to
     export any installation specific information that they want to be
     monitored, about either hardware or software on their systems.
     data
       this field contains all the data generated by the data collector, in
       its own independently defined format. The monitoring agent could check
       this syntactically (according to the JSON specifications) but not
       semantically.
     The plugin system will be in the form of either scripts or binaries whose output
     will be inserted in the report.
     Eventually support for other kinds of plugins might be added as well, such as
     plain text files which will be inserted into the report, or local unix or
     network sockets from which the information has to be read.  This should allow
     most flexibility for implementing an efficient system, while being able to keep
     it as simple as possible.
     Data collectors
     ---------------
     In order to ease testing as well as to make it simple to reuse this
     subsystem it will be possible to run just the "data collectors" on each
     node without passing through the agent daemon. Each data collector will
     report specific data about its subsystem and will be documented
     separately.
     node without passing through the agent daemon.
     If a data collector is run independently, it should print on stdout its
     report, according to the format corresponding to a single data collector
     report object, as described in the previous paragraph.
     report object, as described in the previous paragraphs.
     Mode of operation
     -----------------
-...
     can use adaptively to query a certain resource faster or slower
     depending on those two parameters.
     When run as stand-alone binaries, the data collector will not using any
     caching system, and just fetch and return the data immediately.
     Implementation place
     --------------------
-...
     We will implement the agent system in this order:
     - initial example data collectors (eg. for drbd and instance status.
       Data collector-specific report format TBD).
     - initial daemon for exporting data
     - initial example data collectors (eg. for drbd and instance status).
     - initial daemon for exporting data, integrating the existing collectors
     - plugin system
     - RPC updates for instance status reasons and disk to instance mapping
     - cache layer for the daemon
     - more data collectors
     - cache layer for the daemon (if needed)
     Future work

Also available in: Unified diff

Synnefo » snf-ganeti

Revision 3301805f