From 6d2e1c124fc5246a3b365901198bdc3dd0057034 Mon Sep 17 00:00:00 2001 From: Michele Tartara Date: Mon, 11 Mar 2013 15:43:57 +0100 Subject: [PATCH] Add design document for the "reason trail" This commit adds the design document for introducing "reason trails", tracing the reason why opcodes are executed, step by step. Signed-off-by: Michele Tartara Reviewed-by: Guido Trotter --- Makefile.am | 1 + doc/design-draft.rst | 1 + doc/design-reason-trail.rst | 98 +++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 100 insertions(+) create mode 100644 doc/design-reason-trail.rst diff --git a/Makefile.am b/Makefile.am index e3d6742..47cca9f 100644 --- a/Makefile.am +++ b/Makefile.am @@ -398,6 +398,7 @@ docinput = \ doc/design-partitioned.rst \ doc/design-query-splitting.rst \ doc/design-query2.rst \ + doc/design-reason-trail.rst \ doc/design-resource-model.rst \ doc/design-restricted-commands.rst \ doc/design-shared-storage.rst \ diff --git a/doc/design-draft.rst b/doc/design-draft.rst index 9dd2dfc..be04bd8 100644 --- a/doc/design-draft.rst +++ b/doc/design-draft.rst @@ -17,6 +17,7 @@ Design document drafts design-monitoring-agent.rst design-hroller.rst design-storagespace.rst + design-reason-trail.rst .. vim: set textwidth=72 : .. Local Variables: diff --git a/doc/design-reason-trail.rst b/doc/design-reason-trail.rst new file mode 100644 index 0000000..860fa88 --- /dev/null +++ b/doc/design-reason-trail.rst @@ -0,0 +1,98 @@ +=================== +Ganeti reason trail +=================== + +.. contents:: :depth: 2 + +This is a design document detailing the implementation of a way for Ganeti to +track the origin and the reason of every executed command, from its starting +point (command line, remote API, some htool, etc.) to its actual execution +time. + +Current state and shortcomings +============================== + +There is currently no way to track why a job and all the operations part of it +were executed, and who or what triggered the execution. +This is an inconvenience in general, and also it makes impossible to have +certain information, such as finding the reason why an instance last changed its +status (i.e.: why it was started/stopped/rebooted/etc.), or distinguishing +an admin request from a scheduled maintenance or an automated tool's work. + +Proposed changes +================ + +We propose to introduce a new piece of information, that will be called "reason +trail", to track the path from the issuing of a command to its execution. + +The reason trail will be a list of 3-tuples ``(source, reason, timestamp)``, +with: + +``source`` + The entity deciding to perform (or forward) a command. + It is represented by an arbitrary string, but strings prepended by "gnt:" + are reserved for Ganeti components, and they will be refused by the + interfaces towards the external world. + +``reason`` + The reason why the entity decided to perform the operation. + It is represented by an arbitrary string. The string might possibly be empty, + because certain components of the system might just "pass on" the operation + (therefore wanting to be recorded in the trail) but without an explicit + reason. + +``timestamp`` + The time when the element was added to the reason trail. It has to be + expressed in nanoseconds since the unix epoch (0:00:00 January 01, 1970). + If not enough precision is available (or needed) it can be padded with + zeroes. + +The reason trail will be attached at the OpCode level. When it has to be +serialized externally (such as on the RAPI interface), it will be serialized in +JSON format. Specifically, it will be serialized as a list of elements. +Each element will be a list with two strings (for ``source`` and ``reason``) +and one integer number (the ``timestamp``). + +Any component the operation goes through is allowed (but not required) to append +it's own reason to the list. Other than this, the list shouldn't be modified. + +As an example here is the reason trail for a shutdown operation invoked from +the command line through the gnt-instance tool:: + + [("user", "Cleanup of unused instances", 1363088484000000000), + ("gnt:client:gnt-instance", "stop", 1363088484020000000), + ("gnt:opcode:shutdown", "job=1234;index=0", 1363088484026000000), + ("gnt:daemon:noded:shutdown", "", 1363088484135000000)] + +where the first 3-tuple is determined by a user-specified message, passed to +gnt-instance through a command line parameter. + +The same operation, launched by an external GUI tool, and executed through the +remote API, would have a reason trail like:: + + [("user", "Cleanup of unused instances", 1363088484000000000), + ("other-app:tool-name", "gui:stop", 1363088484000300000), + ("gnt:client:rapi:shutdown", "", 1363088484020000000), + ("gnt:library:rlib2:shutdown", "", 1363088484023000000), + ("gnt:opcode:shutdown", "job=1234;index=0", 1363088484026000000), + ("gnt:daemon:noded:shutdown", "", 1363088484135000000)] + +Implementation +============== + +The OpCode base class will be modified to include a new field, OP_REASON. +This will receive the reason trail as built by all the previous steps. + +When an OpCode is added to a job (in jqueue.py) the job number and the opcode +index will be recorded as the reason for the existence of that opcode. + +The implementation of this design will start from the operations that affect the +instance status. They will be changed so that the "reason" is passed to them. +They will then export the new expected instance status, together +with the associated reason for the monitoring daemon. + +.. vim: set textwidth=72 : +.. Local Variables: +.. mode: rst +.. fill-column: 72 +.. End: -- 1.7.10.4