code.grnet.gr Git - ganeti-local/blob - doc/design-node-state-cache.rst

   1 ================
   2 Node State Cache
   3 ================
   4
   5 .. contents:: :depth: 4
   6
   7 This is a design doc about the optimization of machine info retrieval.
   8
   9
  10 Current State
  11 =============
  12
  13 Currently every RPC call is quite expensive as a TCP handshake has to be
  14 made as well as SSL negotiation. This especially is visible when getting
  15 node and instance info over and over again.
  16
  17 This data, however, is quite easy to cache but needs some changes to how
  18 we retrieve data in the RPC as this is spread over several RPC calls
  19 and are hard to unify.
  20
  21
  22 Proposed changes
  23 ================
  24
  25 To overcome this situation with multiple information retrieval calls we
  26 introduce one single RPC call to get all the info in a organized manner,
  27 for easy store in the cache.
  28
  29 As of now we have 3 different information RPC calls:
  30
  31 - ``call_node_info``: To retrieve disk and hyper-visor information
  32 - ``call_instance_info``: To retrieve hyper-visor information for one
  33   instance
  34 - ``call_all_instance_info``: To retrieve hyper-visor information for
  35   all instances
  36
  37 Not to mention that ``call_all_instance_info`` and
  38 ``call_instance_info`` return different information in the dict.
  39
  40 To unify the data and organize them we introduce a new RPC call
  41 ``call_node_snapshot`` doing all of the above in one go. Which
  42 data we want to know will be specified about a dict of request
  43 types: CACHE_REQ_HV, CACHE_REQ_DISKINFO, CACHE_REQ_BOOTID
  44
  45 As this cache is representing the state of a given node we use the
  46 name of a node as the key to retrieve the data from the cache. A
  47 name-space separation of node and instance data is not possible at the
  48 current point. This is due to the fact that some of the node hyper-visor
  49 information like free memory is correlating with instances running.
  50
  51 An example of how the data for a node in the cache looks like::
  52
  53   {
  54     constants.CACHE_REQ_HV: {
  55       constants.HT_XEN_PVM: {
  56         _NODE_DATA: {
  57           "memory_total": 32763,
  58           "memory_free": 9159,
  59           "memory_dom0": 1024,
  60           "cpu_total": 4,
  61           "cpu_sockets": 2
  62         },
  63         _INSTANCES_DATA: {
  64           "inst1": {
  65             "memory": 4096,
  66             "state": "-b----",
  67             "time": 102399.3,
  68             "vcpus": 1
  69           },
  70           "inst2": {
  71             "memory": 4096,
  72             "state": "-b----",
  73             "time": 12280.0,
  74             "vcpus": 3
  75           }
  76         }
  77       }
  78     },
  79     constants.CACHE_REQ_DISKINFO: {
  80       "xenvg": {
  81         "vg_size": 1048576,
  82         "vg_free": 491520
  83       },
  84     }
  85     constants.CACHE_REQ_BOOTID: "0dd0983c-913d-4ce6-ad94-0eceb77b69f9"
  86   }
  87
  88 This way we get easy to organize information which can simply be arranged in
  89 the cache.
  90
  91 The 3 RPC calls mentioned above will remain for compatibility reason but
  92 will be simple wrappers around this RPC call.
  93
  94
  95 Cache invalidation
  96 ------------------
  97
  98 The cache is invalidated at every RPC call which is not proven to not
  99 modify the state of a given node. This is to avoid inconsistency between
 100 cache and actual node state.
 101
 102 There are some corner cases which invalidates the whole cache at once as
 103 they usually affect other nodes states too:
 104
 105  - migrate/failover
 106  - import/export
 107
 108 A request will be served from the cache if and only if it can be
 109 fulfilled entirely from it (i.e. all the CACHE_REQ_* entries are already
 110 present). Otherwise, we will invalidate the cache and actually do the
 111 remote call.
 112
 113 In addition, every cache entry will have a TTL of about 10 minutes which
 114 should be enough to accommodate most use cases.
 115
 116 We also allow an option to the calls to bypass the cache completely and
 117 do a force remote call. However, this will invalidate the present
 118 entries and populate the cache with the new retrieved values.
 119
 120
 121 Additional cache population
 122 ---------------------------
 123
 124 Besides of the commands which calls above RPC calls, a full cache
 125 population can also be done by a separate new op-code run by
 126 ``ganeti-watcher`` periodically. This op-code will be used instead of
 127 the old ones.
 128
 129
 130 Possible regressions
 131 ====================
 132
 133 As we change from getting "one hyper-visor information" to "get all we
 134 know about this hyper-visor"-style we have a regression in time of
 135 execution. The execution time is about 1.8x more in process execution
 136 time. However, this does not include the latency and negotiation time
 137 needed for each separate RPC call. Also if we hit the cache all 3 costs
 138 will be 0. The only time taken is to look up the info in the cache and
 139 the deserialization of the data. Which takes down the time from today
 140 ~300ms to ~100ms.
 141
 142 .. vim: set textwidth=72 :
 143 .. Local Variables:
 144 .. mode: rst
 145 .. fill-column: 72
 146 .. End: