Statistics
| Branch: | Tag: | Revision:

root / doc / design-node-state-cache.rst @ fc6075dd

History | View | Annotate | Download (4.5 kB)

1 23f0b93e René Nussbaumer
================
2 23f0b93e René Nussbaumer
Node State Cache
3 23f0b93e René Nussbaumer
================
4 23f0b93e René Nussbaumer
5 23f0b93e René Nussbaumer
.. contents:: :depth: 4
6 23f0b93e René Nussbaumer
7 23f0b93e René Nussbaumer
This is a design doc about the optimization of machine info retrieval.
8 23f0b93e René Nussbaumer
9 23f0b93e René Nussbaumer
10 23f0b93e René Nussbaumer
Current State
11 23f0b93e René Nussbaumer
=============
12 23f0b93e René Nussbaumer
13 23f0b93e René Nussbaumer
Currently every RPC call is quite expensive as a TCP handshake has to be
14 23f0b93e René Nussbaumer
made as well as SSL negotiation. This especially is visible when getting
15 23f0b93e René Nussbaumer
node and instance info over and over again.
16 23f0b93e René Nussbaumer
17 23f0b93e René Nussbaumer
This data, however, is quite easy to cache but needs some changes to how
18 23f0b93e René Nussbaumer
we retrieve data in the RPC as this is spread over several RPC calls
19 23f0b93e René Nussbaumer
and are hard to unify.
20 23f0b93e René Nussbaumer
21 23f0b93e René Nussbaumer
22 23f0b93e René Nussbaumer
Proposed changes
23 23f0b93e René Nussbaumer
================
24 23f0b93e René Nussbaumer
25 23f0b93e René Nussbaumer
To overcome this situation with multiple information retrieval calls we
26 23f0b93e René Nussbaumer
introduce one single RPC call to get all the info in a organized manner,
27 23f0b93e René Nussbaumer
for easy store in the cache.
28 23f0b93e René Nussbaumer
29 23f0b93e René Nussbaumer
As of now we have 3 different information RPC calls:
30 23f0b93e René Nussbaumer
31 23f0b93e René Nussbaumer
- ``call_node_info``: To retrieve disk and hyper-visor information
32 23f0b93e René Nussbaumer
- ``call_instance_info``: To retrieve hyper-visor information for one
33 23f0b93e René Nussbaumer
  instance
34 23f0b93e René Nussbaumer
- ``call_all_instance_info``: To retrieve hyper-visor information for
35 23f0b93e René Nussbaumer
  all instances
36 23f0b93e René Nussbaumer
37 23f0b93e René Nussbaumer
Not to mention that ``call_all_instance_info`` and
38 23f0b93e René Nussbaumer
``call_instance_info`` return different information in the dict.
39 23f0b93e René Nussbaumer
40 23f0b93e René Nussbaumer
To unify the data and organize them we introduce a new RPC call
41 23f0b93e René Nussbaumer
``call_node_snapshot`` doing all of the above in one go. Which
42 23f0b93e René Nussbaumer
data we want to know will be specified about a dict of request
43 23f0b93e René Nussbaumer
types: CACHE_REQ_HV, CACHE_REQ_DISKINFO, CACHE_REQ_BOOTID
44 23f0b93e René Nussbaumer
45 23f0b93e René Nussbaumer
As this cache is representing the state of a given node we use the
46 23f0b93e René Nussbaumer
name of a node as the key to retrieve the data from the cache. A
47 23f0b93e René Nussbaumer
name-space separation of node and instance data is not possible at the
48 23f0b93e René Nussbaumer
current point. This is due to the fact that some of the node hyper-visor
49 23f0b93e René Nussbaumer
information like free memory is correlating with instances running.
50 23f0b93e René Nussbaumer
51 23f0b93e René Nussbaumer
An example of how the data for a node in the cache looks like::
52 23f0b93e René Nussbaumer
53 23f0b93e René Nussbaumer
  {
54 23f0b93e René Nussbaumer
    constants.CACHE_REQ_HV: {
55 23f0b93e René Nussbaumer
      constants.HT_XEN_PVM: {
56 23f0b93e René Nussbaumer
        _NODE_DATA: {
57 23f0b93e René Nussbaumer
          "memory_total": 32763,
58 23f0b93e René Nussbaumer
          "memory_free": 9159,
59 23f0b93e René Nussbaumer
          "memory_dom0": 1024,
60 23f0b93e René Nussbaumer
          "cpu_total": 4,
61 23f0b93e René Nussbaumer
          "cpu_sockets": 2
62 23f0b93e René Nussbaumer
        },
63 23f0b93e René Nussbaumer
        _INSTANCES_DATA: {
64 23f0b93e René Nussbaumer
          "inst1": {
65 23f0b93e René Nussbaumer
            "memory": 4096,
66 23f0b93e René Nussbaumer
            "state": "-b----",
67 23f0b93e René Nussbaumer
            "time": 102399.3,
68 23f0b93e René Nussbaumer
            "vcpus": 1
69 23f0b93e René Nussbaumer
          },
70 23f0b93e René Nussbaumer
          "inst2": {
71 23f0b93e René Nussbaumer
            "memory": 4096,
72 23f0b93e René Nussbaumer
            "state": "-b----",
73 23f0b93e René Nussbaumer
            "time": 12280.0,
74 23f0b93e René Nussbaumer
            "vcpus": 3
75 23f0b93e René Nussbaumer
          }
76 23f0b93e René Nussbaumer
        }
77 23f0b93e René Nussbaumer
      }
78 23f0b93e René Nussbaumer
    },
79 23f0b93e René Nussbaumer
    constants.CACHE_REQ_DISKINFO: {
80 23f0b93e René Nussbaumer
      "xenvg": {
81 23f0b93e René Nussbaumer
        "vg_size": 1048576,
82 23f0b93e René Nussbaumer
        "vg_free": 491520
83 23f0b93e René Nussbaumer
      },
84 23f0b93e René Nussbaumer
    }
85 23f0b93e René Nussbaumer
    constants.CACHE_REQ_BOOTID: "0dd0983c-913d-4ce6-ad94-0eceb77b69f9"
86 23f0b93e René Nussbaumer
  }
87 23f0b93e René Nussbaumer
88 23f0b93e René Nussbaumer
This way we get easy to organize information which can simply be arranged in
89 23f0b93e René Nussbaumer
the cache.
90 23f0b93e René Nussbaumer
91 23f0b93e René Nussbaumer
The 3 RPC calls mentioned above will remain for compatibility reason but
92 23f0b93e René Nussbaumer
will be simple wrappers around this RPC call.
93 23f0b93e René Nussbaumer
94 23f0b93e René Nussbaumer
95 23f0b93e René Nussbaumer
Cache invalidation
96 23f0b93e René Nussbaumer
------------------
97 23f0b93e René Nussbaumer
98 23f0b93e René Nussbaumer
The cache is invalidated at every RPC call which is not proven to not
99 23f0b93e René Nussbaumer
modify the state of a given node. This is to avoid inconsistency between
100 23f0b93e René Nussbaumer
cache and actual node state.
101 23f0b93e René Nussbaumer
102 23f0b93e René Nussbaumer
There are some corner cases which invalidates the whole cache at once as
103 23f0b93e René Nussbaumer
they usually affect other nodes states too:
104 23f0b93e René Nussbaumer
105 23f0b93e René Nussbaumer
 - migrate/failover
106 23f0b93e René Nussbaumer
 - import/export
107 23f0b93e René Nussbaumer
108 23f0b93e René Nussbaumer
A request will be served from the cache if and only if it can be
109 23f0b93e René Nussbaumer
fulfilled entirely from it (i.e. all the CACHE_REQ_* entries are already
110 23f0b93e René Nussbaumer
present). Otherwise, we will invalidate the cache and actually do the
111 23f0b93e René Nussbaumer
remote call.
112 23f0b93e René Nussbaumer
113 23f0b93e René Nussbaumer
In addition, every cache entry will have a TTL of about 10 minutes which
114 23f0b93e René Nussbaumer
should be enough to accommodate most use cases.
115 23f0b93e René Nussbaumer
116 23f0b93e René Nussbaumer
We also allow an option to the calls to bypass the cache completely and
117 23f0b93e René Nussbaumer
do a force remote call. However, this will invalidate the present
118 23f0b93e René Nussbaumer
entries and populate the cache with the new retrieved values.
119 23f0b93e René Nussbaumer
120 23f0b93e René Nussbaumer
121 23f0b93e René Nussbaumer
Additional cache population
122 23f0b93e René Nussbaumer
---------------------------
123 23f0b93e René Nussbaumer
124 23f0b93e René Nussbaumer
Besides of the commands which calls above RPC calls, a full cache
125 23f0b93e René Nussbaumer
population can also be done by a separate new op-code run by
126 23f0b93e René Nussbaumer
``ganeti-watcher`` periodically. This op-code will be used instead of
127 23f0b93e René Nussbaumer
the old ones.
128 23f0b93e René Nussbaumer
129 23f0b93e René Nussbaumer
130 23f0b93e René Nussbaumer
Possible regressions
131 23f0b93e René Nussbaumer
====================
132 23f0b93e René Nussbaumer
133 23f0b93e René Nussbaumer
As we change from getting "one hyper-visor information" to "get all we
134 23f0b93e René Nussbaumer
know about this hyper-visor"-style we have a regression in time of
135 23f0b93e René Nussbaumer
execution. The execution time is about 1.8x more in process execution
136 23f0b93e René Nussbaumer
time. However, this does not include the latency and negotiation time
137 23f0b93e René Nussbaumer
needed for each separate RPC call. Also if we hit the cache all 3 costs
138 23f0b93e René Nussbaumer
will be 0. The only time taken is to look up the info in the cache and
139 23f0b93e René Nussbaumer
the deserialization of the data. Which takes down the time from today
140 23f0b93e René Nussbaumer
~300ms to ~100ms.
141 23f0b93e René Nussbaumer
142 23f0b93e René Nussbaumer
.. vim: set textwidth=72 :
143 23f0b93e René Nussbaumer
.. Local Variables:
144 23f0b93e René Nussbaumer
.. mode: rst
145 23f0b93e René Nussbaumer
.. fill-column: 72
146 23f0b93e René Nussbaumer
.. End: