Revision cd93a0ec

b/Makefile.am
356 356
	doc/design-linuxha.rst \
357 357
	doc/design-multi-reloc.rst \
358 358
	doc/design-network.rst \
359
	doc/design-node-state-cache.rst \
360 359
	doc/design-oob.rst \
361 360
	doc/design-ovf-support.rst \
362 361
	doc/design-partitioned.rst \
b/doc/design-draft.rst
9 9
   design-http-server.rst
10 10
   design-impexp2.rst
11 11
   design-network.rst
12
   design-node-state-cache.rst
13 12
   design-resource-model.rst
14 13
   design-virtual-clusters.rst
15 14
   design-query-splitting.rst
/dev/null
1
================
2
Node State Cache
3
================
4

  
5
.. contents:: :depth: 4
6

  
7
This is a design doc about the optimization of machine info retrieval.
8

  
9

  
10
Current State
11
=============
12

  
13
Currently every RPC call is quite expensive as a TCP handshake has to be
14
made as well as SSL negotiation. This especially is visible when getting
15
node and instance info over and over again.
16

  
17
This data, however, is quite easy to cache but needs some changes to how
18
we retrieve data in the RPC as this is spread over several RPC calls
19
and are hard to unify.
20

  
21

  
22
Proposed changes
23
================
24

  
25
To overcome this situation with multiple information retrieval calls we
26
introduce one single RPC call to get all the info in a organized manner,
27
for easy store in the cache.
28

  
29
As of now we have 3 different information RPC calls:
30

  
31
- ``call_node_info``: To retrieve disk and hyper-visor information
32
- ``call_instance_info``: To retrieve hyper-visor information for one
33
  instance
34
- ``call_all_instance_info``: To retrieve hyper-visor information for
35
  all instances
36

  
37
Not to mention that ``call_all_instance_info`` and
38
``call_instance_info`` return different information in the dict.
39

  
40
To unify the data and organize them we introduce a new RPC call
41
``call_node_snapshot`` doing all of the above in one go. Which
42
data we want to know will be specified about a dict of request
43
types: CACHE_REQ_HV, CACHE_REQ_DISKINFO, CACHE_REQ_BOOTID
44

  
45
As this cache is representing the state of a given node we use the
46
name of a node as the key to retrieve the data from the cache. A
47
name-space separation of node and instance data is not possible at the
48
current point. This is due to the fact that some of the node hyper-visor
49
information like free memory is correlating with instances running.
50

  
51
An example of how the data for a node in the cache looks like::
52

  
53
  {
54
    constants.CACHE_REQ_HV: {
55
      constants.HT_XEN_PVM: {
56
        _NODE_DATA: {
57
          "memory_total": 32763,
58
          "memory_free": 9159,
59
          "memory_dom0": 1024,
60
          "cpu_total": 4,
61
          "cpu_sockets": 2
62
        },
63
        _INSTANCES_DATA: {
64
          "inst1": {
65
            "memory": 4096,
66
            "state": "-b----",
67
            "time": 102399.3,
68
            "vcpus": 1
69
          },
70
          "inst2": {
71
            "memory": 4096,
72
            "state": "-b----",
73
            "time": 12280.0,
74
            "vcpus": 3
75
          }
76
        }
77
      }
78
    },
79
    constants.CACHE_REQ_DISKINFO: {
80
      "xenvg": {
81
        "vg_size": 1048576,
82
        "vg_free": 491520
83
      },
84
    }
85
    constants.CACHE_REQ_BOOTID: "0dd0983c-913d-4ce6-ad94-0eceb77b69f9"
86
  }
87

  
88
This way we get easy to organize information which can simply be arranged in
89
the cache.
90

  
91
The 3 RPC calls mentioned above will remain for compatibility reason but
92
will be simple wrappers around this RPC call.
93

  
94

  
95
Cache invalidation
96
------------------
97

  
98
The cache is invalidated at every RPC call which is not proven to not
99
modify the state of a given node. This is to avoid inconsistency between
100
cache and actual node state.
101

  
102
There are some corner cases which invalidates the whole cache at once as
103
they usually affect other nodes states too:
104

  
105
 - migrate/failover
106
 - import/export
107

  
108
A request will be served from the cache if and only if it can be
109
fulfilled entirely from it (i.e. all the CACHE_REQ_* entries are already
110
present). Otherwise, we will invalidate the cache and actually do the
111
remote call.
112

  
113
In addition, every cache entry will have a TTL of about 10 minutes which
114
should be enough to accommodate most use cases.
115

  
116
We also allow an option to the calls to bypass the cache completely and
117
do a force remote call. However, this will invalidate the present
118
entries and populate the cache with the new retrieved values.
119

  
120

  
121
Additional cache population
122
---------------------------
123

  
124
Besides of the commands which calls above RPC calls, a full cache
125
population can also be done by a separate new op-code run by
126
``ganeti-watcher`` periodically. This op-code will be used instead of
127
the old ones.
128

  
129

  
130
Possible regressions
131
====================
132

  
133
As we change from getting "one hyper-visor information" to "get all we
134
know about this hyper-visor"-style we have a regression in time of
135
execution. The execution time is about 1.8x more in process execution
136
time. However, this does not include the latency and negotiation time
137
needed for each separate RPC call. Also if we hit the cache all 3 costs
138
will be 0. The only time taken is to look up the info in the cache and
139
the deserialization of the data. Which takes down the time from today
140
~300ms to ~100ms.
141

  
142
.. vim: set textwidth=72 :
143
.. Local Variables:
144
.. mode: rst
145
.. fill-column: 72
146
.. End:
/dev/null
1
#
2
#
3

  
4
# Copyright (C) 2011 Google Inc.
5
#
6
# This program is free software; you can redistribute it and/or modify
7
# it under the terms of the GNU General Public License as published by
8
# the Free Software Foundation; either version 2 of the License, or
9
# (at your option) any later version.
10
#
11
# This program is distributed in the hope that it will be useful, but
12
# WITHOUT ANY WARRANTY; without even the implied warranty of
13
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
14
# General Public License for more details.
15
#
16
# You should have received a copy of the GNU General Public License
17
# along with this program; if not, write to the Free Software
18
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
19
# 02110-1301, USA.
20

  
21

  
22
"""This module implements caching."""
23

  
24

  
25
import time
26

  
27
from ganeti import locking
28
from ganeti import serializer
29

  
30

  
31
TIMESTAMP = "timestamp"
32
TTL = "ttl"
33
VALUE = "value"
34

  
35

  
36
class CacheBase:
37
  """This is the base class for all caches.
38

  
39
  """
40
  def __init__(self):
41
    """Base init method.
42

  
43
    """
44

  
45
  def Store(self, key, value, ttl=0):
46
    """Stores key with value in the cache.
47

  
48
    @param key: The key to associate this cached value
49
    @param value: The value to cache
50
    @param ttl: TTL in seconds after when this entry is considered outdated
51
    @returns: L{True} on success, L{False} on failure
52

  
53
    """
54
    raise NotImplementedError
55

  
56
  def GetMulti(self, keys):
57
    """Retrieve multiple values from the cache.
58

  
59
    @param keys: The keys to retrieve
60
    @returns: The list of values
61

  
62
    """
63
    raise NotImplementedError
64

  
65
  def Get(self, key):
66
    """Retrieve the value from the cache.
67

  
68
    @param key: The key to retrieve
69
    @returns: The value or L{None} if not found
70

  
71
    """
72
    raise NotImplementedError
73

  
74
  def Invalidate(self, keys):
75
    """Invalidate given keys.
76

  
77
    @param keys: The list of keys to invalidate
78
    @returns: L{True} on success, L{False} otherwise
79

  
80
    """
81
    raise NotImplementedError
82

  
83
  def Flush(self):
84
    """Invalidates all of the keys and flushes the cache.
85

  
86
    """
87
    raise NotImplementedError
88

  
89
  def ResetState(self):
90
    """Used to reset the state of the cache.
91

  
92
    This can be used to reinstantiate connection or any other state refresh
93

  
94
    """
95

  
96
  def Cleanup(self):
97
    """Cleanup the cache from expired entries.
98

  
99
    """
100

  
101

  
102
class SimpleCache(CacheBase):
103
  """Implements a very simple, dict base cache.
104

  
105
  """
106
  CLEANUP_ROUND = 1800
107
  _LOCK = "lock"
108

  
109
  def __init__(self, _time_fn=time.time):
110
    """Initialize this class.
111

  
112
    @param _time_fn: Function used to return time (unittest only)
113

  
114
    """
115
    CacheBase.__init__(self)
116

  
117
    self._time_fn = _time_fn
118

  
119
    self.cache = {}
120
    self.lock = locking.SharedLock("SimpleCache")
121
    self.last_cleanup = self._time_fn()
122

  
123
  def _UnlockedCleanup(self):
124
    """Does cleanup of the cache.
125

  
126
    """
127
    check_time = self._time_fn()
128
    if (self.last_cleanup + self.CLEANUP_ROUND) <= check_time:
129
      keys = []
130
      for key, value in self.cache.items():
131
        if not value[TTL]:
132
          continue
133

  
134
        expired = value[TIMESTAMP] + value[TTL]
135
        if expired < check_time:
136
          keys.append(key)
137
      self._UnlockedInvalidate(keys)
138
      self.last_cleanup = check_time
139

  
140
  @locking.ssynchronized(_LOCK)
141
  def Cleanup(self):
142
    """Cleanup our cache.
143

  
144
    """
145
    self._UnlockedCleanup()
146

  
147
  @locking.ssynchronized(_LOCK)
148
  def Store(self, key, value, ttl=0):
149
    """Stores a value at key in the cache.
150

  
151
    See L{CacheBase.Store} for parameter description
152

  
153
    """
154
    assert ttl >= 0
155
    self._UnlockedCleanup()
156
    val = serializer.Dump(value)
157
    cache_val = {
158
      TIMESTAMP: self._time_fn(),
159
      TTL: ttl,
160
      VALUE: val
161
      }
162
    self.cache[key] = cache_val
163
    return True
164

  
165
  @locking.ssynchronized(_LOCK, shared=1)
166
  def GetMulti(self, keys):
167
    """Retrieve the values of keys from cache.
168

  
169
    See L{CacheBase.GetMulti} for parameter description
170

  
171
    """
172
    return [self._ExtractValue(key) for key in keys]
173

  
174
  @locking.ssynchronized(_LOCK, shared=1)
175
  def Get(self, key):
176
    """Retrieve the value of key from cache.
177

  
178
    See L{CacheBase.Get} for parameter description
179

  
180
    """
181
    return self._ExtractValue(key)
182

  
183
  @locking.ssynchronized(_LOCK)
184
  def Invalidate(self, keys):
185
    """Invalidates value for keys in cache.
186

  
187
    See L{CacheBase.Invalidate} for parameter description
188

  
189
    """
190
    return self._UnlockedInvalidate(keys)
191

  
192
  @locking.ssynchronized(_LOCK)
193
  def Flush(self):
194
    """Invalidates all keys and values in cache.
195

  
196
    See L{CacheBase.Flush} for parameter description
197

  
198
    """
199
    self.cache.clear()
200
    self.last_cleanup = self._time_fn()
201

  
202
  def _UnlockedInvalidate(self, keys):
203
    """Invalidate keys in cache.
204

  
205
    This is the unlocked version, see L{Invalidate} for parameter description
206

  
207
    """
208
    for key in keys:
209
      self.cache.pop(key, None)
210

  
211
    return True
212

  
213
  def _ExtractValue(self, key):
214
    """Extracts just the value for a key.
215

  
216
    This method is taking care if the value did not expire ans returns it
217

  
218
    @param key: The key to look for
219
    @returns: The value if key is not expired, L{None} otherwise
220

  
221
    """
222
    try:
223
      cache_val = self.cache[key]
224
    except KeyError:
225
      return None
226
    else:
227
      if cache_val[TTL] == 0:
228
        return serializer.Load(cache_val[VALUE])
229
      else:
230
        expired = cache_val[TIMESTAMP] + cache_val[TTL]
231

  
232
        if self._time_fn() <= expired:
233
          return serializer.Load(cache_val[VALUE])
234
        else:
235
          return None
/dev/null
1
#!/usr/bin/python
2
#
3

  
4
# Copyright (C) 2011 Google Inc.
5
#
6
# This program is free software; you can redistribute it and/or modify
7
# it under the terms of the GNU General Public License as published by
8
# the Free Software Foundation; either version 2 of the License, or
9
# (at your option) any later version.
10
#
11
# This program is distributed in the hope that it will be useful, but
12
# WITHOUT ANY WARRANTY; without even the implied warranty of
13
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
14
# General Public License for more details.
15
#
16
# You should have received a copy of the GNU General Public License
17
# along with this program; if not, write to the Free Software
18
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
19
# 02110-1301, USA.
20

  
21
"""Script for testing ganeti.cache"""
22

  
23
import testutils
24
import unittest
25

  
26
from ganeti import cache
27

  
28

  
29
class ReturnStub:
30
  def __init__(self, values):
31
    self.values = values
32

  
33
  def __call__(self):
34
    assert self.values
35
    return self.values.pop(0)
36

  
37

  
38
class SimpleCacheTest(unittest.TestCase):
39
  def setUp(self):
40
    self.cache = cache.SimpleCache()
41

  
42
  def testNoKey(self):
43
    self.assertEqual(self.cache.GetMulti(["i-dont-exist", "neither-do-i", "no"]),
44
                     [None, None, None])
45

  
46
  def testCache(self):
47
    value = 0xc0ffee
48
    self.assert_(self.cache.Store("i-exist", value))
49
    self.assertEqual(self.cache.GetMulti(["i-exist"]), [value])
50

  
51
  def testMixed(self):
52
    value = 0xb4dc0de
53
    self.assert_(self.cache.Store("i-exist", value))
54
    self.assertEqual(self.cache.GetMulti(["i-exist", "i-dont"]), [value, None])
55

  
56
  def testTtl(self):
57
    my_times = ReturnStub([0, 1, 1, 2, 3, 5])
58
    ttl_cache = cache.SimpleCache(_time_fn=my_times)
59
    self.assert_(ttl_cache.Store("test-expire", 0xdeadbeef, ttl=2))
60

  
61
    # At this point time will return 2, 1 (start) + 2 (ttl) = 3, still valid
62
    self.assertEqual(ttl_cache.Get("test-expire"), 0xdeadbeef)
63

  
64
    # At this point time will return 3, 1 (start) + 2 (ttl) = 3, still valid
65
    self.assertEqual(ttl_cache.Get("test-expire"), 0xdeadbeef)
66

  
67
    # We are at 5, < 3, invalid
68
    self.assertEqual(ttl_cache.Get("test-expire"), None)
69
    self.assertFalse(my_times.values)
70

  
71
  def testCleanup(self):
72
    my_times = ReturnStub([0, 1, 1, 2, 2, 3, 3, 5, 5,
73
                           21 + cache.SimpleCache.CLEANUP_ROUND,
74
                           34 + cache.SimpleCache.CLEANUP_ROUND,
75
                           55 + cache.SimpleCache.CLEANUP_ROUND * 2,
76
                           89 + cache.SimpleCache.CLEANUP_ROUND * 3])
77
    # Index 0
78
    ttl_cache = cache.SimpleCache(_time_fn=my_times)
79
    # Index 1, 2
80
    self.assert_(ttl_cache.Store("foobar", 0x1dea, ttl=6))
81
    # Index 3, 4
82
    self.assert_(ttl_cache.Store("baz", 0xc0dea55, ttl=11))
83
    # Index 6, 7
84
    self.assert_(ttl_cache.Store("long-foobar", "pretty long",
85
                                 ttl=(22 + cache.SimpleCache.CLEANUP_ROUND)))
86
    # Index 7, 8
87
    self.assert_(ttl_cache.Store("foobazbar", "alive forever"))
88

  
89
    self.assertEqual(set(ttl_cache.cache.keys()),
90
                     set(["foobar", "baz", "long-foobar", "foobazbar"]))
91
    ttl_cache.Cleanup()
92
    self.assertEqual(set(ttl_cache.cache.keys()),
93
                     set(["long-foobar", "foobazbar"]))
94
    ttl_cache.Cleanup()
95
    self.assertEqual(set(ttl_cache.cache.keys()),
96
                     set(["long-foobar", "foobazbar"]))
97
    ttl_cache.Cleanup()
98
    self.assertEqual(set(ttl_cache.cache.keys()), set(["foobazbar"]))
99
    ttl_cache.Cleanup()
100
    self.assertEqual(set(ttl_cache.cache.keys()), set(["foobazbar"]))
101

  
102

  
103
if __name__ == "__main__":
104
  testutils.GanetiTestProgram()

Also available in: Unified diff