Revision 1b9c867c doc/design-2.3.rst

b/doc/design-2.3.rst
33 33
between machines connected to the same switch might be bigger than the
34 34
bandwidth for inter-switch connections.
35 35

  
36
Moreover some operations inside a cluster require all nodes to be locked
36
Moreover, some operations inside a cluster require all nodes to be locked
37 37
together for inter-node consistency, and won't scale if we increase the
38 38
number of nodes to a few hundreds.
39 39

  
......
41 41
~~~~~~~~~~~~~~~~
42 42

  
43 43
With this change we'll divide Ganeti nodes into groups. Nothing will
44
change for clusters with only one node group, the default one. Bigger
45
cluster instead will be able to have more than one group, and each node
46
will belong to exactly one.
44
change for clusters with only one node group. Bigger clusters will be
45
able to have more than one group, and each node will belong to exactly
46
one.
47 47

  
48 48
Node group management
49 49
+++++++++++++++++++++
50 50

  
51 51
To manage node groups and the nodes belonging to them, the following new
52
commands/flags will be introduced::
52
commands and flags will be introduced::
53 53

  
54 54
  gnt-node group-add <group> # add a new node group
55 55
  gnt-node group-del <group> # delete an empty group
......
76 76
  - Moving an instance between groups can only happen via an explicit
77 77
    operation, which for example in the case of DRBD will work by
78 78
    performing internally a replace-disks, a migration, and a second
79
    replace-disks. It will be possible to cleanup an interrupted
79
    replace-disks. It will be possible to clean up an interrupted
80 80
    group-move operation.
81 81
  - Cluster verify will signal an error if an instance has been left
82 82
    mid-transition between groups.
83
  - Intra-group instance migration/failover will check that the target
83
  - Inter-group instance migration/failover will check that the target
84 84
    group will be able to accept the instance network/storage wise, and
85 85
    fail otherwise. In the future we may be able to make some parameter
86 86
    changed during the move, but in the first version we expect an
......
99 99
We expect the following changes for cluster management:
100 100

  
101 101
  - Frequent multinode operations, such as os-diagnose or cluster-verify
102
    will act one group at a time. The default group will be used if none
102
    will act on one group at a time. The default group will be used if none
103 103
    is passed. Command line tools will have a way to easily target all
104 104
    groups, by generating one job per group.
105 105
  - Groups will have a human-readable name, but will internally always
......
132 132
should we see it's useful.
133 133

  
134 134
We envision groups as a good place to enhance cluster scalability. In
135
the future we may want to use them ad units for configuration diffusion,
135
the future we may want to use them as units for configuration diffusion,
136 136
to allow a better master scalability. For example it could be possible
137 137
to change some all-nodes RPCs to contact each group once, from the
138 138
master, and make one node in the group perform internal diffusion. We
......
195 195
- the total node memory, CPU count are very seldom changing; the total
196 196
  node disk space is also slow changing, but can change at runtime; the
197 197
  free memory and free disk will change significantly for some jobs, but
198
  on a short timescale; in general, these values will mostly “constant”
198
  on a short timescale; in general, these values will be mostly “constant”
199 199
  during the lifetime of a job
200 200
- we already have a periodic set of jobs that query the node and
201 201
  instance state, driven the by :command:`ganeti-watcher` command, and
202 202
  we're just discarding the results after acting on them
203 203

  
204
Given the above, it makes sense to cache inside the master daemon the
205
results of node and instance state (with a focus on the node state).
204
Given the above, it makes sense to cache the results of node and instance
205
state (with a focus on the node state) inside the master daemon.
206 206

  
207 207
The cache will not be serialised to disk, and will be for the most part
208 208
transparent to the outside of the master daemon.
......
228 228
consistent). Partial results will not update the cache (see next
229 229
paragraph).
230 230

  
231
Since the there will be no way to feed the cache from outside, and we
231
Since there will be no way to feed the cache from outside, and we
232 232
would like to have a consistent cache view when driven by the watcher,
233 233
we'll introduce a new OpCode/LU for the watcher to run, instead of the
234 234
current separate opcodes (see below in the watcher section).
......
278 278
allocation on one group from exclusive blocking jobs on other node
279 279
groups.
280 280

  
281
The capacity calculations will also use the cache—this is detailed in
281
The capacity calculations will also use the cache. This is detailed in
282 282
the respective sections.
283 283

  
284 284
Watcher operation
......
406 406

  
407 407
This method will feed the cluster state (for the complete set of node
408 408
group, or alternative just a subset) to the iallocator plugin (either
409
the specified one, or the default is none is specified), and return the
409
the specified one, or the default if none is specified), and return the
410 410
new capacity in the format currently exported by the htools suite and
411 411
known as the “tiered specs” (see :manpage:`hspace(1)`).
412 412

  

Also available in: Unified diff