Statistics
| Branch: | Tag: | Revision:

root / man / hspace.rst @ 756df409

History | View | Annotate | Download (14.8 kB)

1
HSPACE(1) Ganeti | Version @GANETI_VERSION@
2
===========================================
3

    
4
NAME
5
----
6

    
7
hspace - Cluster space analyzer for Ganeti
8

    
9
SYNOPSIS
10
--------
11

    
12
**hspace** {backend options...} [algorithm options...] [request options...]
13
[ -p [*fields*] ] [-v... | -q]
14

    
15
**hspace** --version
16

    
17
Backend options:
18

    
19
{ **-m** *cluster* | **-L[** *path* **] [-X]** | **-t** *data-file* |
20
**--simulate** *spec* }
21

    
22

    
23
Algorithm options:
24

    
25
**[ --max-cpu *cpu-ratio* ]**
26
**[ --min-disk *disk-ratio* ]**
27
**[ -O *name...* ]**
28

    
29

    
30
Request options:
31

    
32
**[--memory** *mem* **]**
33
**[--disk** *disk* **]**
34
**[--disk-template** *template* **]**
35
**[--vcpus** *vcpus* **]**
36
**[--tiered-alloc** *spec* **]**
37

    
38

    
39
DESCRIPTION
40
-----------
41

    
42

    
43
hspace computes how many additional instances can be fit on a cluster,
44
while maintaining N+1 status.
45

    
46
The program will try to place instances, all of the same size, on the
47
cluster, until the point where we don't have any N+1 possible
48
allocation. It uses the exact same allocation algorithm as the hail
49
iallocator plugin in *allocate* mode.
50

    
51
The output of the program is designed to interpreted as a shell
52
fragment (or parsed as a *key=value* file). Options which extend the
53
output (e.g. -p, -v) will output the additional information on stderr
54
(such that the stdout is still parseable).
55

    
56
The following keys are available in the output of the script (all
57
prefixed with *HTS_*):
58

    
59
SPEC_MEM, SPEC_DSK, SPEC_CPU, SPEC_RQN, SPEC_DISK_TEMPLATE
60
  These represent the specifications of the instance model used for
61
  allocation (the memory, disk, cpu, requested nodes, disk template).
62

    
63
TSPEC_INI_MEM, TSPEC_INI_DSK, TSPEC_INI_CPU
64
  Only defined when the tiered mode allocation is enabled, these are
65
  similar to the above specifications but show the initial starting spec
66
  for tiered allocation.
67

    
68
CLUSTER_MEM, CLUSTER_DSK, CLUSTER_CPU, CLUSTER_NODES
69
  These represent the total memory, disk, CPU count and total nodes in
70
  the cluster.
71

    
72
INI_SCORE, FIN_SCORE
73
  These are the initial (current) and final cluster score (see the hbal
74
  man page for details about the scoring algorithm).
75

    
76
INI_INST_CNT, FIN_INST_CNT
77
  The initial and final instance count.
78

    
79
INI_MEM_FREE, FIN_MEM_FREE
80
  The initial and final total free memory in the cluster (but this
81
  doesn't necessarily mean available for use).
82

    
83
INI_MEM_AVAIL, FIN_MEM_AVAIL
84
  The initial and final total available memory for allocation in the
85
  cluster. If allocating redundant instances, new instances could
86
  increase the reserved memory so it doesn't necessarily mean the
87
  entirety of this memory can be used for new instance allocations.
88

    
89
INI_MEM_RESVD, FIN_MEM_RESVD
90
  The initial and final reserved memory (for redundancy/N+1 purposes).
91

    
92
INI_MEM_INST, FIN_MEM_INST
93
  The initial and final memory used for instances (actual runtime used
94
  RAM).
95

    
96
INI_MEM_OVERHEAD, FIN_MEM_OVERHEAD
97
  The initial and final memory overhead--memory used for the node
98
  itself and unacounted memory (e.g. due to hypervisor overhead).
99

    
100
INI_MEM_EFF, HTS_INI_MEM_EFF
101
  The initial and final memory efficiency, represented as instance
102
  memory divided by total memory.
103

    
104
INI_DSK_FREE, INI_DSK_AVAIL, INI_DSK_RESVD, INI_DSK_INST, INI_DSK_EFF
105
  Initial disk stats, similar to the memory ones.
106

    
107
FIN_DSK_FREE, FIN_DSK_AVAIL, FIN_DSK_RESVD, FIN_DSK_INST, FIN_DSK_EFF
108
  Final disk stats, similar to the memory ones.
109

    
110
INI_CPU_INST, FIN_CPU_INST
111
  Initial and final number of virtual CPUs used by instances.
112

    
113
INI_CPU_EFF, FIN_CPU_EFF
114
  The initial and final CPU efficiency, represented as the count of
115
  virtual instance CPUs divided by the total physical CPU count.
116

    
117
INI_MNODE_MEM_AVAIL, FIN_MNODE_MEM_AVAIL
118
  The initial and final maximum per-node available memory. This is not
119
  very useful as a metric but can give an impression of the status of
120
  the nodes; as an example, this value restricts the maximum instance
121
  size that can be still created on the cluster.
122

    
123
INI_MNODE_DSK_AVAIL, FIN_MNODE_DSK_AVAIL
124
  Like the above but for disk.
125

    
126
TSPEC
127
  If the tiered allocation mode has been enabled, this parameter holds
128
  the pairs of specifications and counts of instances that can be
129
  created in this mode. The value of the key is a space-separated list
130
  of values; each value is of the form *memory,disk,vcpu=count* where
131
  the memory, disk and vcpu are the values for the current spec, and
132
  count is how many instances of this spec can be created. A complete
133
  value for this variable could be: **4096,102400,2=225
134
  2560,102400,2=20 512,102400,2=21**.
135

    
136
KM_USED_CPU, KM_USED_NPU, KM_USED_MEM, KM_USED_DSK
137
  These represents the metrics of used resources at the start of the
138
  computation (only for tiered allocation mode). The NPU value is
139
  "normalized" CPU count, i.e. the number of virtual CPUs divided by
140
  the maximum ratio of the virtual to physical CPUs.
141

    
142
KM_POOL_CPU, KM_POOL_NPU, KM_POOL_MEM, KM_POOL_DSK
143
  These represents the total resources allocated during the tiered
144
  allocation process. In effect, they represent how much is readily
145
  available for allocation.
146

    
147
KM_UNAV_CPU, KM_POOL_NPU, KM_UNAV_MEM, KM_UNAV_DSK
148
  These represents the resources left over (either free as in
149
  unallocable or allocable on their own) after the tiered allocation
150
  has been completed. They represent better the actual unallocable
151
  resources, because some other resource has been exhausted. For
152
  example, the cluster might still have 100GiB disk free, but with no
153
  memory left for instances, we cannot allocate another instance, so
154
  in effect the disk space is unallocable. Note that the CPUs here
155
  represent instance virtual CPUs, and in case the *--max-cpu* option
156
  hasn't been specified this will be -1.
157

    
158
ALLOC_USAGE
159
  The current usage represented as initial number of instances divided
160
  per final number of instances.
161

    
162
ALLOC_COUNT
163
  The number of instances allocated (delta between FIN_INST_CNT and
164
  INI_INST_CNT).
165

    
166
ALLOC_FAIL*_CNT
167
  For the last attemp at allocations (which would have increased
168
  FIN_INST_CNT with one, if it had succeeded), this is the count of
169
  the failure reasons per failure type; currently defined are FAILMEM,
170
  FAILDISK and FAILCPU which represent errors due to not enough
171
  memory, disk and CPUs, and FAILN1 which represents a non N+1
172
  compliant cluster on which we can't allocate instances at all.
173

    
174
ALLOC_FAIL_REASON
175
  The reason for most of the failures, being one of the above FAIL*
176
  strings.
177

    
178
OK
179
  A marker representing the successful end of the computation, and
180
  having value "1". If this key is not present in the output it means
181
  that the computation failed and any values present should not be
182
  relied upon.
183

    
184
If the tiered allocation mode is enabled, then many of the INI_/FIN_
185
metrics will be also displayed with a TRL_ prefix, and denote the
186
cluster status at the end of the tiered allocation run.
187

    
188
OPTIONS
189
-------
190

    
191
The options that can be passed to the program are as follows:
192

    
193
--memory *mem*
194
  The memory size of the instances to be placed (defaults to
195
  4GiB). Units can be used (see below for more details).
196

    
197
--disk *disk*
198
  The disk size of the instances to be placed (defaults to
199
  100GiB). Units can be used.
200

    
201
--disk-template *template*
202
  The disk template for the instance; one of the Ganeti disk templates
203
  (e.g. plain, drbd, so on) should be passed in.
204

    
205
--vcpus *vcpus*
206
  The number of VCPUs of the instances to be placed (defaults to 1).
207

    
208
--max-cpu=*cpu-ratio*
209
  The maximum virtual to physical cpu ratio, as a floating point number
210
  greater than or equal to one. For example, specifying *cpu-ratio* as
211
  **2.5** means that, for a 4-cpu machine, a maximum of 10 virtual cpus
212
  should be allowed to be in use for primary instances. A value of
213
  exactly one means there will be no over-subscription of CPU (except
214
  for the CPU time used by the node itself), and values below one do not
215
  make sense, as that means other resources (e.g. disk) won't be fully
216
  utilised due to CPU restrictions.
217

    
218
--min-disk=*disk-ratio*
219
  The minimum amount of free disk space remaining, as a floating point
220
  number. For example, specifying *disk-ratio* as **0.25** means that
221
  at least one quarter of disk space should be left free on nodes.
222

    
223
-p, --print-nodes
224
  Prints the before and after node status, in a format designed to
225
  allow the user to understand the node's most important parameters.
226

    
227
  It is possible to customise the listed information by passing a
228
  comma-separated list of field names to this option (the field list
229
  is currently undocumented), or to extend the default field list by
230
  prefixing the additional field list with a plus sign. By default,
231
  the node list will contain the following information:
232

    
233
  F
234
    a character denoting the status of the node, with '-' meaning an
235
    offline node, '*' meaning N+1 failure and blank meaning a good
236
    node
237

    
238
  Name
239
    the node name
240

    
241
  t_mem
242
    the total node memory
243

    
244
  n_mem
245
    the memory used by the node itself
246

    
247
  i_mem
248
    the memory used by instances
249

    
250
  x_mem
251
    amount memory which seems to be in use but cannot be determined
252
    why or by which instance; usually this means that the hypervisor
253
    has some overhead or that there are other reporting errors
254

    
255
  f_mem
256
    the free node memory
257

    
258
  r_mem
259
    the reserved node memory, which is the amount of free memory
260
    needed for N+1 compliance
261

    
262
  t_dsk
263
    total disk
264

    
265
  f_dsk
266
    free disk
267

    
268
  pcpu
269
    the number of physical cpus on the node
270

    
271
  vcpu
272
    the number of virtual cpus allocated to primary instances
273

    
274
  pcnt
275
    number of primary instances
276

    
277
  scnt
278
    number of secondary instances
279

    
280
  p_fmem
281
    percent of free memory
282

    
283
  p_fdsk
284
    percent of free disk
285

    
286
  r_cpu
287
    ratio of virtual to physical cpus
288

    
289
  lCpu
290
    the dynamic CPU load (if the information is available)
291

    
292
  lMem
293
    the dynamic memory load (if the information is available)
294

    
295
  lDsk
296
    the dynamic disk load (if the information is available)
297

    
298
  lNet
299
    the dynamic net load (if the information is available)
300

    
301
-O *name*
302
  This option (which can be given multiple times) will mark nodes as
303
  being *offline*. This means a couple of things:
304

    
305
  - instances won't be placed on these nodes, not even temporarily;
306
    e.g. the *replace primary* move is not available if the secondary
307
    node is offline, since this move requires a failover.
308
  - these nodes will not be included in the score calculation (except
309
    for the percentage of instances on offline nodes)
310

    
311
  Note that the algorithm will also mark as offline any nodes which
312
  are reported by RAPI as such, or that have "?" in file-based input
313
  in any numeric fields.
314

    
315
-t *datafile*, --text-data=*datafile*
316
  The name of the file holding node and instance information (if not
317
  collecting via RAPI or LUXI). This or one of the other backends must
318
  be selected.
319

    
320
-S *filename*, --save-cluster=*filename*
321
  If given, the state of the cluster at the end of the allocation is
322
  saved to a file named *filename.alloc*, and if tiered allocation is
323
  enabled, the state after tiered allocation will be saved to
324
  *filename.tiered*. This allows re-feeding the cluster state to
325
  either hspace itself (with different parameters) or for example
326
  hbal.
327

    
328
-m *cluster*
329
 Collect data directly from the *cluster* given as an argument via
330
 RAPI. If the argument doesn't contain a colon (:), then it is
331
 converted into a fully-built URL via prepending ``https://`` and
332
 appending the default RAPI port, otherwise it's considered a
333
 fully-specified URL and is used as-is.
334

    
335
-L [*path*]
336
  Collect data directly from the master daemon, which is to be
337
  contacted via the luxi (an internal Ganeti protocol). An optional
338
  *path* argument is interpreted as the path to the unix socket on
339
  which the master daemon listens; otherwise, the default path used by
340
  ganeti when installed with *--localstatedir=/var* is used.
341

    
342
--simulate *description*
343
  Instead of using actual data, build an empty cluster given a node
344
  description. The *description* parameter must be a comma-separated
345
  list of five elements, describing in order:
346

    
347
  - the allocation policy for this node group
348
  - the number of nodes in the cluster
349
  - the disk size of the nodes (default in mebibytes, units can be used)
350
  - the memory size of the nodes (default in mebibytes, units can be used)
351
  - the cpu core count for the nodes
352

    
353
  An example description would be **preferred,B20,100G,16g,4**
354
  describing a 20-node cluster where each node has 100GB of disk
355
  space, 16GiB of memory and 4 CPU cores. Note that all nodes must
356
  have the same specs currently.
357

    
358
  This option can be given multiple times, and each new use defines a
359
  new node group. Hence different node groups can have different
360
  allocation policies and node count/specifications.
361

    
362
--tiered-alloc *spec*
363
  Besides the standard, fixed-size allocation, also do a tiered
364
  allocation scheme where the algorithm starts from the given
365
  specification and allocates until there is no more space; then it
366
  decreases the specification and tries the allocation again. The
367
  decrease is done on the matric that last failed during
368
  allocation. The specification given is similar to the *--simulate*
369
  option and it holds:
370

    
371
  - the disk size of the instance (units can be used)
372
  - the memory size of the instance (units can be used)
373
  - the vcpu count for the insance
374

    
375
  An example description would be *100G,4g,2* describing an initial
376
  starting specification of 100GB of disk space, 4GiB of memory and 2
377
  VCPUs.
378

    
379
  Also note that the normal allocation and the tiered allocation are
380
  independent, and both start from the initial cluster state; as such,
381
  the instance count for these two modes are not related one to
382
  another.
383

    
384
-v, --verbose
385
  Increase the output verbosity. Each usage of this option will
386
  increase the verbosity (currently more than 2 doesn't make sense)
387
  from the default of one.
388

    
389
-q, --quiet
390
  Decrease the output verbosity. Each usage of this option will
391
  decrease the verbosity (less than zero doesn't make sense) from the
392
  default of one.
393

    
394
-V, --version
395
  Just show the program version and exit.
396

    
397
UNITS
398
~~~~~
399

    
400
By default, all unit-accepting options use mebibytes. Using the
401
lower-case letters of *m*, *g* and *t* (or their longer equivalents of
402
*mib*, *gib*, *tib*, for which case doesn't matter) explicit binary
403
units can be selected. Units in the SI system can be selected using the
404
upper-case letters of *M*, *G* and *T* (or their longer equivalents of
405
*MB*, *GB*, *TB*, for which case doesn't matter).
406

    
407
More details about the difference between the SI and binary systems can
408
be read in the *units(7)* man page.
409

    
410
EXIT STATUS
411
-----------
412

    
413
The exist status of the command will be zero, unless for some reason
414
the algorithm fatally failed (e.g. wrong node or instance data).
415

    
416
BUGS
417
----
418

    
419
The algorithm is highly dependent on the number of nodes; its runtime
420
grows exponentially with this number, and as such is impractical for
421
really big clusters.
422

    
423
The algorithm doesn't rebalance the cluster or try to get the optimal
424
fit; it just allocates in the best place for the current step, without
425
taking into consideration the impact on future placements.
426

    
427
.. vim: set textwidth=72 :
428
.. Local Variables:
429
.. mode: rst
430
.. fill-column: 72
431
.. End: