Statistics
| Branch: | Tag: | Revision:

root / man / hspace.rst @ d7731f51

History | View | Annotate | Download (13.2 kB)

1
HSPACE(1) Ganeti | Version @GANETI_VERSION@
2
===========================================
3

    
4
NAME
5
----
6

    
7
hspace - Cluster space analyzer for Ganeti
8

    
9
SYNOPSIS
10
--------
11

    
12
**hspace** {backend options...} [algorithm options...] [request options...]
13
[ -p [*fields*] ] [-v... | -q]
14

    
15
**hspace** --version
16

    
17
Backend options:
18

    
19
{ **-m** *cluster* | **-L[** *path* **] [-X]** | **-t** *data-file* |
20
**--simulate** *spec* }
21

    
22

    
23
Algorithm options:
24

    
25
**[ --max-cpu *cpu-ratio* ]**
26
**[ --min-disk *disk-ratio* ]**
27
**[ -O *name...* ]**
28

    
29

    
30
Request options:
31

    
32
**[--memory** *mem* **]**
33
**[--disk** *disk* **]**
34
**[--disk-template** *template* **]**
35
**[--vcpus** *vcpus* **]**
36
**[--tiered-alloc** *spec* **]**
37

    
38

    
39
DESCRIPTION
40
-----------
41

    
42

    
43
hspace computes how many additional instances can be fit on a cluster,
44
while maintaining N+1 status.
45

    
46
The program will try to place instances, all of the same size, on the
47
cluster, until the point where we don't have any N+1 possible
48
allocation. It uses the exact same allocation algorithm as the hail
49
iallocator plugin in *allocate* mode.
50

    
51
The output of the program is designed to interpreted as a shell
52
fragment (or parsed as a *key=value* file). Options which extend the
53
output (e.g. -p, -v) will output the additional information on stderr
54
(such that the stdout is still parseable).
55

    
56
The following keys are available in the output of the script (all
57
prefixed with *HTS_*):
58

    
59
SPEC_MEM, SPEC_DSK, SPEC_CPU, SPEC_RQN, SPEC_DISK_TEMPLATE
60
  These represent the specifications of the instance model used for
61
  allocation (the memory, disk, cpu, requested nodes, disk template).
62

    
63
TSPEC_INI_MEM, TSPEC_INI_DSK, TSPEC_INI_CPU
64
  Only defined when the tiered mode allocation is enabled, these are
65
  similar to the above specifications but show the initial starting spec
66
  for tiered allocation.
67

    
68
CLUSTER_MEM, CLUSTER_DSK, CLUSTER_CPU, CLUSTER_NODES
69
  These represent the total memory, disk, CPU count and total nodes in
70
  the cluster.
71

    
72
INI_SCORE, FIN_SCORE
73
  These are the initial (current) and final cluster score (see the hbal
74
  man page for details about the scoring algorithm).
75

    
76
INI_INST_CNT, FIN_INST_CNT
77
  The initial and final instance count.
78

    
79
INI_MEM_FREE, FIN_MEM_FREE
80
  The initial and final total free memory in the cluster (but this
81
  doesn't necessarily mean available for use).
82

    
83
INI_MEM_AVAIL, FIN_MEM_AVAIL
84
  The initial and final total available memory for allocation in the
85
  cluster. If allocating redundant instances, new instances could
86
  increase the reserved memory so it doesn't necessarily mean the
87
  entirety of this memory can be used for new instance allocations.
88

    
89
INI_MEM_RESVD, FIN_MEM_RESVD
90
  The initial and final reserved memory (for redundancy/N+1 purposes).
91

    
92
INI_MEM_INST, FIN_MEM_INST
93
  The initial and final memory used for instances (actual runtime used
94
  RAM).
95

    
96
INI_MEM_OVERHEAD, FIN_MEM_OVERHEAD
97
  The initial and final memory overhead--memory used for the node
98
  itself and unacounted memory (e.g. due to hypervisor overhead).
99

    
100
INI_MEM_EFF, HTS_INI_MEM_EFF
101
  The initial and final memory efficiency, represented as instance
102
  memory divided by total memory.
103

    
104
INI_DSK_FREE, INI_DSK_AVAIL, INI_DSK_RESVD, INI_DSK_INST, INI_DSK_EFF
105
  Initial disk stats, similar to the memory ones.
106

    
107
FIN_DSK_FREE, FIN_DSK_AVAIL, FIN_DSK_RESVD, FIN_DSK_INST, FIN_DSK_EFF
108
  Final disk stats, similar to the memory ones.
109

    
110
INI_CPU_INST, FIN_CPU_INST
111
  Initial and final number of virtual CPUs used by instances.
112

    
113
INI_CPU_EFF, FIN_CPU_EFF
114
  The initial and final CPU efficiency, represented as the count of
115
  virtual instance CPUs divided by the total physical CPU count.
116

    
117
INI_MNODE_MEM_AVAIL, FIN_MNODE_MEM_AVAIL
118
  The initial and final maximum per-node available memory. This is not
119
  very useful as a metric but can give an impression of the status of
120
  the nodes; as an example, this value restricts the maximum instance
121
  size that can be still created on the cluster.
122

    
123
INI_MNODE_DSK_AVAIL, FIN_MNODE_DSK_AVAIL
124
  Like the above but for disk.
125

    
126
TSPEC
127
  If the tiered allocation mode has been enabled, this parameter holds
128
  the pairs of specifications and counts of instances that can be
129
  created in this mode. The value of the key is a space-separated list
130
  of values; each value is of the form *memory,disk,vcpu=count* where
131
  the memory, disk and vcpu are the values for the current spec, and
132
  count is how many instances of this spec can be created. A complete
133
  value for this variable could be: **4096,102400,2=225
134
  2560,102400,2=20 512,102400,2=21**.
135

    
136
KM_USED_CPU, KM_USED_NPU, KM_USED_MEM, KM_USED_DSK
137
  These represents the metrics of used resources at the start of the
138
  computation (only for tiered allocation mode). The NPU value is
139
  "normalized" CPU count, i.e. the number of virtual CPUs divided by
140
  the maximum ratio of the virtual to physical CPUs.
141

    
142
KM_POOL_CPU, KM_POOL_NPU, KM_POOL_MEM, KM_POOL_DSK
143
  These represents the total resources allocated during the tiered
144
  allocation process. In effect, they represent how much is readily
145
  available for allocation.
146

    
147
KM_UNAV_CPU, KM_POOL_NPU, KM_UNAV_MEM, KM_UNAV_DSK
148
  These represents the resources left over (either free as in
149
  unallocable or allocable on their own) after the tiered allocation
150
  has been completed. They represent better the actual unallocable
151
  resources, because some other resource has been exhausted. For
152
  example, the cluster might still have 100GiB disk free, but with no
153
  memory left for instances, we cannot allocate another instance, so
154
  in effect the disk space is unallocable. Note that the CPUs here
155
  represent instance virtual CPUs, and in case the *--max-cpu* option
156
  hasn't been specified this will be -1.
157

    
158
ALLOC_USAGE
159
  The current usage represented as initial number of instances divided
160
  per final number of instances.
161

    
162
ALLOC_COUNT
163
  The number of instances allocated (delta between FIN_INST_CNT and
164
  INI_INST_CNT).
165

    
166
ALLOC_FAIL*_CNT
167
  For the last attemp at allocations (which would have increased
168
  FIN_INST_CNT with one, if it had succeeded), this is the count of
169
  the failure reasons per failure type; currently defined are FAILMEM,
170
  FAILDISK and FAILCPU which represent errors due to not enough
171
  memory, disk and CPUs, and FAILN1 which represents a non N+1
172
  compliant cluster on which we can't allocate instances at all.
173

    
174
ALLOC_FAIL_REASON
175
  The reason for most of the failures, being one of the above FAIL*
176
  strings.
177

    
178
OK
179
  A marker representing the successful end of the computation, and
180
  having value "1". If this key is not present in the output it means
181
  that the computation failed and any values present should not be
182
  relied upon.
183

    
184
If the tiered allocation mode is enabled, then many of the INI_/FIN_
185
metrics will be also displayed with a TRL_ prefix, and denote the
186
cluster status at the end of the tiered allocation run.
187

    
188
OPTIONS
189
-------
190

    
191
The options that can be passed to the program are as follows:
192

    
193
--memory *mem*
194
  The memory size of the instances to be placed (defaults to
195
  4GiB). Units can be used (see below for more details).
196

    
197
--disk *disk*
198
  The disk size of the instances to be placed (defaults to
199
  100GiB). Units can be used.
200

    
201
--disk-template *template*
202
  The disk template for the instance; one of the Ganeti disk templates
203
  (e.g. plain, drbd, so on) should be passed in.
204

    
205
--vcpus *vcpus*
206
  The number of VCPUs of the instances to be placed (defaults to 1).
207

    
208
--max-cpu=*cpu-ratio*
209
  The maximum virtual to physical cpu ratio, as a floating point number
210
  greater than or equal to one. For example, specifying *cpu-ratio* as
211
  **2.5** means that, for a 4-cpu machine, a maximum of 10 virtual cpus
212
  should be allowed to be in use for primary instances. A value of
213
  exactly one means there will be no over-subscription of CPU (except
214
  for the CPU time used by the node itself), and values below one do not
215
  make sense, as that means other resources (e.g. disk) won't be fully
216
  utilised due to CPU restrictions.
217

    
218
--min-disk=*disk-ratio*
219
  The minimum amount of free disk space remaining, as a floating point
220
  number. For example, specifying *disk-ratio* as **0.25** means that
221
  at least one quarter of disk space should be left free on nodes.
222

    
223
-p, --print-nodes
224
  Prints the before and after node status, in a format designed to allow
225
  the user to understand the node's most important parameters. See the
226
  man page **htools**(1) for more details about this option.
227

    
228
-O *name*
229
  This option (which can be given multiple times) will mark nodes as
230
  being *offline*. This means a couple of things:
231

    
232
  - instances won't be placed on these nodes, not even temporarily;
233
    e.g. the *replace primary* move is not available if the secondary
234
    node is offline, since this move requires a failover.
235
  - these nodes will not be included in the score calculation (except
236
    for the percentage of instances on offline nodes)
237

    
238
  Note that the algorithm will also mark as offline any nodes which
239
  are reported by RAPI as such, or that have "?" in file-based input
240
  in any numeric fields.
241

    
242
-t *datafile*, --text-data=*datafile*
243
  The name of the file holding node and instance information (if not
244
  collecting via RAPI or LUXI). This or one of the other backends must
245
  be selected.
246

    
247
-S *filename*, --save-cluster=*filename*
248
  If given, the state of the cluster at the end of the allocation is
249
  saved to a file named *filename.alloc*, and if tiered allocation is
250
  enabled, the state after tiered allocation will be saved to
251
  *filename.tiered*. This allows re-feeding the cluster state to
252
  either hspace itself (with different parameters) or for example
253
  hbal.
254

    
255
-m *cluster*
256
 Collect data directly from the *cluster* given as an argument via
257
 RAPI. If the argument doesn't contain a colon (:), then it is
258
 converted into a fully-built URL via prepending ``https://`` and
259
 appending the default RAPI port, otherwise it's considered a
260
 fully-specified URL and is used as-is.
261

    
262
-L [*path*]
263
  Collect data directly from the master daemon, which is to be
264
  contacted via the luxi (an internal Ganeti protocol). An optional
265
  *path* argument is interpreted as the path to the unix socket on
266
  which the master daemon listens; otherwise, the default path used by
267
  ganeti when installed with *--localstatedir=/var* is used.
268

    
269
--simulate *description*
270
  Instead of using actual data, build an empty cluster given a node
271
  description. The *description* parameter must be a comma-separated
272
  list of five elements, describing in order:
273

    
274
  - the allocation policy for this node group
275
  - the number of nodes in the cluster
276
  - the disk size of the nodes (default in mebibytes, units can be used)
277
  - the memory size of the nodes (default in mebibytes, units can be used)
278
  - the cpu core count for the nodes
279

    
280
  An example description would be **preferred,B20,100G,16g,4**
281
  describing a 20-node cluster where each node has 100GB of disk
282
  space, 16GiB of memory and 4 CPU cores. Note that all nodes must
283
  have the same specs currently.
284

    
285
  This option can be given multiple times, and each new use defines a
286
  new node group. Hence different node groups can have different
287
  allocation policies and node count/specifications.
288

    
289
--tiered-alloc *spec*
290
  Besides the standard, fixed-size allocation, also do a tiered
291
  allocation scheme where the algorithm starts from the given
292
  specification and allocates until there is no more space; then it
293
  decreases the specification and tries the allocation again. The
294
  decrease is done on the matric that last failed during
295
  allocation. The specification given is similar to the *--simulate*
296
  option and it holds:
297

    
298
  - the disk size of the instance (units can be used)
299
  - the memory size of the instance (units can be used)
300
  - the vcpu count for the insance
301

    
302
  An example description would be *100G,4g,2* describing an initial
303
  starting specification of 100GB of disk space, 4GiB of memory and 2
304
  VCPUs.
305

    
306
  Also note that the normal allocation and the tiered allocation are
307
  independent, and both start from the initial cluster state; as such,
308
  the instance count for these two modes are not related one to
309
  another.
310

    
311
-v, --verbose
312
  Increase the output verbosity. Each usage of this option will
313
  increase the verbosity (currently more than 2 doesn't make sense)
314
  from the default of one.
315

    
316
-q, --quiet
317
  Decrease the output verbosity. Each usage of this option will
318
  decrease the verbosity (less than zero doesn't make sense) from the
319
  default of one.
320

    
321
-V, --version
322
  Just show the program version and exit.
323

    
324
UNITS
325
~~~~~
326

    
327
By default, all unit-accepting options use mebibytes. Using the
328
lower-case letters of *m*, *g* and *t* (or their longer equivalents of
329
*mib*, *gib*, *tib*, for which case doesn't matter) explicit binary
330
units can be selected. Units in the SI system can be selected using the
331
upper-case letters of *M*, *G* and *T* (or their longer equivalents of
332
*MB*, *GB*, *TB*, for which case doesn't matter).
333

    
334
More details about the difference between the SI and binary systems can
335
be read in the *units(7)* man page.
336

    
337
EXIT STATUS
338
-----------
339

    
340
The exist status of the command will be zero, unless for some reason
341
the algorithm fatally failed (e.g. wrong node or instance data).
342

    
343
BUGS
344
----
345

    
346
The algorithm is highly dependent on the number of nodes; its runtime
347
grows exponentially with this number, and as such is impractical for
348
really big clusters.
349

    
350
The algorithm doesn't rebalance the cluster or try to get the optimal
351
fit; it just allocates in the best place for the current step, without
352
taking into consideration the impact on future placements.
353

    
354
.. vim: set textwidth=72 :
355
.. Local Variables:
356
.. mode: rst
357
.. fill-column: 72
358
.. End: