code.grnet.gr Git - ganeti-local/blob - hspace.1

   1 .TH HSPACE 1 2009-06-01 htools "Ganeti H-tools"
   2 .SH NAME
   3 hspace \- Cluster space analyzer for Ganeti
   4
   5 .SH SYNOPSIS
   6 .B hspace
   7 .B "[backend options...]"
   8 .B "[algorithm options...]"
   9 .B "[request options..."]
  10 .BI "[ -p[" fields "] ]"
  11 .B "[-v... | -q]"
  12
  13 .B hspace
  14 .B --version
  15
  16 .TP
  17 Backend options:
  18 .BI " -m " cluster
  19 |
  20 .BI " -L[" path "]"
  21 |
  22 .BI " -t " data-file
  23 |
  24 .BI " --simulate " spec
  25
  26 .TP
  27 Algorithm options:
  28 .BI "[ --max-cpu " cpu-ratio " ]"
  29 .BI "[ --min-disk " disk-ratio " ]"
  30 .BI "[ -O " name... " ]"
  31
  32 .TP
  33 Request options:
  34 .BI "[--memory " mem "]"
  35 .BI "[--disk " disk "]"
  36 .BI "[--req-nodes " req-nodes "]"
  37 .BI "[--vcpus " vcpus "]"
  38 .BI "[--tiered-alloc " spec "]"
  39
  40
  41 .SH DESCRIPTION
  42 hspace computes how many additional instances can be fit on a cluster,
  43 while maintaining N+1 status.
  44
  45 The program will try to place instances, all of the same size, on the
  46 cluster, until the point where we don't have any N+1 possible
  47 allocation. It uses the exact same allocation algorithm as the hail
  48 iallocator plugin.
  49
  50 The output of the program is designed to interpreted as a shell
  51 fragment (or parsed as a \fIkey=value\fR file). Options which extend
  52 the output (e.g. \-p, \-v) will output the additional information on
  53 stderr (such that the stdout is still parseable).
  54
  55 The following keys are available in the output of the script (all
  56 prefixed with \fIHTS_\fR):
  57 .TP
  58 .I SPEC_MEM, SPEC_DSK, SPEC_CPU, SPEC_RQN
  59 These represent the specifications of the instance model used for
  60 allocation (the memory, disk, cpu, requested nodes).
  61
  62 .TP
  63 .I CLUSTER_MEM, CLUSTER_DSK, CLUSTER_CPU, CLUSTER_NODES
  64 These represent the total memory, disk, CPU count and total nodes in
  65 the cluster.
  66
  67 .TP
  68 .I INI_SCORE, FIN_SCORE
  69 These are the initial (current) and final cluster score (see the hbal
  70 man page for details about the scoring algorithm).
  71
  72 .TP
  73 .I INI_INST_CNT, FIN_INST_CNT
  74 The initial and final instance count.
  75
  76 .TP
  77 .I INI_MEM_FREE, FIN_MEM_FREE
  78 The initial and final total free memory in the cluster (but this
  79 doesn't necessarily mean available for use).
  80
  81 .TP
  82 .I INI_MEM_AVAIL, FIN_MEM_AVAIL
  83 The initial and final total available memory for allocation in the
  84 cluster. If allocating redundant instances, new instances could
  85 increase the reserved memory so it doesn't necessarily mean the
  86 entirety of this memory can be used for new instance allocations.
  87
  88 .TP
  89 .I INI_MEM_RESVD, FIN_MEM_RESVD
  90 The initial and final reserved memory (for redundancy/N+1 purposes).
  91
  92 .TP
  93 .I INI_MEM_INST, FIN_MEM_INST
  94 The initial and final memory used for instances (actual runtime used
  95 RAM).
  96
  97 .TP
  98 .I INI_MEM_OVERHEAD, FIN_MEM_OVERHEAD
  99 The initial and final memory overhead \(em memory used for the node
 100 itself and unacounted memory (e.g. due to hypervisor overhead).
 101
 102 .TP
 103 .I INI_MEM_EFF, HTS_INI_MEM_EFF
 104 The initial and final memory efficiency, represented as instance
 105 memory divided by total memory.
 106
 107 .TP
 108 .I INI_DSK_FREE, INI_DSK_AVAIL, INI_DSK_RESVD, INI_DSK_INST, INI_DSK_EFF
 109 Initial disk stats, similar to the memory ones.
 110
 111 .TP
 112 .I FIN_DSK_FREE, FIN_DSK_AVAIL, FIN_DSK_RESVD, FIN_DSK_INST, FIN_DSK_EFF
 113 Final disk stats, similar to the memory ones.
 114
 115 .TP
 116 .I INI_CPU_INST, FIN_CPU_INST
 117 Initial and final number of virtual CPUs used by instances.
 118
 119 .TP
 120 .I INI_CPU_EFF, FIN_CPU_EFF
 121 The initial and final CPU efficiency, represented as the count of
 122 virtual instance CPUs divided by the total physical CPU count.
 123
 124 .TP
 125 .I INI_MNODE_MEM_AVAIL, FIN_MNODE_MEM_AVAIL
 126 The initial and final maximum per\(hynode available memory. This is not
 127 very useful as a metric but can give an impression of the status of
 128 the nodes; as an example, this value restricts the maximum instance
 129 size that can be still created on the cluster.
 130
 131 .TP
 132 .I INI_MNODE_DSK_AVAIL, FIN_MNODE_DSK_AVAIL
 133 Like the above but for disk.
 134
 135 .TP
 136 .I TSPEC
 137 If the tiered allocation mode has been enabled, this parameter holds
 138 the pairs of specifications and counts of instances that can be
 139 created in this mode. The value of the key is a space\(hyseparated list
 140 of values; each value is of the form \fImemory,disk,vcpu=count\fR
 141 where the memory, disk and vcpu are the values for the current spec,
 142 and count is how many instances of this spec can be created. A
 143 complete value for this variable could be: \fB4096,102400,2=225
 144 2560,102400,2=20 512,102400,2=21\fR.
 145
 146 .TP
 147 .I KM_USED_CPU, KM_USED_NPU, KM_USED_MEM, KM_USED_DSK
 148 These represents the metrics of used resources at the start of the
 149 computation (only for tiered allocation mode). The NPU value is
 150 "normalized" CPU count, i.e. the number of virtual CPUs divided by the
 151 maximum ratio of the virtual to physical CPUs.
 152
 153 .TP
 154 .I KM_POOL_CPU, KM_POOL_NPU, KM_POOL_MEM, KM_POOL_DSK
 155 These represents the total resources allocated during the tiered
 156 allocation process. In effect, they represent how much is readily
 157 available for allocation.
 158
 159 .TP
 160 .I KM_UNAV_CPU, KM_POOL_NPU, KM_UNAV_MEM, KM_UNAV_DSK
 161 These represents the resources left over (either free as in
 162 unallocable or allocable on their own) after the tiered allocation has
 163 been completed. They represent better the actual unallocable
 164 resources, because some other resource has been exhausted. For
 165 example, the cluster might still have 100GiB disk free, but with no
 166 memory left for instances, we cannot allocate another instance, so in
 167 effect the disk space is unallocable. Note that the CPUs here
 168 represent instance virtual CPUs, and in case the \fI--max-cpu\fR
 169 option hasn't been specified this will be \-1.
 170
 171 .TP
 172 .I ALLOC_USAGE
 173 The current usage represented as initial number of instances divided
 174 per final number of instances.
 175
 176 .TP
 177 .I ALLOC_COUNT
 178 The number of instances allocated (delta between FIN_INST_CNT and
 179 INI_INST_CNT).
 180
 181 .TP
 182 .I ALLOC_FAIL*_CNT
 183 For the last attemp at allocations (which would have increased
 184 FIN_INST_CNT with one, if it had succeeded), this is the count of the
 185 failure reasons per failure type; currently defined are FAILMEM,
 186 FAILDISK and FAILCPU which represent errors due to not enough memory,
 187 disk and CPUs, and FAILN1 which represents a non N+1 compliant cluster
 188 on which we can't allocate instances at all.
 189
 190 .TP
 191 .I ALLOC_FAIL_REASON
 192 The reason for most of the failures, being one of the above FAIL*
 193 strings.
 194
 195 .TP
 196 .I OK
 197 A marker representing the successful end of the computation, and
 198 having value "1". If this key is not present in the output it means
 199 that the computation failed and any values present should not be
 200 relied upon.
 201
 202 .PP
 203
 204 If the tiered allocation mode is enabled, then many of the INI_/FIN_
 205 metrics will be also displayed with a TRL_ prefix, and denote the
 206 cluster status at the end of the tiered allocation run.
 207
 208 .SH OPTIONS
 209 The options that can be passed to the program are as follows:
 210
 211 .TP
 212 .BI "--memory " mem
 213 The memory size of the instances to be placed (defaults to 4GiB).
 214
 215 .TP
 216 .BI "--disk " disk
 217 The disk size of the instances to be placed (defaults to 100GiB).
 218
 219 .TP
 220 .BI "--req-nodes " num-nodes
 221 The number of nodes for the instances; the default of two means
 222 mirrored instances, while passing one means plain type instances.
 223
 224 .TP
 225 .BI "--vcpus " vcpus
 226 The number of VCPUs of the instances to be placed (defaults to 1).
 227
 228 .TP
 229 .BI "--max-cpu " cpu-ratio
 230 The maximum virtual\(hyto\(hyphysical cpu ratio, as a floating point
 231 number between zero and one. For example, specifying \fIcpu-ratio\fR
 232 as \fB2.5\fR means that, for a 4\(hycpu machine, a maximum of 10
 233 virtual cpus should be allowed to be in use for primary instances. A
 234 value of one doesn't make sense though, as that means no disk space
 235 can be used on it.
 236
 237 .TP
 238 .BI "--min-disk " disk-ratio
 239 The minimum amount of free disk space remaining, as a floating point
 240 number. For example, specifying \fIdisk-ratio\fR as \fB0.25\fR means
 241 that at least one quarter of disk space should be left free on nodes.
 242
 243 .TP
 244 .B -p, --print-nodes
 245 Prints the before and after node status, in a format designed to allow
 246 the user to understand the node's most important parameters.
 247
 248 It is possible to customise the listed information by passing a
 249 comma\(hyseparated list of field names to this option (the field list
 250 is currently undocumented), or to extend the default field list by
 251 prefixing the additional field list with a plus sign. By default, the
 252 node list will contain the following information:
 253 .RS
 254 .TP
 255 .B F
 256 a character denoting the status of the node, with '\-' meaning an
 257 offline node, '*' meaning N+1 failure and blank meaning a good node
 258 .TP
 259 .B Name
 260 the node name
 261 .TP
 262 .B t_mem
 263 the total node memory
 264 .TP
 265 .B n_mem
 266 the memory used by the node itself
 267 .TP
 268 .B i_mem
 269 the memory used by instances
 270 .TP
 271 .B x_mem
 272 amount memory which seems to be in use but cannot be determined why or
 273 by which instance; usually this means that the hypervisor has some
 274 overhead or that there are other reporting errors
 275 .TP
 276 .B f_mem
 277 the free node memory
 278 .TP
 279 .B r_mem
 280 the reserved node memory, which is the amount of free memory needed
 281 for N+1 compliance
 282 .TP
 283 .B t_dsk
 284 total disk
 285 .TP
 286 .B f_dsk
 287 free disk
 288 .TP
 289 .B pcpu
 290 the number of physical cpus on the node
 291 .TP
 292 .B vcpu
 293 the number of virtual cpus allocated to primary instances
 294 .TP
 295 .B pcnt
 296 number of primary instances
 297 .TP
 298 .B pcnt
 299 number of secondary instances
 300 .TP
 301 .B p_fmem
 302 percent of free memory
 303 .TP
 304 .B p_fdsk
 305 percent of free disk
 306 .TP
 307 .B r_cpu
 308 ratio of virtual to physical cpus
 309 .TP
 310 .B lCpu
 311 the dynamic CPU load (if the information is available)
 312 .TP
 313 .B lMem
 314 the dynamic memory load (if the information is available)
 315 .TP
 316 .B lDsk
 317 the dynamic disk load (if the information is available)
 318 .TP
 319 .B lNet
 320 the dynamic net load (if the information is available)
 321 .RE
 322
 323 .TP
 324 .BI "-O " name
 325 This option (which can be given multiple times) will mark nodes as
 326 being \fIoffline\fR, and instances won't be placed on these nodes.
 327
 328 Note that hspace will also mark as offline any nodes which are
 329 reported by RAPI as such, or that have "?" in file\(hybased input in any
 330 numeric fields.
 331 .RE
 332
 333 .TP
 334 .BI "-t" datafile ", --text-data=" datafile
 335 The name of the file holding node and instance information (if not
 336 collecting via RAPI or LUXI). This or one of the other backends must
 337 be selected.
 338
 339 .TP
 340 .BI "-S" filename ", --save-cluster=" filename
 341 If given, the state of the cluster at the end of the allocation is
 342 saved to a file named \fIfilename.alloc\fR, and if tiered allocation
 343 is enabled, the state after tiered allocation will be saved to
 344 \fIfilename.tiered\fR. This allows re-feeding the cluster state to
 345 either hspace itself (with different parameters) or for example hbal.
 346
 347 .TP
 348 .BI "-m" cluster
 349 Collect data directly from the
 350 .I cluster
 351 given as an argument via RAPI. If the argument doesn't contain a colon
 352 (:), then it is converted into a fully\(hybuilt URL via prepending
 353 https:// and appending the default RAPI port, otherwise it's
 354 considered a fully\(hyspecified URL and is used as\(hyis.
 355
 356 .TP
 357 .BI "-L[" path "]"
 358 Collect data directly from the master daemon, which is to be contacted
 359 via the luxi (an internal Ganeti protocol). An optional \fIpath\fR
 360 argument is interpreted as the path to the unix socket on which the
 361 master daemon listens; otherwise, the default path used by ganeti when
 362 installed with \fI--localstatedir=/var\fR is used.
 363
 364 .TP
 365 .BI "--simulate " description
 366 Instead of using actual data, build an empty cluster given a node
 367 description. The \fIdescription\fR parameter must be a
 368 comma\(hyseparated list of four elements, describing in order:
 369
 370 .RS
 371
 372 .RS
 373 .TP
 374 the number of nodes in the cluster
 375
 376 .TP
 377 the disk size of the nodes, in mebibytes
 378
 379 .TP
 380 the memory size of the nodes, in mebibytes
 381
 382 .TP
 383 the cpu core count for the nodes
 384
 385 .RE
 386
 387 An example description would be \fB20,102400,16384,4\fR describing a
 388 20\(hynode cluster where each node has 100GiB of disk space, 16GiB of
 389 memory and 4 CPU cores. Note that all nodes must have the same specs
 390 currently.
 391
 392 .RE
 393
 394 .TP
 395 .BI "--tiered-alloc " spec
 396 Beside the standard, fixed\(hysize allocation, also do a tiered
 397 allocation scheme where the algorithm starts from the given
 398 specification and allocates until there is no more space; then it
 399 decreases the specification and tries the allocation again. The
 400 decrease is done on the matric that last failed during allocation. The
 401 specification given is similar to the \fI--simulate\fR option and it
 402 holds:
 403
 404 .RS
 405
 406 .RS
 407
 408 .TP
 409 the disk size of the instance
 410
 411 .TP
 412 the memory size of the instance
 413
 414 .TP
 415 the vcpu count for the insance
 416
 417 .RE
 418
 419 An example description would be \fB10240,8192,2\fR describing an
 420 initial starting specification of 10GiB of disk space, 4GiB of memory
 421 and 2 VCPUs.
 422
 423 Also note that the normal allocation and the tiered allocation are
 424 independent, and both start from the initial cluster state; as such,
 425 the instance count for these two modes are not related one to another.
 426
 427 .RE
 428
 429 .TP
 430 .B -v, --verbose
 431 Increase the output verbosity. Each usage of this option will increase
 432 the verbosity (currently more than 2 doesn't make sense) from the
 433 default of one. At verbosity 2 the location of the new instances is
 434 shown in the standard error.
 435
 436 .TP
 437 .B -q, --quiet
 438 Decrease the output verbosity. Each usage of this option will decrease
 439 the verbosity (less than zero doesn't make sense) from the default of
 440 one.
 441
 442 .TP
 443 .B -V, --version
 444 Just show the program version and exit.
 445
 446 .SH EXIT STATUS
 447
 448 The exist status of the command will be zero, unless for some reason
 449 the algorithm fatally failed (e.g. wrong node or instance data).
 450
 451 .SH BUGS
 452
 453 The algorithm is highly dependent on the number of nodes; its runtime
 454 grows exponentially with this number, and as such is impractical for
 455 really big clusters.
 456
 457 The algorithm doesn't rebalance the cluster or try to get the optimal
 458 fit; it just allocates in the best place for the current step, without
 459 taking into consideration the impact on future placements.
 460
 461 .SH ENVIRONMENT
 462
 463 If the variables \fBHTOOLS_NODES\fR and \fBHTOOLS_INSTANCES\fR are
 464 present in the environment, they will override the default names for
 465 the nodes and instances files. These will have of course no effect
 466 when the RAPI or Luxi backends are used.
 467
 468 .SH SEE ALSO
 469 .BR hbal "(1), " hscan "(1), " ganeti "(7), " gnt-instance "(8), "
 470 .BR gnt-node "(8)"
 471
 472 .SH "COPYRIGHT"
 473 .PP
 474 Copyright (C) 2009 Google Inc. Permission is granted to copy,
 475 distribute and/or modify under the terms of the GNU General Public
 476 License as published by the Free Software Foundation; either version 2
 477 of the License, or (at your option) any later version.
 478 .PP
 479 On Debian systems, the complete text of the GNU General Public License
 480 can be found in /usr/share/common-licenses/GPL.