code.grnet.gr Git - ganeti-local/blob - hspace.1

   1 .TH HSPACE 1 2009-06-01 htools "Ganeti H-tools"
   2 .SH NAME
   3 hspace \- Cluster space analyzer for Ganeti
   4
   5 .SH SYNOPSIS
   6 .B hspace
   7 .B "[backend options...]"
   8 .B "[algorithm options...]"
   9 .B "[request options..."]
  10 .BI "[ -p[" fields "] ]"
  11 .B "[-v... | -q]"
  12
  13 .B hspace
  14 .B --version
  15
  16 .TP
  17 Backend options:
  18 .BI " -m " cluster
  19 |
  20 .BI " -L[" path "]"
  21 |
  22 .BI " -t " data-file
  23 |
  24 .BI " --simulate " spec
  25
  26 .TP
  27 Algorithm options:
  28 .BI "[ --max-cpu " cpu-ratio " ]"
  29 .BI "[ --min-disk " disk-ratio " ]"
  30 .BI "[ -O " name... " ]"
  31
  32 .TP
  33 Request options:
  34 .BI "[--memory " mem "]"
  35 .BI "[--disk " disk "]"
  36 .BI "[--req-nodes " req-nodes "]"
  37 .BI "[--vcpus " vcpus "]"
  38 .BI "[--tiered-alloc " spec "]"
  39
  40
  41 .SH DESCRIPTION
  42 hspace computes how many additional instances can be fit on a cluster,
  43 while maintaining N+1 status.
  44
  45 The program will try to place instances, all of the same size, on the
  46 cluster, until the point where we don't have any N+1 possible
  47 allocation. It uses the exact same allocation algorithm as the hail
  48 iallocator plugin.
  49
  50 The output of the program is designed to interpreted as a shell
  51 fragment (or parsed as a \fIkey=value\fR file). Options which extend
  52 the output (e.g. \-p, \-v) will output the additional information on
  53 stderr (such that the stdout is still parseable).
  54
  55 The following keys are available in the output of the script (all
  56 prefixed with \fIHTS_\fR):
  57 .TP
  58 .I SPEC_MEM, SPEC_DSK, SPEC_CPU, SPEC_RQN
  59 These represent the specifications of the instance model used for
  60 allocation (the memory, disk, cpu, requested nodes).
  61
  62 .TP
  63 .I CLUSTER_MEM, CLUSTER_DSK, CLUSTER_CPU, CLUSTER_NODES
  64 These represent the total memory, disk, CPU count and total nodes in
  65 the cluster.
  66
  67 .TP
  68 .I INI_SCORE, FIN_SCORE
  69 These are the initial (current) and final cluster score (see the hbal
  70 man page for details about the scoring algorithm).
  71
  72 .TP
  73 .I INI_INST_CNT, FIN_INST_CNT
  74 The initial and final instance count.
  75
  76 .TP
  77 .I INI_MEM_FREE, FIN_MEM_FREE
  78 The initial and final total free memory in the cluster (but this
  79 doesn't necessarily mean available for use).
  80
  81 .TP
  82 .I INI_MEM_AVAIL, FIN_MEM_AVAIL
  83 The initial and final total available memory for allocation in the
  84 cluster. If allocating redundant instances, new instances could
  85 increase the reserved memory so it doesn't necessarily mean the
  86 entirety of this memory can be used for new instance allocations.
  87
  88 .TP
  89 .I INI_MEM_RESVD, FIN_MEM_RESVD
  90 The initial and final reserved memory (for redundancy/N+1 purposes).
  91
  92 .TP
  93 .I INI_MEM_INST, FIN_MEM_INST
  94 The initial and final memory used for instances (actual runtime used
  95 RAM).
  96
  97 .TP
  98 .I INI_MEM_OVERHEAD, FIN_MEM_OVERHEAD
  99 The initial and final memory overhead \(em memory used for the node
 100 itself and unacounted memory (e.g. due to hypervisor overhead).
 101
 102 .TP
 103 .I INI_MEM_EFF, HTS_INI_MEM_EFF
 104 The initial and final memory efficiency, represented as instance
 105 memory divided by total memory.
 106
 107 .TP
 108 .I INI_DSK_FREE, INI_DSK_AVAIL, INI_DSK_RESVD, INI_DSK_INST, INI_DSK_EFF
 109 Initial disk stats, similar to the memory ones.
 110
 111 .TP
 112 .I FIN_DSK_FREE, FIN_DSK_AVAIL, FIN_DSK_RESVD, FIN_DSK_INST, FIN_DSK_EFF
 113 Final disk stats, similar to the memory ones.
 114
 115 .TP
 116 .I INI_CPU_INST, FIN_CPU_INST
 117 Initial and final number of virtual CPUs used by instances.
 118
 119 .TP
 120 .I INI_CPU_EFF, FIN_CPU_EFF
 121 The initial and final CPU efficiency, represented as the count of
 122 virtual instance CPUs divided by the total physical CPU count.
 123
 124 .TP
 125 .I INI_MNODE_MEM_AVAIL, FIN_MNODE_MEM_AVAIL
 126 The initial and final maximum per\(hynode available memory. This is not
 127 very useful as a metric but can give an impression of the status of
 128 the nodes; as an example, this value restricts the maximum instance
 129 size that can be still created on the cluster.
 130
 131 .TP
 132 .I INI_MNODE_DSK_AVAIL, FIN_MNODE_DSK_AVAIL
 133 Like the above but for disk.
 134
 135 .TP
 136 .I TSPEC
 137 If the tiered allocation mode has been enabled, this parameter holds
 138 the pairs of specifications and counts of instances that can be
 139 created in this mode. The value of the key is a space\(hyseparated list
 140 of values; each value is of the form \fImemory,disk,vcpu=count\fR
 141 where the memory, disk and vcpu are the values for the current spec,
 142 and count is how many instances of this spec can be created. A
 143 complete value for this variable could be: \fB4096,102400,2=225
 144 2560,102400,2=20 512,102400,2=21\fR.
 145
 146 .TP
 147 .I KM_USED_CPU, KM_USED_MEM, KM_USED_DSK
 148 These represents the metrics of used resources at the start of the
 149 computation (only for tiered allocation mode).
 150
 151 .TP
 152 .I KM_POOL_CPU, KM_POOL_MEM, KM_POOL_DSK
 153 These represents the total resources allocated during the tiered
 154 allocation process. In effect, they represent how much is readily
 155 available for allocation.
 156
 157 .TP
 158 .I KM_UNAV_CPU, KM_UNAV_MEM, KM_UNAV_DSK
 159 These represents the resources left over (either free as in
 160 unallocable or allocable on their own) after the tiered allocation has
 161 been completed. They represent better the actual unallocable
 162 resources, because some other resource has been exhausted. For
 163 example, the cluster might still have 100GiB disk free, but with no
 164 memory left for instances, we cannot allocate another instance, so in
 165 effect the disk space is unallocable. Note that the CPUs here
 166 represent instance virtual CPUs, and in case the \fI--max-cpu\fR
 167 option hasn't been specified this will be -1.
 168
 169 .TP
 170 .I ALLOC_USAGE
 171 The current usage represented as initial number of instances divided
 172 per final number of instances.
 173
 174 .TP
 175 .I ALLOC_COUNT
 176 The number of instances allocated (delta between FIN_INST_CNT and
 177 INI_INST_CNT).
 178
 179 .TP
 180 .I ALLOC_FAIL*_CNT
 181 For the last attemp at allocations (which would have increased
 182 FIN_INST_CNT with one, if it had succeeded), this is the count of the
 183 failure reasons per failure type; currently defined are FAILMEM,
 184 FAILDISK and FAILCPU which represent errors due to not enough memory,
 185 disk and CPUs, and FAILN1 which represents a non N+1 compliant cluster
 186 on which we can't allocate instances at all.
 187
 188 .TP
 189 .I ALLOC_FAIL_REASON
 190 The reason for most of the failures, being one of the above FAIL*
 191 strings.
 192
 193 .TP
 194 .I OK
 195 A marker representing the successful end of the computation, and
 196 having value "1". If this key is not present in the output it means
 197 that the computation failed and any values present should not be
 198 relied upon.
 199
 200 .PP
 201
 202 If the tiered allocation mode is enabled, then many of the INI_/FIN_
 203 metrics will be also displayed with a TRL_ prefix, and denote the
 204 cluster status at the end of the tiered allocation run.
 205
 206 .SH OPTIONS
 207 The options that can be passed to the program are as follows:
 208
 209 .TP
 210 .BI "--memory " mem
 211 The memory size of the instances to be placed (defaults to 4GiB).
 212
 213 .TP
 214 .BI "--disk " disk
 215 The disk size of the instances to be placed (defaults to 100GiB).
 216
 217 .TP
 218 .BI "--req-nodes " num-nodes
 219 The number of nodes for the instances; the default of two means
 220 mirrored instances, while passing one means plain type instances.
 221
 222 .TP
 223 .BI "--vcpus " vcpus
 224 The number of VCPUs of the instances to be placed (defaults to 1).
 225
 226 .TP
 227 .BI "--max-cpu " cpu-ratio
 228 The maximum virtual\(hyto\(hyphysical cpu ratio, as a floating point
 229 number between zero and one. For example, specifying \fIcpu-ratio\fR
 230 as \fB2.5\fR means that, for a 4\(hycpu machine, a maximum of 10
 231 virtual cpus should be allowed to be in use for primary instances. A
 232 value of one doesn't make sense though, as that means no disk space
 233 can be used on it.
 234
 235 .TP
 236 .BI "--min-disk " disk-ratio
 237 The minimum amount of free disk space remaining, as a floating point
 238 number. For example, specifying \fIdisk-ratio\fR as \fB0.25\fR means
 239 that at least one quarter of disk space should be left free on nodes.
 240
 241 .TP
 242 .B -p, --print-nodes
 243 Prints the before and after node status, in a format designed to allow
 244 the user to understand the node's most important parameters.
 245
 246 It is possible to customise the listed information by passing a
 247 comma\(hyseparated list of field names to this option (the field list is
 248 currently undocumented). By default, the node list will contain these
 249 informations:
 250 .RS
 251 .TP
 252 .B F
 253 a character denoting the status of the node, with '\-' meaning an
 254 offline node, '*' meaning N+1 failure and blank meaning a good node
 255 .TP
 256 .B Name
 257 the node name
 258 .TP
 259 .B t_mem
 260 the total node memory
 261 .TP
 262 .B n_mem
 263 the memory used by the node itself
 264 .TP
 265 .B i_mem
 266 the memory used by instances
 267 .TP
 268 .B x_mem
 269 amount memory which seems to be in use but cannot be determined why or
 270 by which instance; usually this means that the hypervisor has some
 271 overhead or that there are other reporting errors
 272 .TP
 273 .B f_mem
 274 the free node memory
 275 .TP
 276 .B r_mem
 277 the reserved node memory, which is the amount of free memory needed
 278 for N+1 compliance
 279 .TP
 280 .B t_dsk
 281 total disk
 282 .TP
 283 .B f_dsk
 284 free disk
 285 .TP
 286 .B pcpu
 287 the number of physical cpus on the node
 288 .TP
 289 .B vcpu
 290 the number of virtual cpus allocated to primary instances
 291 .TP
 292 .B pcnt
 293 number of primary instances
 294 .TP
 295 .B pcnt
 296 number of secondary instances
 297 .TP
 298 .B p_fmem
 299 percent of free memory
 300 .TP
 301 .B p_fdsk
 302 percent of free disk
 303 .TP
 304 .B r_cpu
 305 ratio of virtual to physical cpus
 306 .TP
 307 .B lCpu
 308 the dynamic CPU load (if the information is available)
 309 .TP
 310 .B lMem
 311 the dynamic memory load (if the information is available)
 312 .TP
 313 .B lDsk
 314 the dynamic disk load (if the information is available)
 315 .TP
 316 .B lNet
 317 the dynamic net load (if the information is available)
 318 .RE
 319
 320 .TP
 321 .BI "-O " name
 322 This option (which can be given multiple times) will mark nodes as
 323 being \fIoffline\fR, and instances won't be placed on these nodes.
 324
 325 Note that hspace will also mark as offline any nodes which are
 326 reported by RAPI as such, or that have "?" in file\(hybased input in any
 327 numeric fields.
 328 .RE
 329
 330 .TP
 331 .BI "-t" datafile ", --text-data=" datafile
 332 The name of the file holding node and instance information (if not
 333 collecting via RAPI or LUXI). This or one of the other backends must
 334 be selected.
 335
 336 .TP
 337 .BI "-m" cluster
 338 Collect data directly from the
 339 .I cluster
 340 given as an argument via RAPI. If the argument doesn't contain a colon
 341 (:), then it is converted into a fully\(hybuilt URL via prepending
 342 https:// and appending the default RAPI port, otherwise it's
 343 considered a fully\(hyspecified URL and is used as\(hyis.
 344
 345 .TP
 346 .BI "-L[" path "]"
 347 Collect data directly from the master daemon, which is to be contacted
 348 via the luxi (an internal Ganeti protocol). An optional \fIpath\fR
 349 argument is interpreted as the path to the unix socket on which the
 350 master daemon listens; otherwise, the default path used by ganeti when
 351 installed with \fI--localstatedir=/var\fR is used.
 352
 353 .TP
 354 .BI "--simulate " description
 355 Instead of using actual data, build an empty cluster given a node
 356 description. The \fIdescription\fR parameter must be a
 357 comma\(hyseparated list of four elements, describing in order:
 358
 359 .RS
 360
 361 .RS
 362 .TP
 363 the number of nodes in the cluster
 364
 365 .TP
 366 the disk size of the nodes, in mebibytes
 367
 368 .TP
 369 the memory size of the nodes, in mebibytes
 370
 371 .TP
 372 the cpu core count for the nodes
 373
 374 .RE
 375
 376 An example description would be \fB20,102400,16384,4\fR describing a
 377 20\(hynode cluster where each node has 100GiB of disk space, 16GiB of
 378 memory and 4 CPU cores. Note that all nodes must have the same specs
 379 currently.
 380
 381 .RE
 382
 383 .TP
 384 .BI "--tiered-alloc " spec
 385 Beside the standard, fixed\(hysize allocation, also do a tiered
 386 allocation scheme where the algorithm starts from the given
 387 specification and allocates until there is no more space; then it
 388 decreases the specification and tries the allocation again. The
 389 decrease is done on the matric that last failed during allocation. The
 390 specification given is similar to the \fI--simulate\fR option and it
 391 holds:
 392
 393 .RS
 394
 395 .RS
 396
 397 .TP
 398 the disk size of the instance
 399
 400 .TP
 401 the memory size of the instance
 402
 403 .TP
 404 the vcpu count for the insance
 405
 406 .RE
 407
 408 An example description would be \fB10240,8192,2\fR describing an
 409 initial starting specification of 10GiB of disk space, 4GiB of memory
 410 and 2 VCPUs.
 411
 412 Also note that the normal allocation and the tiered allocation are
 413 independent, and both start from the initial cluster state; as such,
 414 the instance count for these two modes are not related one to another.
 415
 416 .RE
 417
 418 .TP
 419 .B -v, --verbose
 420 Increase the output verbosity. Each usage of this option will increase
 421 the verbosity (currently more than 2 doesn't make sense) from the
 422 default of one. At verbosity 2 the location of the new instances is
 423 shown in the standard error.
 424
 425 .TP
 426 .B -q, --quiet
 427 Decrease the output verbosity. Each usage of this option will decrease
 428 the verbosity (less than zero doesn't make sense) from the default of
 429 one.
 430
 431 .TP
 432 .B -V, --version
 433 Just show the program version and exit.
 434
 435 .SH EXIT STATUS
 436
 437 The exist status of the command will be zero, unless for some reason
 438 the algorithm fatally failed (e.g. wrong node or instance data).
 439
 440 .SH BUGS
 441
 442 The algorithm is highly dependent on the number of nodes; its runtime
 443 grows exponentially with this number, and as such is impractical for
 444 really big clusters.
 445
 446 The algorithm doesn't rebalance the cluster or try to get the optimal
 447 fit; it just allocates in the best place for the current step, without
 448 taking into consideration the impact on future placements.
 449
 450 .SH ENVIRONMENT
 451
 452 If the variables \fBHTOOLS_NODES\fR and \fBHTOOLS_INSTANCES\fR are
 453 present in the environment, they will override the default names for
 454 the nodes and instances files. These will have of course no effect
 455 when the RAPI or Luxi backends are used.
 456
 457 .SH SEE ALSO
 458 .BR hbal "(1), " hscan "(1), " ganeti "(7), " gnt-instance "(8), "
 459 .BR gnt-node "(8)"
 460
 461 .SH "COPYRIGHT"
 462 .PP
 463 Copyright (C) 2009 Google Inc. Permission is granted to copy,
 464 distribute and/or modify under the terms of the GNU General Public
 465 License as published by the Free Software Foundation; either version 2
 466 of the License, or (at your option) any later version.
 467 .PP
 468 On Debian systems, the complete text of the GNU General Public License
 469 can be found in /usr/share/common-licenses/GPL.