Revision cc6d2673

/dev/null
1
Ganeti-htools release notes
2
===========================
3

  
4

  
5
Version 0.3.1 (Fri, 11 Mar 2011)
6
--------------------------------
7

  
8
Minor bugfix release:
9

  
10
- Fixed source archive generation: the hscolour.css file was an invalid
11
  symlink, and the man pages were not correctly timestamped (leading to
12
  unneeded build-time rebuilds)
13
- Improved the Luxi backend to show which attribute fails parsing
14
- Small improvements to the man pages, and also ship the HTML version of
15
  man pages in the source archive
16

  
17

  
18
Version 0.3.0 (Fri, 04 Feb 2011)
19
--------------------------------
20

  
21
A significant release that breaks compatibility with Ganeti versions
22
below 2.4 due to the node group changes. Only the RAPI backend can talk
23
to older clusters, but it is recommended to use this version only with
24
Ganeti 2.4.
25

  
26
All commands are now multi-group aware (but to various degrees), so
27
allocation, balancing and capacity calculation respects the group layout
28
and will not create “broken” instances by using nodes from different
29
groups.
30

  
31
For a regular, single-group cluster, no changes should be directly
32
visible to the users. A multi-group cluster however will change some
33
things slightly:
34

  
35
- hbal will require a target group to operate on (no cluster-wide
36
  balancing yet)
37
- evacuation of (DRBD) instances from a node will be restricted to nodes
38
  in the same group, as inter-group moves are not implemented yet
39
- capacity, while showing correct data, will not give per-group details
40
  yet
41

  
42
There are other changes in this release:
43

  
44
- fixed a long-standing bug in hscan related to node memory data
45
- changed the text backend format, which unfortunately invalidates old
46
  files
47
- error handling improvements, so that invalid input data reports better
48
  where the error is
49
- the simulation backend changes its syntax, now it takes the allocation
50
  policy too, and can generate multiple groups
51
- (internal) man page generation moved to pandoc from hand-written,
52
  which is helpful as it can also generate HTML versions
53
- the balancing algorithm has been changed to work in parallel, if the
54
  code is linked against the multi-threaded runtime; this gives a very
55
  good speedup (~80% on 4 cores, ~60-70% of 12 cores)
56

  
57
Version 0.2.8 (Thu, 23 Dec 2010)
58
--------------------------------
59

  
60
A bug fix release:
61

  
62
- fixed balancing function for big clusters, which will improve corner
63
  cases where hbal didn't see any solution even though the cluster was
64
  obviously not well balanced
65
- fixed exit code of hbal in case of (Luxi) job errors
66
- changed the signal handling in hbal in order to make hbal control
67
  easier: instead of synchronising on the count of signals, make SIGINT
68
  cause graceful termination, and SIGTERM an immediate one
69
- increased the tag exclusion weight so that it has greater importance
70
  during the balancing
71
- slight improvement to the speed of balancing via algorithm tweaks
72

  
73

  
74
Version 0.2.7 (Thu, 07 Oct 2010)
75
--------------------------------
76

  
77
Bug fixes:
78

  
79
- fixed the error message for hail multi-evacuation mode
80
- improve evacuation mode for offline secondary nodes (ignore available
81
  memory)
82

  
83
New features:
84

  
85
- add a new option ``-S`` to hbal and hspace that saves the cluster
86
  state at the end of the processing in the text format used by the
87
  ``-t`` option, for later re-processing
88
- a two new options to hbal, -g and --min-gain-limit, that should help
89
  in limiting the number of balances steps with a low gain in the final
90
  stages
91
- hbal, when executing jobs, will now wait for the current jobs to
92
  finish at the first stop (e.g. ^C); if the user wants immediate exit,
93
  another signal should be sent
94
- added “normalized” physical CPU units in hspace output (NPU), which
95
  represents units of physical CPUs free/used, based on the max-cpu
96
  ratio
97

  
98

  
99
Version 0.2.6 (Mon, 26 Jul 2010)
100
--------------------------------
101

  
102
Exactly three months since the last release. Many internal changes, plus
103
a couple of important changes in the balancing algorithm.
104

  
105
First, the balancing may now introduce N+1 errors, if this solves other,
106
more critical problems. For the moment, this means that moving instances
107
away from offline nodes is allowed even if it creates N+1 errors, and
108
that means evacuation can be done in more cases.
109

  
110
Second, the scoring for N+1 has changed. In previous versions, it simply
111
counted the number of failing N+1 nodes, which means moving an instance
112
away from a N+1 failed node (but without the node 'clearing' the N+1
113
status) was not reflected in the cluster score. As such, the balancing
114
algorithm managed to clear N+1 errors only sometimes, since usually it
115
takes more than one move for this, and the first prerequisite move was
116
not 'rewarded' appropriately and thus it was not selected. Now, it is
117
possible to fix many more error cases than before: on a simulated 40
118
node cluster full with instances (symmetrically allocated on all nodes),
119
around five nodes can be evacuated before N+1 errors can be solved,
120
whereas 0.2.5 could evacuate at best one node.
121

  
122
There were some other internal changes to the scoring algorithm, such
123
that now the metrics have associated weights, and they are not all of
124
the same importance anymore. As of now, the only change is that offline
125
instances have a higher weight, which should favour proper node
126
evacuations.
127

  
128
Among the other changes:
129

  
130
- fixed the hspace KM_POOL_* metrics, which were returned as the final
131
  state and not as the delta between the initial and final states
132
- fixed hspace handling of N+1 failing clusters: before, it used to
133
  generate a 'fake' response, and the structure of this response was not
134
  always in sync with the real responses, leading to missing items;
135
  currently it proceeds correctly through the code (skipping the
136
  computation), and uses the same display mechanisms as the normal case
137
- fixed hscan exit code for RAPI failures: previously it finished with
138
  success even if all the clusters failed, which was creating issues
139
  with the live-test script; now it exits with exit code 2 for RAPI
140
  failures (unfortunately this is still not optimal as LUXI failures
141
  will use exit code 1, the same as the command line)
142
- changed the limit values for CPU/disk, which previously were used
143
  optionally, whereas now they are always used; the default cpu ratio
144
  limit is now 64 VCPUs per PCPU
145
- changed the internal handling of the short name vs. original
146
  (Ganeti-provided) name; now internally we always use the full name,
147
  and only in display routines we show the shortened (called 'alias')
148
  name; as a result, the -O and --excluded-instances options now accept
149
  both the full name and the shortened name
150
- changed internal handling of JSON conversions and errors, such that
151
  now we show a better context for failure messages, which should help
152
  with diagnosing the malformed message
153
- changed the names for a few node fields, and added some more nodes;
154
  this is most likely to help with debugging, and not with regular
155
  operation though
156
- changed the node fields option to allow the '+' prefix to mean 'extend
157
  the default fields list' rather than start from fresh (similar to
158
  Ganeti's implementation)
159
- a few internal changes related to the LUXI protocol implementation,
160
  which should make it more safe against potential bugs, one
161
  optiomization that should help with large messages, and some patches
162
  in preparation for potential expansion of the LUXI backend functionality
163

  
164
And finally, many improvements on unittests and the live-test
165
script. Test coverage is much enhanced, and the test infrastructure has
166
better error reporting; this should lead down-the-road to better code
167
and fewer bugs…
168

  
169

  
170
Version 0.2.5 (Mon, 26 Apr 2010)
171
--------------------------------
172

  
173
Some internal cleanup plus a few user-visible changes:
174

  
175
- new option for marking instances as 'do-not-move' during rebalancing
176
- allow ``hscan`` to scan the local cluster via Luxi
177
- add more metrics to ``hspace`` which show the delta between original
178
  state and final state better (only valid for tiered allocation)
179

  
180

  
181
Version 0.2.4 (Mon, 22 Feb 2010)
182
--------------------------------
183

  
184
Two improvements for node evacuation:
185

  
186
- hbal takes a new parameter ``--evac-mode`` that restricts the
187
  instances to be moved to the ones on offline/drained nodes, which
188
  should reduce the work done
189
- hail supports the new ``multi-evacuate`` mode of the IAllocator
190
  protocol, that will be released in a minor release on the Ganeti 2.1
191
  branch
192

  
193

  
194
Version 0.2.3 (Thu,  4 Feb 2010)
195
--------------------------------
196

  
197
A small release:
198

  
199
- Fixes selection of secondary node: previously, if the cluster had
200
  many N+1 failures, a N+1 failed node could be selected as secondary
201
  even if it did not have enough memory to allow the instance to be
202
  migrated/failed over to it; this is bad for automated tools, since
203
  we can get the cluster in an unhealthy state
204
- Switch the text backend to a single input file, that is generated
205
  now by hscan and shouldn't be generated manually via
206
  gnt-node/instance list anymore; this allows richer information to be
207
  kept in the file, and simplifies a little the internals of the text
208
  backend
209

  
210

  
211
Version 0.2.2 (Tue, 29 Dec 2009)
212
--------------------------------
213

  
214
Small release, 0.2.1 was broken and thus this was released earlier:
215

  
216
- Release 0.2.1 broke the LUXI backend due to a typo, fixed
217
- Added a live-test script that should catch errors like the above one
218
  in the future (needs a working, non-empty cluster)
219
- Changed RAPI and LUXI backends to treat drained nodes as offline,
220
  similar to the IAllocator backend change in 0.2.0 (which was wrongly
221
  marked as affecting all backends)
222
- Changed the metrics for offline instances and N1 score from percent to
223
  count, in order to increase the priority of evacuations
224
- Added a new metric (offline primary instances) which should fix the
225
  evacuation of a offline node in a 2-node cluster
226

  
227

  
228
Version 0.2.1 (Wed,  2 Dec 2009)
229
--------------------------------
230

  
231
- Added instance exclusion defined via instance tags
232
- Fixed the output of hspace to be again parseable from the shell
233

  
234

  
235
Version 0.2.0 (Tue, 10 Nov 2009)
236
--------------------------------
237

  
238
A significant release, with a few new major features:
239

  
240
- Added direct execution of the hbal solution when using the Luxi
241
  backend; the steps for each instance moves are submitted as a single
242
  jobs, and the different jobs are submitted as groups in order to
243
  parallelise the execution of moves
244
- Added support for balancing based on dynamic utilisation data for
245
  instances, fed in via a text file; by default, all instances are
246
  considered equal and this change also improves the equalisation of
247
  secondary instances per node
248
- Added support for tiered capacity calculation in hspace, where we
249
  start from a maximum instance spec and decrease the spec when we run
250
  out of resources; this should give a better measure of available
251
  capacity on 'fragmented' clusters; this is done separately from the
252
  current fixed-mode computation
253

  
254
Also there have been many minor improvements:
255

  
256
- Added option for showing instances (“--print-instances”), similar to
257
  the print nodes option
258
- Added support for customising the node list via an argument to the
259
  print nodes option in the form of a comma-separated list of field
260
  names; currently the field names are not documented, expecting further
261
  changes in a next release
262
- Enhanced the error reporting in the Luxi and Rapi backends
263
- Changed the handling of drained nodes, now being treated the same as
264
  offline nodes, for Ganeti 2.0.4+ compatibility
265
- A number of internal changes, simplifying code and merging some
266
  disparate functions
267
- Simplify the build system in relation to creation of archives
268

  
269

  
270
Version 0.1.8 (Tue, 29 Sep 2009)
271
--------------------------------
272

  
273
- Brown-paper-bag release fixing haddock issues
274

  
275

  
276
Version 0.1.7 (Mon, 28 Sep 2009)
277
--------------------------------
278

  
279
- Fixed a bug in the Luxi backend for big responses
280
- Fixed test suite exit code in presence of test failures
281
- Changed the migrate operation to run instead failover for instances
282
  which were marked as not running in the input data (this could have
283
  been changed since then, but it's better than today's always migrate)
284
- Added support for 'cheap' moves only (only migrate/failover) in
285
  balancing
286
- Added support for building without curl (thus no RAPI backend)
287

  
288

  
289
Version 0.1.6 (Wed, 19 Aug 2009)
290
--------------------------------
291

  
292
- Added support for Luxi (the native Ganeti protocol)
293
- Added support for simulated clusters (for hspace only)
294
- Added timeouts for the RAPI backend
295
- Fixed a few inconsistencies in the command line handling
296
- Fixed handling of errors while loading data
297
- The 'network' is a new dependency due to the Luxi addition
298

  
299

  
300
Version 0.1.5 (Thu, 09 Jul 2009)
301
--------------------------------
302

  
303
- Removed obsolete hn1 program; this allowed removal of a lot of
304
  supporting code
305
- Lots of changes in hspace: the output now is a shell fragment in order
306
  for script to source it or parse it easier; added failure reasons;
307
  optimised to use less memory for large clusters
308
- Optimized the scoring algorithm (used by all tools) so that now
309
  computations should be faster
310

  
311

  
312
Version 0.1.4 (Tue, 16 Jun 2009)
313
--------------------------------
314

  
315
- Added CPU count/ratio of virtual-to-physical CPUs to the cluster
316
  scoring methods; this means that now the balancer, the iallocator
317
  plugin and so on will try to keep the VCPU-to-PCPU ratio equal across
318
  the cluster
319
- Fixed some hscan bugs
320
- Fixed the way iallocator reads the total disk size (was broken and it
321
  was always falling back to summing the disk sizes)
322
- Internals: fixed most compile-time warnings
323

  
324

  
325
Version 0.1.3 (Fri, 05 Jun 2009)
326
--------------------------------
327

  
328
- Fix a bug in the ReplacePrimary instance moves, affecting most of the
329
  tools
330

  
331

  
332
Version 0.1.2 (Tue, 02 Jun 2009)
333
--------------------------------
334

  
335
- Add a new program, “hspace”, which computes the free space on a
336
  cluster (based on a given instance spec)
337
- Improvements in API docs and partially in the user docs
338
- Started adding unittests
339

  
340

  
341
Version 0.1.1 (Tue, 26 May 2009)
342
--------------------------------
343

  
344
- Add a new program, “hail”, which is an iallocator plugin and can
345
  allocate/relocate instances
346
- Experimental support for non-mirrored instances (hail supports them,
347
  hbal should no longer abort when it finds such instances and simply
348
  ignore them)
349
- The RAPI port and/or scheme can be overriden now, and even “file://”
350
  schemes can be used if the message body has been saved under the
351
  appropriate name
352
- Lots of code reorganization, esp. rewritten loading pipeline
353
- Better data checking and better error messages in case validation
354
  fails; tools now consider nodes with error in input data (‘?’ returned
355
  by ganeti) as offline
356
- Small enhancement to the makefile for simpler packaging
357

  
358

  
359
Version 0.1.0 (Tue, 19 May 2009)
360
--------------------------------
361

  
362
- Drop compatibility with Ganeti 1.2
363
- Add a new minimum score option (with a very low default), should help
364
  with very good clusters (but is still not optimal)
365
- Add a --quiet option to hbal
366
- Add support for reading offline nodes directly from the cluster
367

  
368

  
369
Version 0.0.8 (Tue, 21 Apr 2009)
370
--------------------------------
371

  
372
- hbal: prevent mismatches in wrong node names being passed to -O, by
373
  aborting in this case
374
- add the ability to write the commands (-C) to a script via (-C<file>),
375
  so that it can be later executed directly; this has also changed the
376
  commands to include the ncessary -f flags to skip confirmations
377
- add checks for extra argument in hbal and hn1, so that unintended
378
  errors are catched
379
- raise the accepted “missing” memory limit to 512MB, to cover usual Xen
380
  reservations
381

  
382

  
383
Version 0.0.7 (Mon, 23 Mar 2009)
384
--------------------------------
385

  
386
- added support for offline nodes, which are not used as targets for
387
  instance relocation and if they hold instances the hbal algorithm will
388
  attempt to relocate these away
389
- added support for offline instances, which now will no longer skew the
390
  free memory estimation of nodes; the algorithm will no longer create
391
  conditions for N+1 failures when such instances are later started
392
- implemented a complete model of node resources, in order to prevent an
393
  unintended re-occurrence of cases like the offline instance were we
394
  miscalculate some node resource; this gives warning now in case the
395
  node reported free disk or free memory deviates by more than a set
396
  amount from the expected value
397
- a new tool *hscan* that can generate the input text-file for the other
398
  tools by collection via RAPI
399
- some small changes to the build system to make it more friendly; also
400
  included the generated documentation in the source archive
401

  
402

  
403
Version 0.0.6 (Mon, 16 Mar 2009)
404
--------------------------------
405

  
406
- re-factored the hbal algorithm to make it stable in the sense that it
407
  gives the same solution when restarted from the middle; barring
408
  rounding of disk/memory and incomplete reporting from Ganeti (for
409
  1.2), it should be now feasible to rely on its output without
410
  generating moves ad infinitum
411
- the hbal algorithm now uses two more variables: the node N+1 failures
412
  and the amount of reserved memory; the first of which tries to ‘fix’
413
  the N+1 status, the latter tries to distribute secondaries more
414
  equally
415
- the hbal algorithm now uses two more moves at each step:
416
  replace+failover and failover+replace (besides the original failover,
417
  replace, and failover+replace+failover)
418
- slightly changed the build system to embed GIT version/tags into the
419
  binaries so that we know for a binary from which tree it was done,
420
  either via ‘--version’ or via “strings hbal|grep version”
421
- changed the solution list and in general the hbal output to be more
422
  clear by default, and changed “gnt-instance failover” to “gnt-instance
423
  migrate”
424
- added man pages for the two binaries
425

  
426

  
427
Version 0.0.5 (Mon, 09 Mar 2009)
428
--------------------------------
429

  
430
- a few small improvements for hbal (possibly undone by later changes),
431
  hbal is now quite faster
432
- fix documentation building
433
- allow hbal to work on non N+1 compliant clusters, but without
434
  guarantees that the end cluster will be compliant; in any case, this
435
  should give a smaller number of nodes that are not compliant if the
436
  cluster state permits it
437
- strip common domain suffix from nodes and instances, so that output is
438
  shorter and hopefully clearer
439

  
440

  
441
Version 0.0.4 (Sun, 15 Feb 2009)
442
--------------------------------
443

  
444
- better balancing algorithm in hbal
445
- implemented an RAPI collector, now the cluster data can be gathered
446
  automatically via RAPI and doesn't need manual export of node and
447
  instance list
448

  
449

  
450
Version 0.0.3 (Wed, 28 Jan 2009)
451
--------------------------------
452

  
453
- initial release of the hbal, a cluster rebalancing tool
454
- input data format changed due to hbal requirements
455

  
456

  
457
Version 0.0.2 (Tue, 06 Jan 2009)
458
--------------------------------
459

  
460
- fix handling of some common cases (cluster N+1 compliant from the
461
  start, too big depth given, failure to compute solution)
462
- add option to print the needed command list for reaching the proposed
463
  solution
464

  
465

  
466
Version 0.0.1 (Tue, 06 Jan 2009)
467
--------------------------------
468

  
469
- initial release of hn1 tool
470

  
471
.. vim: set textwidth=72 :
472
.. Local Variables:
473
.. mode: rst
474
.. fill-column: 72
475
.. End:
b/htools/OLD-NEWS
1
Ganeti-htools release notes
2
===========================
3

  
4

  
5
**Note**: After version 0.3.1, the htools sources have been integrated
6
into the ganeti core repository, and released together with the ganeti
7
releases. Thus this NEWS file is obsolete.
8

  
9
Version 0.3.1 (Fri, 11 Mar 2011)
10
--------------------------------
11

  
12
Minor bugfix release:
13

  
14
- Fixed source archive generation: the hscolour.css file was an invalid
15
  symlink, and the man pages were not correctly timestamped (leading to
16
  unneeded build-time rebuilds)
17
- Improved the Luxi backend to show which attribute fails parsing
18
- Small improvements to the man pages, and also ship the HTML version of
19
  man pages in the source archive
20

  
21

  
22
Version 0.3.0 (Fri, 04 Feb 2011)
23
--------------------------------
24

  
25
A significant release that breaks compatibility with Ganeti versions
26
below 2.4 due to the node group changes. Only the RAPI backend can talk
27
to older clusters, but it is recommended to use this version only with
28
Ganeti 2.4.
29

  
30
All commands are now multi-group aware (but to various degrees), so
31
allocation, balancing and capacity calculation respects the group layout
32
and will not create “broken” instances by using nodes from different
33
groups.
34

  
35
For a regular, single-group cluster, no changes should be directly
36
visible to the users. A multi-group cluster however will change some
37
things slightly:
38

  
39
- hbal will require a target group to operate on (no cluster-wide
40
  balancing yet)
41
- evacuation of (DRBD) instances from a node will be restricted to nodes
42
  in the same group, as inter-group moves are not implemented yet
43
- capacity, while showing correct data, will not give per-group details
44
  yet
45

  
46
There are other changes in this release:
47

  
48
- fixed a long-standing bug in hscan related to node memory data
49
- changed the text backend format, which unfortunately invalidates old
50
  files
51
- error handling improvements, so that invalid input data reports better
52
  where the error is
53
- the simulation backend changes its syntax, now it takes the allocation
54
  policy too, and can generate multiple groups
55
- (internal) man page generation moved to pandoc from hand-written,
56
  which is helpful as it can also generate HTML versions
57
- the balancing algorithm has been changed to work in parallel, if the
58
  code is linked against the multi-threaded runtime; this gives a very
59
  good speedup (~80% on 4 cores, ~60-70% of 12 cores)
60

  
61
Version 0.2.8 (Thu, 23 Dec 2010)
62
--------------------------------
63

  
64
A bug fix release:
65

  
66
- fixed balancing function for big clusters, which will improve corner
67
  cases where hbal didn't see any solution even though the cluster was
68
  obviously not well balanced
69
- fixed exit code of hbal in case of (Luxi) job errors
70
- changed the signal handling in hbal in order to make hbal control
71
  easier: instead of synchronising on the count of signals, make SIGINT
72
  cause graceful termination, and SIGTERM an immediate one
73
- increased the tag exclusion weight so that it has greater importance
74
  during the balancing
75
- slight improvement to the speed of balancing via algorithm tweaks
76

  
77

  
78
Version 0.2.7 (Thu, 07 Oct 2010)
79
--------------------------------
80

  
81
Bug fixes:
82

  
83
- fixed the error message for hail multi-evacuation mode
84
- improve evacuation mode for offline secondary nodes (ignore available
85
  memory)
86

  
87
New features:
88

  
89
- add a new option ``-S`` to hbal and hspace that saves the cluster
90
  state at the end of the processing in the text format used by the
91
  ``-t`` option, for later re-processing
92
- a two new options to hbal, -g and --min-gain-limit, that should help
93
  in limiting the number of balances steps with a low gain in the final
94
  stages
95
- hbal, when executing jobs, will now wait for the current jobs to
96
  finish at the first stop (e.g. ^C); if the user wants immediate exit,
97
  another signal should be sent
98
- added “normalized” physical CPU units in hspace output (NPU), which
99
  represents units of physical CPUs free/used, based on the max-cpu
100
  ratio
101

  
102

  
103
Version 0.2.6 (Mon, 26 Jul 2010)
104
--------------------------------
105

  
106
Exactly three months since the last release. Many internal changes, plus
107
a couple of important changes in the balancing algorithm.
108

  
109
First, the balancing may now introduce N+1 errors, if this solves other,
110
more critical problems. For the moment, this means that moving instances
111
away from offline nodes is allowed even if it creates N+1 errors, and
112
that means evacuation can be done in more cases.
113

  
114
Second, the scoring for N+1 has changed. In previous versions, it simply
115
counted the number of failing N+1 nodes, which means moving an instance
116
away from a N+1 failed node (but without the node 'clearing' the N+1
117
status) was not reflected in the cluster score. As such, the balancing
118
algorithm managed to clear N+1 errors only sometimes, since usually it
119
takes more than one move for this, and the first prerequisite move was
120
not 'rewarded' appropriately and thus it was not selected. Now, it is
121
possible to fix many more error cases than before: on a simulated 40
122
node cluster full with instances (symmetrically allocated on all nodes),
123
around five nodes can be evacuated before N+1 errors can be solved,
124
whereas 0.2.5 could evacuate at best one node.
125

  
126
There were some other internal changes to the scoring algorithm, such
127
that now the metrics have associated weights, and they are not all of
128
the same importance anymore. As of now, the only change is that offline
129
instances have a higher weight, which should favour proper node
130
evacuations.
131

  
132
Among the other changes:
133

  
134
- fixed the hspace KM_POOL_* metrics, which were returned as the final
135
  state and not as the delta between the initial and final states
136
- fixed hspace handling of N+1 failing clusters: before, it used to
137
  generate a 'fake' response, and the structure of this response was not
138
  always in sync with the real responses, leading to missing items;
139
  currently it proceeds correctly through the code (skipping the
140
  computation), and uses the same display mechanisms as the normal case
141
- fixed hscan exit code for RAPI failures: previously it finished with
142
  success even if all the clusters failed, which was creating issues
143
  with the live-test script; now it exits with exit code 2 for RAPI
144
  failures (unfortunately this is still not optimal as LUXI failures
145
  will use exit code 1, the same as the command line)
146
- changed the limit values for CPU/disk, which previously were used
147
  optionally, whereas now they are always used; the default cpu ratio
148
  limit is now 64 VCPUs per PCPU
149
- changed the internal handling of the short name vs. original
150
  (Ganeti-provided) name; now internally we always use the full name,
151
  and only in display routines we show the shortened (called 'alias')
152
  name; as a result, the -O and --excluded-instances options now accept
153
  both the full name and the shortened name
154
- changed internal handling of JSON conversions and errors, such that
155
  now we show a better context for failure messages, which should help
156
  with diagnosing the malformed message
157
- changed the names for a few node fields, and added some more nodes;
158
  this is most likely to help with debugging, and not with regular
159
  operation though
160
- changed the node fields option to allow the '+' prefix to mean 'extend
161
  the default fields list' rather than start from fresh (similar to
162
  Ganeti's implementation)
163
- a few internal changes related to the LUXI protocol implementation,
164
  which should make it more safe against potential bugs, one
165
  optiomization that should help with large messages, and some patches
166
  in preparation for potential expansion of the LUXI backend functionality
167

  
168
And finally, many improvements on unittests and the live-test
169
script. Test coverage is much enhanced, and the test infrastructure has
170
better error reporting; this should lead down-the-road to better code
171
and fewer bugs…
172

  
173

  
174
Version 0.2.5 (Mon, 26 Apr 2010)
175
--------------------------------
176

  
177
Some internal cleanup plus a few user-visible changes:
178

  
179
- new option for marking instances as 'do-not-move' during rebalancing
180
- allow ``hscan`` to scan the local cluster via Luxi
181
- add more metrics to ``hspace`` which show the delta between original
182
  state and final state better (only valid for tiered allocation)
183

  
184

  
185
Version 0.2.4 (Mon, 22 Feb 2010)
186
--------------------------------
187

  
188
Two improvements for node evacuation:
189

  
190
- hbal takes a new parameter ``--evac-mode`` that restricts the
191
  instances to be moved to the ones on offline/drained nodes, which
192
  should reduce the work done
193
- hail supports the new ``multi-evacuate`` mode of the IAllocator
194
  protocol, that will be released in a minor release on the Ganeti 2.1
195
  branch
196

  
197

  
198
Version 0.2.3 (Thu,  4 Feb 2010)
199
--------------------------------
200

  
201
A small release:
202

  
203
- Fixes selection of secondary node: previously, if the cluster had
204
  many N+1 failures, a N+1 failed node could be selected as secondary
205
  even if it did not have enough memory to allow the instance to be
206
  migrated/failed over to it; this is bad for automated tools, since
207
  we can get the cluster in an unhealthy state
208
- Switch the text backend to a single input file, that is generated
209
  now by hscan and shouldn't be generated manually via
210
  gnt-node/instance list anymore; this allows richer information to be
211
  kept in the file, and simplifies a little the internals of the text
212
  backend
213

  
214

  
215
Version 0.2.2 (Tue, 29 Dec 2009)
216
--------------------------------
217

  
218
Small release, 0.2.1 was broken and thus this was released earlier:
219

  
220
- Release 0.2.1 broke the LUXI backend due to a typo, fixed
221
- Added a live-test script that should catch errors like the above one
222
  in the future (needs a working, non-empty cluster)
223
- Changed RAPI and LUXI backends to treat drained nodes as offline,
224
  similar to the IAllocator backend change in 0.2.0 (which was wrongly
225
  marked as affecting all backends)
226
- Changed the metrics for offline instances and N1 score from percent to
227
  count, in order to increase the priority of evacuations
228
- Added a new metric (offline primary instances) which should fix the
229
  evacuation of a offline node in a 2-node cluster
230

  
231

  
232
Version 0.2.1 (Wed,  2 Dec 2009)
233
--------------------------------
234

  
235
- Added instance exclusion defined via instance tags
236
- Fixed the output of hspace to be again parseable from the shell
237

  
238

  
239
Version 0.2.0 (Tue, 10 Nov 2009)
240
--------------------------------
241

  
242
A significant release, with a few new major features:
243

  
244
- Added direct execution of the hbal solution when using the Luxi
245
  backend; the steps for each instance moves are submitted as a single
246
  jobs, and the different jobs are submitted as groups in order to
247
  parallelise the execution of moves
248
- Added support for balancing based on dynamic utilisation data for
249
  instances, fed in via a text file; by default, all instances are
250
  considered equal and this change also improves the equalisation of
251
  secondary instances per node
252
- Added support for tiered capacity calculation in hspace, where we
253
  start from a maximum instance spec and decrease the spec when we run
254
  out of resources; this should give a better measure of available
255
  capacity on 'fragmented' clusters; this is done separately from the
256
  current fixed-mode computation
257

  
258
Also there have been many minor improvements:
259

  
260
- Added option for showing instances (“--print-instances”), similar to
261
  the print nodes option
262
- Added support for customising the node list via an argument to the
263
  print nodes option in the form of a comma-separated list of field
264
  names; currently the field names are not documented, expecting further
265
  changes in a next release
266
- Enhanced the error reporting in the Luxi and Rapi backends
267
- Changed the handling of drained nodes, now being treated the same as
268
  offline nodes, for Ganeti 2.0.4+ compatibility
269
- A number of internal changes, simplifying code and merging some
270
  disparate functions
271
- Simplify the build system in relation to creation of archives
272

  
273

  
274
Version 0.1.8 (Tue, 29 Sep 2009)
275
--------------------------------
276

  
277
- Brown-paper-bag release fixing haddock issues
278

  
279

  
280
Version 0.1.7 (Mon, 28 Sep 2009)
281
--------------------------------
282

  
283
- Fixed a bug in the Luxi backend for big responses
284
- Fixed test suite exit code in presence of test failures
285
- Changed the migrate operation to run instead failover for instances
286
  which were marked as not running in the input data (this could have
287
  been changed since then, but it's better than today's always migrate)
288
- Added support for 'cheap' moves only (only migrate/failover) in
289
  balancing
290
- Added support for building without curl (thus no RAPI backend)
291

  
292

  
293
Version 0.1.6 (Wed, 19 Aug 2009)
294
--------------------------------
295

  
296
- Added support for Luxi (the native Ganeti protocol)
297
- Added support for simulated clusters (for hspace only)
298
- Added timeouts for the RAPI backend
299
- Fixed a few inconsistencies in the command line handling
300
- Fixed handling of errors while loading data
301
- The 'network' is a new dependency due to the Luxi addition
302

  
303

  
304
Version 0.1.5 (Thu, 09 Jul 2009)
305
--------------------------------
306

  
307
- Removed obsolete hn1 program; this allowed removal of a lot of
308
  supporting code
309
- Lots of changes in hspace: the output now is a shell fragment in order
310
  for script to source it or parse it easier; added failure reasons;
311
  optimised to use less memory for large clusters
312
- Optimized the scoring algorithm (used by all tools) so that now
313
  computations should be faster
314

  
315

  
316
Version 0.1.4 (Tue, 16 Jun 2009)
317
--------------------------------
318

  
319
- Added CPU count/ratio of virtual-to-physical CPUs to the cluster
320
  scoring methods; this means that now the balancer, the iallocator
321
  plugin and so on will try to keep the VCPU-to-PCPU ratio equal across
322
  the cluster
323
- Fixed some hscan bugs
324
- Fixed the way iallocator reads the total disk size (was broken and it
325
  was always falling back to summing the disk sizes)
326
- Internals: fixed most compile-time warnings
327

  
328

  
329
Version 0.1.3 (Fri, 05 Jun 2009)
330
--------------------------------
331

  
332
- Fix a bug in the ReplacePrimary instance moves, affecting most of the
333
  tools
334

  
335

  
336
Version 0.1.2 (Tue, 02 Jun 2009)
337
--------------------------------
338

  
339
- Add a new program, “hspace”, which computes the free space on a
340
  cluster (based on a given instance spec)
341
- Improvements in API docs and partially in the user docs
342
- Started adding unittests
343

  
344

  
345
Version 0.1.1 (Tue, 26 May 2009)
346
--------------------------------
347

  
348
- Add a new program, “hail”, which is an iallocator plugin and can
349
  allocate/relocate instances
350
- Experimental support for non-mirrored instances (hail supports them,
351
  hbal should no longer abort when it finds such instances and simply
352
  ignore them)
353
- The RAPI port and/or scheme can be overriden now, and even “file://”
354
  schemes can be used if the message body has been saved under the
355
  appropriate name
356
- Lots of code reorganization, esp. rewritten loading pipeline
357
- Better data checking and better error messages in case validation
358
  fails; tools now consider nodes with error in input data (‘?’ returned
359
  by ganeti) as offline
360
- Small enhancement to the makefile for simpler packaging
361

  
362

  
363
Version 0.1.0 (Tue, 19 May 2009)
364
--------------------------------
365

  
366
- Drop compatibility with Ganeti 1.2
367
- Add a new minimum score option (with a very low default), should help
368
  with very good clusters (but is still not optimal)
369
- Add a --quiet option to hbal
370
- Add support for reading offline nodes directly from the cluster
371

  
372

  
373
Version 0.0.8 (Tue, 21 Apr 2009)
374
--------------------------------
375

  
376
- hbal: prevent mismatches in wrong node names being passed to -O, by
377
  aborting in this case
378
- add the ability to write the commands (-C) to a script via (-C<file>),
379
  so that it can be later executed directly; this has also changed the
380
  commands to include the ncessary -f flags to skip confirmations
381
- add checks for extra argument in hbal and hn1, so that unintended
382
  errors are catched
383
- raise the accepted “missing” memory limit to 512MB, to cover usual Xen
384
  reservations
385

  
386

  
387
Version 0.0.7 (Mon, 23 Mar 2009)
388
--------------------------------
389

  
390
- added support for offline nodes, which are not used as targets for
391
  instance relocation and if they hold instances the hbal algorithm will
392
  attempt to relocate these away
393
- added support for offline instances, which now will no longer skew the
394
  free memory estimation of nodes; the algorithm will no longer create
395
  conditions for N+1 failures when such instances are later started
396
- implemented a complete model of node resources, in order to prevent an
397
  unintended re-occurrence of cases like the offline instance were we
398
  miscalculate some node resource; this gives warning now in case the
399
  node reported free disk or free memory deviates by more than a set
400
  amount from the expected value
401
- a new tool *hscan* that can generate the input text-file for the other
402
  tools by collection via RAPI
403
- some small changes to the build system to make it more friendly; also
404
  included the generated documentation in the source archive
405

  
406

  
407
Version 0.0.6 (Mon, 16 Mar 2009)
408
--------------------------------
409

  
410
- re-factored the hbal algorithm to make it stable in the sense that it
411
  gives the same solution when restarted from the middle; barring
412
  rounding of disk/memory and incomplete reporting from Ganeti (for
413
  1.2), it should be now feasible to rely on its output without
414
  generating moves ad infinitum
415
- the hbal algorithm now uses two more variables: the node N+1 failures
416
  and the amount of reserved memory; the first of which tries to ‘fix’
417
  the N+1 status, the latter tries to distribute secondaries more
418
  equally
419
- the hbal algorithm now uses two more moves at each step:
420
  replace+failover and failover+replace (besides the original failover,
421
  replace, and failover+replace+failover)
422
- slightly changed the build system to embed GIT version/tags into the
423
  binaries so that we know for a binary from which tree it was done,
424
  either via ‘--version’ or via “strings hbal|grep version”
425
- changed the solution list and in general the hbal output to be more
426
  clear by default, and changed “gnt-instance failover” to “gnt-instance
427
  migrate”
428
- added man pages for the two binaries
429

  
430

  
431
Version 0.0.5 (Mon, 09 Mar 2009)
432
--------------------------------
433

  
434
- a few small improvements for hbal (possibly undone by later changes),
435
  hbal is now quite faster
436
- fix documentation building
437
- allow hbal to work on non N+1 compliant clusters, but without
438
  guarantees that the end cluster will be compliant; in any case, this
439
  should give a smaller number of nodes that are not compliant if the
440
  cluster state permits it
441
- strip common domain suffix from nodes and instances, so that output is
442
  shorter and hopefully clearer
443

  
444

  
445
Version 0.0.4 (Sun, 15 Feb 2009)
446
--------------------------------
447

  
448
- better balancing algorithm in hbal
449
- implemented an RAPI collector, now the cluster data can be gathered
450
  automatically via RAPI and doesn't need manual export of node and
451
  instance list
452

  
453

  
454
Version 0.0.3 (Wed, 28 Jan 2009)
455
--------------------------------
456

  
457
- initial release of the hbal, a cluster rebalancing tool
458
- input data format changed due to hbal requirements
459

  
460

  
461
Version 0.0.2 (Tue, 06 Jan 2009)
462
--------------------------------
463

  
464
- fix handling of some common cases (cluster N+1 compliant from the
465
  start, too big depth given, failure to compute solution)
466
- add option to print the needed command list for reaching the proposed
467
  solution
468

  
469

  
470
Version 0.0.1 (Tue, 06 Jan 2009)
471
--------------------------------
472

  
473
- initial release of hn1 tool
474

  
475
.. vim: set textwidth=72 :
476
.. Local Variables:
477
.. mode: rst
478
.. fill-column: 72
479
.. End:

Also available in: Unified diff