Revision cc6d2673
/dev/null | ||
---|---|---|
1 |
Ganeti-htools release notes |
|
2 |
=========================== |
|
3 |
|
|
4 |
|
|
5 |
Version 0.3.1 (Fri, 11 Mar 2011) |
|
6 |
-------------------------------- |
|
7 |
|
|
8 |
Minor bugfix release: |
|
9 |
|
|
10 |
- Fixed source archive generation: the hscolour.css file was an invalid |
|
11 |
symlink, and the man pages were not correctly timestamped (leading to |
|
12 |
unneeded build-time rebuilds) |
|
13 |
- Improved the Luxi backend to show which attribute fails parsing |
|
14 |
- Small improvements to the man pages, and also ship the HTML version of |
|
15 |
man pages in the source archive |
|
16 |
|
|
17 |
|
|
18 |
Version 0.3.0 (Fri, 04 Feb 2011) |
|
19 |
-------------------------------- |
|
20 |
|
|
21 |
A significant release that breaks compatibility with Ganeti versions |
|
22 |
below 2.4 due to the node group changes. Only the RAPI backend can talk |
|
23 |
to older clusters, but it is recommended to use this version only with |
|
24 |
Ganeti 2.4. |
|
25 |
|
|
26 |
All commands are now multi-group aware (but to various degrees), so |
|
27 |
allocation, balancing and capacity calculation respects the group layout |
|
28 |
and will not create “broken” instances by using nodes from different |
|
29 |
groups. |
|
30 |
|
|
31 |
For a regular, single-group cluster, no changes should be directly |
|
32 |
visible to the users. A multi-group cluster however will change some |
|
33 |
things slightly: |
|
34 |
|
|
35 |
- hbal will require a target group to operate on (no cluster-wide |
|
36 |
balancing yet) |
|
37 |
- evacuation of (DRBD) instances from a node will be restricted to nodes |
|
38 |
in the same group, as inter-group moves are not implemented yet |
|
39 |
- capacity, while showing correct data, will not give per-group details |
|
40 |
yet |
|
41 |
|
|
42 |
There are other changes in this release: |
|
43 |
|
|
44 |
- fixed a long-standing bug in hscan related to node memory data |
|
45 |
- changed the text backend format, which unfortunately invalidates old |
|
46 |
files |
|
47 |
- error handling improvements, so that invalid input data reports better |
|
48 |
where the error is |
|
49 |
- the simulation backend changes its syntax, now it takes the allocation |
|
50 |
policy too, and can generate multiple groups |
|
51 |
- (internal) man page generation moved to pandoc from hand-written, |
|
52 |
which is helpful as it can also generate HTML versions |
|
53 |
- the balancing algorithm has been changed to work in parallel, if the |
|
54 |
code is linked against the multi-threaded runtime; this gives a very |
|
55 |
good speedup (~80% on 4 cores, ~60-70% of 12 cores) |
|
56 |
|
|
57 |
Version 0.2.8 (Thu, 23 Dec 2010) |
|
58 |
-------------------------------- |
|
59 |
|
|
60 |
A bug fix release: |
|
61 |
|
|
62 |
- fixed balancing function for big clusters, which will improve corner |
|
63 |
cases where hbal didn't see any solution even though the cluster was |
|
64 |
obviously not well balanced |
|
65 |
- fixed exit code of hbal in case of (Luxi) job errors |
|
66 |
- changed the signal handling in hbal in order to make hbal control |
|
67 |
easier: instead of synchronising on the count of signals, make SIGINT |
|
68 |
cause graceful termination, and SIGTERM an immediate one |
|
69 |
- increased the tag exclusion weight so that it has greater importance |
|
70 |
during the balancing |
|
71 |
- slight improvement to the speed of balancing via algorithm tweaks |
|
72 |
|
|
73 |
|
|
74 |
Version 0.2.7 (Thu, 07 Oct 2010) |
|
75 |
-------------------------------- |
|
76 |
|
|
77 |
Bug fixes: |
|
78 |
|
|
79 |
- fixed the error message for hail multi-evacuation mode |
|
80 |
- improve evacuation mode for offline secondary nodes (ignore available |
|
81 |
memory) |
|
82 |
|
|
83 |
New features: |
|
84 |
|
|
85 |
- add a new option ``-S`` to hbal and hspace that saves the cluster |
|
86 |
state at the end of the processing in the text format used by the |
|
87 |
``-t`` option, for later re-processing |
|
88 |
- a two new options to hbal, -g and --min-gain-limit, that should help |
|
89 |
in limiting the number of balances steps with a low gain in the final |
|
90 |
stages |
|
91 |
- hbal, when executing jobs, will now wait for the current jobs to |
|
92 |
finish at the first stop (e.g. ^C); if the user wants immediate exit, |
|
93 |
another signal should be sent |
|
94 |
- added “normalized” physical CPU units in hspace output (NPU), which |
|
95 |
represents units of physical CPUs free/used, based on the max-cpu |
|
96 |
ratio |
|
97 |
|
|
98 |
|
|
99 |
Version 0.2.6 (Mon, 26 Jul 2010) |
|
100 |
-------------------------------- |
|
101 |
|
|
102 |
Exactly three months since the last release. Many internal changes, plus |
|
103 |
a couple of important changes in the balancing algorithm. |
|
104 |
|
|
105 |
First, the balancing may now introduce N+1 errors, if this solves other, |
|
106 |
more critical problems. For the moment, this means that moving instances |
|
107 |
away from offline nodes is allowed even if it creates N+1 errors, and |
|
108 |
that means evacuation can be done in more cases. |
|
109 |
|
|
110 |
Second, the scoring for N+1 has changed. In previous versions, it simply |
|
111 |
counted the number of failing N+1 nodes, which means moving an instance |
|
112 |
away from a N+1 failed node (but without the node 'clearing' the N+1 |
|
113 |
status) was not reflected in the cluster score. As such, the balancing |
|
114 |
algorithm managed to clear N+1 errors only sometimes, since usually it |
|
115 |
takes more than one move for this, and the first prerequisite move was |
|
116 |
not 'rewarded' appropriately and thus it was not selected. Now, it is |
|
117 |
possible to fix many more error cases than before: on a simulated 40 |
|
118 |
node cluster full with instances (symmetrically allocated on all nodes), |
|
119 |
around five nodes can be evacuated before N+1 errors can be solved, |
|
120 |
whereas 0.2.5 could evacuate at best one node. |
|
121 |
|
|
122 |
There were some other internal changes to the scoring algorithm, such |
|
123 |
that now the metrics have associated weights, and they are not all of |
|
124 |
the same importance anymore. As of now, the only change is that offline |
|
125 |
instances have a higher weight, which should favour proper node |
|
126 |
evacuations. |
|
127 |
|
|
128 |
Among the other changes: |
|
129 |
|
|
130 |
- fixed the hspace KM_POOL_* metrics, which were returned as the final |
|
131 |
state and not as the delta between the initial and final states |
|
132 |
- fixed hspace handling of N+1 failing clusters: before, it used to |
|
133 |
generate a 'fake' response, and the structure of this response was not |
|
134 |
always in sync with the real responses, leading to missing items; |
|
135 |
currently it proceeds correctly through the code (skipping the |
|
136 |
computation), and uses the same display mechanisms as the normal case |
|
137 |
- fixed hscan exit code for RAPI failures: previously it finished with |
|
138 |
success even if all the clusters failed, which was creating issues |
|
139 |
with the live-test script; now it exits with exit code 2 for RAPI |
|
140 |
failures (unfortunately this is still not optimal as LUXI failures |
|
141 |
will use exit code 1, the same as the command line) |
|
142 |
- changed the limit values for CPU/disk, which previously were used |
|
143 |
optionally, whereas now they are always used; the default cpu ratio |
|
144 |
limit is now 64 VCPUs per PCPU |
|
145 |
- changed the internal handling of the short name vs. original |
|
146 |
(Ganeti-provided) name; now internally we always use the full name, |
|
147 |
and only in display routines we show the shortened (called 'alias') |
|
148 |
name; as a result, the -O and --excluded-instances options now accept |
|
149 |
both the full name and the shortened name |
|
150 |
- changed internal handling of JSON conversions and errors, such that |
|
151 |
now we show a better context for failure messages, which should help |
|
152 |
with diagnosing the malformed message |
|
153 |
- changed the names for a few node fields, and added some more nodes; |
|
154 |
this is most likely to help with debugging, and not with regular |
|
155 |
operation though |
|
156 |
- changed the node fields option to allow the '+' prefix to mean 'extend |
|
157 |
the default fields list' rather than start from fresh (similar to |
|
158 |
Ganeti's implementation) |
|
159 |
- a few internal changes related to the LUXI protocol implementation, |
|
160 |
which should make it more safe against potential bugs, one |
|
161 |
optiomization that should help with large messages, and some patches |
|
162 |
in preparation for potential expansion of the LUXI backend functionality |
|
163 |
|
|
164 |
And finally, many improvements on unittests and the live-test |
|
165 |
script. Test coverage is much enhanced, and the test infrastructure has |
|
166 |
better error reporting; this should lead down-the-road to better code |
|
167 |
and fewer bugs… |
|
168 |
|
|
169 |
|
|
170 |
Version 0.2.5 (Mon, 26 Apr 2010) |
|
171 |
-------------------------------- |
|
172 |
|
|
173 |
Some internal cleanup plus a few user-visible changes: |
|
174 |
|
|
175 |
- new option for marking instances as 'do-not-move' during rebalancing |
|
176 |
- allow ``hscan`` to scan the local cluster via Luxi |
|
177 |
- add more metrics to ``hspace`` which show the delta between original |
|
178 |
state and final state better (only valid for tiered allocation) |
|
179 |
|
|
180 |
|
|
181 |
Version 0.2.4 (Mon, 22 Feb 2010) |
|
182 |
-------------------------------- |
|
183 |
|
|
184 |
Two improvements for node evacuation: |
|
185 |
|
|
186 |
- hbal takes a new parameter ``--evac-mode`` that restricts the |
|
187 |
instances to be moved to the ones on offline/drained nodes, which |
|
188 |
should reduce the work done |
|
189 |
- hail supports the new ``multi-evacuate`` mode of the IAllocator |
|
190 |
protocol, that will be released in a minor release on the Ganeti 2.1 |
|
191 |
branch |
|
192 |
|
|
193 |
|
|
194 |
Version 0.2.3 (Thu, 4 Feb 2010) |
|
195 |
-------------------------------- |
|
196 |
|
|
197 |
A small release: |
|
198 |
|
|
199 |
- Fixes selection of secondary node: previously, if the cluster had |
|
200 |
many N+1 failures, a N+1 failed node could be selected as secondary |
|
201 |
even if it did not have enough memory to allow the instance to be |
|
202 |
migrated/failed over to it; this is bad for automated tools, since |
|
203 |
we can get the cluster in an unhealthy state |
|
204 |
- Switch the text backend to a single input file, that is generated |
|
205 |
now by hscan and shouldn't be generated manually via |
|
206 |
gnt-node/instance list anymore; this allows richer information to be |
|
207 |
kept in the file, and simplifies a little the internals of the text |
|
208 |
backend |
|
209 |
|
|
210 |
|
|
211 |
Version 0.2.2 (Tue, 29 Dec 2009) |
|
212 |
-------------------------------- |
|
213 |
|
|
214 |
Small release, 0.2.1 was broken and thus this was released earlier: |
|
215 |
|
|
216 |
- Release 0.2.1 broke the LUXI backend due to a typo, fixed |
|
217 |
- Added a live-test script that should catch errors like the above one |
|
218 |
in the future (needs a working, non-empty cluster) |
|
219 |
- Changed RAPI and LUXI backends to treat drained nodes as offline, |
|
220 |
similar to the IAllocator backend change in 0.2.0 (which was wrongly |
|
221 |
marked as affecting all backends) |
|
222 |
- Changed the metrics for offline instances and N1 score from percent to |
|
223 |
count, in order to increase the priority of evacuations |
|
224 |
- Added a new metric (offline primary instances) which should fix the |
|
225 |
evacuation of a offline node in a 2-node cluster |
|
226 |
|
|
227 |
|
|
228 |
Version 0.2.1 (Wed, 2 Dec 2009) |
|
229 |
-------------------------------- |
|
230 |
|
|
231 |
- Added instance exclusion defined via instance tags |
|
232 |
- Fixed the output of hspace to be again parseable from the shell |
|
233 |
|
|
234 |
|
|
235 |
Version 0.2.0 (Tue, 10 Nov 2009) |
|
236 |
-------------------------------- |
|
237 |
|
|
238 |
A significant release, with a few new major features: |
|
239 |
|
|
240 |
- Added direct execution of the hbal solution when using the Luxi |
|
241 |
backend; the steps for each instance moves are submitted as a single |
|
242 |
jobs, and the different jobs are submitted as groups in order to |
|
243 |
parallelise the execution of moves |
|
244 |
- Added support for balancing based on dynamic utilisation data for |
|
245 |
instances, fed in via a text file; by default, all instances are |
|
246 |
considered equal and this change also improves the equalisation of |
|
247 |
secondary instances per node |
|
248 |
- Added support for tiered capacity calculation in hspace, where we |
|
249 |
start from a maximum instance spec and decrease the spec when we run |
|
250 |
out of resources; this should give a better measure of available |
|
251 |
capacity on 'fragmented' clusters; this is done separately from the |
|
252 |
current fixed-mode computation |
|
253 |
|
|
254 |
Also there have been many minor improvements: |
|
255 |
|
|
256 |
- Added option for showing instances (“--print-instances”), similar to |
|
257 |
the print nodes option |
|
258 |
- Added support for customising the node list via an argument to the |
|
259 |
print nodes option in the form of a comma-separated list of field |
|
260 |
names; currently the field names are not documented, expecting further |
|
261 |
changes in a next release |
|
262 |
- Enhanced the error reporting in the Luxi and Rapi backends |
|
263 |
- Changed the handling of drained nodes, now being treated the same as |
|
264 |
offline nodes, for Ganeti 2.0.4+ compatibility |
|
265 |
- A number of internal changes, simplifying code and merging some |
|
266 |
disparate functions |
|
267 |
- Simplify the build system in relation to creation of archives |
|
268 |
|
|
269 |
|
|
270 |
Version 0.1.8 (Tue, 29 Sep 2009) |
|
271 |
-------------------------------- |
|
272 |
|
|
273 |
- Brown-paper-bag release fixing haddock issues |
|
274 |
|
|
275 |
|
|
276 |
Version 0.1.7 (Mon, 28 Sep 2009) |
|
277 |
-------------------------------- |
|
278 |
|
|
279 |
- Fixed a bug in the Luxi backend for big responses |
|
280 |
- Fixed test suite exit code in presence of test failures |
|
281 |
- Changed the migrate operation to run instead failover for instances |
|
282 |
which were marked as not running in the input data (this could have |
|
283 |
been changed since then, but it's better than today's always migrate) |
|
284 |
- Added support for 'cheap' moves only (only migrate/failover) in |
|
285 |
balancing |
|
286 |
- Added support for building without curl (thus no RAPI backend) |
|
287 |
|
|
288 |
|
|
289 |
Version 0.1.6 (Wed, 19 Aug 2009) |
|
290 |
-------------------------------- |
|
291 |
|
|
292 |
- Added support for Luxi (the native Ganeti protocol) |
|
293 |
- Added support for simulated clusters (for hspace only) |
|
294 |
- Added timeouts for the RAPI backend |
|
295 |
- Fixed a few inconsistencies in the command line handling |
|
296 |
- Fixed handling of errors while loading data |
|
297 |
- The 'network' is a new dependency due to the Luxi addition |
|
298 |
|
|
299 |
|
|
300 |
Version 0.1.5 (Thu, 09 Jul 2009) |
|
301 |
-------------------------------- |
|
302 |
|
|
303 |
- Removed obsolete hn1 program; this allowed removal of a lot of |
|
304 |
supporting code |
|
305 |
- Lots of changes in hspace: the output now is a shell fragment in order |
|
306 |
for script to source it or parse it easier; added failure reasons; |
|
307 |
optimised to use less memory for large clusters |
|
308 |
- Optimized the scoring algorithm (used by all tools) so that now |
|
309 |
computations should be faster |
|
310 |
|
|
311 |
|
|
312 |
Version 0.1.4 (Tue, 16 Jun 2009) |
|
313 |
-------------------------------- |
|
314 |
|
|
315 |
- Added CPU count/ratio of virtual-to-physical CPUs to the cluster |
|
316 |
scoring methods; this means that now the balancer, the iallocator |
|
317 |
plugin and so on will try to keep the VCPU-to-PCPU ratio equal across |
|
318 |
the cluster |
|
319 |
- Fixed some hscan bugs |
|
320 |
- Fixed the way iallocator reads the total disk size (was broken and it |
|
321 |
was always falling back to summing the disk sizes) |
|
322 |
- Internals: fixed most compile-time warnings |
|
323 |
|
|
324 |
|
|
325 |
Version 0.1.3 (Fri, 05 Jun 2009) |
|
326 |
-------------------------------- |
|
327 |
|
|
328 |
- Fix a bug in the ReplacePrimary instance moves, affecting most of the |
|
329 |
tools |
|
330 |
|
|
331 |
|
|
332 |
Version 0.1.2 (Tue, 02 Jun 2009) |
|
333 |
-------------------------------- |
|
334 |
|
|
335 |
- Add a new program, “hspace”, which computes the free space on a |
|
336 |
cluster (based on a given instance spec) |
|
337 |
- Improvements in API docs and partially in the user docs |
|
338 |
- Started adding unittests |
|
339 |
|
|
340 |
|
|
341 |
Version 0.1.1 (Tue, 26 May 2009) |
|
342 |
-------------------------------- |
|
343 |
|
|
344 |
- Add a new program, “hail”, which is an iallocator plugin and can |
|
345 |
allocate/relocate instances |
|
346 |
- Experimental support for non-mirrored instances (hail supports them, |
|
347 |
hbal should no longer abort when it finds such instances and simply |
|
348 |
ignore them) |
|
349 |
- The RAPI port and/or scheme can be overriden now, and even “file://” |
|
350 |
schemes can be used if the message body has been saved under the |
|
351 |
appropriate name |
|
352 |
- Lots of code reorganization, esp. rewritten loading pipeline |
|
353 |
- Better data checking and better error messages in case validation |
|
354 |
fails; tools now consider nodes with error in input data (‘?’ returned |
|
355 |
by ganeti) as offline |
|
356 |
- Small enhancement to the makefile for simpler packaging |
|
357 |
|
|
358 |
|
|
359 |
Version 0.1.0 (Tue, 19 May 2009) |
|
360 |
-------------------------------- |
|
361 |
|
|
362 |
- Drop compatibility with Ganeti 1.2 |
|
363 |
- Add a new minimum score option (with a very low default), should help |
|
364 |
with very good clusters (but is still not optimal) |
|
365 |
- Add a --quiet option to hbal |
|
366 |
- Add support for reading offline nodes directly from the cluster |
|
367 |
|
|
368 |
|
|
369 |
Version 0.0.8 (Tue, 21 Apr 2009) |
|
370 |
-------------------------------- |
|
371 |
|
|
372 |
- hbal: prevent mismatches in wrong node names being passed to -O, by |
|
373 |
aborting in this case |
|
374 |
- add the ability to write the commands (-C) to a script via (-C<file>), |
|
375 |
so that it can be later executed directly; this has also changed the |
|
376 |
commands to include the ncessary -f flags to skip confirmations |
|
377 |
- add checks for extra argument in hbal and hn1, so that unintended |
|
378 |
errors are catched |
|
379 |
- raise the accepted “missing” memory limit to 512MB, to cover usual Xen |
|
380 |
reservations |
|
381 |
|
|
382 |
|
|
383 |
Version 0.0.7 (Mon, 23 Mar 2009) |
|
384 |
-------------------------------- |
|
385 |
|
|
386 |
- added support for offline nodes, which are not used as targets for |
|
387 |
instance relocation and if they hold instances the hbal algorithm will |
|
388 |
attempt to relocate these away |
|
389 |
- added support for offline instances, which now will no longer skew the |
|
390 |
free memory estimation of nodes; the algorithm will no longer create |
|
391 |
conditions for N+1 failures when such instances are later started |
|
392 |
- implemented a complete model of node resources, in order to prevent an |
|
393 |
unintended re-occurrence of cases like the offline instance were we |
|
394 |
miscalculate some node resource; this gives warning now in case the |
|
395 |
node reported free disk or free memory deviates by more than a set |
|
396 |
amount from the expected value |
|
397 |
- a new tool *hscan* that can generate the input text-file for the other |
|
398 |
tools by collection via RAPI |
|
399 |
- some small changes to the build system to make it more friendly; also |
|
400 |
included the generated documentation in the source archive |
|
401 |
|
|
402 |
|
|
403 |
Version 0.0.6 (Mon, 16 Mar 2009) |
|
404 |
-------------------------------- |
|
405 |
|
|
406 |
- re-factored the hbal algorithm to make it stable in the sense that it |
|
407 |
gives the same solution when restarted from the middle; barring |
|
408 |
rounding of disk/memory and incomplete reporting from Ganeti (for |
|
409 |
1.2), it should be now feasible to rely on its output without |
|
410 |
generating moves ad infinitum |
|
411 |
- the hbal algorithm now uses two more variables: the node N+1 failures |
|
412 |
and the amount of reserved memory; the first of which tries to ‘fix’ |
|
413 |
the N+1 status, the latter tries to distribute secondaries more |
|
414 |
equally |
|
415 |
- the hbal algorithm now uses two more moves at each step: |
|
416 |
replace+failover and failover+replace (besides the original failover, |
|
417 |
replace, and failover+replace+failover) |
|
418 |
- slightly changed the build system to embed GIT version/tags into the |
|
419 |
binaries so that we know for a binary from which tree it was done, |
|
420 |
either via ‘--version’ or via “strings hbal|grep version” |
|
421 |
- changed the solution list and in general the hbal output to be more |
|
422 |
clear by default, and changed “gnt-instance failover” to “gnt-instance |
|
423 |
migrate” |
|
424 |
- added man pages for the two binaries |
|
425 |
|
|
426 |
|
|
427 |
Version 0.0.5 (Mon, 09 Mar 2009) |
|
428 |
-------------------------------- |
|
429 |
|
|
430 |
- a few small improvements for hbal (possibly undone by later changes), |
|
431 |
hbal is now quite faster |
|
432 |
- fix documentation building |
|
433 |
- allow hbal to work on non N+1 compliant clusters, but without |
|
434 |
guarantees that the end cluster will be compliant; in any case, this |
|
435 |
should give a smaller number of nodes that are not compliant if the |
|
436 |
cluster state permits it |
|
437 |
- strip common domain suffix from nodes and instances, so that output is |
|
438 |
shorter and hopefully clearer |
|
439 |
|
|
440 |
|
|
441 |
Version 0.0.4 (Sun, 15 Feb 2009) |
|
442 |
-------------------------------- |
|
443 |
|
|
444 |
- better balancing algorithm in hbal |
|
445 |
- implemented an RAPI collector, now the cluster data can be gathered |
|
446 |
automatically via RAPI and doesn't need manual export of node and |
|
447 |
instance list |
|
448 |
|
|
449 |
|
|
450 |
Version 0.0.3 (Wed, 28 Jan 2009) |
|
451 |
-------------------------------- |
|
452 |
|
|
453 |
- initial release of the hbal, a cluster rebalancing tool |
|
454 |
- input data format changed due to hbal requirements |
|
455 |
|
|
456 |
|
|
457 |
Version 0.0.2 (Tue, 06 Jan 2009) |
|
458 |
-------------------------------- |
|
459 |
|
|
460 |
- fix handling of some common cases (cluster N+1 compliant from the |
|
461 |
start, too big depth given, failure to compute solution) |
|
462 |
- add option to print the needed command list for reaching the proposed |
|
463 |
solution |
|
464 |
|
|
465 |
|
|
466 |
Version 0.0.1 (Tue, 06 Jan 2009) |
|
467 |
-------------------------------- |
|
468 |
|
|
469 |
- initial release of hn1 tool |
|
470 |
|
|
471 |
.. vim: set textwidth=72 : |
|
472 |
.. Local Variables: |
|
473 |
.. mode: rst |
|
474 |
.. fill-column: 72 |
|
475 |
.. End: |
b/htools/OLD-NEWS | ||
---|---|---|
1 |
Ganeti-htools release notes |
|
2 |
=========================== |
|
3 |
|
|
4 |
|
|
5 |
**Note**: After version 0.3.1, the htools sources have been integrated |
|
6 |
into the ganeti core repository, and released together with the ganeti |
|
7 |
releases. Thus this NEWS file is obsolete. |
|
8 |
|
|
9 |
Version 0.3.1 (Fri, 11 Mar 2011) |
|
10 |
-------------------------------- |
|
11 |
|
|
12 |
Minor bugfix release: |
|
13 |
|
|
14 |
- Fixed source archive generation: the hscolour.css file was an invalid |
|
15 |
symlink, and the man pages were not correctly timestamped (leading to |
|
16 |
unneeded build-time rebuilds) |
|
17 |
- Improved the Luxi backend to show which attribute fails parsing |
|
18 |
- Small improvements to the man pages, and also ship the HTML version of |
|
19 |
man pages in the source archive |
|
20 |
|
|
21 |
|
|
22 |
Version 0.3.0 (Fri, 04 Feb 2011) |
|
23 |
-------------------------------- |
|
24 |
|
|
25 |
A significant release that breaks compatibility with Ganeti versions |
|
26 |
below 2.4 due to the node group changes. Only the RAPI backend can talk |
|
27 |
to older clusters, but it is recommended to use this version only with |
|
28 |
Ganeti 2.4. |
|
29 |
|
|
30 |
All commands are now multi-group aware (but to various degrees), so |
|
31 |
allocation, balancing and capacity calculation respects the group layout |
|
32 |
and will not create “broken” instances by using nodes from different |
|
33 |
groups. |
|
34 |
|
|
35 |
For a regular, single-group cluster, no changes should be directly |
|
36 |
visible to the users. A multi-group cluster however will change some |
|
37 |
things slightly: |
|
38 |
|
|
39 |
- hbal will require a target group to operate on (no cluster-wide |
|
40 |
balancing yet) |
|
41 |
- evacuation of (DRBD) instances from a node will be restricted to nodes |
|
42 |
in the same group, as inter-group moves are not implemented yet |
|
43 |
- capacity, while showing correct data, will not give per-group details |
|
44 |
yet |
|
45 |
|
|
46 |
There are other changes in this release: |
|
47 |
|
|
48 |
- fixed a long-standing bug in hscan related to node memory data |
|
49 |
- changed the text backend format, which unfortunately invalidates old |
|
50 |
files |
|
51 |
- error handling improvements, so that invalid input data reports better |
|
52 |
where the error is |
|
53 |
- the simulation backend changes its syntax, now it takes the allocation |
|
54 |
policy too, and can generate multiple groups |
|
55 |
- (internal) man page generation moved to pandoc from hand-written, |
|
56 |
which is helpful as it can also generate HTML versions |
|
57 |
- the balancing algorithm has been changed to work in parallel, if the |
|
58 |
code is linked against the multi-threaded runtime; this gives a very |
|
59 |
good speedup (~80% on 4 cores, ~60-70% of 12 cores) |
|
60 |
|
|
61 |
Version 0.2.8 (Thu, 23 Dec 2010) |
|
62 |
-------------------------------- |
|
63 |
|
|
64 |
A bug fix release: |
|
65 |
|
|
66 |
- fixed balancing function for big clusters, which will improve corner |
|
67 |
cases where hbal didn't see any solution even though the cluster was |
|
68 |
obviously not well balanced |
|
69 |
- fixed exit code of hbal in case of (Luxi) job errors |
|
70 |
- changed the signal handling in hbal in order to make hbal control |
|
71 |
easier: instead of synchronising on the count of signals, make SIGINT |
|
72 |
cause graceful termination, and SIGTERM an immediate one |
|
73 |
- increased the tag exclusion weight so that it has greater importance |
|
74 |
during the balancing |
|
75 |
- slight improvement to the speed of balancing via algorithm tweaks |
|
76 |
|
|
77 |
|
|
78 |
Version 0.2.7 (Thu, 07 Oct 2010) |
|
79 |
-------------------------------- |
|
80 |
|
|
81 |
Bug fixes: |
|
82 |
|
|
83 |
- fixed the error message for hail multi-evacuation mode |
|
84 |
- improve evacuation mode for offline secondary nodes (ignore available |
|
85 |
memory) |
|
86 |
|
|
87 |
New features: |
|
88 |
|
|
89 |
- add a new option ``-S`` to hbal and hspace that saves the cluster |
|
90 |
state at the end of the processing in the text format used by the |
|
91 |
``-t`` option, for later re-processing |
|
92 |
- a two new options to hbal, -g and --min-gain-limit, that should help |
|
93 |
in limiting the number of balances steps with a low gain in the final |
|
94 |
stages |
|
95 |
- hbal, when executing jobs, will now wait for the current jobs to |
|
96 |
finish at the first stop (e.g. ^C); if the user wants immediate exit, |
|
97 |
another signal should be sent |
|
98 |
- added “normalized” physical CPU units in hspace output (NPU), which |
|
99 |
represents units of physical CPUs free/used, based on the max-cpu |
|
100 |
ratio |
|
101 |
|
|
102 |
|
|
103 |
Version 0.2.6 (Mon, 26 Jul 2010) |
|
104 |
-------------------------------- |
|
105 |
|
|
106 |
Exactly three months since the last release. Many internal changes, plus |
|
107 |
a couple of important changes in the balancing algorithm. |
|
108 |
|
|
109 |
First, the balancing may now introduce N+1 errors, if this solves other, |
|
110 |
more critical problems. For the moment, this means that moving instances |
|
111 |
away from offline nodes is allowed even if it creates N+1 errors, and |
|
112 |
that means evacuation can be done in more cases. |
|
113 |
|
|
114 |
Second, the scoring for N+1 has changed. In previous versions, it simply |
|
115 |
counted the number of failing N+1 nodes, which means moving an instance |
|
116 |
away from a N+1 failed node (but without the node 'clearing' the N+1 |
|
117 |
status) was not reflected in the cluster score. As such, the balancing |
|
118 |
algorithm managed to clear N+1 errors only sometimes, since usually it |
|
119 |
takes more than one move for this, and the first prerequisite move was |
|
120 |
not 'rewarded' appropriately and thus it was not selected. Now, it is |
|
121 |
possible to fix many more error cases than before: on a simulated 40 |
|
122 |
node cluster full with instances (symmetrically allocated on all nodes), |
|
123 |
around five nodes can be evacuated before N+1 errors can be solved, |
|
124 |
whereas 0.2.5 could evacuate at best one node. |
|
125 |
|
|
126 |
There were some other internal changes to the scoring algorithm, such |
|
127 |
that now the metrics have associated weights, and they are not all of |
|
128 |
the same importance anymore. As of now, the only change is that offline |
|
129 |
instances have a higher weight, which should favour proper node |
|
130 |
evacuations. |
|
131 |
|
|
132 |
Among the other changes: |
|
133 |
|
|
134 |
- fixed the hspace KM_POOL_* metrics, which were returned as the final |
|
135 |
state and not as the delta between the initial and final states |
|
136 |
- fixed hspace handling of N+1 failing clusters: before, it used to |
|
137 |
generate a 'fake' response, and the structure of this response was not |
|
138 |
always in sync with the real responses, leading to missing items; |
|
139 |
currently it proceeds correctly through the code (skipping the |
|
140 |
computation), and uses the same display mechanisms as the normal case |
|
141 |
- fixed hscan exit code for RAPI failures: previously it finished with |
|
142 |
success even if all the clusters failed, which was creating issues |
|
143 |
with the live-test script; now it exits with exit code 2 for RAPI |
|
144 |
failures (unfortunately this is still not optimal as LUXI failures |
|
145 |
will use exit code 1, the same as the command line) |
|
146 |
- changed the limit values for CPU/disk, which previously were used |
|
147 |
optionally, whereas now they are always used; the default cpu ratio |
|
148 |
limit is now 64 VCPUs per PCPU |
|
149 |
- changed the internal handling of the short name vs. original |
|
150 |
(Ganeti-provided) name; now internally we always use the full name, |
|
151 |
and only in display routines we show the shortened (called 'alias') |
|
152 |
name; as a result, the -O and --excluded-instances options now accept |
|
153 |
both the full name and the shortened name |
|
154 |
- changed internal handling of JSON conversions and errors, such that |
|
155 |
now we show a better context for failure messages, which should help |
|
156 |
with diagnosing the malformed message |
|
157 |
- changed the names for a few node fields, and added some more nodes; |
|
158 |
this is most likely to help with debugging, and not with regular |
|
159 |
operation though |
|
160 |
- changed the node fields option to allow the '+' prefix to mean 'extend |
|
161 |
the default fields list' rather than start from fresh (similar to |
|
162 |
Ganeti's implementation) |
|
163 |
- a few internal changes related to the LUXI protocol implementation, |
|
164 |
which should make it more safe against potential bugs, one |
|
165 |
optiomization that should help with large messages, and some patches |
|
166 |
in preparation for potential expansion of the LUXI backend functionality |
|
167 |
|
|
168 |
And finally, many improvements on unittests and the live-test |
|
169 |
script. Test coverage is much enhanced, and the test infrastructure has |
|
170 |
better error reporting; this should lead down-the-road to better code |
|
171 |
and fewer bugs… |
|
172 |
|
|
173 |
|
|
174 |
Version 0.2.5 (Mon, 26 Apr 2010) |
|
175 |
-------------------------------- |
|
176 |
|
|
177 |
Some internal cleanup plus a few user-visible changes: |
|
178 |
|
|
179 |
- new option for marking instances as 'do-not-move' during rebalancing |
|
180 |
- allow ``hscan`` to scan the local cluster via Luxi |
|
181 |
- add more metrics to ``hspace`` which show the delta between original |
|
182 |
state and final state better (only valid for tiered allocation) |
|
183 |
|
|
184 |
|
|
185 |
Version 0.2.4 (Mon, 22 Feb 2010) |
|
186 |
-------------------------------- |
|
187 |
|
|
188 |
Two improvements for node evacuation: |
|
189 |
|
|
190 |
- hbal takes a new parameter ``--evac-mode`` that restricts the |
|
191 |
instances to be moved to the ones on offline/drained nodes, which |
|
192 |
should reduce the work done |
|
193 |
- hail supports the new ``multi-evacuate`` mode of the IAllocator |
|
194 |
protocol, that will be released in a minor release on the Ganeti 2.1 |
|
195 |
branch |
|
196 |
|
|
197 |
|
|
198 |
Version 0.2.3 (Thu, 4 Feb 2010) |
|
199 |
-------------------------------- |
|
200 |
|
|
201 |
A small release: |
|
202 |
|
|
203 |
- Fixes selection of secondary node: previously, if the cluster had |
|
204 |
many N+1 failures, a N+1 failed node could be selected as secondary |
|
205 |
even if it did not have enough memory to allow the instance to be |
|
206 |
migrated/failed over to it; this is bad for automated tools, since |
|
207 |
we can get the cluster in an unhealthy state |
|
208 |
- Switch the text backend to a single input file, that is generated |
|
209 |
now by hscan and shouldn't be generated manually via |
|
210 |
gnt-node/instance list anymore; this allows richer information to be |
|
211 |
kept in the file, and simplifies a little the internals of the text |
|
212 |
backend |
|
213 |
|
|
214 |
|
|
215 |
Version 0.2.2 (Tue, 29 Dec 2009) |
|
216 |
-------------------------------- |
|
217 |
|
|
218 |
Small release, 0.2.1 was broken and thus this was released earlier: |
|
219 |
|
|
220 |
- Release 0.2.1 broke the LUXI backend due to a typo, fixed |
|
221 |
- Added a live-test script that should catch errors like the above one |
|
222 |
in the future (needs a working, non-empty cluster) |
|
223 |
- Changed RAPI and LUXI backends to treat drained nodes as offline, |
|
224 |
similar to the IAllocator backend change in 0.2.0 (which was wrongly |
|
225 |
marked as affecting all backends) |
|
226 |
- Changed the metrics for offline instances and N1 score from percent to |
|
227 |
count, in order to increase the priority of evacuations |
|
228 |
- Added a new metric (offline primary instances) which should fix the |
|
229 |
evacuation of a offline node in a 2-node cluster |
|
230 |
|
|
231 |
|
|
232 |
Version 0.2.1 (Wed, 2 Dec 2009) |
|
233 |
-------------------------------- |
|
234 |
|
|
235 |
- Added instance exclusion defined via instance tags |
|
236 |
- Fixed the output of hspace to be again parseable from the shell |
|
237 |
|
|
238 |
|
|
239 |
Version 0.2.0 (Tue, 10 Nov 2009) |
|
240 |
-------------------------------- |
|
241 |
|
|
242 |
A significant release, with a few new major features: |
|
243 |
|
|
244 |
- Added direct execution of the hbal solution when using the Luxi |
|
245 |
backend; the steps for each instance moves are submitted as a single |
|
246 |
jobs, and the different jobs are submitted as groups in order to |
|
247 |
parallelise the execution of moves |
|
248 |
- Added support for balancing based on dynamic utilisation data for |
|
249 |
instances, fed in via a text file; by default, all instances are |
|
250 |
considered equal and this change also improves the equalisation of |
|
251 |
secondary instances per node |
|
252 |
- Added support for tiered capacity calculation in hspace, where we |
|
253 |
start from a maximum instance spec and decrease the spec when we run |
|
254 |
out of resources; this should give a better measure of available |
|
255 |
capacity on 'fragmented' clusters; this is done separately from the |
|
256 |
current fixed-mode computation |
|
257 |
|
|
258 |
Also there have been many minor improvements: |
|
259 |
|
|
260 |
- Added option for showing instances (“--print-instances”), similar to |
|
261 |
the print nodes option |
|
262 |
- Added support for customising the node list via an argument to the |
|
263 |
print nodes option in the form of a comma-separated list of field |
|
264 |
names; currently the field names are not documented, expecting further |
|
265 |
changes in a next release |
|
266 |
- Enhanced the error reporting in the Luxi and Rapi backends |
|
267 |
- Changed the handling of drained nodes, now being treated the same as |
|
268 |
offline nodes, for Ganeti 2.0.4+ compatibility |
|
269 |
- A number of internal changes, simplifying code and merging some |
|
270 |
disparate functions |
|
271 |
- Simplify the build system in relation to creation of archives |
|
272 |
|
|
273 |
|
|
274 |
Version 0.1.8 (Tue, 29 Sep 2009) |
|
275 |
-------------------------------- |
|
276 |
|
|
277 |
- Brown-paper-bag release fixing haddock issues |
|
278 |
|
|
279 |
|
|
280 |
Version 0.1.7 (Mon, 28 Sep 2009) |
|
281 |
-------------------------------- |
|
282 |
|
|
283 |
- Fixed a bug in the Luxi backend for big responses |
|
284 |
- Fixed test suite exit code in presence of test failures |
|
285 |
- Changed the migrate operation to run instead failover for instances |
|
286 |
which were marked as not running in the input data (this could have |
|
287 |
been changed since then, but it's better than today's always migrate) |
|
288 |
- Added support for 'cheap' moves only (only migrate/failover) in |
|
289 |
balancing |
|
290 |
- Added support for building without curl (thus no RAPI backend) |
|
291 |
|
|
292 |
|
|
293 |
Version 0.1.6 (Wed, 19 Aug 2009) |
|
294 |
-------------------------------- |
|
295 |
|
|
296 |
- Added support for Luxi (the native Ganeti protocol) |
|
297 |
- Added support for simulated clusters (for hspace only) |
|
298 |
- Added timeouts for the RAPI backend |
|
299 |
- Fixed a few inconsistencies in the command line handling |
|
300 |
- Fixed handling of errors while loading data |
|
301 |
- The 'network' is a new dependency due to the Luxi addition |
|
302 |
|
|
303 |
|
|
304 |
Version 0.1.5 (Thu, 09 Jul 2009) |
|
305 |
-------------------------------- |
|
306 |
|
|
307 |
- Removed obsolete hn1 program; this allowed removal of a lot of |
|
308 |
supporting code |
|
309 |
- Lots of changes in hspace: the output now is a shell fragment in order |
|
310 |
for script to source it or parse it easier; added failure reasons; |
|
311 |
optimised to use less memory for large clusters |
|
312 |
- Optimized the scoring algorithm (used by all tools) so that now |
|
313 |
computations should be faster |
|
314 |
|
|
315 |
|
|
316 |
Version 0.1.4 (Tue, 16 Jun 2009) |
|
317 |
-------------------------------- |
|
318 |
|
|
319 |
- Added CPU count/ratio of virtual-to-physical CPUs to the cluster |
|
320 |
scoring methods; this means that now the balancer, the iallocator |
|
321 |
plugin and so on will try to keep the VCPU-to-PCPU ratio equal across |
|
322 |
the cluster |
|
323 |
- Fixed some hscan bugs |
|
324 |
- Fixed the way iallocator reads the total disk size (was broken and it |
|
325 |
was always falling back to summing the disk sizes) |
|
326 |
- Internals: fixed most compile-time warnings |
|
327 |
|
|
328 |
|
|
329 |
Version 0.1.3 (Fri, 05 Jun 2009) |
|
330 |
-------------------------------- |
|
331 |
|
|
332 |
- Fix a bug in the ReplacePrimary instance moves, affecting most of the |
|
333 |
tools |
|
334 |
|
|
335 |
|
|
336 |
Version 0.1.2 (Tue, 02 Jun 2009) |
|
337 |
-------------------------------- |
|
338 |
|
|
339 |
- Add a new program, “hspace”, which computes the free space on a |
|
340 |
cluster (based on a given instance spec) |
|
341 |
- Improvements in API docs and partially in the user docs |
|
342 |
- Started adding unittests |
|
343 |
|
|
344 |
|
|
345 |
Version 0.1.1 (Tue, 26 May 2009) |
|
346 |
-------------------------------- |
|
347 |
|
|
348 |
- Add a new program, “hail”, which is an iallocator plugin and can |
|
349 |
allocate/relocate instances |
|
350 |
- Experimental support for non-mirrored instances (hail supports them, |
|
351 |
hbal should no longer abort when it finds such instances and simply |
|
352 |
ignore them) |
|
353 |
- The RAPI port and/or scheme can be overriden now, and even “file://” |
|
354 |
schemes can be used if the message body has been saved under the |
|
355 |
appropriate name |
|
356 |
- Lots of code reorganization, esp. rewritten loading pipeline |
|
357 |
- Better data checking and better error messages in case validation |
|
358 |
fails; tools now consider nodes with error in input data (‘?’ returned |
|
359 |
by ganeti) as offline |
|
360 |
- Small enhancement to the makefile for simpler packaging |
|
361 |
|
|
362 |
|
|
363 |
Version 0.1.0 (Tue, 19 May 2009) |
|
364 |
-------------------------------- |
|
365 |
|
|
366 |
- Drop compatibility with Ganeti 1.2 |
|
367 |
- Add a new minimum score option (with a very low default), should help |
|
368 |
with very good clusters (but is still not optimal) |
|
369 |
- Add a --quiet option to hbal |
|
370 |
- Add support for reading offline nodes directly from the cluster |
|
371 |
|
|
372 |
|
|
373 |
Version 0.0.8 (Tue, 21 Apr 2009) |
|
374 |
-------------------------------- |
|
375 |
|
|
376 |
- hbal: prevent mismatches in wrong node names being passed to -O, by |
|
377 |
aborting in this case |
|
378 |
- add the ability to write the commands (-C) to a script via (-C<file>), |
|
379 |
so that it can be later executed directly; this has also changed the |
|
380 |
commands to include the ncessary -f flags to skip confirmations |
|
381 |
- add checks for extra argument in hbal and hn1, so that unintended |
|
382 |
errors are catched |
|
383 |
- raise the accepted “missing” memory limit to 512MB, to cover usual Xen |
|
384 |
reservations |
|
385 |
|
|
386 |
|
|
387 |
Version 0.0.7 (Mon, 23 Mar 2009) |
|
388 |
-------------------------------- |
|
389 |
|
|
390 |
- added support for offline nodes, which are not used as targets for |
|
391 |
instance relocation and if they hold instances the hbal algorithm will |
|
392 |
attempt to relocate these away |
|
393 |
- added support for offline instances, which now will no longer skew the |
|
394 |
free memory estimation of nodes; the algorithm will no longer create |
|
395 |
conditions for N+1 failures when such instances are later started |
|
396 |
- implemented a complete model of node resources, in order to prevent an |
|
397 |
unintended re-occurrence of cases like the offline instance were we |
|
398 |
miscalculate some node resource; this gives warning now in case the |
|
399 |
node reported free disk or free memory deviates by more than a set |
|
400 |
amount from the expected value |
|
401 |
- a new tool *hscan* that can generate the input text-file for the other |
|
402 |
tools by collection via RAPI |
|
403 |
- some small changes to the build system to make it more friendly; also |
|
404 |
included the generated documentation in the source archive |
|
405 |
|
|
406 |
|
|
407 |
Version 0.0.6 (Mon, 16 Mar 2009) |
|
408 |
-------------------------------- |
|
409 |
|
|
410 |
- re-factored the hbal algorithm to make it stable in the sense that it |
|
411 |
gives the same solution when restarted from the middle; barring |
|
412 |
rounding of disk/memory and incomplete reporting from Ganeti (for |
|
413 |
1.2), it should be now feasible to rely on its output without |
|
414 |
generating moves ad infinitum |
|
415 |
- the hbal algorithm now uses two more variables: the node N+1 failures |
|
416 |
and the amount of reserved memory; the first of which tries to ‘fix’ |
|
417 |
the N+1 status, the latter tries to distribute secondaries more |
|
418 |
equally |
|
419 |
- the hbal algorithm now uses two more moves at each step: |
|
420 |
replace+failover and failover+replace (besides the original failover, |
|
421 |
replace, and failover+replace+failover) |
|
422 |
- slightly changed the build system to embed GIT version/tags into the |
|
423 |
binaries so that we know for a binary from which tree it was done, |
|
424 |
either via ‘--version’ or via “strings hbal|grep version” |
|
425 |
- changed the solution list and in general the hbal output to be more |
|
426 |
clear by default, and changed “gnt-instance failover” to “gnt-instance |
|
427 |
migrate” |
|
428 |
- added man pages for the two binaries |
|
429 |
|
|
430 |
|
|
431 |
Version 0.0.5 (Mon, 09 Mar 2009) |
|
432 |
-------------------------------- |
|
433 |
|
|
434 |
- a few small improvements for hbal (possibly undone by later changes), |
|
435 |
hbal is now quite faster |
|
436 |
- fix documentation building |
|
437 |
- allow hbal to work on non N+1 compliant clusters, but without |
|
438 |
guarantees that the end cluster will be compliant; in any case, this |
|
439 |
should give a smaller number of nodes that are not compliant if the |
|
440 |
cluster state permits it |
|
441 |
- strip common domain suffix from nodes and instances, so that output is |
|
442 |
shorter and hopefully clearer |
|
443 |
|
|
444 |
|
|
445 |
Version 0.0.4 (Sun, 15 Feb 2009) |
|
446 |
-------------------------------- |
|
447 |
|
|
448 |
- better balancing algorithm in hbal |
|
449 |
- implemented an RAPI collector, now the cluster data can be gathered |
|
450 |
automatically via RAPI and doesn't need manual export of node and |
|
451 |
instance list |
|
452 |
|
|
453 |
|
|
454 |
Version 0.0.3 (Wed, 28 Jan 2009) |
|
455 |
-------------------------------- |
|
456 |
|
|
457 |
- initial release of the hbal, a cluster rebalancing tool |
|
458 |
- input data format changed due to hbal requirements |
|
459 |
|
|
460 |
|
|
461 |
Version 0.0.2 (Tue, 06 Jan 2009) |
|
462 |
-------------------------------- |
|
463 |
|
|
464 |
- fix handling of some common cases (cluster N+1 compliant from the |
|
465 |
start, too big depth given, failure to compute solution) |
|
466 |
- add option to print the needed command list for reaching the proposed |
|
467 |
solution |
|
468 |
|
|
469 |
|
|
470 |
Version 0.0.1 (Tue, 06 Jan 2009) |
|
471 |
-------------------------------- |
|
472 |
|
|
473 |
- initial release of hn1 tool |
|
474 |
|
|
475 |
.. vim: set textwidth=72 : |
|
476 |
.. Local Variables: |
|
477 |
.. mode: rst |
|
478 |
.. fill-column: 72 |
|
479 |
.. End: |
Also available in: Unified diff