root / doc / design-upgrade.rst @ 56c934da
History | View | Annotate | Download (11.9 kB)
1 |
======================================== |
---|---|
2 |
Automatized Upgrade Procedure for Ganeti |
3 |
======================================== |
4 |
|
5 |
.. contents:: :depth: 4 |
6 |
|
7 |
This is a design document detailing the proposed changes to the |
8 |
upgrade process, in order to allow it to be more automatic. |
9 |
|
10 |
|
11 |
Current state and shortcomings |
12 |
============================== |
13 |
|
14 |
Ganeti requires to run the same version of Ganeti to be run on all |
15 |
nodes of a cluster and this requirement is unlikely to go away in the |
16 |
foreseeable future. Also, the configuration may change between minor |
17 |
versions (and in the past has proven to do so). This requires a quite |
18 |
involved manual upgrade process of draining the queue, stopping |
19 |
ganeti, changing the binaries, upgrading the configuration, starting |
20 |
ganeti, distributing the configuration, and undraining the queue. |
21 |
|
22 |
|
23 |
Proposed changes |
24 |
================ |
25 |
|
26 |
While we will not remove the requirement of the same Ganeti |
27 |
version running on all nodes, the transition from one version |
28 |
to the other will be made more automatic. It will be possible |
29 |
to install new binaries ahead of time, and the actual switch |
30 |
between versions will be a single command. |
31 |
|
32 |
While changing the file layout anyway, we install the python |
33 |
code, which is architecture independent, under ``${prefix}/share``, |
34 |
in a way that properly separates the Ganeti libraries of the |
35 |
various versions. |
36 |
|
37 |
Path changes to allow multiple versions installed |
38 |
------------------------------------------------- |
39 |
|
40 |
Currently, Ganeti installs to ``${PREFIX}/bin``, ``${PREFIX}/sbin``, |
41 |
and so on, as well as to ``${pythondir}/ganeti``. |
42 |
|
43 |
These paths will be changed in the following way. |
44 |
|
45 |
- The python package will be installed to |
46 |
``${PREFIX}/share/ganeti/${VERSION}/ganeti``. |
47 |
Here ${VERSION} is, depending on configure options, either the full qualified |
48 |
version number, consisting of major, minor, revision, and suffix, or it is |
49 |
just a major.minor pair. All python executables will be installed under |
50 |
``${PREFIX}/share/ganeti/${VERSION}`` so that they see their respective |
51 |
Ganeti library. ``${PREFIX}/share/ganeti/default`` is a symbolic link to |
52 |
``${sysconfdir}/ganeti/share`` which, in turn, is a symbolic link to |
53 |
``${PREFIX}/share/ganeti/${VERSION}``. For all python executatables (like |
54 |
``gnt-cluster``, ``gnt-node``, etc) symbolic links going through |
55 |
``${PREFIX}/share/ganeti/default`` are added under ``${PREFIX}/sbin``. |
56 |
|
57 |
- All other files will be installed to the corresponding path under |
58 |
``${libdir}/ganeti/${VERSION}`` instead of under ``${PREFIX}`` |
59 |
directly, where ``${libdir}`` defaults to ``${PREFIX}/lib``. |
60 |
``${libdir}/ganeti/default`` will be a symlink to ``${sysconfdir}/ganeti/lib`` |
61 |
which, in turn, is a symlink to ``${libdir}/ganeti/${VERSION}``. |
62 |
Symbolic links to the files installed under ``${libdir}/ganeti/${VERSION}`` |
63 |
will be added under ``${PREFIX}/bin``, ``${PREFIX}/sbin``, and so on. These |
64 |
symbolic links will go through ``${libdir}/ganeti/default`` so that the |
65 |
version can easily be changed by updating the symbolic link in |
66 |
``${sysconfdir}``. |
67 |
|
68 |
The set of links for ganeti binaries might change between the versions. |
69 |
However, as the file structure under ``${libdir}/ganeti/${VERSION}`` reflects |
70 |
that of ``/``, two links of differnt versions will never conflict. Similarly, |
71 |
the symbolic links for the python executables will never conflict, as they |
72 |
always point to a file with the same basename directly under |
73 |
``${PREFIX}/share/ganeti/default``. Therefore, each version will make sure that |
74 |
enough symbolic links are present in ``${PREFIX}/bin``, ``${PREFIX}/sbin`` and |
75 |
so on, even though some might be dangling, if a differnt version of ganeti is |
76 |
currently active. |
77 |
|
78 |
The extra indirection through ``${sysconfdir}`` allows installations that choose |
79 |
to have ``${sysconfdir}`` and ``${localstatedir}`` outside ``${PREFIX}`` to |
80 |
mount ``${PREFIX}`` read-only. The latter is important for systems that choose |
81 |
``/usr`` as ``${PREFIX}`` and are following the Filesystem Hierarchy Standard. |
82 |
For example, choosing ``/usr`` as ``${PREFIX}`` and ``/etc`` as ``${sysconfdir}``, |
83 |
the layout for version 2.10 will look as follows. |
84 |
:: |
85 |
|
86 |
/ |
87 |
| |
88 |
+-- etc |
89 |
| | |
90 |
| +-- ganeti |
91 |
| | |
92 |
| +-- lib -> /usr/lib/ganeti/2.10 |
93 |
| | |
94 |
| +-- share -> /usr/share/ganeti/2.10 |
95 |
+-- usr |
96 |
| |
97 |
+-- bin |
98 |
| | |
99 |
| +-- harep -> /usr/lib/ganeti/default/usr/bin/harep |
100 |
| | |
101 |
| ... |
102 |
| |
103 |
+-- sbin |
104 |
| | |
105 |
| +-- gnt-cluster -> /usr/share/ganeti/default/gnt-cluster |
106 |
| | |
107 |
| ... |
108 |
| |
109 |
+-- ... |
110 |
| |
111 |
+-- lib |
112 |
| | |
113 |
| +-- ganeti |
114 |
| | |
115 |
| +-- default -> /etc/ganeti/lib |
116 |
| | |
117 |
| +-- 2.10 |
118 |
| | |
119 |
| +-- usr |
120 |
| | |
121 |
| +-- bin |
122 |
| | | |
123 |
| | +-- htools |
124 |
| | | |
125 |
| | +-- harep -> htools |
126 |
| | | |
127 |
| | ... |
128 |
| ... |
129 |
| |
130 |
+-- share |
131 |
| |
132 |
+-- ganeti |
133 |
| |
134 |
+-- default -> /etc/ganeti/share |
135 |
| |
136 |
+-- 2.10 |
137 |
| |
138 |
+ -- gnt-cluster |
139 |
| |
140 |
+ -- gnt-node |
141 |
| |
142 |
+ -- ... |
143 |
| |
144 |
+ -- ganeti |
145 |
| |
146 |
+-- backend.py |
147 |
| |
148 |
+-- ... |
149 |
| |
150 |
+-- cmdlib |
151 |
| | |
152 |
| ... |
153 |
... |
154 |
|
155 |
|
156 |
|
157 |
gnt-cluster upgrade |
158 |
------------------- |
159 |
|
160 |
The actual upgrade process will be done by a new command ``upgrade`` to |
161 |
``gnt-cluster``. If called with the option ``--to`` which take precisely |
162 |
one argument, the version to |
163 |
upgrade (or downgrade) to, given as full string with major, minor, suffix, |
164 |
and suffix. To be compatible with current configuration upgrade and downgrade |
165 |
procedures, the new version must be of the same major version and |
166 |
either an equal or higher minor version, or precisely the previous |
167 |
minor version. |
168 |
|
169 |
When executed, ``gnt-cluster upgrade --to=<version>`` will perform the |
170 |
following actions. |
171 |
|
172 |
- It verifies that the version to change to is installed on all nodes |
173 |
of the cluster that are not marked as offline. If this is not the |
174 |
case it aborts with an error. This initial testing is an |
175 |
optimization to allow for early feedback. |
176 |
|
177 |
- An intent-to-upgrade file is created that contains the current |
178 |
version of ganeti, the version to change to, and the process ID of |
179 |
the ``gnt-cluster upgrade`` process. The latter is not used automatically, |
180 |
but allows manual detection if the upgrade process died |
181 |
unintentionally. The intend-to-upgrade file is persisted to disk |
182 |
before continuing. |
183 |
|
184 |
- The Ganeti job queue is drained, and the executable waits till there |
185 |
are no more jobs in the queue. Once :doc:`design-optables` is |
186 |
implemented, for upgrades, and only for upgrades, all jobs are paused |
187 |
instead (in the sense that the currently running opcode continues, |
188 |
but the next opcode is not started) and it is continued once all |
189 |
jobs are fully paused. |
190 |
|
191 |
- All ganeti daemons on the master node are stopped. |
192 |
|
193 |
- It is verified again that all nodes at this moment not marked as |
194 |
offline have the new version installed. If this is not the case, |
195 |
then all changes so far (stopping ganeti daemons and draining the |
196 |
queue) are undone and failure is reported. This second verification |
197 |
is necessary, as the set of online nodes might have changed during |
198 |
the draining period. |
199 |
|
200 |
- All ganeti daemons on all remaining (non-offline) nodes are stopped. |
201 |
|
202 |
- A backup of all Ganeti-related status information is created for |
203 |
manual rollbacks. While the normal way of rolling back after an |
204 |
upgrade should be calling ``gnt-clsuter upgrade`` from the newer version |
205 |
with the older version as argument, a full backup provides an |
206 |
additional safety net, especially for jump-upgrades (skipping |
207 |
intermediate minor versions). |
208 |
|
209 |
- If the action is a downgrade to the previous minor version, the |
210 |
configuration is downgraded now, using ``cfgupgrade --downgrade``. |
211 |
|
212 |
- The ``${sysconfdir}/ganeti/lib`` and ``${sysconfdir}/ganeti/share`` |
213 |
symbolic links are updated. |
214 |
|
215 |
- If the action is an upgrade to a higher minor version, the configuration |
216 |
is upgraded now, using ``cfgupgrade``. |
217 |
|
218 |
- All daemons are started on all nodes. |
219 |
|
220 |
- ``ensure-dirs --full-run`` is run on all nodes. |
221 |
|
222 |
- ``gnt-cluster redist-conf`` is run on the master node. |
223 |
|
224 |
- All daemons are restarted on all nodes. |
225 |
|
226 |
- The Ganeti job queue is undrained. |
227 |
|
228 |
- The intent-to-upgrade file is removed. |
229 |
|
230 |
- ``gnt-cluster verify`` is run and the result reported. |
231 |
|
232 |
|
233 |
Considerations on unintended reboots of the master node |
234 |
======================================================= |
235 |
|
236 |
During the upgrade procedure, the only ganeti process still running is |
237 |
the one instance of ``gnt-cluster upgrade``. This process is also responsible |
238 |
for eventually removing the queue drain. Therefore, we have to provide |
239 |
means to resume this process, if it dies unintentionally. The process |
240 |
itself will handle SIGTERM gracefully by either undoing all changes |
241 |
done so far, or by ignoring the signal all together and continuing to |
242 |
the end; the choice between these behaviors depends on whether change |
243 |
of the configuration has already started (in which case it goes |
244 |
through to the end), or not (in which case the actions done so far are |
245 |
rolled back). |
246 |
|
247 |
To achieve this, ``gnt-cluster upgrade`` will support a ``--resume`` |
248 |
option. It is recommended |
249 |
to have ``gnt-cluster upgrade --resume`` as an at-reboot task in the crontab. |
250 |
The ``gnt-cluster upgrade --resume`` comand first verifies that |
251 |
it is running on the master node, using the same requirement as for |
252 |
starting the master daemon, i.e., confirmed by a majority of all |
253 |
nodes. If it is not the master node, it will remove any possibly |
254 |
existing intend-to-upgrade file and exit. If it is running on the |
255 |
master node, it will check for the existence of an intend-to-upgrade |
256 |
file. If no such file is found, it will simply exit. If found, it will |
257 |
resume at the appropriate stage. |
258 |
|
259 |
- If the configuration file still is at the initial version, |
260 |
``gnt-cluster upgrade`` is resumed at the step immediately following the |
261 |
writing of the intend-to-upgrade file. It should be noted that |
262 |
all steps before changing the configuration are idempotent, so |
263 |
redoing them does not do any harm. |
264 |
|
265 |
- If the configuration is already at the new version, all daemons on |
266 |
all nodes are stopped (as they might have been started again due |
267 |
to a reboot) and then it is resumed at the step immediately |
268 |
following the configuration change. All actions following the |
269 |
configuration change can be repeated without bringing the cluster |
270 |
into a worse state. |
271 |
|
272 |
|
273 |
Caveats |
274 |
======= |
275 |
|
276 |
Since ``gnt-cluster upgrade`` drains the queue and undrains it later, so any |
277 |
information about a previous drain gets lost. This problem will |
278 |
disappear, once :doc:`design-optables` is implemented, as then the |
279 |
undrain will then be restricted to filters by gnt-upgrade. |
280 |
|
281 |
|
282 |
Requirement of job queue update |
283 |
=============================== |
284 |
|
285 |
Since for upgrades we only pause jobs and do not fully drain the |
286 |
queue, we need to be able to transform the job queue into a queue for |
287 |
the new version. The preferred way to obtain this is to keep the |
288 |
serialization format backwards compatible, i.e., only adding new |
289 |
opcodes and new optional fields. |
290 |
|
291 |
However, even with soft drain, no job is running at the moment `cfgupgrade` |
292 |
is running. So, if we change the queue representation, including the |
293 |
representation of individual opcodes in any way, `cfgupgrade` will also |
294 |
modify the queue accordingly. In a jobs-as-processes world, pausing a job |
295 |
will be implemented in such a way that the corresponding process stops after |
296 |
finishing the current opcode, and a new process is created if and when the |
297 |
job is unpaused again. |