Statistics
| Branch: | Tag: | Revision:

root / doc / design-upgrade.rst @ 9110fb4a

History | View | Annotate | Download (11.9 kB)

1
========================================
2
Automatized Upgrade Procedure for Ganeti
3
========================================
4

    
5
.. contents:: :depth: 4
6

    
7
This is a design document detailing the proposed changes to the
8
upgrade process, in order to allow it to be more automatic.
9

    
10

    
11
Current state and shortcomings
12
==============================
13

    
14
Ganeti requires to run the same version of Ganeti to be run on all
15
nodes of a cluster and this requirement is unlikely to go away in the
16
foreseeable future. Also, the configuration may change between minor
17
versions (and in the past has proven to do so). This requires a quite
18
involved manual upgrade process of draining the queue, stopping
19
ganeti, changing the binaries, upgrading the configuration, starting
20
ganeti, distributing the configuration, and undraining the queue.
21

    
22

    
23
Proposed changes
24
================
25

    
26
While we will not remove the requirement of the same Ganeti
27
version running on all nodes, the transition from one version
28
to the other will be made more automatic. It will be possible
29
to install new binaries ahead of time, and the actual switch
30
between versions will be a single command.
31

    
32
While changing the file layout anyway, we install the python
33
code, which is architecture independent, under ``${prefix}/share``,
34
in a way that properly separates the Ganeti libraries of the
35
various versions. 
36

    
37
Path changes to allow multiple versions installed
38
-------------------------------------------------
39

    
40
Currently, Ganeti installs to ``${PREFIX}/bin``, ``${PREFIX}/sbin``,
41
and so on, as well as to ``${pythondir}/ganeti``.
42

    
43
These paths will be changed in the following way.
44

    
45
- The python package will be installed to
46
  ``${PREFIX}/share/ganeti/${VERSION}/ganeti``.
47
  Here ${VERSION} is, depending on configure options, either the full qualified
48
  version number, consisting of major, minor, revision, and suffix, or it is
49
  just a major.minor pair. All python executables will be installed under
50
  ``${PREFIX}/share/ganeti/${VERSION}`` so that they see their respective
51
  Ganeti library. ``${PREFIX}/share/ganeti/default`` is a symbolic link to
52
  ``${sysconfdir}/ganeti/share`` which, in turn, is a symbolic link to
53
  ``${PREFIX}/share/ganeti/${VERSION}``. For all python executatables (like
54
  ``gnt-cluster``, ``gnt-node``, etc) symbolic links going through
55
  ``${PREFIX}/share/ganeti/default`` are added under ``${PREFIX}/sbin``.
56

    
57
- All other files will be installed to the corresponding path under
58
  ``${libdir}/ganeti/${VERSION}`` instead of under ``${PREFIX}``
59
  directly, where ``${libdir}`` defaults to ``${PREFIX}/lib``.
60
  ``${libdir}/ganeti/default`` will be a symlink to ``${sysconfdir}/ganeti/lib``
61
  which, in turn, is a symlink to ``${libdir}/ganeti/${VERSION}``.
62
  Symbolic links to the files installed under ``${libdir}/ganeti/${VERSION}``
63
  will be added under ``${PREFIX}/bin``, ``${PREFIX}/sbin``, and so on. These
64
  symbolic links will go through ``${libdir}/ganeti/default`` so that the
65
  version can easily be changed by updating the symbolic link in
66
  ``${sysconfdir}``.
67

    
68
The set of links for ganeti binaries might change between the versions.
69
However, as the file structure under ``${libdir}/ganeti/${VERSION}`` reflects
70
that of ``/``, two links of differnt versions will never conflict. Similarly,
71
the symbolic links for the python executables will never conflict, as they
72
always point to a file with the same basename directly under
73
``${PREFIX}/share/ganeti/default``. Therefore, each version will make sure that
74
enough symbolic links are present in ``${PREFIX}/bin``, ``${PREFIX}/sbin`` and
75
so on, even though some might be dangling, if a differnt version of ganeti is
76
currently active.
77

    
78
The extra indirection through ``${sysconfdir}`` allows installations that choose
79
to have ``${sysconfdir}`` and ``${localstatedir}`` outside ``${PREFIX}`` to
80
mount ``${PREFIX}`` read-only. The latter is important for systems that choose
81
``/usr`` as ``${PREFIX}`` and are following the Filesystem Hierarchy Standard.
82
For example, choosing ``/usr`` as ``${PREFIX}`` and ``/etc`` as ``${sysconfdir}``,
83
the layout for version 2.10 will look as follows.
84
::
85

    
86
   /
87
   |
88
   +-- etc
89
   |   |
90
   |   +-- ganeti 
91
   |         |
92
   |         +-- lib -> /usr/lib/ganeti/2.10
93
   |         |
94
   |         +-- share  -> /usr/share/ganeti/2.10
95
   +-- usr
96
        |
97
        +-- bin
98
        |   |
99
        |   +-- harep -> /usr/lib/ganeti/default/usr/bin/harep
100
        |   |
101
        |   ...  
102
        |
103
        +-- sbin
104
        |   |
105
        |   +-- gnt-cluster -> /usr/share/ganeti/default/gnt-cluster
106
        |   |
107
        |   ...  
108
        |
109
        +-- ...
110
        |
111
        +-- lib
112
        |   |
113
        |   +-- ganeti
114
        |       |
115
        |       +-- default -> /etc/ganeti/lib
116
        |       |
117
        |       +-- 2.10
118
        |           |
119
        |           +-- usr
120
        |               |
121
        |               +-- bin
122
        |               |    |
123
        |               |    +-- htools
124
        |               |    |
125
        |               |    +-- harep -> htools
126
        |               |    |
127
        |               |    ...
128
        |               ...
129
        |
130
        +-- share
131
             |
132
             +-- ganeti
133
                 |
134
                 +-- default -> /etc/ganeti/share
135
                 |
136
                 +-- 2.10
137
                     |
138
                     + -- gnt-cluster
139
                     |
140
                     + -- gnt-node
141
                     |
142
                     + -- ...
143
                     |
144
                     + -- ganeti
145
                          |
146
                          +-- backend.py
147
                          |
148
                          +-- ...
149
                          |
150
                          +-- cmdlib
151
                          |   |
152
                          |   ...
153
                          ...
154

    
155

    
156

    
157
gnt-cluster upgrade
158
-------------------
159

    
160
The actual upgrade process will be done by a new command ``upgrade`` to
161
``gnt-cluster``. If called with the option ``--to`` which take precisely
162
one argument, the version to
163
upgrade (or downgrade) to, given as full string with major, minor, suffix,
164
and suffix. To be compatible with current configuration upgrade and downgrade
165
procedures, the new version must be of the same major version and
166
either an equal or higher minor version, or precisely the previous
167
minor version.
168

    
169
When executed, ``gnt-cluster upgrade --to=<version>`` will perform the
170
following actions.
171

    
172
- It verifies that the version to change to is installed on all nodes
173
  of the cluster that are not marked as offline. If this is not the
174
  case it aborts with an error. This initial testing is an
175
  optimization to allow for early feedback.
176

    
177
- An intent-to-upgrade file is created that contains the current
178
  version of ganeti, the version to change to, and the process ID of
179
  the ``gnt-cluster upgrade`` process. The latter is not used automatically,
180
  but allows manual detection if the upgrade process died
181
  unintentionally. The intend-to-upgrade file is persisted to disk
182
  before continuing.
183

    
184
- The Ganeti job queue is drained, and the executable waits till there
185
  are no more jobs in the queue. Once :doc:`design-optables` is
186
  implemented, for upgrades, and only for upgrades, all jobs are paused
187
  instead (in the sense that the currently running opcode continues,
188
  but the next opcode is not started) and it is continued once all
189
  jobs are fully paused.
190

    
191
- All ganeti daemons on the master node are stopped.
192

    
193
- It is verified again that all nodes at this moment not marked as
194
  offline have the new version installed. If this is not the case,
195
  then all changes so far (stopping ganeti daemons and draining the
196
  queue) are undone and failure is reported. This second verification
197
  is necessary, as the set of online nodes might have changed during
198
  the draining period.
199

    
200
- All ganeti daemons on all remaining (non-offline) nodes are stopped.
201

    
202
- A backup of all Ganeti-related status information is created for
203
  manual rollbacks. While the normal way of rolling back after an
204
  upgrade should be calling ``gnt-clsuter upgrade`` from the newer version
205
  with the older version as argument, a full backup provides an
206
  additional safety net, especially for jump-upgrades (skipping
207
  intermediate minor versions).
208

    
209
- If the action is a downgrade to the previous minor version, the
210
  configuration is downgraded now, using ``cfgupgrade --downgrade``.
211

    
212
- The ``${sysconfdir}/ganeti/lib`` and ``${sysconfdir}/ganeti/share``
213
  symbolic links are updated.
214

    
215
- If the action is an upgrade to a higher minor version, the configuration
216
  is upgraded now, using ``cfgupgrade``.
217

    
218
- All daemons are started on all nodes.
219

    
220
- ``ensure-dirs --full-run`` is run on all nodes.
221

    
222
- ``gnt-cluster redist-conf`` is run on the master node. 
223

    
224
- All daemons are restarted on all nodes.
225

    
226
- The Ganeti job queue is undrained.
227

    
228
- The intent-to-upgrade file is removed.
229

    
230
- ``gnt-cluster verify`` is run and the result reported.
231

    
232

    
233
Considerations on unintended reboots of the master node
234
=======================================================
235
 
236
During the upgrade procedure, the only ganeti process still running is
237
the one instance of ``gnt-cluster upgrade``. This process is also responsible
238
for eventually removing the queue drain. Therefore, we have to provide
239
means to resume this process, if it dies unintentionally. The process
240
itself will handle SIGTERM gracefully by either undoing all changes
241
done so far, or by ignoring the signal all together and continuing to
242
the end; the choice between these behaviors depends on whether change
243
of the configuration has already started (in which case it goes
244
through to the end), or not (in which case the actions done so far are
245
rolled back).
246

    
247
To achieve this, ``gnt-cluster upgrade`` will support a ``--resume``
248
option. It is recommended
249
to have ``gnt-cluster upgrade --resume`` as an at-reboot task in the crontab.
250
The ``gnt-cluster upgrade --resume`` comand first verifies that
251
it is running on the master node, using the same requirement as for
252
starting the master daemon, i.e., confirmed by a majority of all
253
nodes. If it is not the master node, it will remove any possibly
254
existing intend-to-upgrade file and exit. If it is running on the
255
master node, it will check for the existence of an intend-to-upgrade
256
file. If no such file is found, it will simply exit. If found, it will
257
resume at the appropriate stage.
258

    
259
- If the configuration file still is at the initial version,
260
  ``gnt-cluster upgrade`` is resumed at the step immediately following the
261
  writing of the intend-to-upgrade file. It should be noted that
262
  all steps before changing the configuration are idempotent, so
263
  redoing them does not do any harm.
264

    
265
- If the configuration is already at the new version, all daemons on
266
  all nodes are stopped (as they might have been started again due
267
  to a reboot) and then it is resumed at the step immediately
268
  following the configuration change. All actions following the
269
  configuration change can be repeated without bringing the cluster
270
  into a worse state.
271

    
272

    
273
Caveats
274
=======
275

    
276
Since ``gnt-cluster upgrade`` drains the queue and undrains it later, so any
277
information about a previous drain gets lost. This problem will
278
disappear, once :doc:`design-optables` is implemented, as then the
279
undrain will then be restricted to filters by gnt-upgrade.
280

    
281

    
282
Requirement of job queue update
283
===============================
284

    
285
Since for upgrades we only pause jobs and do not fully drain the
286
queue, we need to be able to transform the job queue into a queue for
287
the new version. The preferred way to obtain this is to keep the
288
serialization format backwards compatible, i.e., only adding new
289
opcodes and new optional fields.
290

    
291
However, even with soft drain, no job is running at the moment `cfgupgrade`
292
is running. So, if we change the queue representation, including the
293
representation of individual opcodes in any way, `cfgupgrade` will also
294
modify the queue accordingly. In a jobs-as-processes world, pausing a job
295
will be implemented in such a way that the corresponding process stops after
296
finishing the current opcode, and a new process is created if and when the
297
job is unpaused again.