Statistics
| Branch: | Tag: | Revision:

root / doc / design-upgrade.rst @ 69809ae3

History | View | Annotate | Download (12 kB)

1
========================================
2
Automatized Upgrade Procedure for Ganeti
3
========================================
4

    
5
.. contents:: :depth: 4
6

    
7
This is a design document detailing the proposed changes to the
8
upgrade process, in order to allow it to be more automatic.
9

    
10

    
11
Current state and shortcomings
12
==============================
13

    
14
Ganeti requires to run the same version of Ganeti to be run on all
15
nodes of a cluster and this requirement is unlikely to go away in the
16
foreseeable future. Also, the configuration may change between minor
17
versions (and in the past has proven to do so). This requires a quite
18
involved manual upgrade process of draining the queue, stopping
19
ganeti, changing the binaries, upgrading the configuration, starting
20
ganeti, distributing the configuration, and undraining the queue.
21

    
22

    
23
Proposed changes
24
================
25

    
26
While we will not remove the requirement of the same Ganeti
27
version running on all nodes, the transition from one version
28
to the other will be made more automatic. It will be possible
29
to install new binaries ahead of time, and the actual switch
30
between versions will be a single command.
31

    
32
While changing the file layout anyway, we install the python
33
code, which is architecture independent, under ``${prefix}/share``,
34
in a way that properly separates the Ganeti libraries of the
35
various versions. 
36

    
37
Path changes to allow multiple versions installed
38
-------------------------------------------------
39

    
40
Currently, Ganeti installs to ``${PREFIX}/bin``, ``${PREFIX}/sbin``,
41
and so on, as well as to ``${pythondir}/ganeti``.
42

    
43
These paths will be changed in the following way.
44

    
45
- The python package will be installed to
46
  ``${PREFIX}/share/ganeti/${VERSION}/ganeti``.
47
  Here ${VERSION} is, depending on configure options, either the full qualified
48
  version number, consisting of major, minor, revision, and suffix, or it is
49
  just a major.minor pair. All python executables will be installed under
50
  ``${PREFIX}/share/ganeti/${VERSION}`` so that they see their respective
51
  Ganeti library. ``${PREFIX}/share/ganeti/default`` is a symbolic link to
52
  ``${sysconfdir}/ganeti/share`` which, in turn, is a symbolic link to
53
  ``${PREFIX}/share/ganeti/${VERSION}``. For all python executatables (like
54
  ``gnt-cluster``, ``gnt-node``, etc) symbolic links going through
55
  ``${PREFIX}/share/ganeti/default`` are added under ``${PREFIX}/sbin``.
56

    
57
- All other files will be installed to the corresponding path under
58
  ``${libdir}/ganeti/${VERSION}`` instead of under ``${PREFIX}``
59
  directly, where ``${libdir}`` defaults to ``${PREFIX}/lib``.
60
  ``${libdir}/ganeti/default`` will be a symlink to ``${sysconfdir}/ganeti/lib``
61
  which, in turn, is a symlink to ``${libdir}/ganeti/${VERSION}``.
62
  Symbolic links to the files installed under ``${libdir}/ganeti/${VERSION}``
63
  will be added under ``${PREFIX}/bin``, ``${PREFIX}/sbin``, and so on. These
64
  symbolic links will go through ``${libdir}/ganeti/default`` so that the
65
  version can easily be changed by updating the symbolic link in
66
  ``${sysconfdir}``.
67

    
68
The set of links for ganeti binaries might change between the versions.
69
However, as the file structure under ``${libdir}/ganeti/${VERSION}`` reflects
70
that of ``/``, two links of differnt versions will never conflict. Similarly,
71
the symbolic links for the python executables will never conflict, as they
72
always point to a file with the same basename directly under
73
``${PREFIX}/share/ganeti/default``. Therefore, each version will make sure that
74
enough symbolic links are present in ``${PREFIX}/bin``, ``${PREFIX}/sbin`` and
75
so on, even though some might be dangling, if a differnt version of ganeti is
76
currently active.
77

    
78
The extra indirection through ``${sysconfdir}`` allows installations that choose
79
to have ``${sysconfdir}`` and ``${localstatedir}`` outside ``${PREFIX}`` to
80
mount ``${PREFIX}`` read-only. The latter is important for systems that choose
81
``/usr`` as ``${PREFIX}`` and are following the Filesystem Hierarchy Standard.
82
For example, choosing ``/usr`` as ``${PREFIX}`` and ``/etc`` as ``${sysconfdir}``,
83
the layout for version 2.10 will look as follows.
84
::
85

    
86
   /
87
   |
88
   +-- etc
89
   |   |
90
   |   +-- ganeti 
91
   |         |
92
   |         +-- lib -> /usr/lib/ganeti/2.10
93
   |         |
94
   |         +-- share  -> /usr/share/ganeti/2.10
95
   +-- usr
96
        |
97
        +-- bin
98
        |   |
99
        |   +-- harep -> /usr/lib/ganeti/default/usr/bin/harep
100
        |   |
101
        |   ...  
102
        |
103
        +-- sbin
104
        |   |
105
        |   +-- gnt-cluster -> /usr/share/ganeti/default/gnt-cluster
106
        |   |
107
        |   ...  
108
        |
109
        +-- ...
110
        |
111
        +-- lib
112
        |   |
113
        |   +-- ganeti
114
        |       |
115
        |       +-- default -> /etc/ganeti/lib
116
        |       |
117
        |       +-- 2.10
118
        |           |
119
        |           +-- usr
120
        |               |
121
        |               +-- bin
122
        |               |    |
123
        |               |    +-- htools
124
        |               |    |
125
        |               |    +-- harep -> htools
126
        |               |    |
127
        |               |    ...
128
        |               ...
129
        |
130
        +-- share
131
             |
132
             +-- ganeti
133
                 |
134
                 +-- default -> /etc/ganeti/share
135
                 |
136
                 +-- 2.10
137
                     |
138
                     + -- gnt-cluster
139
                     |
140
                     + -- gnt-node
141
                     |
142
                     + -- ...
143
                     |
144
                     + -- ganeti
145
                          |
146
                          +-- backend.py
147
                          |
148
                          +-- ...
149
                          |
150
                          +-- cmdlib
151
                          |   |
152
                          |   ...
153
                          ...
154

    
155

    
156

    
157
gnt-cluster upgrade
158
-------------------
159

    
160
The actual upgrade process will be done by a new command ``upgrade`` to
161
``gnt-cluster``. If called with the option ``--to`` which take precisely
162
one argument, the version to
163
upgrade (or downgrade) to, given as full string with major, minor, suffix,
164
and suffix. To be compatible with current configuration upgrade and downgrade
165
procedures, the new version must be of the same major version and
166
either an equal or higher minor version, or precisely the previous
167
minor version.
168

    
169
When executed, ``gnt-cluster upgrade --to=<version>`` will perform the
170
following actions.
171

    
172
- It verifies that the version to change to is installed on all nodes
173
  of the cluster that are not marked as offline. If this is not the
174
  case it aborts with an error. This initial testing is an
175
  optimization to allow for early feedback.
176

    
177
- An intent-to-upgrade file is created that contains the current
178
  version of ganeti, the version to change to, and the process ID of
179
  the ``gnt-cluster upgrade`` process. The latter is not used automatically,
180
  but allows manual detection if the upgrade process died
181
  unintentionally. The intend-to-upgrade file is persisted to disk
182
  before continuing.
183

    
184
- The Ganeti job queue is drained, and the executable waits till there
185
  are no more jobs in the queue. Once :doc:`design-optables` is
186
  implemented, for upgrades, and only for upgrades, all jobs are paused
187
  instead (in the sense that the currently running opcode continues,
188
  but the next opcode is not started) and it is continued once all
189
  jobs are fully paused.
190

    
191
- All ganeti daemons on the master node are stopped.
192

    
193
- It is verified again that all nodes at this moment not marked as
194
  offline have the new version installed. If this is not the case,
195
  then all changes so far (stopping ganeti daemons and draining the
196
  queue) are undone and failure is reported. This second verification
197
  is necessary, as the set of online nodes might have changed during
198
  the draining period.
199

    
200
- All ganeti daemons on all remaining (non-offline) nodes are stopped.
201

    
202
- A backup of all Ganeti-related status information is created for
203
  manual rollbacks. While the normal way of rolling back after an
204
  upgrade should be calling ``gnt-clsuter upgrade`` from the newer version
205
  with the older version as argument, a full backup provides an
206
  additional safety net, especially for jump-upgrades (skipping
207
  intermediate minor versions).
208

    
209
- If the action is a downgrade to the previous minor version, the
210
  configuration is downgraded now, using ``cfgupgrade --downgrade``.
211

    
212
- If the action is downgrade, any version-specific additional downgrade
213
  actions are carried out.
214

    
215
- The ``${sysconfdir}/ganeti/lib`` and ``${sysconfdir}/ganeti/share``
216
  symbolic links are updated.
217

    
218
- If the action is an upgrade to a higher minor version, the configuration
219
  is upgraded now, using ``cfgupgrade``.
220

    
221
- ``ensure-dirs --full-run`` is run on all nodes.
222

    
223
- All daemons are started on all nodes.
224

    
225
- ``gnt-cluster redist-conf`` is run on the master node. 
226

    
227
- All daemons are restarted on all nodes.
228

    
229
- The Ganeti job queue is undrained.
230

    
231
- The intent-to-upgrade file is removed.
232

    
233
- ``post-upgrade`` is run with the original version as argument.
234

    
235
- ``gnt-cluster verify`` is run and the result reported.
236

    
237

    
238
Considerations on unintended reboots of the master node
239
=======================================================
240
 
241
During the upgrade procedure, the only ganeti process still running is
242
the one instance of ``gnt-cluster upgrade``. This process is also responsible
243
for eventually removing the queue drain. Therefore, we have to provide
244
means to resume this process, if it dies unintentionally. The process
245
itself will handle SIGTERM gracefully by either undoing all changes
246
done so far, or by ignoring the signal all together and continuing to
247
the end; the choice between these behaviors depends on whether change
248
of the configuration has already started (in which case it goes
249
through to the end), or not (in which case the actions done so far are
250
rolled back).
251

    
252
To achieve this, ``gnt-cluster upgrade`` will support a ``--resume``
253
option. It is recommended
254
to have ``gnt-cluster upgrade --resume`` as an at-reboot task in the crontab.
255
The ``gnt-cluster upgrade --resume`` comand first verifies that
256
it is running on the master node, using the same requirement as for
257
starting the master daemon, i.e., confirmed by a majority of all
258
nodes. If it is not the master node, it will remove any possibly
259
existing intend-to-upgrade file and exit. If it is running on the
260
master node, it will check for the existence of an intend-to-upgrade
261
file. If no such file is found, it will simply exit. If found, it will
262
resume at the appropriate stage.
263

    
264
- If the configuration file still is at the initial version,
265
  ``gnt-cluster upgrade`` is resumed at the step immediately following the
266
  writing of the intend-to-upgrade file. It should be noted that
267
  all steps before changing the configuration are idempotent, so
268
  redoing them does not do any harm.
269

    
270
- If the configuration is already at the new version, all daemons on
271
  all nodes are stopped (as they might have been started again due
272
  to a reboot) and then it is resumed at the step immediately
273
  following the configuration change. All actions following the
274
  configuration change can be repeated without bringing the cluster
275
  into a worse state.
276

    
277

    
278
Caveats
279
=======
280

    
281
Since ``gnt-cluster upgrade`` drains the queue and undrains it later, so any
282
information about a previous drain gets lost. This problem will
283
disappear, once :doc:`design-optables` is implemented, as then the
284
undrain will then be restricted to filters by gnt-upgrade.
285

    
286

    
287
Requirement of job queue update
288
===============================
289

    
290
Since for upgrades we only pause jobs and do not fully drain the
291
queue, we need to be able to transform the job queue into a queue for
292
the new version. The preferred way to obtain this is to keep the
293
serialization format backwards compatible, i.e., only adding new
294
opcodes and new optional fields.
295

    
296
However, even with soft drain, no job is running at the moment `cfgupgrade`
297
is running. So, if we change the queue representation, including the
298
representation of individual opcodes in any way, `cfgupgrade` will also
299
modify the queue accordingly. In a jobs-as-processes world, pausing a job
300
will be implemented in such a way that the corresponding process stops after
301
finishing the current opcode, and a new process is created if and when the
302
job is unpaused again.