Statistics
| Branch: | Tag: | Revision:

root / doc / design-upgrade.rst @ 5ac19ed3

History | View | Annotate | Download (11.7 kB)

1
========================================
2
Automatized Upgrade Procedure for Ganeti
3
========================================
4

    
5
.. contents:: :depth: 4
6

    
7
This is a design document detailing the proposed changes to the
8
upgrade process, in order to allow it to be more automatic.
9

    
10

    
11
Current state and shortcomings
12
==============================
13

    
14
Ganeti requires to run the same version of Ganeti to be run on all
15
nodes of a cluster and this requirement is unlikely to go away in the
16
foreseeable future. Also, the configuration may change between minor
17
versions (and in the past has proven to do so). This requires a quite
18
involved manual upgrade process of draining the queue, stopping
19
ganeti, changing the binaries, upgrading the configuration, starting
20
ganeti, distributing the configuration, and undraining the queue.
21

    
22

    
23
Proposed changes
24
================
25

    
26
While we will not remove the requirement of the same Ganeti
27
version running on all nodes, the transition from one version
28
to the other will be made more automatic. It will be possible
29
to install new binaries ahead of time, and the actual switch
30
between versions will be a single command.
31

    
32
While changing the file layout anyway, we install the python
33
code, which is architecture independent, under ``${prefix}/share``,
34
in a way that properly separates the Ganeti libraries of the
35
various versions. 
36

    
37
Path changes to allow multiple versions installed
38
-------------------------------------------------
39

    
40
Currently, Ganeti installs to ``${PREFIX}/bin``, ``${PREFIX}/sbin``,
41
and so on, as well as to ``${pythondir}/ganeti``.
42

    
43
These paths will be changed in the following way.
44

    
45
- The python package will be installed to
46
  ``${PREFIX}/share/ganeti/${VERSION}/ganeti``.
47
  Here ${VERSION} is, depending on configure options, either the full qualified
48
  version number, consisting of major, minor, revision, and suffix, or it is
49
  just a major.minor pair. All python executables will be installed under
50
  ``${PREFIX}/share/ganeti/${VERSION}`` so that they see their respective
51
  Ganeti library. ``${PREFIX}/share/ganeti/default`` is a symbolic link to
52
  ``${sysconfdir}/ganeti/share`` which, in turn, is a symbolic link to
53
  ``${PREFIX}/share/ganeti/${VERSION}``. For all python executatables (like
54
  ``gnt-cluster``, ``gnt-node``, etc) symbolic links going through
55
  ``${PREFIX}/share/ganeti/default`` are added under ``${PREFIX}/sbin``.
56

    
57
- All other files will be installed to the corresponding path under
58
  ``${libdir}/ganeti/${VERSION}`` instead of under ``${PREFIX}``
59
  directly, where ``${libdir}`` defaults to ``${PREFIX}/lib``.
60
  ``${libdir}/ganeti/default`` will be a symlink to ``${sysconfdir}/ganeti/lib``
61
  which, in turn, is a symlink to ``${libdir}/ganeti/${VERSION}``.
62
  Symbolic links to the files installed under ``${libdir}/ganeti/${VERSION}``
63
  will be added under ``${PREFIX}/bin``, ``${PREFIX}/sbin``, and so on. These
64
  symbolic links will go through ``${libdir}/ganeti/default`` so that the
65
  version can easily be changed by updating the symbolic link in
66
  ``${sysconfdir}``.
67

    
68
The set of links for ganeti binaries might change between the versions.
69
However, as the file structure under ``${libdir}/ganeti/${VERSION}`` reflects
70
that of ``/``, two links of differnt versions will never conflict. Similarly,
71
the symbolic links for the python executables will never conflict, as they
72
always point to a file with the same basename directly under
73
``${PREFIX}/share/ganeti/default``. Therefore, each version will make sure that
74
enough symbolic links are present in ``${PREFIX}/bin``, ``${PREFIX}/sbin`` and
75
so on, even though some might be dangling, if a differnt version of ganeti is
76
currently active.
77

    
78
The extra indirection through ``${sysconfdir}`` allows installations that choose
79
to have ``${sysconfdir}`` and ``${localstatedir}`` outside ``${PREFIX}`` to
80
mount ``${PREFIX}`` read-only. The latter is important for systems that choose
81
``/usr`` as ``${PREFIX}`` and are following the Filesystem Hierarchy Standard.
82
For example, choosing ``/usr`` as ``${PREFIX}`` and ``/etc`` as ``${sysconfdir}``,
83
the layout for version 2.10 will look as follows.
84
::
85

    
86
   /
87
   |
88
   +-- etc
89
   |   |
90
   |   +-- ganeti 
91
   |         |
92
   |         +-- lib -> /usr/lib/ganeti/2.10
93
   |         |
94
   |         +-- share  -> /usr/share/ganeti/2.10
95
   +-- usr
96
        |
97
        +-- bin
98
        |   |
99
        |   +-- harep -> /usr/lib/ganeti/default/usr/bin/harep
100
        |   |
101
        |   ...  
102
        |
103
        +-- sbin
104
        |   |
105
        |   +-- gnt-cluster -> /usr/share/ganeti/default/gnt-cluster
106
        |   |
107
        |   ...  
108
        |
109
        +-- ...
110
        |
111
        +-- lib
112
        |   |
113
        |   +-- ganeti
114
        |       |
115
        |       +-- default -> /etc/ganeti/lib
116
        |       |
117
        |       +-- 2.10
118
        |           |
119
        |           +-- usr
120
        |               |
121
        |               +-- bin
122
        |               |    |
123
        |               |    +-- htools
124
        |               |    |
125
        |               |    +-- harep -> htools
126
        |               |    |
127
        |               |    ...
128
        |               ...
129
        |
130
        +-- share
131
             |
132
             +-- ganeti
133
                 |
134
                 +-- default -> /etc/ganeti/share
135
                 |
136
                 +-- 2.10
137
                     |
138
                     + -- gnt-cluster
139
                     |
140
                     + -- gnt-node
141
                     |
142
                     + -- ...
143
                     |
144
                     + -- ganeti
145
                          |
146
                          +-- backend.py
147
                          |
148
                          +-- ...
149
                          |
150
                          +-- cmdlib
151
                          |   |
152
                          |   ...
153
                          ...
154

    
155

    
156

    
157
gnt-cluster upgrade
158
-------------------
159

    
160
The actual upgrade process will be done by a new command ``upgrade`` to
161
``gnt-cluster``. If called with the option ``--to`` which take precisely
162
one argument, the version to
163
upgrade (or downgrade) to, given as full string with major, minor, suffix,
164
and suffix. To be compatible with current configuration upgrade and downgrade
165
procedures, the new version must be of the same major version and
166
either an equal or higher minor version, or precisely the previous
167
minor version.
168

    
169
When executed, ``gnt-cluster upgrade --to=<version>`` will perform the
170
following actions.
171

    
172
- It verifies that the version to change to is installed on all nodes
173
  of the cluster that are not marked as offline. If this is not the
174
  case it aborts with an error. This initial testing is an
175
  optimization to allow for early feedback.
176

    
177
- An intent-to-upgrade file is created that contains the current
178
  version of ganeti, the version to change to, and the process ID of
179
  the ``gnt-cluster upgrade`` process. The latter is not used automatically,
180
  but allows manual detection if the upgrade process died
181
  unintentionally. The intend-to-upgrade file is persisted to disk
182
  before continuing.
183

    
184
- The Ganeti job queue is drained, and the executable waits till there
185
  are no more jobs in the queue. Once :doc:`design-optables` is
186
  implemented, for upgrades, and only for upgrades, all jobs are paused
187
  instead (in the sense that the currently running opcode continues,
188
  but the next opcode is not started) and it is continued once all
189
  jobs are fully paused.
190

    
191
- All ganeti daemons on the master node are stopped.
192

    
193
- It is verified again that all nodes at this moment not marked as
194
  offline have the new version installed. If this is not the case,
195
  then all changes so far (stopping ganeti daemons and draining the
196
  queue) are undone and failure is reported. This second verification
197
  is necessary, as the set of online nodes might have changed during
198
  the draining period.
199

    
200
- All ganeti daemons on all remaining (non-offline) nodes are stopped.
201

    
202
- A backup of all Ganeti-related status information is created for
203
  manual rollbacks. While the normal way of rolling back after an
204
  upgrade should be calling ``gnt-clsuter upgrade`` from the newer version
205
  with the older version as argument, a full backup provides an
206
  additional safety net, especially for jump-upgrades (skipping
207
  intermediate minor versions).
208

    
209
- If the action is a downgrade to the previous minor version, the
210
  configuration is downgraded now, using ``cfgupgrade --downgrade``.
211

    
212
- The ``${sysconfdir}/ganeti/lib`` and ``${sysconfdir}/ganeti/share``
213
  symbolic links are updated.
214

    
215
- If the action is an upgrade to a higher minor version, the configuration
216
  is upgraded now, using ``cfgupgrade``.
217

    
218
- ``ensure-dirs --full-run`` is run on all nodes.
219

    
220
- All daemons are started on all nodes.
221

    
222
- ``gnt-cluster redist-conf`` is run on the master node. 
223

    
224
- All daemons are restarted on all nodes.
225

    
226
- The Ganeti job queue is undrained.
227

    
228
- The intent-to-upgrade file is removed.
229

    
230
- ``post-upgrade`` is run with the original version as argument.
231

    
232
- ``gnt-cluster verify`` is run and the result reported.
233

    
234

    
235
Considerations on unintended reboots of the master node
236
=======================================================
237
 
238
During the upgrade procedure, the only ganeti process still running is
239
the one instance of ``gnt-cluster upgrade``. This process is also responsible
240
for eventually removing the queue drain. Therefore, we have to provide
241
means to resume this process, if it dies unintentionally. The process
242
itself will handle SIGTERM gracefully by either undoing all changes
243
done so far, or by ignoring the signal all together and continuing to
244
the end; the choice between these behaviors depends on whether change
245
of the configuration has already started (in which case it goes
246
through to the end), or not (in which case the actions done so far are
247
rolled back).
248

    
249
To achieve this, ``gnt-cluster upgrade`` will support a ``--resume``
250
option. It is recommended
251
to have ``gnt-cluster upgrade --resume`` as an at-reboot task in the crontab.
252
The ``gnt-cluster upgrade --resume`` comand first verifies that
253
it is running on the master node, using the same requirement as for
254
starting the master daemon, i.e., confirmed by a majority of all
255
nodes. If it is not the master node, it will remove any possibly
256
existing intend-to-upgrade file and exit. If it is running on the
257
master node, it will check for the existence of an intend-to-upgrade
258
file. If no such file is found, it will simply exit. If found, it will
259
resume at the appropriate stage.
260

    
261
- If the configuration file still is at the initial version,
262
  ``gnt-cluster upgrade`` is resumed at the step immediately following the
263
  writing of the intend-to-upgrade file. It should be noted that
264
  all steps before changing the configuration are idempotent, so
265
  redoing them does not do any harm.
266

    
267
- If the configuration is already at the new version, all daemons on
268
  all nodes are stopped (as they might have been started again due
269
  to a reboot) and then it is resumed at the step immediately
270
  following the configuration change. All actions following the
271
  configuration change can be repeated without bringing the cluster
272
  into a worse state.
273

    
274

    
275
Caveats
276
=======
277

    
278
Since ``gnt-cluster upgrade`` drains the queue and undrains it later, so any
279
information about a previous drain gets lost. This problem will
280
disappear, once :doc:`design-optables` is implemented, as then the
281
undrain will then be restricted to filters by gnt-upgrade.
282

    
283

    
284
Requirement of opcode backwards compatibility
285
==============================================
286

    
287
Since for upgrades we only pause jobs and do not fully drain the
288
queue, we need to be able to transform the job queue into a queue for
289
the new version. The way this is achieved is by keeping the
290
serialization format backwards compatible. This is in line with
291
current practice that opcodes do not change between versions, and at
292
most new fields are added. Whenever we add a new field to an opcode,
293
we will make sure that the deserialization function will provide a
294
default value if the field is not present.
295

    
296