Statistics
| Branch: | Tag: | Revision:

root / doc / design-upgrade.rst @ b8e39189

History | View | Annotate | Download (11.5 kB)

1
========================================
2
Automatized Upgrade Procedure for Ganeti
3
========================================
4

    
5
.. contents:: :depth: 4
6

    
7
This is a design document detailing the proposed changes to the
8
upgrade process, in order to allow it to be more automatic.
9

    
10

    
11
Current state and shortcomings
12
==============================
13

    
14
Ganeti requires to run the same version of Ganeti to be run on all
15
nodes of a cluster and this requirement is unlikely to go away in the
16
foreseeable future. Also, the configuration may change between minor
17
versions (and in the past has proven to do so). This requires a quite
18
involved manual upgrade process of draining the queue, stopping
19
ganeti, changing the binaries, upgrading the configuration, starting
20
ganeti, distributing the configuration, and undraining the queue.
21

    
22

    
23
Proposed changes
24
================
25

    
26
While we will not remove the requirement of the same Ganeti
27
version running on all nodes, the transition from one version
28
to the other will be made more automatic. It will be possible
29
to install new binaries ahead of time, and the actual switch
30
between versions will be a single command.
31

    
32
While changing the file layout anyway, we install the python
33
code, which is architecture independent, under ``${prefix}/share``,
34
in a way that properly separates the Ganeti libraries of the
35
various versions. 
36

    
37
Path changes to allow multiple versions installed
38
-------------------------------------------------
39

    
40
Currently, Ganeti installs to ``${PREFIX}/bin``, ``${PREFIX}/sbin``,
41
and so on, as well as to ``${pythondir}/ganeti``.
42

    
43
These paths will be changed in the following way.
44

    
45
- The python package will be installed to
46
  ``${PREFIX}/share/ganeti/${VERSION}/ganeti``.
47
  Here ${VERSION} is, depending on configure options, either the full qualified
48
  version number, consisting of major, minor, revision, and suffix, or it is
49
  just a major.minor pair. All python executables will be installed under
50
  ``${PREFIX}/share/ganeti/${VERSION}`` so that they see their respective
51
  Ganeti library. ``${PREFIX}/share/ganeti/default`` is a symbolic link to
52
  ``${sysconfdir}/ganeti/share`` which, in turn, is a symbolic link to
53
  ``${PREFIX}/share/ganeti/${VERSION}``. For all python executatables (like
54
  ``gnt-cluster``, ``gnt-node``, etc) symbolic links going through
55
  ``${PREFIX}/share/ganeti/default`` are added under ``${PREFIX}/sbin``.
56

    
57
- All other files will be installed to the corresponding path under
58
  ``${libdir}/ganeti/${VERSION}`` instead of under ``${PREFIX}``
59
  directly, where ``${libdir}`` defaults to ``${PREFIX}/lib``.
60
  ``${libdir}/ganeti/default`` will be a symlink to ``${sysconfdir}/ganeti/lib``
61
  which, in turn, is a symlink to ``${libdir}/ganeti/${VERSION}``.
62
  Symbolic links to the files installed under ``${libdir}/ganeti/${VERSION}``
63
  will be added under ``${PREFIX}/bin``, ``${PREFIX}/sbin``, and so on. These
64
  symbolic links will go through ``${libdir}/ganeti/default`` so that the
65
  version can easily be changed by updating the symbolic link in
66
  ``${sysconfdir}``.
67

    
68
The set of links for ganeti binaries might change between the versions.
69
However, as the file structure under ``${libdir}/ganeti/${VERSION}`` reflects
70
that of ``/``, two links of differnt versions will never conflict. Similarly,
71
the symbolic links for the python executables will never conflict, as they
72
always point to a file with the same basename directly under
73
``${PREFIX}/share/ganeti/default``. Therefore, each version will make sure that
74
enough symbolic links are present in ``${PREFIX}/bin``, ``${PREFIX}/sbin`` and
75
so on, even though some might be dangling, if a differnt version of ganeti is
76
currently active.
77

    
78
The extra indirection through ``${sysconfdir}`` allows installations that choose
79
to have ``${sysconfdir}`` and ``${localstatedir}`` outside ``${PREFIX}`` to
80
mount ``${PREFIX}`` read-only. The latter is important for systems that choose
81
``/usr`` as ``${PREFIX}`` and are following the Filesystem Hierarchy Standard.
82
For example, choosing ``/usr`` as ``${PREFIX}`` and ``/etc`` as ``${sysconfdir}``,
83
the layout for version 2.10 will look as follows.
84
::
85

    
86
   /
87
   |
88
   +-- etc
89
   |   |
90
   |   +-- ganeti 
91
   |         |
92
   |         +-- lib -> /usr/lib/ganeti/2.10
93
   |         |
94
   |         +-- share  -> /usr/share/ganeti/2.10
95
   +-- usr
96
        |
97
        +-- bin
98
        |   |
99
        |   +-- harep -> /usr/lib/ganeti/default/usr/bin/harep
100
        |   |
101
        |   ...  
102
        |
103
        +-- sbin
104
        |   |
105
        |   +-- gnt-cluster -> /usr/share/ganeti/default/gnt-cluster
106
        |   |
107
        |   ...  
108
        |
109
        +-- ...
110
        |
111
        +-- lib
112
        |   |
113
        |   +-- ganeti
114
        |       |
115
        |       +-- default -> /etc/ganeti/lib
116
        |       |
117
        |       +-- 2.10
118
        |           |
119
        |           +-- usr
120
        |               |
121
        |               +-- bin
122
        |               |    |
123
        |               |    +-- htools
124
        |               |    |
125
        |               |    +-- harep -> htools
126
        |               |    |
127
        |               |    ...
128
        |               ...
129
        |
130
        +-- share
131
             |
132
             +-- ganeti
133
                 |
134
                 +-- default -> /etc/ganeti/share
135
                 |
136
                 +-- 2.10
137
                     |
138
                     + -- gnt-cluster
139
                     |
140
                     + -- gnt-node
141
                     |
142
                     + -- ...
143
                     |
144
                     + -- ganeti
145
                          |
146
                          +-- backend.py
147
                          |
148
                          +-- ...
149
                          |
150
                          +-- cmdlib
151
                          |   |
152
                          |   ...
153
                          ...
154

    
155

    
156

    
157
gnt-upgrade
158
-----------
159

    
160
The actual upgrade process will be done by a new binary,
161
``gnt-upgrade``. It will take precisely one argument, the version to
162
upgrade (or downgrade) to, given as full string with major, minor, suffix,
163
and suffix. To be compatible with current configuration upgrade and downgrade
164
procedures, the new version must be of the same major version and
165
either an equal or higher minor version, or precisely the previous
166
minor version.
167

    
168
When executed, ``gnt-upgrade`` will perform the following actions.
169

    
170
- It verifies that the version to change to is installed on all nodes
171
  of the cluster that are not marked as offline. If this is not the
172
  case it aborts with an error. This initial testing is an
173
  optimization to allow for early feedback.
174

    
175
- An intent-to-upgrade file is created that contains the current
176
  version of ganeti, the version to change to, and the process ID of
177
  the ``gnt-upgrade`` process. The latter is not used automatically,
178
  but allows manual detection if the upgrade process died
179
  unintentionally. The intend-to-upgrade file is persisted to disk
180
  before continuing.
181

    
182
- The Ganeti job queue is drained, and the executable waits till there
183
  are no more jobs in the queue. Once :doc:`design-optables` is
184
  implemented, for upgrades, and only for upgrades, all jobs are paused
185
  instead (in the sense that the currently running opcode continues,
186
  but the next opcode is not started) and it is continued once all
187
  jobs are fully paused.
188

    
189
- All ganeti daemons on the master node are stopped.
190

    
191
- It is verified again that all nodes at this moment not marked as
192
  offline have the new version installed. If this is not the case,
193
  then all changes so far (stopping ganeti daemons and draining the
194
  queue) are undone and failure is reported. This second verification
195
  is necessary, as the set of online nodes might have changed during
196
  the draining period.
197

    
198
- All ganeti daemons on all remaining (non-offline) nodes are stopped.
199

    
200
- A backup of all Ganeti-related status information is created for
201
  manual rollbacks. While the normal way of rolling back after an
202
  upgrade should be calling ``gnt-upgrade`` from the newer version
203
  with the older version as argument, a full backup provides an
204
  additional safety net, especially for jump-upgrades (skipping
205
  intermediate minor versions).
206

    
207
- If the action is a downgrade to the previous minor version, the
208
  configuration is downgraded now, using ``cfgupgrade --downgrade``.
209

    
210
- The ``${sysconfdir}/ganeti/lib`` and ``${sysconfdir}/ganeti/share``
211
  symbolic links are updated.
212

    
213
- If the action is an upgrade to a higher minor version, the configuration
214
  is upgraded now, using ``cfgupgrade``.
215

    
216
- All daemons are started on all nodes.
217

    
218
- ``ensure-dirs --full-run`` is run on all nodes.
219

    
220
- ``gnt-cluster redist-conf`` is run on the master node. 
221

    
222
- All daemons are restarted on all nodes.
223

    
224
- The Ganeti job queue is undrained.
225

    
226
- The intent-to-upgrade file is removed.
227

    
228
- ``gnt-cluster verify`` is run and the result reported.
229

    
230

    
231
Considerations on unintended reboots of the master node
232
=======================================================
233
 
234
During the upgrade procedure, the only ganeti process still running is
235
the one instance of ``gnt-upgrade``. This process is also responsible
236
for eventually removing the queue drain. Therefore, we have to provide
237
means to resume this process, if it dies unintentionally. The process
238
itself will handle SIGTERM gracefully by either undoing all changes
239
done so far, or by ignoring the signal all together and continuing to
240
the end; the choice between these behaviors depends on whether change
241
of the configuration has already started (in which case it goes
242
through to the end), or not (in which case the actions done so far are
243
rolled back).
244

    
245
To achieve this, ``gnt-upgrade`` will support a ``--resume``
246
option. It is recommended to have ``gnt-upgrade --resume`` as an
247
at-reboot task in the crontab. If started with this option,
248
``gnt-upgrade`` does not accept any arguments. It first verifies that
249
it is running on the master node, using the same requirement as for
250
starting the master daemon, i.e., confirmed by a majority of all
251
nodes. If it is not the master node, it will remove any possibly
252
existing intend-to-upgrade file and exit. If it is running on the
253
master node, it will check for the existence of an intend-to-upgrade
254
file. If no such file is found, it will simply exit. If found, it will
255
resume at the appropriate stage.
256

    
257
- If the configuration file still is at the initial version,
258
  ``gnt-upgrade`` is resumed at the step immediately following the
259
  writing of the intend-to-upgrade file. It should be noted that
260
  all steps before changing the configuration are idempotent, so
261
  redoing them does not do any harm.
262

    
263
- If the configuration is already at the new version, all daemons on
264
  all nodes are stopped (as they might have been started again due
265
  to a reboot) and then it is resumed at the step immediately
266
  following the configuration change. All actions following the
267
  configuration change can be repeated without bringing the cluster
268
  into a worse state.
269

    
270

    
271
Caveats
272
=======
273

    
274
Since ``gnt-upgrade`` drains the queue and undrains it later, so any
275
information about a previous drain gets lost. This problem will
276
disappear, once :doc:`design-optables` is implemented, as then the
277
undrain will then be restricted to filters by gnt-upgrade.
278

    
279

    
280
Requirement of opcode backwards compatibility
281
==============================================
282

    
283
Since for upgrades we only pause jobs and do not fully drain the
284
queue, we need to be able to transform the job queue into a queue for
285
the new version. The way this is achieved is by keeping the
286
serialization format backwards compatible. This is in line with
287
current practice that opcodes do not change between versions, and at
288
most new fields are added. Whenever we add a new field to an opcode,
289
we will make sure that the deserialization function will provide a
290
default value if the field is not present.
291

    
292