Revision b123fb31

b/doc/design-node-security.rst
67 67
in the virtualization stacks to gain access to the host machines as well.
68 68

  
69 69

  
70
Proposal concerning SSH key distribution
71
----------------------------------------
70
Proposal concerning SSH host key distribution
71
---------------------------------------------
72

  
73
We propose the following design regarding the SSH host key handling. The
74
root keys are untouched by this design.
75

  
76
Each node gets its own ssh private/public key pair, but only the public
77
keys of the master candidates get added to the ``authorized_keys`` file
78
of all nodes. This has the advantages, that:
79

  
80
- Only master candidates can ssh into other nodes, thus compromised
81
  nodes cannot compromise the cluster further.
82
- One can remove a compromised master candidate from a cluster
83
  (including removing it's public key from all nodes' ``authorized_keys``
84
  file) without having to regenerate and distribute new ssh keys for all
85
  master candidates. (Even though it is be good practice to do that anyway,
86
  since the compromising of the other master candidates might have taken
87
  place already.)
88
- If a (uncompromised) master candidate is offlined to be sent for
89
  repair due to a hardware failure before Ganeti can remove any keys
90
  from it (for example when the network adapter of the machine is broken),
91
  we don't have to worry about the keys being on a machine that is
92
  physically accessible.
93

  
94
To ensure security while transferring public key information and
95
updating the ``authorized_keys``, there are several other changes
96
necessary:
97

  
98
- Any distribution of keys (in this case only public keys) is done via
99
  SSH and not via RPC. An attacker who has RPC control should not be
100
  able to get SSH access where he did not have SSH access before
101
  already.
102
- The only RPC calls that are made in this context are from the master
103
  daemon to the node daemon on its own host and noded ensures as much
104
  as possible that the change to be made does not harm the cluster's
105
  security boundary.
106
- The nodes that are potential master candidates keep a list of public
107
  keys of potential master candidates of the cluster in a separate
108
  file called ``ganeti_pub_keys`` to keep track of which keys could
109
  possibly be added ``authorized_keys`` files of the nodes. We come
110
  to what "potential" means in this case in the next section. The key
111
  list is only transferred via SSH or written directly by noded. It
112
  is not stored in the cluster config, because the config is
113
  distributed via RPC.
114

  
115
The following sections describe in detail which Ganeti commands are
116
affected by the proposed changes.
117

  
118

  
119
RAPI
120
~~~~
121

  
122
The design goal to limit SSH powers to master candidates conflicts with
123
the current powers a user of the RAPI interface would have. The
124
``master_capable`` flag of nodes can be modified via RAPI.
125
That means, an attacker that has access to the RAPI interface, can make
126
all non-master-capable nodes master-capable, and then increase the master
127
candidate pool size till all machines are master candidates (or at least
128
a particular machine that he is aming for). This means that with RAPI
129
access and a compromised normal node, one can make this node a master
130
candidate and then still have the power to compromise the whole cluster.
131

  
132
To mitigate this issue, we propose the following changes:
133
- Add a flag to the cluster configuration
134
  ``master_capability_rapi_modifiable`` which indicates whether or
135
  not it should be possible to modify the ``master_capable`` flag of
136
  nodes via RAPI. The flag is set to ``False`` by default and can
137
  itself only be changed on the commandline. In this design doc, we
138
  refer to the flag as the "rapi flag" from here on.
139
- Only if the ``master_capabability_rapi_modifiable`` switch is set to
140
  ``True``, it is possible to modify the master-capability flag of
141
  nodes.
142

  
143
With this setup, there are the following definitions of "potential
144
master candidates" depending on the rapi flag:
145
- If the rapi flag is set to ``True``, all cluster nodes are potential
146
  master candidates, because as described above, all of them can
147
  eventually be made master candidates via RAPI and thus security-wise,
148
  we haven't won anything above the current SSH handling.
149
- If the rapi flag is set to ``False``, only the master capable nodes
150
  are considered potential master candidates, as it is not possible to
151
  make them master candidates via RAPI at all.
152

  
153
Note that when the rapi flag is changed, the state of the
154
``ganeti_pub_keys`` file on all nodes  has to be updated accordingly.
155
This should be done in the client script ``gnt_cluster`` before the
156
RPC call to update the configuration is made, because this way, if
157
someone would try to perform that RPC call on master to trick it into
158
thinking that the flag is enabled, this would not help as the content of
159
the ``ganeti_pub_keys`` file is a crucial part in the design of the
160
distribution of the SSH keys.
161

  
162
Note: One could think of always allowing to disable the master-capability
163
via RAPI and just restrict the enabling of it, thus making it possible
164
to RAPI-"freeze" the nodes' master-capability state once it disabled.
165
However, we think these are rather confusing semantics of the involved
166
flags and thus we go with proposed design.
167

  
168
Note that this change will break RAPI compatibility, at least if the
169
rapi flag is not explicitely set to ``True``. We made this choice to
170
have the more secure option as default, because otherwise it is
171
unlikely to be widely used.
72 172

  
73
We propose two improvements regarding the ssh keys:
74 173

  
75
#. Limit the distribution of the private ssh key to the master candidates.
76

  
77
#. Use different ssh key pairs for each master candidate.
78

  
79
We propose to limit the set of nodes holding the private root user SSH key
80
to the master and the master candidates. This way, the security risk would
81
be limited to a rather small set of nodes even though the cluster could
82
consists of a lot more nodes. The set of master candidates could be protected
83
better than the normal nodes (for example residing in a DMZ) to enhance
84
security even more if the administrator wishes so. The following
85
sections describe in detail which Ganeti commands are affected by this
86
change and in what way.
87

  
88
Security will be even more increased if each master candidate gets
89
its own ssh private/public key pair. This way, one can remove a
90
compromised master candidate from a cluster (including removing it's
91
public key from all nodes' ``authorized_keys`` file) without having to
92
regenerate and distribute new ssh keys for all master candidates. (Even
93
though it is be good practice to do that anyway, since the compromising
94
of the other master candidates might have taken place already.) However,
95
this improvement was not part of the original feature request and
96
increases the complexity of node management even more. We therefore
97
consider it as second step in this design and will address
98
this after the other parts of this design are implemented.
174
Cluster initialization
175
~~~~~~~~~~~~~~~~~~~~~~
99 176

  
100
The following sections describe in detail which Ganeti commands are affected
101
by the first part of ssh-related improvements, limiting the key
102
distribution to master candidates only.
177
On cluster initialization, the following steps are taken in
178
bootstrap.py:
179
- A public/private key pair is generated (as before), but only used
180
  by the first (and thus master) node. In particular, the private key
181
  never leaves the node.
182
- A mapping of node UUIDs to public SSH keys is created and stored
183
  as text file in ``/var/lib/ganeti/ganeti_pub_keys`` only accessible
184
  by root (permissions 0600). The master node's uuid and its public
185
  key is added as first entry. The format of the file is one
186
  line per node, each line composed as ``node_uuid ssh_key``.
187
- The node's public key is added to it's own ``authorized_keys`` file.
103 188

  
104 189

  
105 190
(Re-)Adding nodes to a cluster
......
108 193
According to :doc:`design-node-add`, Ganeti transfers the ssh keys to
109 194
every node that gets added to the cluster.
110 195

  
111
We propose to change this procedure to treat master candidates and normal
112
nodes differently. For master candidates, the procedure would stay as is.
113
For normal nodes, Ganeti would transfer the public and private ssh host
114
keys (as before) and only the public root key.
115

  
116
A normal node would not be able to connect via ssh to other nodes, but
117
the master (and potentially master candidates) can connect to this node.
196
Adding a new node will require the following steps.
197

  
198
In gnt_node.py:
199
- On the new node, a new public/private SSH key pair is generated.
200
- The public key of the new node is fetched (via SSH) to the master
201
  node and if it is a potential master candidate (see definition above),
202
  it is added to the ``ganeti_pub_keys`` list on the master node.
203
- The public keys of all current master candidates are added to the
204
  new node's ``authorized_keys`` file (also via SSH).
205

  
206
In LUNodeAdd in cmdlib/node.py:
207
- The LUNodeAdd determines whether or not the new node is a master
208
  candidate and in any case updates the cluster's configuration with the
209
  new nodes information. (This is not changed by the proposed design.)
210
- If the new node is a master candidate, we make an RPC call to the node
211
  daemon of the master node to add the new node's public key to all
212
  nodes' ``authorized_keys`` files. The implementation of this RPC call
213
  has to be extra careful as described in the next steps, because
214
  compromised RPC security should not compromise SSH security.
215

  
216
RPC call execution in noded (on master node):
217
- Check that the public key of the new node is in the
218
  ``ganeti_pub_keys`` file of the master node to make sure that no keys
219
  of nodes outside the Ganeti cluster and no keys that are not potential
220
  master candidates gain SSH access in the cluster.
221
- Via SSH, transfer the new node's public key to all nodes (including
222
  the new node) and add it to their ``authorized_keys`` file.
223
- The ``ganeti_pub_keys`` file is transferred via SSH to all
224
  potential master candidates nodes except the master node
225
  (including the new one).
118 226

  
119 227
In case of readding a node that used to be in the cluster before,
120
handling of the ssh keys would basically be the same with the following
121
additional modifications: if the node used to be a master or
122
master-candidate node, but will be a normal node after readding, Ganeti
123
should make sure that the private root key is deleted if it is still
124
present on the node.
228
handling of the SSH keys would basically be the same, in particular also
229
a new SSH key pair is generated for the node, because we cannot be sure
230
that the old key pair has not been compromised while the node was
231
offlined.
125 232

  
126 233

  
127 234
Pro- and demoting a node to/from master candidate
128 235
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
129 236

  
130
If the role of a node is changed from 'normal' to 'master_candidate', the
131
master node should at that point copy the private root ssh key. When demoting
132
a node from master candidate to a normal node, the key that have been copied
133
there on promotion or addition should be removed again.
237
If the role of a node is changed from 'normal' to 'master_candidate',
238
the procedure is the same as for adding nodes from the step "In
239
LUNodeAdd ..." on.
240

  
241
If a node gets demoted to 'normal', the master daemon makes a similar
242
RPC call to the master node's node daemon as for adding a node.
243

  
244
In the RPC call, noded will perform the following steps:
245
- Check that the public key of the node to be demoted is indeed in the
246
  ``ganeti_pub_keys`` file to avoid deleting ssh keys of machines that
247
  don't belong to the cluster (and thus potentially lock out the
248
  administrator).
249
- Via SSH, remove the key from all node's ``authorized_keys`` files.
134 250

  
135 251
This affected the behavior of the following commands:
136 252

  
......
176 292
Cluster verify
177 293
~~~~~~~~~~~~~~
178 294

  
179
To make sure the private root ssh key was not distributed to a normal
180
node, 'gnt-cluster verify' will be extended by a check for the key
181
on normal nodes. Additionally, it will check if the private key is
182
indeed present on master candidates.
295
So far, 'gnt-cluster verify' checks the SSH connectivity of all nodes to
296
all other nodes. We propose to replace this by the following checks:
297

  
298
- For all master candidates, we check if they can connect any other node
299
  in the cluster (other master candidates and normal nodes).
300
- We check if the ``ganeti_pub_keys`` file contains keys of nodes that
301
  are no longer in the cluster or that are not potential master
302
  candidates.
303
- For all normal nodes, we check if their key does not appear in other
304
  node's ``authorized_keys``. For now, we will only emit a warning
305
  rather than an error if this check fails, because Ganeti might be
306
  run in a setup where Ganeti is not the only system manipulating the
307
  SSH keys.
308

  
309

  
310
Upgrades
311
~~~~~~~~
312

  
313
When upgrading from a version that has the previous SSH setup to the one
314
proposed in this design, the upgrade procedure has to involve the
315
following steps in the post-upgrade hook:
316
- For all nodes, new SSH key pairs are generated.
317
- All nodes and their public keys are added to the ``ganeti_pub_keys``
318
  file and the file is copied to all nodes.
319
- All keys of master candidate nodes are added to the
320
  ``authorized_keys`` files of all other nodes.
321

  
322
Since this upgrade significantly changes the configuration of the
323
clusters' nodes, we will add a note to the UPGRADE notes to make the
324
administrator aware of this fact (in case he intends to enable access
325
from normal nodes to master candidates for other reasons than Ganeti
326
uses the machines).
327

  
328
Also, in any operation where Ganeti creates new SSH keys, the old keys
329
will be backed up and not simply overridden.
330

  
331

  
332
Downgrades
333
~~~~~~~~~~
334

  
335
These downgrading steps will be implemtented from 2.12 to 2.11:
336
- The master node's private/public key pair will be distributed to all
337
  nodes (via SSH) and the individual SSH keys will be backed up.
338
- The obsolete individual ssh keys will be removed from all nodes'
339
  ``authorized_keys`` file.
340

  
341

  
342
Renew-Crypto
343
~~~~~~~~~~~~
183 344

  
345
The ``gnt-cluster renew-crypto`` command is not affected by the proposed
346
changes related to SSH.
184 347

  
185 348

  
186 349
Proposal regarding node daemon certificates

Also available in: Unified diff