Revision b123fb31 doc/design-node-security.rst
b/doc/design-node-security.rst | ||
---|---|---|
67 | 67 |
in the virtualization stacks to gain access to the host machines as well. |
68 | 68 |
|
69 | 69 |
|
70 |
Proposal concerning SSH key distribution |
|
71 |
---------------------------------------- |
|
70 |
Proposal concerning SSH host key distribution |
|
71 |
--------------------------------------------- |
|
72 |
|
|
73 |
We propose the following design regarding the SSH host key handling. The |
|
74 |
root keys are untouched by this design. |
|
75 |
|
|
76 |
Each node gets its own ssh private/public key pair, but only the public |
|
77 |
keys of the master candidates get added to the ``authorized_keys`` file |
|
78 |
of all nodes. This has the advantages, that: |
|
79 |
|
|
80 |
- Only master candidates can ssh into other nodes, thus compromised |
|
81 |
nodes cannot compromise the cluster further. |
|
82 |
- One can remove a compromised master candidate from a cluster |
|
83 |
(including removing it's public key from all nodes' ``authorized_keys`` |
|
84 |
file) without having to regenerate and distribute new ssh keys for all |
|
85 |
master candidates. (Even though it is be good practice to do that anyway, |
|
86 |
since the compromising of the other master candidates might have taken |
|
87 |
place already.) |
|
88 |
- If a (uncompromised) master candidate is offlined to be sent for |
|
89 |
repair due to a hardware failure before Ganeti can remove any keys |
|
90 |
from it (for example when the network adapter of the machine is broken), |
|
91 |
we don't have to worry about the keys being on a machine that is |
|
92 |
physically accessible. |
|
93 |
|
|
94 |
To ensure security while transferring public key information and |
|
95 |
updating the ``authorized_keys``, there are several other changes |
|
96 |
necessary: |
|
97 |
|
|
98 |
- Any distribution of keys (in this case only public keys) is done via |
|
99 |
SSH and not via RPC. An attacker who has RPC control should not be |
|
100 |
able to get SSH access where he did not have SSH access before |
|
101 |
already. |
|
102 |
- The only RPC calls that are made in this context are from the master |
|
103 |
daemon to the node daemon on its own host and noded ensures as much |
|
104 |
as possible that the change to be made does not harm the cluster's |
|
105 |
security boundary. |
|
106 |
- The nodes that are potential master candidates keep a list of public |
|
107 |
keys of potential master candidates of the cluster in a separate |
|
108 |
file called ``ganeti_pub_keys`` to keep track of which keys could |
|
109 |
possibly be added ``authorized_keys`` files of the nodes. We come |
|
110 |
to what "potential" means in this case in the next section. The key |
|
111 |
list is only transferred via SSH or written directly by noded. It |
|
112 |
is not stored in the cluster config, because the config is |
|
113 |
distributed via RPC. |
|
114 |
|
|
115 |
The following sections describe in detail which Ganeti commands are |
|
116 |
affected by the proposed changes. |
|
117 |
|
|
118 |
|
|
119 |
RAPI |
|
120 |
~~~~ |
|
121 |
|
|
122 |
The design goal to limit SSH powers to master candidates conflicts with |
|
123 |
the current powers a user of the RAPI interface would have. The |
|
124 |
``master_capable`` flag of nodes can be modified via RAPI. |
|
125 |
That means, an attacker that has access to the RAPI interface, can make |
|
126 |
all non-master-capable nodes master-capable, and then increase the master |
|
127 |
candidate pool size till all machines are master candidates (or at least |
|
128 |
a particular machine that he is aming for). This means that with RAPI |
|
129 |
access and a compromised normal node, one can make this node a master |
|
130 |
candidate and then still have the power to compromise the whole cluster. |
|
131 |
|
|
132 |
To mitigate this issue, we propose the following changes: |
|
133 |
- Add a flag to the cluster configuration |
|
134 |
``master_capability_rapi_modifiable`` which indicates whether or |
|
135 |
not it should be possible to modify the ``master_capable`` flag of |
|
136 |
nodes via RAPI. The flag is set to ``False`` by default and can |
|
137 |
itself only be changed on the commandline. In this design doc, we |
|
138 |
refer to the flag as the "rapi flag" from here on. |
|
139 |
- Only if the ``master_capabability_rapi_modifiable`` switch is set to |
|
140 |
``True``, it is possible to modify the master-capability flag of |
|
141 |
nodes. |
|
142 |
|
|
143 |
With this setup, there are the following definitions of "potential |
|
144 |
master candidates" depending on the rapi flag: |
|
145 |
- If the rapi flag is set to ``True``, all cluster nodes are potential |
|
146 |
master candidates, because as described above, all of them can |
|
147 |
eventually be made master candidates via RAPI and thus security-wise, |
|
148 |
we haven't won anything above the current SSH handling. |
|
149 |
- If the rapi flag is set to ``False``, only the master capable nodes |
|
150 |
are considered potential master candidates, as it is not possible to |
|
151 |
make them master candidates via RAPI at all. |
|
152 |
|
|
153 |
Note that when the rapi flag is changed, the state of the |
|
154 |
``ganeti_pub_keys`` file on all nodes has to be updated accordingly. |
|
155 |
This should be done in the client script ``gnt_cluster`` before the |
|
156 |
RPC call to update the configuration is made, because this way, if |
|
157 |
someone would try to perform that RPC call on master to trick it into |
|
158 |
thinking that the flag is enabled, this would not help as the content of |
|
159 |
the ``ganeti_pub_keys`` file is a crucial part in the design of the |
|
160 |
distribution of the SSH keys. |
|
161 |
|
|
162 |
Note: One could think of always allowing to disable the master-capability |
|
163 |
via RAPI and just restrict the enabling of it, thus making it possible |
|
164 |
to RAPI-"freeze" the nodes' master-capability state once it disabled. |
|
165 |
However, we think these are rather confusing semantics of the involved |
|
166 |
flags and thus we go with proposed design. |
|
167 |
|
|
168 |
Note that this change will break RAPI compatibility, at least if the |
|
169 |
rapi flag is not explicitely set to ``True``. We made this choice to |
|
170 |
have the more secure option as default, because otherwise it is |
|
171 |
unlikely to be widely used. |
|
72 | 172 |
|
73 |
We propose two improvements regarding the ssh keys: |
|
74 | 173 |
|
75 |
#. Limit the distribution of the private ssh key to the master candidates. |
|
76 |
|
|
77 |
#. Use different ssh key pairs for each master candidate. |
|
78 |
|
|
79 |
We propose to limit the set of nodes holding the private root user SSH key |
|
80 |
to the master and the master candidates. This way, the security risk would |
|
81 |
be limited to a rather small set of nodes even though the cluster could |
|
82 |
consists of a lot more nodes. The set of master candidates could be protected |
|
83 |
better than the normal nodes (for example residing in a DMZ) to enhance |
|
84 |
security even more if the administrator wishes so. The following |
|
85 |
sections describe in detail which Ganeti commands are affected by this |
|
86 |
change and in what way. |
|
87 |
|
|
88 |
Security will be even more increased if each master candidate gets |
|
89 |
its own ssh private/public key pair. This way, one can remove a |
|
90 |
compromised master candidate from a cluster (including removing it's |
|
91 |
public key from all nodes' ``authorized_keys`` file) without having to |
|
92 |
regenerate and distribute new ssh keys for all master candidates. (Even |
|
93 |
though it is be good practice to do that anyway, since the compromising |
|
94 |
of the other master candidates might have taken place already.) However, |
|
95 |
this improvement was not part of the original feature request and |
|
96 |
increases the complexity of node management even more. We therefore |
|
97 |
consider it as second step in this design and will address |
|
98 |
this after the other parts of this design are implemented. |
|
174 |
Cluster initialization |
|
175 |
~~~~~~~~~~~~~~~~~~~~~~ |
|
99 | 176 |
|
100 |
The following sections describe in detail which Ganeti commands are affected |
|
101 |
by the first part of ssh-related improvements, limiting the key |
|
102 |
distribution to master candidates only. |
|
177 |
On cluster initialization, the following steps are taken in |
|
178 |
bootstrap.py: |
|
179 |
- A public/private key pair is generated (as before), but only used |
|
180 |
by the first (and thus master) node. In particular, the private key |
|
181 |
never leaves the node. |
|
182 |
- A mapping of node UUIDs to public SSH keys is created and stored |
|
183 |
as text file in ``/var/lib/ganeti/ganeti_pub_keys`` only accessible |
|
184 |
by root (permissions 0600). The master node's uuid and its public |
|
185 |
key is added as first entry. The format of the file is one |
|
186 |
line per node, each line composed as ``node_uuid ssh_key``. |
|
187 |
- The node's public key is added to it's own ``authorized_keys`` file. |
|
103 | 188 |
|
104 | 189 |
|
105 | 190 |
(Re-)Adding nodes to a cluster |
... | ... | |
108 | 193 |
According to :doc:`design-node-add`, Ganeti transfers the ssh keys to |
109 | 194 |
every node that gets added to the cluster. |
110 | 195 |
|
111 |
We propose to change this procedure to treat master candidates and normal |
|
112 |
nodes differently. For master candidates, the procedure would stay as is. |
|
113 |
For normal nodes, Ganeti would transfer the public and private ssh host |
|
114 |
keys (as before) and only the public root key. |
|
115 |
|
|
116 |
A normal node would not be able to connect via ssh to other nodes, but |
|
117 |
the master (and potentially master candidates) can connect to this node. |
|
196 |
Adding a new node will require the following steps. |
|
197 |
|
|
198 |
In gnt_node.py: |
|
199 |
- On the new node, a new public/private SSH key pair is generated. |
|
200 |
- The public key of the new node is fetched (via SSH) to the master |
|
201 |
node and if it is a potential master candidate (see definition above), |
|
202 |
it is added to the ``ganeti_pub_keys`` list on the master node. |
|
203 |
- The public keys of all current master candidates are added to the |
|
204 |
new node's ``authorized_keys`` file (also via SSH). |
|
205 |
|
|
206 |
In LUNodeAdd in cmdlib/node.py: |
|
207 |
- The LUNodeAdd determines whether or not the new node is a master |
|
208 |
candidate and in any case updates the cluster's configuration with the |
|
209 |
new nodes information. (This is not changed by the proposed design.) |
|
210 |
- If the new node is a master candidate, we make an RPC call to the node |
|
211 |
daemon of the master node to add the new node's public key to all |
|
212 |
nodes' ``authorized_keys`` files. The implementation of this RPC call |
|
213 |
has to be extra careful as described in the next steps, because |
|
214 |
compromised RPC security should not compromise SSH security. |
|
215 |
|
|
216 |
RPC call execution in noded (on master node): |
|
217 |
- Check that the public key of the new node is in the |
|
218 |
``ganeti_pub_keys`` file of the master node to make sure that no keys |
|
219 |
of nodes outside the Ganeti cluster and no keys that are not potential |
|
220 |
master candidates gain SSH access in the cluster. |
|
221 |
- Via SSH, transfer the new node's public key to all nodes (including |
|
222 |
the new node) and add it to their ``authorized_keys`` file. |
|
223 |
- The ``ganeti_pub_keys`` file is transferred via SSH to all |
|
224 |
potential master candidates nodes except the master node |
|
225 |
(including the new one). |
|
118 | 226 |
|
119 | 227 |
In case of readding a node that used to be in the cluster before, |
120 |
handling of the ssh keys would basically be the same with the following |
|
121 |
additional modifications: if the node used to be a master or |
|
122 |
master-candidate node, but will be a normal node after readding, Ganeti |
|
123 |
should make sure that the private root key is deleted if it is still |
|
124 |
present on the node. |
|
228 |
handling of the SSH keys would basically be the same, in particular also |
|
229 |
a new SSH key pair is generated for the node, because we cannot be sure |
|
230 |
that the old key pair has not been compromised while the node was |
|
231 |
offlined. |
|
125 | 232 |
|
126 | 233 |
|
127 | 234 |
Pro- and demoting a node to/from master candidate |
128 | 235 |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
129 | 236 |
|
130 |
If the role of a node is changed from 'normal' to 'master_candidate', the |
|
131 |
master node should at that point copy the private root ssh key. When demoting |
|
132 |
a node from master candidate to a normal node, the key that have been copied |
|
133 |
there on promotion or addition should be removed again. |
|
237 |
If the role of a node is changed from 'normal' to 'master_candidate', |
|
238 |
the procedure is the same as for adding nodes from the step "In |
|
239 |
LUNodeAdd ..." on. |
|
240 |
|
|
241 |
If a node gets demoted to 'normal', the master daemon makes a similar |
|
242 |
RPC call to the master node's node daemon as for adding a node. |
|
243 |
|
|
244 |
In the RPC call, noded will perform the following steps: |
|
245 |
- Check that the public key of the node to be demoted is indeed in the |
|
246 |
``ganeti_pub_keys`` file to avoid deleting ssh keys of machines that |
|
247 |
don't belong to the cluster (and thus potentially lock out the |
|
248 |
administrator). |
|
249 |
- Via SSH, remove the key from all node's ``authorized_keys`` files. |
|
134 | 250 |
|
135 | 251 |
This affected the behavior of the following commands: |
136 | 252 |
|
... | ... | |
176 | 292 |
Cluster verify |
177 | 293 |
~~~~~~~~~~~~~~ |
178 | 294 |
|
179 |
To make sure the private root ssh key was not distributed to a normal |
|
180 |
node, 'gnt-cluster verify' will be extended by a check for the key |
|
181 |
on normal nodes. Additionally, it will check if the private key is |
|
182 |
indeed present on master candidates. |
|
295 |
So far, 'gnt-cluster verify' checks the SSH connectivity of all nodes to |
|
296 |
all other nodes. We propose to replace this by the following checks: |
|
297 |
|
|
298 |
- For all master candidates, we check if they can connect any other node |
|
299 |
in the cluster (other master candidates and normal nodes). |
|
300 |
- We check if the ``ganeti_pub_keys`` file contains keys of nodes that |
|
301 |
are no longer in the cluster or that are not potential master |
|
302 |
candidates. |
|
303 |
- For all normal nodes, we check if their key does not appear in other |
|
304 |
node's ``authorized_keys``. For now, we will only emit a warning |
|
305 |
rather than an error if this check fails, because Ganeti might be |
|
306 |
run in a setup where Ganeti is not the only system manipulating the |
|
307 |
SSH keys. |
|
308 |
|
|
309 |
|
|
310 |
Upgrades |
|
311 |
~~~~~~~~ |
|
312 |
|
|
313 |
When upgrading from a version that has the previous SSH setup to the one |
|
314 |
proposed in this design, the upgrade procedure has to involve the |
|
315 |
following steps in the post-upgrade hook: |
|
316 |
- For all nodes, new SSH key pairs are generated. |
|
317 |
- All nodes and their public keys are added to the ``ganeti_pub_keys`` |
|
318 |
file and the file is copied to all nodes. |
|
319 |
- All keys of master candidate nodes are added to the |
|
320 |
``authorized_keys`` files of all other nodes. |
|
321 |
|
|
322 |
Since this upgrade significantly changes the configuration of the |
|
323 |
clusters' nodes, we will add a note to the UPGRADE notes to make the |
|
324 |
administrator aware of this fact (in case he intends to enable access |
|
325 |
from normal nodes to master candidates for other reasons than Ganeti |
|
326 |
uses the machines). |
|
327 |
|
|
328 |
Also, in any operation where Ganeti creates new SSH keys, the old keys |
|
329 |
will be backed up and not simply overridden. |
|
330 |
|
|
331 |
|
|
332 |
Downgrades |
|
333 |
~~~~~~~~~~ |
|
334 |
|
|
335 |
These downgrading steps will be implemtented from 2.12 to 2.11: |
|
336 |
- The master node's private/public key pair will be distributed to all |
|
337 |
nodes (via SSH) and the individual SSH keys will be backed up. |
|
338 |
- The obsolete individual ssh keys will be removed from all nodes' |
|
339 |
``authorized_keys`` file. |
|
340 |
|
|
341 |
|
|
342 |
Renew-Crypto |
|
343 |
~~~~~~~~~~~~ |
|
183 | 344 |
|
345 |
The ``gnt-cluster renew-crypto`` command is not affected by the proposed |
|
346 |
changes related to SSH. |
|
184 | 347 |
|
185 | 348 |
|
186 | 349 |
Proposal regarding node daemon certificates |
Also available in: Unified diff