root / doc / design-node-security.rst @ 925ad2e1
History | View | Annotate | Download (27.8 kB)
1 |
============================= |
---|---|
2 |
Improvements of Node Security |
3 |
============================= |
4 |
|
5 |
This document describes an enhancement of Ganeti's security by restricting |
6 |
the distribution of security-sensitive data to the master and master |
7 |
candidates only. |
8 |
|
9 |
Note: In this document, we will use the term 'normal node' for a node that |
10 |
is neither master nor master-candidate. |
11 |
|
12 |
.. contents:: :depth: 4 |
13 |
|
14 |
Objective |
15 |
========= |
16 |
|
17 |
Up till 2.10, Ganeti distributes security-relevant keys to all nodes, |
18 |
including nodes that are neither master nor master-candidates. Those |
19 |
keys are the private and public SSH keys for node communication and the |
20 |
SSL certficate and private key for RPC communication. Objective of this |
21 |
design is to limit the set of nodes that can establish ssh and RPC |
22 |
connections to the master and master candidates. |
23 |
|
24 |
As pointed out in |
25 |
`issue 377 <https://code.google.com/p/ganeti/issues/detail?id=377>`_, this |
26 |
is a security risk. Since all nodes have these keys, compromising |
27 |
any of those nodes would possibly give an attacker access to all other |
28 |
machines in the cluster. Reducing the set of nodes that are able to |
29 |
make ssh and RPC connections to the master and master candidates would |
30 |
significantly reduce the risk simply because fewer machines would be a |
31 |
valuable target for attackers. |
32 |
|
33 |
Note: For bigger installations of Ganeti, it is advisable to run master |
34 |
candidate nodes as non-vm-capable nodes. This would reduce the attack |
35 |
surface for the hypervisor exploitation. |
36 |
|
37 |
|
38 |
Detailed design |
39 |
=============== |
40 |
|
41 |
|
42 |
Current state and shortcomings |
43 |
------------------------------ |
44 |
|
45 |
Currently (as of 2.10), all nodes hold the following information: |
46 |
|
47 |
- the ssh host keys (public and private) |
48 |
- the ssh root keys (public and private) |
49 |
- node daemon certificate (the SSL client certificate and its |
50 |
corresponding private key) |
51 |
|
52 |
Concerning ssh, this setup contains the following security issue. Since |
53 |
all nodes of a cluster can ssh as root into any other cluster node, one |
54 |
compromised node can harm all other nodes of a cluster. |
55 |
|
56 |
Regarding the SSL encryption of the RPC communication with the node |
57 |
daemon, we currently have the following setup. There is only one |
58 |
certificate which is used as both, client and server certificate. Besides |
59 |
the SSL client verification, we check if the used client certificate is |
60 |
the same as the certificate stored on the server. |
61 |
|
62 |
This means that any node running a node daemon can also act as an RPC |
63 |
client and use it to issue RPC calls to other cluster nodes. This in |
64 |
turn means that any compromised node could be used to make RPC calls to |
65 |
any node (including itself) to gain full control over VMs. This could |
66 |
be used by an attacker to for example bring down the VMs or exploit bugs |
67 |
in the virtualization stacks to gain access to the host machines as well. |
68 |
|
69 |
|
70 |
Proposal concerning SSH host key distribution |
71 |
--------------------------------------------- |
72 |
|
73 |
We propose the following design regarding the SSH host key handling. The |
74 |
root keys are untouched by this design. |
75 |
|
76 |
Each node gets its own ssh private/public key pair, but only the public |
77 |
keys of the master candidates get added to the ``authorized_keys`` file |
78 |
of all nodes. This has the advantages, that: |
79 |
|
80 |
- Only master candidates can ssh into other nodes, thus compromised |
81 |
nodes cannot compromise the cluster further. |
82 |
- One can remove a compromised master candidate from a cluster |
83 |
(including removing it's public key from all nodes' ``authorized_keys`` |
84 |
file) without having to regenerate and distribute new ssh keys for all |
85 |
master candidates. (Even though it is be good practice to do that anyway, |
86 |
since the compromising of the other master candidates might have taken |
87 |
place already.) |
88 |
- If a (uncompromised) master candidate is offlined to be sent for |
89 |
repair due to a hardware failure before Ganeti can remove any keys |
90 |
from it (for example when the network adapter of the machine is broken), |
91 |
we don't have to worry about the keys being on a machine that is |
92 |
physically accessible. |
93 |
|
94 |
To ensure security while transferring public key information and |
95 |
updating the ``authorized_keys``, there are several other changes |
96 |
necessary: |
97 |
|
98 |
- Any distribution of keys (in this case only public keys) is done via |
99 |
SSH and not via RPC. An attacker who has RPC control should not be |
100 |
able to get SSH access where he did not have SSH access before |
101 |
already. |
102 |
- The only RPC calls that are made in this context are from the master |
103 |
daemon to the node daemon on its own host and noded ensures as much |
104 |
as possible that the change to be made does not harm the cluster's |
105 |
security boundary. |
106 |
- The nodes that are potential master candidates keep a list of public |
107 |
keys of potential master candidates of the cluster in a separate |
108 |
file called ``ganeti_pub_keys`` to keep track of which keys could |
109 |
possibly be added ``authorized_keys`` files of the nodes. We come |
110 |
to what "potential" means in this case in the next section. The key |
111 |
list is only transferred via SSH or written directly by noded. It |
112 |
is not stored in the cluster config, because the config is |
113 |
distributed via RPC. |
114 |
|
115 |
The following sections describe in detail which Ganeti commands are |
116 |
affected by the proposed changes. |
117 |
|
118 |
|
119 |
RAPI |
120 |
~~~~ |
121 |
|
122 |
The design goal to limit SSH powers to master candidates conflicts with |
123 |
the current powers a user of the RAPI interface would have. The |
124 |
``master_capable`` flag of nodes can be modified via RAPI. |
125 |
That means, an attacker that has access to the RAPI interface, can make |
126 |
all non-master-capable nodes master-capable, and then increase the master |
127 |
candidate pool size till all machines are master candidates (or at least |
128 |
a particular machine that he is aming for). This means that with RAPI |
129 |
access and a compromised normal node, one can make this node a master |
130 |
candidate and then still have the power to compromise the whole cluster. |
131 |
|
132 |
To mitigate this issue, we propose the following changes: |
133 |
|
134 |
- Add a flag ``master_capability_rapi_modifiable`` to the cluster |
135 |
configuration which indicates whether or not it should be possible |
136 |
to modify the ``master_capable`` flag of nodes via RAPI. The flag is |
137 |
set to ``False`` by default and can itself only be changed on the |
138 |
commandline. In this design doc, we refer to the flag as the |
139 |
"rapi flag" from here on. |
140 |
- Only if the ``master_capabability_rapi_modifiable`` switch is set to |
141 |
``True``, it is possible to modify the master-capability flag of |
142 |
nodes. |
143 |
|
144 |
With this setup, there are the following definitions of "potential |
145 |
master candidates" depending on the rapi flag: |
146 |
|
147 |
- If the rapi flag is set to ``True``, all cluster nodes are potential |
148 |
master candidates, because as described above, all of them can |
149 |
eventually be made master candidates via RAPI and thus security-wise, |
150 |
we haven't won anything above the current SSH handling. |
151 |
- If the rapi flag is set to ``False``, only the master capable nodes |
152 |
are considered potential master candidates, as it is not possible to |
153 |
make them master candidates via RAPI at all. |
154 |
|
155 |
Note that when the rapi flag is changed, the state of the |
156 |
``ganeti_pub_keys`` file on all nodes has to be updated accordingly. |
157 |
This should be done in the client script ``gnt_cluster`` before the |
158 |
RPC call to update the configuration is made, because this way, if |
159 |
someone would try to perform that RPC call on master to trick it into |
160 |
thinking that the flag is enabled, this would not help as the content of |
161 |
the ``ganeti_pub_keys`` file is a crucial part in the design of the |
162 |
distribution of the SSH keys. |
163 |
|
164 |
Note: One could think of always allowing to disable the master-capability |
165 |
via RAPI and just restrict the enabling of it, thus making it possible |
166 |
to RAPI-"freeze" the nodes' master-capability state once it disabled. |
167 |
However, we think these are rather confusing semantics of the involved |
168 |
flags and thus we go with proposed design. |
169 |
|
170 |
Note that this change will break RAPI compatibility, at least if the |
171 |
rapi flag is not explicitely set to ``True``. We made this choice to |
172 |
have the more secure option as default, because otherwise it is |
173 |
unlikely to be widely used. |
174 |
|
175 |
|
176 |
Cluster initialization |
177 |
~~~~~~~~~~~~~~~~~~~~~~ |
178 |
|
179 |
On cluster initialization, the following steps are taken in |
180 |
bootstrap.py: |
181 |
|
182 |
- A public/private key pair is generated (as before), but only used |
183 |
by the first (and thus master) node. In particular, the private key |
184 |
never leaves the node. |
185 |
- A mapping of node UUIDs to public SSH keys is created and stored |
186 |
as text file in ``/var/lib/ganeti/ganeti_pub_keys`` only accessible |
187 |
by root (permissions 0600). The master node's uuid and its public |
188 |
key is added as first entry. The format of the file is one |
189 |
line per node, each line composed as ``node_uuid ssh_key``. |
190 |
- The node's public key is added to it's own ``authorized_keys`` file. |
191 |
|
192 |
|
193 |
(Re-)Adding nodes to a cluster |
194 |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
195 |
|
196 |
According to :doc:`design-node-add`, Ganeti transfers the ssh keys to |
197 |
every node that gets added to the cluster. |
198 |
|
199 |
Adding a new node will require the following steps. |
200 |
|
201 |
In gnt_node.py: |
202 |
|
203 |
- On the new node, a new public/private SSH key pair is generated. |
204 |
- The public key of the new node is fetched (via SSH) to the master |
205 |
node and if it is a potential master candidate (see definition above), |
206 |
it is added to the ``ganeti_pub_keys`` list on the master node. |
207 |
- The public keys of all current master candidates are added to the |
208 |
new node's ``authorized_keys`` file (also via SSH). |
209 |
|
210 |
In LUNodeAdd in cmdlib/node.py: |
211 |
|
212 |
- The LUNodeAdd determines whether or not the new node is a master |
213 |
candidate and in any case updates the cluster's configuration with the |
214 |
new nodes information. (This is not changed by the proposed design.) |
215 |
- If the new node is a master candidate, we make an RPC call to the node |
216 |
daemon of the master node to add the new node's public key to all |
217 |
nodes' ``authorized_keys`` files. The implementation of this RPC call |
218 |
has to be extra careful as described in the next steps, because |
219 |
compromised RPC security should not compromise SSH security. |
220 |
|
221 |
RPC call execution in noded (on master node): |
222 |
|
223 |
- Check that the public key of the new node is in the |
224 |
``ganeti_pub_keys`` file of the master node to make sure that no keys |
225 |
of nodes outside the Ganeti cluster and no keys that are not potential |
226 |
master candidates gain SSH access in the cluster. |
227 |
- Via SSH, transfer the new node's public key to all nodes (including |
228 |
the new node) and add it to their ``authorized_keys`` file. |
229 |
- The ``ganeti_pub_keys`` file is transferred via SSH to all |
230 |
potential master candidates nodes except the master node |
231 |
(including the new one). |
232 |
|
233 |
In case of readding a node that used to be in the cluster before, |
234 |
handling of the SSH keys would basically be the same, in particular also |
235 |
a new SSH key pair is generated for the node, because we cannot be sure |
236 |
that the old key pair has not been compromised while the node was |
237 |
offlined. |
238 |
|
239 |
|
240 |
Pro- and demoting a node to/from master candidate |
241 |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
242 |
|
243 |
If the role of a node is changed from 'normal' to 'master_candidate', |
244 |
the procedure is the same as for adding nodes from the step "In |
245 |
LUNodeAdd ..." on. |
246 |
|
247 |
If a node gets demoted to 'normal', the master daemon makes a similar |
248 |
RPC call to the master node's node daemon as for adding a node. |
249 |
|
250 |
In the RPC call, noded will perform the following steps: |
251 |
|
252 |
- Check that the public key of the node to be demoted is indeed in the |
253 |
``ganeti_pub_keys`` file to avoid deleting ssh keys of machines that |
254 |
don't belong to the cluster (and thus potentially lock out the |
255 |
administrator). |
256 |
- Via SSH, remove the key from all node's ``authorized_keys`` files. |
257 |
|
258 |
This affected the behavior of the following commands: |
259 |
|
260 |
:: |
261 |
gnt-node modify --master-candidate=yes |
262 |
gnt-node modify --master-candidate=no [--auto-promote] |
263 |
|
264 |
If the node has been master candidate already before the command to promote |
265 |
it was issued, Ganeti does not do anything. |
266 |
|
267 |
Note that when you demote a node from master candidate to normal node, another |
268 |
master-capable and normal node will be promoted to master candidate. For this |
269 |
newly promoted node, the same changes apply as if it was explicitely promoted. |
270 |
|
271 |
The same behavior should be ensured for the corresponding rapi command. |
272 |
|
273 |
|
274 |
Offlining and onlining a node |
275 |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
276 |
|
277 |
When offlining a node, it immediately loses its role as master or master |
278 |
candidate as well. When it is onlined again, it will become master |
279 |
candidate again if it was so before. The handling of the keys should be done |
280 |
in the same way as when the node is explicitely promoted or demoted to or from |
281 |
master candidate. See the previous section for details. |
282 |
|
283 |
This affects the command: |
284 |
|
285 |
:: |
286 |
gnt-node modify --offline=yes |
287 |
gnt-node modify --offline=no [--auto-promote] |
288 |
|
289 |
For offlining, the removal of the keys is particularly important, as the |
290 |
detection of a compromised node might be the very reason for the offlining. |
291 |
Of course we cannot guarantee that removal of the key is always successful, |
292 |
because the node might not be reachable anymore. Even though it is a |
293 |
best-effort operation, it is still an improvement over the status quo, |
294 |
because currently Ganeti does not even try to remove any keys. |
295 |
|
296 |
The same behavior should be ensured for the corresponding rapi command. |
297 |
|
298 |
|
299 |
Cluster verify |
300 |
~~~~~~~~~~~~~~ |
301 |
|
302 |
So far, 'gnt-cluster verify' checks the SSH connectivity of all nodes to |
303 |
all other nodes. We propose to replace this by the following checks: |
304 |
|
305 |
- For all master candidates, we check if they can connect any other node |
306 |
in the cluster (other master candidates and normal nodes). |
307 |
- We check if the ``ganeti_pub_keys`` file contains keys of nodes that |
308 |
are no longer in the cluster or that are not potential master |
309 |
candidates. |
310 |
- For all normal nodes, we check if their key does not appear in other |
311 |
node's ``authorized_keys``. For now, we will only emit a warning |
312 |
rather than an error if this check fails, because Ganeti might be |
313 |
run in a setup where Ganeti is not the only system manipulating the |
314 |
SSH keys. |
315 |
|
316 |
|
317 |
Upgrades |
318 |
~~~~~~~~ |
319 |
|
320 |
When upgrading from a version that has the previous SSH setup to the one |
321 |
proposed in this design, the upgrade procedure has to involve the |
322 |
following steps in the post-upgrade hook: |
323 |
|
324 |
- For all nodes, new SSH key pairs are generated. |
325 |
- All nodes and their public keys are added to the ``ganeti_pub_keys`` |
326 |
file and the file is copied to all nodes. |
327 |
- All keys of master candidate nodes are added to the |
328 |
``authorized_keys`` files of all other nodes. |
329 |
|
330 |
Since this upgrade significantly changes the configuration of the |
331 |
clusters' nodes, we will add a note to the UPGRADE notes to make the |
332 |
administrator aware of this fact (in case he intends to enable access |
333 |
from normal nodes to master candidates for other reasons than Ganeti |
334 |
uses the machines). |
335 |
|
336 |
Also, in any operation where Ganeti creates new SSH keys, the old keys |
337 |
will be backed up and not simply overridden. |
338 |
|
339 |
|
340 |
Downgrades |
341 |
~~~~~~~~~~ |
342 |
|
343 |
These downgrading steps will be implemtented from 2.12 to 2.11: |
344 |
|
345 |
- The master node's private/public key pair will be distributed to all |
346 |
nodes (via SSH) and the individual SSH keys will be backed up. |
347 |
- The obsolete individual ssh keys will be removed from all nodes' |
348 |
``authorized_keys`` file. |
349 |
|
350 |
|
351 |
Renew-Crypto |
352 |
~~~~~~~~~~~~ |
353 |
|
354 |
The ``gnt-cluster renew-crypto`` command is not affected by the proposed |
355 |
changes related to SSH. |
356 |
|
357 |
|
358 |
Proposal regarding node daemon certificates |
359 |
------------------------------------------- |
360 |
|
361 |
Regarding the node daemon certificates, we propose the following changes |
362 |
in the design. |
363 |
|
364 |
- Instead of using the same certificate for all nodes as both, server |
365 |
and client certificate, we generate a common server certificate (and |
366 |
the corresponding private key) for all nodes and a different client |
367 |
certificate (and the corresponding private key) for each node. All |
368 |
those certificates will be self-signed for now. The client |
369 |
certificates will use the node UUID as serial number to ensure |
370 |
uniqueness within the cluster. |
371 |
- In addition, we store a mapping of |
372 |
(node UUID, client certificate digest) in the cluster's configuration |
373 |
and ssconf for hosts that are master or master candidate. |
374 |
The client certificate digest is a hash of the client certificate. |
375 |
We suggest a 'sha1' hash here. We will call this mapping 'candidate map' |
376 |
from here on. |
377 |
- The node daemon will be modified in a way that on an incoming RPC |
378 |
request, it first performs a client verification (same as before) to |
379 |
ensure that the requesting host is indeed the holder of the |
380 |
corresponding private key. Additionally, it compares the digest of |
381 |
the certificate of the incoming request to the respective entry of |
382 |
the candidate map. If the digest does not match the entry of the host |
383 |
in the mapping or is not included in the mapping at all, the SSL |
384 |
connection is refused. |
385 |
|
386 |
This design has the following advantages: |
387 |
|
388 |
- A compromised normal node cannot issue RPC calls, because it will |
389 |
not be in the candidate map. (See the ``Drawbacks`` section regarding |
390 |
an indirect way of achieving this though.) |
391 |
- A compromised master candidate would be able to issue RPC requests, |
392 |
but on detection of its compromised state, it can be removed from the |
393 |
cluster (and thus from the candidate map) without the need for |
394 |
redistribution of any certificates, because the other master candidates |
395 |
can continue using their own certificates. However, it is best |
396 |
practice to issue a complete key renewal even in this case, unless one |
397 |
can ensure no actions compromising other nodes have not already been |
398 |
carried out. |
399 |
- A compromised node would not be able to use the other (possibly master |
400 |
candidate) nodes' information from the candidate map to issue RPCs, |
401 |
because the config just stores the digests and not the certificate |
402 |
itself. |
403 |
- A compromised node would be able to obtain another node's certificate |
404 |
by waiting for incoming RPCs from this other node. However, the node |
405 |
cannot use the certificate to issue RPC calls, because the SSL client |
406 |
verification would require the node to hold the corresponding private |
407 |
key as well. |
408 |
|
409 |
Drawbacks of this design: |
410 |
|
411 |
- Complexity of node and certificate management will be increased (see |
412 |
following sections for details). |
413 |
- If the candidate map is not distributed fast enough to all nodes after |
414 |
an update of the configuration, it might be possible to issue RPC calls |
415 |
from a compromised master candidate node that has been removed |
416 |
from the Ganeti cluster already. However, this is still a better |
417 |
situation than before and an inherent problem when one wants to |
418 |
distinguish between master candidates and normal nodes. |
419 |
- A compromised master candidate would still be able to issue RPC calls, |
420 |
if it uses ssh to retrieve another master candidate's client |
421 |
certificate and the corresponding private SSL key. This is an issue |
422 |
even with the first part of the improved handling of ssh keys in this |
423 |
design (limiting ssh keys to master candidates), but it will be |
424 |
eliminated with the second part of the design (separate ssh keys for |
425 |
each master candidate). |
426 |
- Even though this proposal is an improvement towards the previous |
427 |
situation in Ganeti, it still does not use the full power of SSL. For |
428 |
further improvements, see Section "Related and future work". |
429 |
|
430 |
Alternative proposals: |
431 |
|
432 |
- Instead of generating a client certificate per node, one could think |
433 |
of just generating two different client certificates, one for normal |
434 |
nodes and one for master candidates. Noded could then just check if |
435 |
the requesting node has the master candidate certificate. Drawback of |
436 |
this proposal is that once one master candidate gets compromised, all |
437 |
master candidates would need to get a new certificate even if the |
438 |
compromised master candidate had not yet fetched the certificates |
439 |
from the other master candidates via ssh. |
440 |
- In addition to our main proposal, one could think of including a |
441 |
piece of data (for example the node's host name or UUID) in the RPC |
442 |
call which is encrypted with the requesting node's private key. The |
443 |
node daemon could check if the datum can be decrypted using the node's |
444 |
certificate. However, this would ensure similar functionality as |
445 |
SSL's built-in client verification and add significant complexity |
446 |
to Ganeti's RPC protocol. |
447 |
|
448 |
In the following sections, we describe how our design affects various |
449 |
Ganeti operations. |
450 |
|
451 |
|
452 |
Cluster initialization |
453 |
~~~~~~~~~~~~~~~~~~~~~~ |
454 |
|
455 |
On cluster initialization, so far only the node daemon certificate was |
456 |
created. With our design, two certificates (and corresponding keys) |
457 |
need to be created, a server certificate to be distributed to all nodes |
458 |
and a client certificate only to be used by this particular node. In the |
459 |
following, we use the term node daemon certificate for the server |
460 |
certficate only. |
461 |
|
462 |
In the cluster configuration, the candidate map is created. It is |
463 |
populated with the respective entry for the master node. It is also |
464 |
written to ssconf. |
465 |
|
466 |
|
467 |
(Re-)Adding nodes |
468 |
~~~~~~~~~~~~~~~~~ |
469 |
|
470 |
When a node is added, the server certificate is copied to the node (as |
471 |
before). Additionally, a new client certificate (and the corresponding |
472 |
private key) is created on the new node to be used only by the new node |
473 |
as client certifcate. |
474 |
|
475 |
If the new node is a master candidate, the candidate map is extended by |
476 |
the new node's data. As before, the updated configuration is distributed |
477 |
to all nodes (as complete configuration on the master candidates and |
478 |
ssconf on all nodes). Note that distribution of the configuration after |
479 |
adding a node is already implemented, since all nodes hold the list of |
480 |
nodes in the cluster in ssconf anyway. |
481 |
|
482 |
If the configuration for whatever reason already holds an entry for this |
483 |
node, it will be overriden. |
484 |
|
485 |
When readding a node, the procedure is the same as for adding a node. |
486 |
|
487 |
|
488 |
Promotion and demotion of master candidates |
489 |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
490 |
|
491 |
When a normal node gets promoted to be master candidate, an entry to the |
492 |
candidate map has to be added and the updated configuration has to be |
493 |
distributed to all nodes. If there was already an entry for the node, |
494 |
we override it. |
495 |
|
496 |
On demotion of a master candidate, the node's entry in the candidate map |
497 |
gets removed and the updated configuration gets redistibuted. |
498 |
|
499 |
The same procedure applied to onlining and offlining master candidates. |
500 |
|
501 |
|
502 |
Cluster verify |
503 |
~~~~~~~~~~~~~~ |
504 |
|
505 |
Cluster verify will be extended by the following checks: |
506 |
|
507 |
- Whether each entry in the candidate map indeed corresponds to a master |
508 |
candidate. |
509 |
- Whether the master candidate's certificate digest match their entry |
510 |
in the candidate map. |
511 |
- Whether no node tries to use the certificate of another node. In |
512 |
particular, it is important to check that no normal node tries to |
513 |
use the certificate of a master candidate. |
514 |
|
515 |
|
516 |
Crypto renewal |
517 |
~~~~~~~~~~~~~~ |
518 |
|
519 |
Currently, when the cluster's cryptographic tokens are renewed using the |
520 |
``gnt-cluster renew-crypto`` command the node daemon certificate is |
521 |
renewed (among others). Option ``--new-cluster-certificate`` renews the |
522 |
node daemon certificate only. |
523 |
|
524 |
By adding an option ``--new-node-certificates`` we offer to renew the |
525 |
client certificate. Whenever the client certificates are renewed, the |
526 |
candidate map has to be updated and redistributed. |
527 |
|
528 |
If for whatever reason, the candidate map becomes inconsistent, for example |
529 |
due inconsistent updating after a demotion or offlining), the user can use |
530 |
this option to renew the client certificates and update the candidate |
531 |
certificate map. |
532 |
|
533 |
|
534 |
Further considerations |
535 |
---------------------- |
536 |
|
537 |
Watcher |
538 |
~~~~~~~ |
539 |
|
540 |
The watcher is a script that is run on all nodes in regular intervals. The |
541 |
changes proposed in this design will not affect the watcher's implementation, |
542 |
because it behaves differently on the master than on non-master nodes. |
543 |
|
544 |
Only on the master, it issues query calls which would require a client |
545 |
certificate of a node in the candidate mapping. This is the case for the |
546 |
master node. On non-master node, it's only external communication is done via |
547 |
the ConfD protocol, which uses the hmac key, which is present on all nodes. |
548 |
Besides that, the watcher does not make any ssh connections, and thus is |
549 |
not affected by the changes in ssh key handling either. |
550 |
|
551 |
|
552 |
Other Keys and Daemons |
553 |
~~~~~~~~~~~~~~~~~~~~~~ |
554 |
|
555 |
Ganeti handles a couple of other keys/certificates that have not been mentioned |
556 |
in this design so far. Also, other daemons than the ones mentioned so far |
557 |
perform intra-cluster communication. Neither the keys nor the daemons will |
558 |
be affected by this design for several reasons: |
559 |
|
560 |
- The hmac key used by ConfD (see :doc:`design-2.1`): the hmac key is still |
561 |
distributed to all nodes, because it was designed to be used for |
562 |
communicating with ConfD, which should be possible from all nodes. |
563 |
For example, the monitoring daemon which runs on all nodes uses it to |
564 |
retrieve information from ConfD. However, since communication with ConfD |
565 |
is read-only, a compromised node holding the hmac key does not enable an |
566 |
attacker to change the cluster's state. |
567 |
|
568 |
- The WConfD daemon writes the configuration to all master candidates |
569 |
via RPC. Since it only runs on the master node, it's ability to run |
570 |
RPC requests is maintained with this design. |
571 |
|
572 |
- The rapi SSL key certificate and rapi user/password file 'rapi_users' is |
573 |
already only copied to the master candidates (see :doc:`design-2.1`, |
574 |
Section ``Redistribute Config``). |
575 |
|
576 |
- The spice certificates are still distributed to all nodes, since it should |
577 |
be possible to use spice to access VMs on any cluster node. |
578 |
|
579 |
- The cluster domain secret is used for inter-cluster instance moves. |
580 |
Since instances can be moved from any normal node of the source cluster to |
581 |
any normal node of the destination cluster, the presence of this |
582 |
secret on all nodes is necessary. |
583 |
|
584 |
|
585 |
Related and Future Work |
586 |
~~~~~~~~~~~~~~~~~~~~~~~ |
587 |
|
588 |
There a couple of suggestions on how to improve the SSL setup even more. |
589 |
As a trade-off wrt to complexity and implementation effort, we did not |
590 |
implement them yet (as of version 2.11) but describe them here for |
591 |
future reference. |
592 |
|
593 |
- All SSL certificates that Ganeti uses so far are self-signed. It would |
594 |
increase the security if they were signed by a common CA. There is |
595 |
already a design doc for a Ganeti CA which was suggested in a |
596 |
different context (related to import/export). This would also be a |
597 |
benefit for the RPC calls. See design doc :doc:`design-impexp2` for |
598 |
more information. Implementing a CA is rather complex, because it |
599 |
would mean also to support renewing the CA certificate and providing |
600 |
and supporting infrastructure to revoke compromised certificates. |
601 |
- An extension of the previous suggestion would be to even enable the |
602 |
system administrator to use an external CA. Especially in bigger |
603 |
setups, where already an SSL infrastructure exists, it would be useful |
604 |
if Ganeti can simply be integrated with it, rather than forcing the |
605 |
user to use the Ganeti CA. |
606 |
- A lighter version of using a CA would be to use the server certificate |
607 |
to sign the client certificate instead of using self-signed |
608 |
certificates for both. The probleme here is that this would make |
609 |
renewing the server certificate rather complicated, because all client |
610 |
certificates would need to be resigned and redistributed as well, |
611 |
which leads to interesting chicken-and-egg problems when this is done |
612 |
via RPC calls. |
613 |
- Ganeti RPC calls are currently done without checking if the hostname |
614 |
of the node complies with the common name of the certificate. This |
615 |
might be a desirable feature, but would increase the effort when a |
616 |
node is renamed. |
617 |
- The typical use case for SSL is to have one certificate per node |
618 |
rather than one shared certificate (Ganeti's noded server certificate) |
619 |
and a client certificate. One could change the design in a way that |
620 |
only one certificate per node is used, but this would require a common |
621 |
CA so that the validity of the certificate can be established by every |
622 |
node in the cluster. |
623 |
- With the proposed design, the serial numbers of the client |
624 |
certificates are set to the node UUIDs. This is technically also not |
625 |
complying to how SSL is supposed to be used, as the serial numbers |
626 |
should reflect the enumeration of certificates created by the CA. Once |
627 |
a CA is implemented, it might be reasonable to change this |
628 |
accordingly. The implementation of the proposed design also has the |
629 |
drawback of the serial number not changing even if the certificate is |
630 |
replaced by a new one (for example when calling ``gnt-cluster renew- |
631 |
crypt``), which also does not comply to way SSL was designed to be |
632 |
used. |
633 |
|
634 |
.. vim: set textwidth=72 : |
635 |
.. Local Variables: |
636 |
.. mode: rst |
637 |
.. fill-column: 72 |
638 |
.. End: |