root / doc / design-node-add.rst @ 909948f1
History | View | Annotate | Download (5.6 kB)
1 |
Design for adding a node to a cluster |
---|---|
2 |
===================================== |
3 |
|
4 |
.. contents:: :depth: 3 |
5 |
|
6 |
|
7 |
Current state and shortcomings |
8 |
------------------------------ |
9 |
|
10 |
Before a node can be added to a cluster, its SSH daemon must be |
11 |
re-configured to use the cluster-wide SSH host key. Ganeti 2.3.0 changed |
12 |
the way this is done by moving all related code to a separate script, |
13 |
``tools/setup-ssh``, using Paramiko. Before all such configuration was |
14 |
done from ``lib/bootstrap.py`` using the system's own SSH client and a |
15 |
shell script given to said client through parameters. |
16 |
|
17 |
Both solutions controlled all actions on the connecting machine; the |
18 |
newly added node was merely executing commands. This implies and |
19 |
requires a tight coupling and equality between nodes (e.g. paths to |
20 |
files being the same). Most of the logic and error handling is also done |
21 |
on the connecting machine. |
22 |
|
23 |
Once a node's SSH daemon has been configured, more than 25 files need to |
24 |
be copied using ``scp`` before the node daemon can be started. No |
25 |
verification is being done before files are copied. Once the node daemon |
26 |
is started, an opcode is submitted to the master daemon, which will then |
27 |
copy more files, such as the configuration and job queue for master |
28 |
candidates, using RPC. This process is somewhat fragile and requires |
29 |
initiating many SSH connections. |
30 |
|
31 |
Proposed changes |
32 |
---------------- |
33 |
|
34 |
SSH |
35 |
~~~ |
36 |
|
37 |
The main goal is to move more logic to the newly added node. Instead of |
38 |
having a relatively large script executed on the master node, most of it |
39 |
is moved over to the added node. |
40 |
|
41 |
A new script named ``prepare-node-join`` is added. It receives a JSON |
42 |
data structure (defined :ref:`below <prepare-node-join-json>`) on its |
43 |
standard input. Once the data has been successfully decoded, it proceeds |
44 |
to configure the local node's SSH daemon and root's SSH settings, after |
45 |
which the SSH daemon is restarted. |
46 |
|
47 |
All the master node has to do to add a new node is to gather all |
48 |
required data, build the data structure, and invoke the script on the |
49 |
node to be added. This will enable us to once again use the system's own |
50 |
SSH client and to drop the dependency on Paramiko for Ganeti itself |
51 |
(``ganeti-listrunner`` is going to continue using Paramiko). |
52 |
|
53 |
Eventually ``setup-ssh`` can be removed. |
54 |
|
55 |
|
56 |
Node daemon |
57 |
~~~~~~~~~~~ |
58 |
|
59 |
Similar to SSH setup changes, the process of copying files and starting |
60 |
the node daemon will be moved into a dedicated program. On its standard |
61 |
input it will receive a standardized JSON structure (defined :ref:`below |
62 |
<node-daemon-setup-json>`). Once the input data has been successfully |
63 |
decoded and the received values were verified for sanity, the program |
64 |
proceeds to write the values to files and then starts the node daemon |
65 |
(``ganeti-noded``). |
66 |
|
67 |
To add a new node to the cluster, the master node will have to gather |
68 |
all values, build the data structure, and then invoke the newly added |
69 |
``node-daemon-setup`` program via SSH. In this way only a single SSH |
70 |
connection is needed and the values can be verified before being written |
71 |
to files. |
72 |
|
73 |
If the program exits successfully, the node is ready to be added to the |
74 |
master daemon's configuration. The node daemon will be running, but |
75 |
``OpNodeAdd`` needs to be run before it becomes a full node. The opcode |
76 |
will copy more files, such as the :doc:`RAPI certificate <rapi>`. |
77 |
|
78 |
|
79 |
Data structures |
80 |
--------------- |
81 |
|
82 |
.. _prepare-node-join-json: |
83 |
|
84 |
JSON structure for SSH setup |
85 |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
86 |
|
87 |
The data is given in an object containing the keys described below. |
88 |
Unless specified otherwise, all entries are optional. |
89 |
|
90 |
``cluster_name`` |
91 |
Required string with the cluster name. If a local cluster name is |
92 |
found, the join process is aborted unless the passed cluster name |
93 |
matches the local name. |
94 |
``node_daemon_certificate`` |
95 |
Public part of cluster's node daemon certificate in PEM format. If a |
96 |
local node certificate and key is found, the join process is aborted |
97 |
unless this passed public part can be verified with the local key. |
98 |
``ssh_host_key`` |
99 |
List containing public and private parts of SSH host key. See below |
100 |
for definition. |
101 |
``ssh_root_key`` |
102 |
List containing public and private parts of root's key for SSH |
103 |
authorization. See below for definition. |
104 |
|
105 |
Lists of SSH keys use a tuple with three values. The first describes the |
106 |
key variant (``rsa`` or ``dsa``). The second and third are the private |
107 |
and public part of the key. Example: |
108 |
|
109 |
.. highlight:: javascript |
110 |
|
111 |
:: |
112 |
|
113 |
[ |
114 |
("rsa", "-----BEGIN RSA PRIVATE KEY-----...", "ssh-rss AAAA..."), |
115 |
("dsa", "-----BEGIN DSA PRIVATE KEY-----...", "ssh-dss AAAA..."), |
116 |
] |
117 |
|
118 |
|
119 |
.. _node-daemon-setup-json: |
120 |
|
121 |
JSON structure for node daemon setup |
122 |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
123 |
|
124 |
The data is given in an object containing the keys described below. |
125 |
Unless specified otherwise, all entries are optional. |
126 |
|
127 |
``cluster_name`` |
128 |
Required string with the cluster name. If a local cluster name is |
129 |
found, the join process is aborted unless the passed cluster name |
130 |
matches the local name. The cluster name is also included in the |
131 |
dictionary given via the ``ssconf`` entry. |
132 |
``node_daemon_certificate`` |
133 |
Public and private part of cluster's node daemon certificate in PEM |
134 |
format. If a local node certificate is found, the process is aborted |
135 |
unless it matches. |
136 |
``ssconf`` |
137 |
Dictionary with ssconf names and their values. Both are strings. |
138 |
Example: |
139 |
|
140 |
.. highlight:: javascript |
141 |
|
142 |
:: |
143 |
|
144 |
{ |
145 |
"cluster_name": "cluster.example.com", |
146 |
"master_ip": "192.168.2.1", |
147 |
"master_netdev": "br0", |
148 |
# … |
149 |
} |
150 |
|
151 |
``start_node_daemon`` |
152 |
Boolean denoting whether the node daemon should be started (or |
153 |
restarted if it was running for some reason). |
154 |
|
155 |
.. vim: set textwidth=72 : |
156 |
.. Local Variables: |
157 |
.. mode: rst |
158 |
.. fill-column: 72 |
159 |
.. End: |