Statistics
| Branch: | Tag: | Revision:

root / doc / design-daemons.rst @ 06c2fb4a

History | View | Annotate | Download (13.8 kB)

1 ffedf64d Michele Tartara
==========================
2 ffedf64d Michele Tartara
Ganeti daemons refactoring
3 ffedf64d Michele Tartara
==========================
4 ffedf64d Michele Tartara
5 ffedf64d Michele Tartara
.. contents:: :depth: 2
6 ffedf64d Michele Tartara
7 ffedf64d Michele Tartara
This is a design document detailing the plan for refactoring the internal
8 ffedf64d Michele Tartara
structure of Ganeti, and particularly the set of daemons it is divided into.
9 ffedf64d Michele Tartara
10 ffedf64d Michele Tartara
11 ffedf64d Michele Tartara
Current state and shortcomings
12 ffedf64d Michele Tartara
==============================
13 ffedf64d Michele Tartara
14 ffedf64d Michele Tartara
Ganeti is comprised of a growing number of daemons, each dealing with part of
15 ffedf64d Michele Tartara
the tasks the cluster has to face, and communicating with the other daemons
16 ffedf64d Michele Tartara
using a variety of protocols.
17 ffedf64d Michele Tartara
18 ffedf64d Michele Tartara
Specifically, as of Ganeti 2.8, the situation is as follows:
19 ffedf64d Michele Tartara
20 ffedf64d Michele Tartara
``Master daemon (MasterD)``
21 ffedf64d Michele Tartara
  It is responsible for managing the entire cluster, and it's written in Python.
22 ffedf64d Michele Tartara
  It is executed on a single node (the master node). It receives the commands
23 ffedf64d Michele Tartara
  given by the cluster administrator (through the remote API daemon or the
24 ffedf64d Michele Tartara
  command line tools) over the LUXI protocol.  The master daemon is responsible
25 ffedf64d Michele Tartara
  for creating and managing the jobs that will execute such commands, and for
26 ffedf64d Michele Tartara
  managing the locks that ensure the cluster will not incur in race conditions.
27 ffedf64d Michele Tartara
28 ffedf64d Michele Tartara
  Each job is managed by a separate Python thread, that interacts with the node
29 ffedf64d Michele Tartara
  daemons via RPC calls.
30 ffedf64d Michele Tartara
31 ffedf64d Michele Tartara
  The master daemon is also responsible for managing the configuration of the
32 ffedf64d Michele Tartara
  cluster, changing it when required by some job. It is also responsible for
33 ffedf64d Michele Tartara
  copying the configuration to the other master candidates after updating it.
34 ffedf64d Michele Tartara
35 ffedf64d Michele Tartara
``RAPI daemon (RapiD)``
36 ffedf64d Michele Tartara
  It is written in Python and runs on the master node only. It waits for
37 ffedf64d Michele Tartara
  requests issued remotely through the remote API protocol. Then, it forwards
38 ffedf64d Michele Tartara
  them, using the LUXI protocol, to the master daemon (if they are commands) or
39 ffedf64d Michele Tartara
  to the query daemon if they are queries about the configuration (including
40 ffedf64d Michele Tartara
  live status) of the cluster.
41 ffedf64d Michele Tartara
42 ffedf64d Michele Tartara
``Node daemon (NodeD)``
43 ffedf64d Michele Tartara
  It is written in Python. It runs on all the nodes. It is responsible for
44 ffedf64d Michele Tartara
  receiving the master requests over RPC and execute them, using the appropriate
45 ffedf64d Michele Tartara
  backend (hypervisors, DRBD, LVM, etc.). It also receives requests over RPC for
46 ffedf64d Michele Tartara
  the execution of queries gathering live data on behalf of the query daemon.
47 ffedf64d Michele Tartara
48 ffedf64d Michele Tartara
``Configuration daemon (ConfD)``
49 ffedf64d Michele Tartara
  It is written in Haskell. It runs on all the master candidates. Since the
50 ffedf64d Michele Tartara
  configuration is replicated only on the master node, this daemon exists in
51 ffedf64d Michele Tartara
  order to provide information about the configuration to nodes needing them.
52 ffedf64d Michele Tartara
  The requests are done through ConfD's own protocol, HMAC signed,
53 ffedf64d Michele Tartara
  implemented over UDP, and meant to be used by parallely querying all the
54 ffedf64d Michele Tartara
  master candidates (or a subset thereof) and getting the most up to date
55 ffedf64d Michele Tartara
  answer. This is meant as a way to provide a robust service even in case master
56 ffedf64d Michele Tartara
  is temporarily unavailable.
57 ffedf64d Michele Tartara
58 ffedf64d Michele Tartara
``Query daemon (QueryD)``
59 ffedf64d Michele Tartara
  It is written in Haskell. It runs on all the master candidates. It replies
60 ffedf64d Michele Tartara
  to Luxi queries about the current status of the system, including live data it
61 ffedf64d Michele Tartara
  obtains by querying the node daemons through RPCs.
62 ffedf64d Michele Tartara
63 ffedf64d Michele Tartara
``Monitoring daemon (MonD)``
64 ffedf64d Michele Tartara
  It is written in Haskell. It runs on all nodes, including the ones that are
65 ffedf64d Michele Tartara
  not vm-capable. It is meant to provide information on the status of the
66 ffedf64d Michele Tartara
  system. Such information is related only to the specific node the daemon is
67 ffedf64d Michele Tartara
  running on, and it is provided as JSON encoded data over HTTP, to be easily
68 ffedf64d Michele Tartara
  readable by external tools.
69 ffedf64d Michele Tartara
  The monitoring daemon communicates with ConfD to get information about the
70 ffedf64d Michele Tartara
  configuration of the cluster. The choice of communicating with ConfD instead
71 ffedf64d Michele Tartara
  of MasterD allows it to obtain configuration information even when the cluster
72 ffedf64d Michele Tartara
  is heavily degraded (e.g.: when master and some, but not all, of the master
73 ffedf64d Michele Tartara
  candidates are unreachable).
74 ffedf64d Michele Tartara
75 ffedf64d Michele Tartara
The current structure of the Ganeti daemons is inefficient because there are
76 ffedf64d Michele Tartara
many different protocols involved, and each daemon needs to be able to use
77 ffedf64d Michele Tartara
multiple ones, and has to deal with doing different things, thus making
78 ffedf64d Michele Tartara
sometimes unclear which daemon is responsible for performing a specific task.
79 ffedf64d Michele Tartara
80 ffedf64d Michele Tartara
Also, with the current configuration, jobs are managed by the master daemon
81 ffedf64d Michele Tartara
using python threads. This makes terminating a job after it has started a
82 ffedf64d Michele Tartara
difficult operation, and it is the main reason why this is not possible yet.
83 ffedf64d Michele Tartara
84 ffedf64d Michele Tartara
The master daemon currently has too many different tasks, that could be handled
85 ffedf64d Michele Tartara
better if split among different daemons.
86 ffedf64d Michele Tartara
87 ffedf64d Michele Tartara
88 ffedf64d Michele Tartara
Proposed changes
89 ffedf64d Michele Tartara
================
90 ffedf64d Michele Tartara
91 ffedf64d Michele Tartara
In order to improve on the current situation, a new daemon subdivision is
92 ffedf64d Michele Tartara
proposed, and presented hereafter.
93 ffedf64d Michele Tartara
94 ffedf64d Michele Tartara
.. digraph:: "new-daemons-structure"
95 ffedf64d Michele Tartara
96 ffedf64d Michele Tartara
  {rank=same; RConfD LuxiD;}
97 ffedf64d Michele Tartara
  {rank=same; Jobs rconfigdata;}
98 ffedf64d Michele Tartara
  node [shape=box]
99 ffedf64d Michele Tartara
  RapiD [label="RapiD [M]"]
100 ffedf64d Michele Tartara
  LuxiD [label="LuxiD [M]"]
101 ffedf64d Michele Tartara
  WConfD [label="WConfD [M]"]
102 ffedf64d Michele Tartara
  Jobs [label="Jobs [M]"]
103 ffedf64d Michele Tartara
  RConfD [label="RConfD [MC]"]
104 ffedf64d Michele Tartara
  MonD [label="MonD [All]"]
105 ffedf64d Michele Tartara
  NodeD [label="NodeD [All]"]
106 ffedf64d Michele Tartara
  Clients [label="gnt-*\nclients [M]"]
107 ffedf64d Michele Tartara
  p1 [shape=none, label=""]
108 ffedf64d Michele Tartara
  p2 [shape=none, label=""]
109 ffedf64d Michele Tartara
  p3 [shape=none, label=""]
110 ffedf64d Michele Tartara
  p4 [shape=none, label=""]
111 ffedf64d Michele Tartara
  configdata [shape=none, label="config.data"]
112 ffedf64d Michele Tartara
  rconfigdata [shape=none, label="config.data\n[MC copy]"]
113 ffedf64d Michele Tartara
  locksdata [shape=none, label="locks.data"]
114 ffedf64d Michele Tartara
115 ffedf64d Michele Tartara
  RapiD -> LuxiD [label="LUXI"]
116 ffedf64d Michele Tartara
  LuxiD -> WConfD [label="WConfD\nproto"]
117 ffedf64d Michele Tartara
  LuxiD -> Jobs [label="fork/exec"]
118 ffedf64d Michele Tartara
  Jobs -> WConfD [label="WConfD\nproto"]
119 ffedf64d Michele Tartara
  Jobs -> NodeD [label="RPC"]
120 ffedf64d Michele Tartara
  LuxiD -> NodeD [label="RPC"]
121 ffedf64d Michele Tartara
  rconfigdata -> RConfD
122 ffedf64d Michele Tartara
  configdata -> rconfigdata [label="sync via\nNodeD RPC"]
123 ffedf64d Michele Tartara
  WConfD -> NodeD [label="RPC"]
124 ffedf64d Michele Tartara
  WConfD -> configdata
125 ffedf64d Michele Tartara
  WConfD -> locksdata
126 ffedf64d Michele Tartara
  MonD -> RConfD [label="RConfD\nproto"]
127 ffedf64d Michele Tartara
  Clients -> LuxiD [label="LUXI"]
128 ffedf64d Michele Tartara
  p1 -> MonD [label="MonD proto"]
129 ffedf64d Michele Tartara
  p2 -> RapiD [label="RAPI"]
130 ffedf64d Michele Tartara
  p3 -> RConfD [label="RConfD\nproto"]
131 ffedf64d Michele Tartara
  p4 -> Clients [label="CLI"]
132 ffedf64d Michele Tartara
133 ffedf64d Michele Tartara
``LUXI daemon (LuxiD)``
134 ffedf64d Michele Tartara
  It will be written in Haskell. It will run on the master node and it will be
135 ffedf64d Michele Tartara
  the only LUXI server, replying to all the LUXI queries. These includes both
136 ffedf64d Michele Tartara
  the queries about the live configuration of the cluster, previously served by
137 ffedf64d Michele Tartara
  QueryD, and the commands actually changing the status of the cluster by
138 ffedf64d Michele Tartara
  submitting jobs. Therefore, this daemon will also be the one responsible with
139 ffedf64d Michele Tartara
  managing the job queue. When a job needs to be executed, the LuxiD will spawn
140 ffedf64d Michele Tartara
  a separate process tasked with the execution of that specific job, thus making
141 ffedf64d Michele Tartara
  it easier to terminate the job itself, if needeed.  When a job requires locks,
142 ffedf64d Michele Tartara
  LuxiD will request them from WConfD.
143 ffedf64d Michele Tartara
  In order to keep availability of the cluster in case of failure of the master
144 ffedf64d Michele Tartara
  node, LuxiD will replicate the job queue to the other master candidates, by
145 ffedf64d Michele Tartara
  RPCs to the NodeD running there (the choice of RPCs for this task might be
146 ffedf64d Michele Tartara
  reviewed at a second time, after implementing this design).
147 ffedf64d Michele Tartara
148 ffedf64d Michele Tartara
``Configuration management daemon (WConfD)``
149 ffedf64d Michele Tartara
  It will run on the master node and it will be responsible for the management
150 ffedf64d Michele Tartara
  of the authoritative copy of the cluster configuration (that is, it will be
151 ffedf64d Michele Tartara
  the daemon actually modifying the ``config.data`` file). All the requests of
152 ffedf64d Michele Tartara
  configuration changes will have to pass through this daemon, and will be
153 ffedf64d Michele Tartara
  performed using a LUXI-like protocol ("WConfD proto" in the graph. The exact
154 ffedf64d Michele Tartara
  protocol will be defined in the separate design document that will detail the
155 ffedf64d Michele Tartara
  WConfD separation).  Having a single point of configuration management will
156 ffedf64d Michele Tartara
  also allow Ganeti to get rid of possible race conditions due to concurrent
157 ffedf64d Michele Tartara
  modifications of the configuration.  When the configuration is updated, it
158 ffedf64d Michele Tartara
  will have to push the received changes to the other master candidates, via
159 ffedf64d Michele Tartara
  RPCs, so that RConfD daemons and (in case of a failure on the master node)
160 ffedf64d Michele Tartara
  the WConfD daemon on the new master can access an up-to-date version of it
161 ffedf64d Michele Tartara
  (the choice of RPCs for this task might be reviewed at a second time). This
162 ffedf64d Michele Tartara
  daemon will also be the one responsible for managing the locks, granting them
163 ffedf64d Michele Tartara
  to the jobs requesting them, and taking care of freeing them up if the jobs
164 ffedf64d Michele Tartara
  holding them crash or are terminated before releasing them.  In order to do
165 ffedf64d Michele Tartara
  this, each job, after being spawned by LuxiD, will open a local unix socket
166 ffedf64d Michele Tartara
  that will be used to communicate with it, and will be destroyed when the job
167 ffedf64d Michele Tartara
  terminates.  LuxiD will be able to check, after a timeout, whether the job is
168 ffedf64d Michele Tartara
  still running by connecting here, and to ask WConfD to forcefully remove the
169 ffedf64d Michele Tartara
  locks if the socket is closed.
170 ffedf64d Michele Tartara
  Also, WConfD should hold a serialized list of the locks and their owners in a
171 ffedf64d Michele Tartara
  file (``locks.data``), so that it can keep track of their status in case it
172 ffedf64d Michele Tartara
  crashes and needs to be restarted (by asking LuxiD which of them are still
173 ffedf64d Michele Tartara
  running).
174 ffedf64d Michele Tartara
  Interaction with this daemon will be performed using Unix sockets.
175 ffedf64d Michele Tartara
176 ffedf64d Michele Tartara
``Configuration query daemon (RConfD)``
177 ffedf64d Michele Tartara
  It is written in Haskell, and it corresponds to the old ConfD. It will run on
178 ffedf64d Michele Tartara
  all the master candidates and it will serve information about the the static
179 ffedf64d Michele Tartara
  configuration of the cluster (the one contained in ``config.data``). The
180 ffedf64d Michele Tartara
  provided information will be highly available (as in: a response will be
181 ffedf64d Michele Tartara
  available as long as a stable-enough connection between the client and at
182 ffedf64d Michele Tartara
  least one working master candidate is available) and its freshness will be
183 ffedf64d Michele Tartara
  best effort (the most recent reply from any of the master candidates will be
184 ffedf64d Michele Tartara
  returned, but it might still be older than the one available through WConfD).
185 ffedf64d Michele Tartara
  The information will be served through the ConfD protocol.
186 ffedf64d Michele Tartara
187 ffedf64d Michele Tartara
``Rapi daemon (RapiD)``
188 ffedf64d Michele Tartara
  It remains basically unchanged, with the only difference that all of its LUXI
189 ffedf64d Michele Tartara
  query are directed towards LuxiD instead of being split between MasterD and
190 ffedf64d Michele Tartara
  QueryD.
191 ffedf64d Michele Tartara
192 ffedf64d Michele Tartara
``Monitoring daemon (MonD)``
193 ffedf64d Michele Tartara
  It remains unaffected by the changes in this design document. It will just get
194 ffedf64d Michele Tartara
  some of the data it needs from RConfD instead of the old ConfD, but the
195 ffedf64d Michele Tartara
  interfaces of the two are identical.
196 ffedf64d Michele Tartara
197 ffedf64d Michele Tartara
``Node daemon (NodeD)``
198 ffedf64d Michele Tartara
  It remains unaffected by the changes proposed in the design document. The only
199 ffedf64d Michele Tartara
  difference being that it will receive its RPCs from LuxiD (for job queue
200 ffedf64d Michele Tartara
  replication), from WConfD (for configuration replication) and for the
201 ffedf64d Michele Tartara
  processes executing single jobs (for all the operations to be performed by
202 ffedf64d Michele Tartara
  nodes) instead of receiving them just from MasterD.
203 ffedf64d Michele Tartara
204 ffedf64d Michele Tartara
This restructuring will allow us to reorganize and improve the codebase,
205 ffedf64d Michele Tartara
introducing cleaner interfaces and giving well defined and more restricted tasks
206 ffedf64d Michele Tartara
to each daemon.
207 ffedf64d Michele Tartara
208 ffedf64d Michele Tartara
Furthermore, having more well-defined interfaces will allow us to have easier
209 ffedf64d Michele Tartara
upgrade procedures, and to work towards the possibility of upgrading single
210 ffedf64d Michele Tartara
components of a cluster one at a time, without the need for immediately
211 ffedf64d Michele Tartara
upgrading the entire cluster in a single step.
212 ffedf64d Michele Tartara
213 ffedf64d Michele Tartara
214 ffedf64d Michele Tartara
Implementation
215 ffedf64d Michele Tartara
==============
216 ffedf64d Michele Tartara
217 ffedf64d Michele Tartara
While performing this refactoring, we aim to increase the amount of
218 ffedf64d Michele Tartara
Haskell code, thus benefiting from the additional type safety provided by its
219 ffedf64d Michele Tartara
wide compile-time checks. In particular, all the job queue management and the
220 ffedf64d Michele Tartara
configuration management daemon will be written in Haskell, taking over the role
221 ffedf64d Michele Tartara
currently fulfilled by Python code executed as part of MasterD.
222 ffedf64d Michele Tartara
223 ffedf64d Michele Tartara
The changes describe by this design document are quite extensive, therefore they
224 ffedf64d Michele Tartara
will not be implemented all at the same time, but through a sequence of steps,
225 ffedf64d Michele Tartara
leaving the codebase in a consistent and usable state.
226 ffedf64d Michele Tartara
227 ffedf64d Michele Tartara
#. Rename QueryD to LuxiD.
228 ffedf64d Michele Tartara
   A part of LuxiD, the one replying to configuration
229 ffedf64d Michele Tartara
   queries including live information about the system, already exists in the
230 ffedf64d Michele Tartara
   form of QueryD. This is being renamed to LuxiD, and will form the first part
231 ffedf64d Michele Tartara
   of the new daemon. NB: this is happening starting from Ganeti 2.8. At the
232 ffedf64d Michele Tartara
   beginning, only the already existing queries will be replied to by LuxiD.
233 ffedf64d Michele Tartara
   More queries will be implemented in the next versions.
234 ffedf64d Michele Tartara
235 ffedf64d Michele Tartara
#. Let LuxiD be the interface for the queries and MasterD be their executor.
236 ffedf64d Michele Tartara
   Currently, MasterD is the only responsible for receiving and executing LUXI
237 ffedf64d Michele Tartara
   queries, and for managing the jobs they create.
238 ffedf64d Michele Tartara
   Receiving the queries and managing the job queue will be extracted from
239 ffedf64d Michele Tartara
   MasterD into LuxiD.
240 ffedf64d Michele Tartara
   Actually executing jobs will still be done by MasterD, that contains all the
241 ffedf64d Michele Tartara
   logic for doing that and for properly managing locks and the configuration.
242 ffedf64d Michele Tartara
   A separate design document will detail how the system will decide which jobs
243 ffedf64d Michele Tartara
   to send over for execution, and how to rate-limit them.
244 ffedf64d Michele Tartara
245 ffedf64d Michele Tartara
#. Extract WConfD from MasterD.
246 ffedf64d Michele Tartara
   The logic for managing the configuration file is factored out to the
247 ffedf64d Michele Tartara
   dedicated WConfD daemon. All configuration changes, currently executed
248 ffedf64d Michele Tartara
   directly by MasterD, will be changed to be IPC requests sent to the new
249 ffedf64d Michele Tartara
   daemon.
250 ffedf64d Michele Tartara
251 ffedf64d Michele Tartara
#. Extract locking management from MasterD.
252 ffedf64d Michele Tartara
   The logic for managing and granting locks is extracted to WConfD as well.
253 ffedf64d Michele Tartara
   Locks will not be taken directly anymore, but asked via IPC to WConfD.
254 ffedf64d Michele Tartara
   This step can be executed on its own or at the same time as the previous one.
255 ffedf64d Michele Tartara
256 ffedf64d Michele Tartara
#. Jobs are executed as processes.
257 ffedf64d Michele Tartara
   The logic for running jobs is rewritten so that each job can be managed by an
258 ffedf64d Michele Tartara
   independent process. LuxiD will spawn a new (Python) process for every single
259 ffedf64d Michele Tartara
   job. The RPCs will remain unchanged, and the LU code will stay as is as much
260 ffedf64d Michele Tartara
   as possible.
261 ffedf64d Michele Tartara
   MasterD will cease to exist as a deamon on its own at this point, but not
262 ffedf64d Michele Tartara
   before.
263 ffedf64d Michele Tartara
264 ffedf64d Michele Tartara
Further considerations
265 ffedf64d Michele Tartara
======================
266 ffedf64d Michele Tartara
267 ffedf64d Michele Tartara
There is a possibility that a job will finish performing its task while LuxiD
268 ffedf64d Michele Tartara
and/or WConfD will not be available.
269 ffedf64d Michele Tartara
In order to deal with this situation, each job will write the results of its
270 ffedf64d Michele Tartara
execution on a file. The name of this file will be known to LuxiD before
271 ffedf64d Michele Tartara
starting the job, and will be stored together with the job ID, and the
272 ffedf64d Michele Tartara
name of the job-unique socket.
273 ffedf64d Michele Tartara
274 ffedf64d Michele Tartara
The job, upon ending its execution, will signal LuxiD (through the socket), so
275 ffedf64d Michele Tartara
that it can read the result of the execution and release the locks as needed.
276 ffedf64d Michele Tartara
277 ffedf64d Michele Tartara
In case LuxiD is not available at that time, the job will just terminate without
278 ffedf64d Michele Tartara
signalling it, and writing the results on file as usual. When a new LuxiD
279 ffedf64d Michele Tartara
becomes available, it will have the most up-to-date list of running jobs
280 ffedf64d Michele Tartara
(received via replication from the former LuxiD), and go through it, cleaning up
281 ffedf64d Michele Tartara
all the terminated jobs.
282 ffedf64d Michele Tartara
283 ffedf64d Michele Tartara
284 ffedf64d Michele Tartara
.. vim: set textwidth=72 :
285 ffedf64d Michele Tartara
.. Local Variables:
286 ffedf64d Michele Tartara
.. mode: rst
287 ffedf64d Michele Tartara
.. fill-column: 72
288 ffedf64d Michele Tartara
.. End: