root / doc / design-daemons.rst @ 7786784e
History | View | Annotate | Download (24.6 kB)
1 |
========================== |
---|---|
2 |
Ganeti daemons refactoring |
3 |
========================== |
4 |
|
5 |
.. contents:: :depth: 2 |
6 |
|
7 |
This is a design document detailing the plan for refactoring the internal |
8 |
structure of Ganeti, and particularly the set of daemons it is divided into. |
9 |
|
10 |
|
11 |
Current state and shortcomings |
12 |
============================== |
13 |
|
14 |
Ganeti is comprised of a growing number of daemons, each dealing with part of |
15 |
the tasks the cluster has to face, and communicating with the other daemons |
16 |
using a variety of protocols. |
17 |
|
18 |
Specifically, as of Ganeti 2.8, the situation is as follows: |
19 |
|
20 |
``Master daemon (MasterD)`` |
21 |
It is responsible for managing the entire cluster, and it's written in Python. |
22 |
It is executed on a single node (the master node). It receives the commands |
23 |
given by the cluster administrator (through the remote API daemon or the |
24 |
command line tools) over the LUXI protocol. The master daemon is responsible |
25 |
for creating and managing the jobs that will execute such commands, and for |
26 |
managing the locks that ensure the cluster will not incur in race conditions. |
27 |
|
28 |
Each job is managed by a separate Python thread, that interacts with the node |
29 |
daemons via RPC calls. |
30 |
|
31 |
The master daemon is also responsible for managing the configuration of the |
32 |
cluster, changing it when required by some job. It is also responsible for |
33 |
copying the configuration to the other master candidates after updating it. |
34 |
|
35 |
``RAPI daemon (RapiD)`` |
36 |
It is written in Python and runs on the master node only. It waits for |
37 |
requests issued remotely through the remote API protocol. Then, it forwards |
38 |
them, using the LUXI protocol, to the master daemon (if they are commands) or |
39 |
to the query daemon if they are queries about the configuration (including |
40 |
live status) of the cluster. |
41 |
|
42 |
``Node daemon (NodeD)`` |
43 |
It is written in Python. It runs on all the nodes. It is responsible for |
44 |
receiving the master requests over RPC and execute them, using the appropriate |
45 |
backend (hypervisors, DRBD, LVM, etc.). It also receives requests over RPC for |
46 |
the execution of queries gathering live data on behalf of the query daemon. |
47 |
|
48 |
``Configuration daemon (ConfD)`` |
49 |
It is written in Haskell. It runs on all the master candidates. Since the |
50 |
configuration is replicated only on the master node, this daemon exists in |
51 |
order to provide information about the configuration to nodes needing them. |
52 |
The requests are done through ConfD's own protocol, HMAC signed, |
53 |
implemented over UDP, and meant to be used by parallely querying all the |
54 |
master candidates (or a subset thereof) and getting the most up to date |
55 |
answer. This is meant as a way to provide a robust service even in case master |
56 |
is temporarily unavailable. |
57 |
|
58 |
``Query daemon (QueryD)`` |
59 |
It is written in Haskell. It runs on all the master candidates. It replies |
60 |
to Luxi queries about the current status of the system, including live data it |
61 |
obtains by querying the node daemons through RPCs. |
62 |
|
63 |
``Monitoring daemon (MonD)`` |
64 |
It is written in Haskell. It runs on all nodes, including the ones that are |
65 |
not vm-capable. It is meant to provide information on the status of the |
66 |
system. Such information is related only to the specific node the daemon is |
67 |
running on, and it is provided as JSON encoded data over HTTP, to be easily |
68 |
readable by external tools. |
69 |
The monitoring daemon communicates with ConfD to get information about the |
70 |
configuration of the cluster. The choice of communicating with ConfD instead |
71 |
of MasterD allows it to obtain configuration information even when the cluster |
72 |
is heavily degraded (e.g.: when master and some, but not all, of the master |
73 |
candidates are unreachable). |
74 |
|
75 |
The current structure of the Ganeti daemons is inefficient because there are |
76 |
many different protocols involved, and each daemon needs to be able to use |
77 |
multiple ones, and has to deal with doing different things, thus making |
78 |
sometimes unclear which daemon is responsible for performing a specific task. |
79 |
|
80 |
Also, with the current configuration, jobs are managed by the master daemon |
81 |
using python threads. This makes terminating a job after it has started a |
82 |
difficult operation, and it is the main reason why this is not possible yet. |
83 |
|
84 |
The master daemon currently has too many different tasks, that could be handled |
85 |
better if split among different daemons. |
86 |
|
87 |
|
88 |
Proposed changes |
89 |
================ |
90 |
|
91 |
In order to improve on the current situation, a new daemon subdivision is |
92 |
proposed, and presented hereafter. |
93 |
|
94 |
.. digraph:: "new-daemons-structure" |
95 |
|
96 |
{rank=same; RConfD LuxiD;} |
97 |
{rank=same; Jobs rconfigdata;} |
98 |
node [shape=box] |
99 |
RapiD [label="RapiD [M]"] |
100 |
LuxiD [label="LuxiD [M]"] |
101 |
WConfD [label="WConfD [M]"] |
102 |
Jobs [label="Jobs [M]"] |
103 |
RConfD [label="RConfD [MC]"] |
104 |
MonD [label="MonD [All]"] |
105 |
NodeD [label="NodeD [All]"] |
106 |
Clients [label="gnt-*\nclients [M]"] |
107 |
p1 [shape=none, label=""] |
108 |
p2 [shape=none, label=""] |
109 |
p3 [shape=none, label=""] |
110 |
p4 [shape=none, label=""] |
111 |
configdata [shape=none, label="config.data"] |
112 |
rconfigdata [shape=none, label="config.data\n[MC copy]"] |
113 |
locksdata [shape=none, label="locks.data"] |
114 |
|
115 |
RapiD -> LuxiD [label="LUXI"] |
116 |
LuxiD -> WConfD [label="WConfD\nproto"] |
117 |
LuxiD -> Jobs [label="fork/exec"] |
118 |
Jobs -> WConfD [label="WConfD\nproto"] |
119 |
Jobs -> NodeD [label="RPC"] |
120 |
LuxiD -> NodeD [label="RPC"] |
121 |
rconfigdata -> RConfD |
122 |
configdata -> rconfigdata [label="sync via\nNodeD RPC"] |
123 |
WConfD -> NodeD [label="RPC"] |
124 |
WConfD -> configdata |
125 |
WConfD -> locksdata |
126 |
MonD -> RConfD [label="RConfD\nproto"] |
127 |
Clients -> LuxiD [label="LUXI"] |
128 |
p1 -> MonD [label="MonD proto"] |
129 |
p2 -> RapiD [label="RAPI"] |
130 |
p3 -> RConfD [label="RConfD\nproto"] |
131 |
p4 -> Clients [label="CLI"] |
132 |
|
133 |
``LUXI daemon (LuxiD)`` |
134 |
It will be written in Haskell. It will run on the master node and it will be |
135 |
the only LUXI server, replying to all the LUXI queries. These includes both |
136 |
the queries about the live configuration of the cluster, previously served by |
137 |
QueryD, and the commands actually changing the status of the cluster by |
138 |
submitting jobs. Therefore, this daemon will also be the one responsible with |
139 |
managing the job queue. When a job needs to be executed, the LuxiD will spawn |
140 |
a separate process tasked with the execution of that specific job, thus making |
141 |
it easier to terminate the job itself, if needeed. When a job requires locks, |
142 |
LuxiD will request them from WConfD. |
143 |
In order to keep availability of the cluster in case of failure of the master |
144 |
node, LuxiD will replicate the job queue to the other master candidates, by |
145 |
RPCs to the NodeD running there (the choice of RPCs for this task might be |
146 |
reviewed at a second time, after implementing this design). |
147 |
|
148 |
``Configuration management daemon (WConfD)`` |
149 |
It will run on the master node and it will be responsible for the management |
150 |
of the authoritative copy of the cluster configuration (that is, it will be |
151 |
the daemon actually modifying the ``config.data`` file). All the requests of |
152 |
configuration changes will have to pass through this daemon, and will be |
153 |
performed using a LUXI-like protocol ("WConfD proto" in the graph. The exact |
154 |
protocol will be defined in the separate design document that will detail the |
155 |
WConfD separation). Having a single point of configuration management will |
156 |
also allow Ganeti to get rid of possible race conditions due to concurrent |
157 |
modifications of the configuration. When the configuration is updated, it |
158 |
will have to push the received changes to the other master candidates, via |
159 |
RPCs, so that RConfD daemons and (in case of a failure on the master node) |
160 |
the WConfD daemon on the new master can access an up-to-date version of it |
161 |
(the choice of RPCs for this task might be reviewed at a second time). This |
162 |
daemon will also be the one responsible for managing the locks, granting them |
163 |
to the jobs requesting them, and taking care of freeing them up if the jobs |
164 |
holding them crash or are terminated before releasing them. In order to do |
165 |
this, each job, after being spawned by LuxiD, will open a local unix socket |
166 |
that will be used to communicate with it, and will be destroyed when the job |
167 |
terminates. LuxiD will be able to check, after a timeout, whether the job is |
168 |
still running by connecting here, and to ask WConfD to forcefully remove the |
169 |
locks if the socket is closed. |
170 |
Also, WConfD should hold a serialized list of the locks and their owners in a |
171 |
file (``locks.data``), so that it can keep track of their status in case it |
172 |
crashes and needs to be restarted (by asking LuxiD which of them are still |
173 |
running). |
174 |
Interaction with this daemon will be performed using Unix sockets. |
175 |
|
176 |
``Configuration query daemon (RConfD)`` |
177 |
It is written in Haskell, and it corresponds to the old ConfD. It will run on |
178 |
all the master candidates and it will serve information about the the static |
179 |
configuration of the cluster (the one contained in ``config.data``). The |
180 |
provided information will be highly available (as in: a response will be |
181 |
available as long as a stable-enough connection between the client and at |
182 |
least one working master candidate is available) and its freshness will be |
183 |
best effort (the most recent reply from any of the master candidates will be |
184 |
returned, but it might still be older than the one available through WConfD). |
185 |
The information will be served through the ConfD protocol. |
186 |
|
187 |
``Rapi daemon (RapiD)`` |
188 |
It remains basically unchanged, with the only difference that all of its LUXI |
189 |
query are directed towards LuxiD instead of being split between MasterD and |
190 |
QueryD. |
191 |
|
192 |
``Monitoring daemon (MonD)`` |
193 |
It remains unaffected by the changes in this design document. It will just get |
194 |
some of the data it needs from RConfD instead of the old ConfD, but the |
195 |
interfaces of the two are identical. |
196 |
|
197 |
``Node daemon (NodeD)`` |
198 |
It remains unaffected by the changes proposed in the design document. The only |
199 |
difference being that it will receive its RPCs from LuxiD (for job queue |
200 |
replication), from WConfD (for configuration replication) and for the |
201 |
processes executing single jobs (for all the operations to be performed by |
202 |
nodes) instead of receiving them just from MasterD. |
203 |
|
204 |
This restructuring will allow us to reorganize and improve the codebase, |
205 |
introducing cleaner interfaces and giving well defined and more restricted tasks |
206 |
to each daemon. |
207 |
|
208 |
Furthermore, having more well-defined interfaces will allow us to have easier |
209 |
upgrade procedures, and to work towards the possibility of upgrading single |
210 |
components of a cluster one at a time, without the need for immediately |
211 |
upgrading the entire cluster in a single step. |
212 |
|
213 |
|
214 |
Implementation |
215 |
============== |
216 |
|
217 |
While performing this refactoring, we aim to increase the amount of |
218 |
Haskell code, thus benefiting from the additional type safety provided by its |
219 |
wide compile-time checks. In particular, all the job queue management and the |
220 |
configuration management daemon will be written in Haskell, taking over the role |
221 |
currently fulfilled by Python code executed as part of MasterD. |
222 |
|
223 |
The changes describe by this design document are quite extensive, therefore they |
224 |
will not be implemented all at the same time, but through a sequence of steps, |
225 |
leaving the codebase in a consistent and usable state. |
226 |
|
227 |
#. Rename QueryD to LuxiD. |
228 |
A part of LuxiD, the one replying to configuration |
229 |
queries including live information about the system, already exists in the |
230 |
form of QueryD. This is being renamed to LuxiD, and will form the first part |
231 |
of the new daemon. NB: this is happening starting from Ganeti 2.8. At the |
232 |
beginning, only the already existing queries will be replied to by LuxiD. |
233 |
More queries will be implemented in the next versions. |
234 |
|
235 |
#. Let LuxiD be the interface for the queries and MasterD be their executor. |
236 |
Currently, MasterD is the only responsible for receiving and executing LUXI |
237 |
queries, and for managing the jobs they create. |
238 |
Receiving the queries and managing the job queue will be extracted from |
239 |
MasterD into LuxiD. |
240 |
Actually executing jobs will still be done by MasterD, that contains all the |
241 |
logic for doing that and for properly managing locks and the configuration. |
242 |
At this stage, scheduling will simply consist in starting jobs until a fixed |
243 |
maximum number of simultaneously running jobs is reached. |
244 |
|
245 |
#. Extract WConfD from MasterD. |
246 |
The logic for managing the configuration file is factored out to the |
247 |
dedicated WConfD daemon. All configuration changes, currently executed |
248 |
directly by MasterD, will be changed to be IPC requests sent to the new |
249 |
daemon. |
250 |
|
251 |
#. Extract locking management from MasterD. |
252 |
The logic for managing and granting locks is extracted to WConfD as well. |
253 |
Locks will not be taken directly anymore, but asked via IPC to WConfD. |
254 |
This step can be executed on its own or at the same time as the previous one. |
255 |
|
256 |
#. Jobs are executed as processes. |
257 |
The logic for running jobs is rewritten so that each job can be managed by an |
258 |
independent process. LuxiD will spawn a new (Python) process for every single |
259 |
job. The RPCs will remain unchanged, and the LU code will stay as is as much |
260 |
as possible. |
261 |
MasterD will cease to exist as a deamon on its own at this point, but not |
262 |
before. |
263 |
|
264 |
#. Improve job scheduling algorithm. |
265 |
The simple algorithm for scheduling jobs will be replaced by a more |
266 |
intelligent one. Also, the implementation of :doc:`design-optables` can be |
267 |
started. |
268 |
|
269 |
Job death detection |
270 |
------------------- |
271 |
|
272 |
**Requirements:** |
273 |
|
274 |
- It must be possible to reliably detect a death of a process even under |
275 |
uncommon conditions such as very heavy system load. |
276 |
- A daemon must be able to detect a death of a process even if the |
277 |
daemon is restarted while the process is running. |
278 |
- The solution must not rely on being able to communicate with |
279 |
a process. |
280 |
- The solution must work for the current situation where multiple jobs |
281 |
run in a single process. |
282 |
- It must be POSIX compliant. |
283 |
|
284 |
These conditions rule out simple solutions like checking a process ID |
285 |
(because the process might be eventually replaced by another process |
286 |
with the same ID) or keeping an open connection to a process. |
287 |
|
288 |
**Solution:** As a job process is spawned, before attempting to |
289 |
communicate with any other process, it will create a designated empty |
290 |
lock file, open it, acquire an *exclusive* lock on it, and keep it open. |
291 |
When connecting to a daemon, the job process will provide it with the |
292 |
path of the file. If the process dies unexpectedly, the operating system |
293 |
kernel automatically cleans up the lock. |
294 |
|
295 |
Therefore, daemons can check if a process is dead by trying to acquire |
296 |
a *shared* lock on the lock file in a non-blocking mode: |
297 |
|
298 |
- If the locking operation succeeds, it means that the exclusive lock is |
299 |
missing, therefore the process has died, but the lock |
300 |
file hasn't been cleaned up yet. The daemon should release the lock |
301 |
immediately. Optionally, the daemon may delete the lock file. |
302 |
- If the file is missing, the process has died and the lock file has |
303 |
been cleaned up. |
304 |
- If the locking operation fails due to a lock conflict, it means |
305 |
the process is alive. |
306 |
|
307 |
Using shared locks for querying lock files ensures that the detection |
308 |
works correctly even if multiple daemons query a file at the same time. |
309 |
|
310 |
A job should close and remove its lock file when completely finishes. |
311 |
The WConfD daemon will be responsible for removing stale lock files of |
312 |
jobs that didn't remove its lock files themselves. |
313 |
|
314 |
**Statelessness of the protocol:** To keep our protocols stateless, |
315 |
the job id and the path the to lock file are sent as part of every |
316 |
request that deals with resources, in particular the Ganeti Locks. |
317 |
All resources are owned by the pair (job id, lock file). In this way, |
318 |
several jobs can live in the same process (as it will be in the |
319 |
transition period), but owner death detection still only depends on the |
320 |
owner of the resource. In particular, no additional lookup table is |
321 |
needed to obtain the lock file for a given owner. |
322 |
|
323 |
**Considered alternatives:** An alternative to creating a separate lock |
324 |
file would be to lock the job status file. However, file locks are kept |
325 |
only as long as the file is open. Therefore any operation followed by |
326 |
closing the file would cause the process to release the lock. In |
327 |
particular, with jobs as threads, the master daemon wouldn't be able to |
328 |
keep locks and operate on job files at the same time. |
329 |
|
330 |
WConfD details |
331 |
-------------- |
332 |
|
333 |
WConfD will communicate with its clients through a Unix domain socket for both |
334 |
configuration management and locking. Clients can issue multiple RPC calls |
335 |
through one socket. For each such a call the client sends a JSON request |
336 |
document with a remote function name and data for its arguments. The server |
337 |
replies with a JSON response document containing either the result of |
338 |
signalling a failure. |
339 |
|
340 |
Any state associated with client processes will be mirrored on persistent |
341 |
storage and linked to the identity of processes so that the WConfD daemon will |
342 |
be able to resume its operation at any point after a restart or a crash. WConfD |
343 |
will track each client's process start time along with its process ID to be |
344 |
able detect if a process dies and it's process ID is reused. WConfD will clear |
345 |
all locks and other state associated with a client if it detects it's process |
346 |
no longer exists. |
347 |
|
348 |
Configuration management |
349 |
++++++++++++++++++++++++ |
350 |
|
351 |
The new configuration management protocol will be implemented in the following |
352 |
steps: |
353 |
|
354 |
Step 1: |
355 |
#. Implement the following functions in WConfD and export them through |
356 |
RPC: |
357 |
|
358 |
- Obtain a single internal lock, either in shared or |
359 |
exclusive mode. This lock will substitute the current lock |
360 |
``_config_lock`` in config.py. |
361 |
- Release the lock. |
362 |
- Return the whole configuration data to a client. |
363 |
- Receive the whole configuration data from a client and replace the |
364 |
current configuration with it. Distribute it to master candidates |
365 |
and distribute the corresponding *ssconf*. |
366 |
|
367 |
WConfD must detect deaths of its clients (see `Job death |
368 |
detection`_) and release locks automatically. |
369 |
|
370 |
#. In config.py modify public methods that access configuration: |
371 |
|
372 |
- Instead of acquiring a local lock, obtain a lock from WConfD |
373 |
using the above functions |
374 |
- Fetch the current configuration from WConfD. |
375 |
- Use it to perform the method's task. |
376 |
- If the configuration was modified, send it to WConfD at the end. |
377 |
- Release the lock to WConfD. |
378 |
|
379 |
This will decouple the configuration management from the master daemon, |
380 |
even though the specific configuration tasks will still performed by |
381 |
individual jobs. |
382 |
|
383 |
After this step it'll be possible access the configuration from separate |
384 |
processes. |
385 |
|
386 |
Step 2: |
387 |
#. Reimplement all current methods of ``ConfigWriter`` for reading and |
388 |
writing the configuration of a cluster in WConfD. |
389 |
#. Expose each of those functions in WConfD as a separate RPC function. |
390 |
This will allow easy future extensions or modifications. |
391 |
#. Replace ``ConfigWriter`` with a stub (preferably automatically |
392 |
generated from the Haskell code) that will contain the same methods |
393 |
as the current ``ConfigWriter`` and delegate all calls to its |
394 |
methods to WConfD. |
395 |
|
396 |
Step 3: |
397 |
#. Remove WConfD's RPC functions for obtaining/releasing the single |
398 |
internal lock from Step 1. |
399 |
#. Remove WConfD's RPC functions for sending/receiving the whole |
400 |
configuration from Step 1. |
401 |
|
402 |
Future aims: |
403 |
|
404 |
- Optionally refactor the RPC calls to reduce their number or improve their |
405 |
efficiency (for example by obtaining a larger set of data instead of |
406 |
querying items one by one). |
407 |
|
408 |
Locking |
409 |
+++++++ |
410 |
|
411 |
The new locking protocol will be implemented as follows: |
412 |
|
413 |
Re-implement the current locking mechanism in WConfD and expose it for RPC |
414 |
calls. All current locks will be mapped into a data structure that will |
415 |
uniquely identify them (storing lock's level together with it's name). |
416 |
|
417 |
WConfD will impose a linear order on locks. The order will be compatible |
418 |
with the current ordering of lock levels so that existing code will work |
419 |
without changes. |
420 |
|
421 |
WConfD will keep the set of currently held locks for each client. The |
422 |
protocol will allow the following operations on the set: |
423 |
|
424 |
*Update:* |
425 |
Update the current set of locks according to a given list. The list contains |
426 |
locks and their desired level (release / shared / exclusive). To prevent |
427 |
deadlocks, WConfD will check that all newly requested locks (or already held |
428 |
locks requested to be upgraded to *exclusive*) are greater in the sense of |
429 |
the linear order than all currently held locks, and fail the operation if |
430 |
not. Only the locks in the list will be updated, other locks already held |
431 |
will be left intact. If the operation fails, the client's lock set will be |
432 |
left intact. |
433 |
*Opportunistic union:* |
434 |
Add as much as possible locks from a given set to the current set within a |
435 |
given timeout. WConfD will again check the proper order of locks and |
436 |
acquire only the ones that are allowed wrt. the current set. Returns the |
437 |
set of acquired locks, possibly empty. Immediate. Never fails. (It would also |
438 |
be possible to extend the operation to try to wait until a given number of |
439 |
locks is available, or a given timeout elapses.) |
440 |
*List:* |
441 |
List the current set of held locks. Immediate, never fails. |
442 |
*Intersection:* |
443 |
Retain only a given set of locks in the current one. This function is |
444 |
provided for convenience, it's redundant wrt. *list* and *update*. Immediate, |
445 |
never fails. |
446 |
|
447 |
Addidional restrictions due to lock implications: |
448 |
Ganeti supports locks that act as if a lock on a whole group (like all nodes) |
449 |
were held. To avoid dead locks caused by the additional blockage of those |
450 |
group locks, we impose certain restrictions. Whenever `A` is a group lock and |
451 |
`B` belongs to `A`, then the following holds. |
452 |
|
453 |
- `A` is in lock order before `B`. |
454 |
- All locks that are in the lock order between `A` and `B` also belong to `A`. |
455 |
- It is considered a lock-order violation to ask for an exclusive lock on `B` |
456 |
while holding a shared lock on `A`. |
457 |
|
458 |
After this step it'll be possible to use locks from jobs as separate processes. |
459 |
|
460 |
The above set of operations allows the clients to use various work-flows. In particular: |
461 |
|
462 |
Pessimistic strategy: |
463 |
Lock all potentially relevant resources (for example all nodes), determine |
464 |
which will be needed, and release all the others. |
465 |
Optimistic strategy: |
466 |
Determine what locks need to be acquired without holding any. Lock the |
467 |
required set of locks. Determine the set of required locks again and check if |
468 |
they are all held. If not, release everything and restart. |
469 |
|
470 |
.. COMMENTED OUT: |
471 |
Start with the smallest set of locks and when determining what more |
472 |
relevant resources will be needed, expand the set. If an *union* operation |
473 |
fails, release all locks, acquire the desired union and restart the |
474 |
operation so that all preconditions and possible concurrent changes are |
475 |
checked again. |
476 |
|
477 |
Future aims: |
478 |
|
479 |
- Add more fine-grained locks to prevent unnecessary blocking of jobs. This |
480 |
could include locks on parameters of entities or locks on their states (so that |
481 |
a node remains online, but otherwise can change, etc.). In particular, |
482 |
adding, moving and removing instances currently blocks the whole node. |
483 |
- Add checks that all modified configuration parameters belong to entities |
484 |
the client has locked and log violations. |
485 |
- Make the above checks mandatory. |
486 |
- Automate optimistic locking and checking the locks in logical units. |
487 |
For example, this could be accomplished by allowing some of the initial |
488 |
phases of `LogicalUnit` (such as `ExpandNames` and `DeclareLocks`) to be run |
489 |
repeatedly, checking if the set of locks requested the second time is |
490 |
contained in the set acquired after the first pass. |
491 |
- Add the possibility for a job to reserve hardware resources such as disk |
492 |
space or memory on nodes. Most likely as a new, special kind of instances |
493 |
that would only block its resources and allow to be converted to a regular |
494 |
instance. This would allow long-running jobs such as instance creation or |
495 |
move to lock the corresponding nodes, acquire the resources and turn the |
496 |
locks into shared ones, keeping an exclusive lock only on the instance. |
497 |
- Use more sophisticated algorithm for preventing deadlocks such as a |
498 |
`wait-for graph`_. This would allow less *union* failures and allow more |
499 |
optimistic, scalable acquisition of locks. |
500 |
|
501 |
.. _`wait-for graph`: http://en.wikipedia.org/wiki/Wait-for_graph |
502 |
|
503 |
|
504 |
Further considerations |
505 |
====================== |
506 |
|
507 |
There is a possibility that a job will finish performing its task while LuxiD |
508 |
and/or WConfD will not be available. |
509 |
In order to deal with this situation, each job will update its job file |
510 |
in the queue. This is race free, as LuxiD will no longer touch the job file, |
511 |
once the job is started; a corollary of this is that the job also has to |
512 |
take care of replicating updates to the job file. LuxiD will watch job files for |
513 |
changes to determine when a job as cleanly finished. To determine jobs |
514 |
that died without having the chance of updating the job file, the `Job death |
515 |
detection`_ mechanism will be used. |
516 |
|
517 |
.. vim: set textwidth=72 : |
518 |
.. Local Variables: |
519 |
.. mode: rst |
520 |
.. fill-column: 72 |
521 |
.. End: |