root / doc / design-2.2.rst @ 6e56e84a
History | View | Annotate | Download (7.2 kB)
1 |
================= |
---|---|
2 |
Ganeti 2.2 design |
3 |
================= |
4 |
|
5 |
This document describes the major changes in Ganeti 2.2 compared to |
6 |
the 2.1 version. |
7 |
|
8 |
The 2.2 version will be a relatively small release. Its main aim is to |
9 |
avoid changing too much of the core code, while addressing issues and |
10 |
adding new features and improvements over 2.1, in a timely fashion. |
11 |
|
12 |
.. contents:: :depth: 4 |
13 |
|
14 |
Objective |
15 |
========= |
16 |
|
17 |
Background |
18 |
========== |
19 |
|
20 |
Overview |
21 |
======== |
22 |
|
23 |
Detailed design |
24 |
=============== |
25 |
|
26 |
As for 2.1 we divide the 2.2 design into three areas: |
27 |
|
28 |
- core changes, which affect the master daemon/job queue/locking or |
29 |
all/most logical units |
30 |
- logical unit/feature changes |
31 |
- external interface changes (eg. command line, os api, hooks, ...) |
32 |
|
33 |
Core changes |
34 |
------------ |
35 |
|
36 |
Remote procedure call timeouts |
37 |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
38 |
|
39 |
Current state and shortcomings |
40 |
++++++++++++++++++++++++++++++ |
41 |
|
42 |
The current RPC protocol used by Ganeti is based on HTTP. Every request |
43 |
consists of an HTTP PUT request (e.g. ``PUT /hooks_runner HTTP/1.0``) |
44 |
and doesn't return until the function called has returned. Parameters |
45 |
and return values are encoded using JSON. |
46 |
|
47 |
On the server side, ``ganeti-noded`` handles every incoming connection |
48 |
in a separate process by forking just after accepting the connection. |
49 |
This process exits after sending the response. |
50 |
|
51 |
There is one major problem with this design: Timeouts can not be used on |
52 |
a per-request basis. Neither client or server know how long it will |
53 |
take. Even if we might be able to group requests into different |
54 |
categories (e.g. fast and slow), this is not reliable. |
55 |
|
56 |
If a node has an issue or the network connection fails while a request |
57 |
is being handled, the master daemon can wait for a long time for the |
58 |
connection to time out (e.g. due to the operating system's underlying |
59 |
TCP keep-alive packets or timeouts). While the settings for keep-alive |
60 |
packets can be changed using Linux-specific socket options, we prefer to |
61 |
use application-level timeouts because these cover both machine down and |
62 |
unresponsive node daemon cases. |
63 |
|
64 |
Proposed changes |
65 |
++++++++++++++++ |
66 |
|
67 |
RPC glossary |
68 |
^^^^^^^^^^^^ |
69 |
|
70 |
Function call ID |
71 |
Unique identifier returned by ``ganeti-noded`` after invoking a |
72 |
function. |
73 |
Function process |
74 |
Process started by ``ganeti-noded`` to call actual (backend) function. |
75 |
|
76 |
Protocol |
77 |
^^^^^^^^ |
78 |
|
79 |
Initially we chose HTTP as our RPC protocol because there were existing |
80 |
libraries, which, unfortunately, turned out to miss important features |
81 |
(such as SSL certificate authentication) and we had to write our own. |
82 |
|
83 |
This proposal can easily be implemented using HTTP, though it would |
84 |
likely be more efficient and less complicated to use the LUXI protocol |
85 |
already used to communicate between client tools and the Ganeti master |
86 |
daemon. Switching to another protocol can occur at a later point. This |
87 |
proposal should be implemented using HTTP as its underlying protocol. |
88 |
|
89 |
The LUXI protocol currently contains two functions, ``WaitForJobChange`` |
90 |
and ``AutoArchiveJobs``, which can take a longer time. They both support |
91 |
a parameter to specify the timeout. This timeout is usually chosen as |
92 |
roughly half of the socket timeout, guaranteeing a response before the |
93 |
socket times out. After the specified amount of time, |
94 |
``AutoArchiveJobs`` returns and reports the number of archived jobs. |
95 |
``WaitForJobChange`` returns and reports a timeout. In both cases, the |
96 |
functions can be called again. |
97 |
|
98 |
A similar model can be used for the inter-node RPC protocol. In some |
99 |
sense, the node daemon will implement a light variant of *"node daemon |
100 |
jobs"*. When the function call is sent, it specifies an initial timeout. |
101 |
If the function didn't finish within this timeout, a response is sent |
102 |
with a unique identifier, the function call ID. The client can then |
103 |
choose to wait for the function to finish again with a timeout. |
104 |
Inter-node RPC calls would no longer be blocking indefinitely and there |
105 |
would be an implicit ping-mechanism. |
106 |
|
107 |
Request handling |
108 |
^^^^^^^^^^^^^^^^ |
109 |
|
110 |
To support the protocol changes described above, the way the node daemon |
111 |
handles request will have to change. Instead of forking and handling |
112 |
every connection in a separate process, there should be one child |
113 |
process per function call and the master process will handle the |
114 |
communication with clients and the function processes using asynchronous |
115 |
I/O. |
116 |
|
117 |
Function processes communicate with the parent process via stdio and |
118 |
possibly their exit status. Every function process has a unique |
119 |
identifier, though it shouldn't be the process ID only (PIDs can be |
120 |
recycled and are prone to race conditions for this use case). The |
121 |
proposed format is ``${ppid}:${cpid}:${time}:${random}``, where ``ppid`` |
122 |
is the ``ganeti-noded`` PID, ``cpid`` the child's PID, ``time`` the |
123 |
current Unix timestamp with decimal places and ``random`` at least 16 |
124 |
random bits. |
125 |
|
126 |
The following operations will be supported: |
127 |
|
128 |
``StartFunction(fn_name, fn_args, timeout)`` |
129 |
Starts a function specified by ``fn_name`` with arguments in |
130 |
``fn_args`` and waits up to ``timeout`` seconds for the function |
131 |
to finish. Fire-and-forget calls can be made by specifying a timeout |
132 |
of 0 seconds (e.g. for powercycling the node). Returns three values: |
133 |
function call ID (if not finished), whether function finished (or |
134 |
timeout) and the function's return value. |
135 |
``WaitForFunction(fnc_id, timeout)`` |
136 |
Waits up to ``timeout`` seconds for function call to finish. Return |
137 |
value same as ``StartFunction``. |
138 |
|
139 |
In the future, ``StartFunction`` could support an additional parameter |
140 |
to specify after how long the function process should be aborted. |
141 |
|
142 |
Simplified timing diagram:: |
143 |
|
144 |
Master daemon Node daemon Function process |
145 |
| |
146 |
Call function |
147 |
(timeout 10s) -----> Parse request and fork for ----> Start function |
148 |
calling actual function, then | |
149 |
wait up to 10s for function to | |
150 |
finish | |
151 |
| | |
152 |
... ... |
153 |
| | |
154 |
Examine return <---- | | |
155 |
value and wait | |
156 |
again -------------> Wait another 10s for function | |
157 |
| | |
158 |
... ... |
159 |
| | |
160 |
Examine return <---- | | |
161 |
value and wait | |
162 |
again -------------> Wait another 10s for function | |
163 |
| | |
164 |
... ... |
165 |
| | |
166 |
| Function ends, |
167 |
Get return value and forward <-- process exits |
168 |
Process return <---- it to caller |
169 |
value and continue |
170 |
| |
171 |
|
172 |
.. TODO: Convert diagram above to graphviz/dot graphic |
173 |
|
174 |
On process termination (e.g. after having been sent a ``SIGTERM`` or |
175 |
``SIGINT`` signal), ``ganeti-noded`` should send ``SIGTERM`` to all |
176 |
function processes and wait for all of them to terminate. |
177 |
|
178 |
|
179 |
Feature changes |
180 |
--------------- |
181 |
|
182 |
External interface changes |
183 |
-------------------------- |
184 |
|
185 |
.. vim: set textwidth=72 : |