root / doc / design-query-splitting.rst @ cb44e3db
History | View | Annotate | Download (6.5 kB)
1 |
=========================================== |
---|---|
2 |
Splitting the query and job execution paths |
3 |
=========================================== |
4 |
|
5 |
|
6 |
Introduction |
7 |
============ |
8 |
|
9 |
Currently, the master daemon does two main roles: |
10 |
|
11 |
- execute jobs that change the cluster state |
12 |
- respond to queries |
13 |
|
14 |
Due to the technical details of the implementation, the job execution |
15 |
and query paths interact with each other, and for example the "masterd |
16 |
hang" issue that we had late in the 2.5 release cycle was due to the |
17 |
interaction between job queries and job execution. |
18 |
|
19 |
Furthermore, also because technical implementations (Python lacking |
20 |
read-only variables being one example), we can't share internal data |
21 |
structures for jobs; instead, in the query path, we read them from |
22 |
disk in order to not block job execution due to locks. |
23 |
|
24 |
All these point to the fact that the integration of both queries and |
25 |
job execution in the same process (multi-threaded) creates more |
26 |
problems than advantages, and hence we should look into separating |
27 |
them. |
28 |
|
29 |
|
30 |
Proposed design |
31 |
=============== |
32 |
|
33 |
In Ganeti 2.7, we will introduce a separate, optional daemon to handle |
34 |
queries (note: whether this is an actual "new" daemon, or its |
35 |
functionality is folded into confd, remains to be seen). |
36 |
|
37 |
This daemon will expose exactly the same Luxi interface as masterd, |
38 |
except that job submission will be disabled. If so configured (at |
39 |
build time), clients will be changed to: |
40 |
|
41 |
- keep sending REQ_SUBMIT_JOB, REQ_SUBMIT_MANY_JOBS, and all requests |
42 |
except REQ_QUERY_* to the masterd socket (but also QR_LOCK) |
43 |
- redirect all REQ_QUERY_* requests to the new Luxi socket of the new |
44 |
daemon (except generic query with QR_LOCK) |
45 |
|
46 |
This new daemon will serve both pure configuration queries (which |
47 |
confd can already serve), and run-time queries (which currently only |
48 |
masterd can serve). Since the RPC can be done from any node to any |
49 |
node, the new daemon can run on all master candidates, not only on the |
50 |
master node. This means that all gnt-* list options can be now run on |
51 |
other nodes than the master node. If we implement this as a separate |
52 |
daemon that talks to confd, then we could actually run this on all |
53 |
nodes of the cluster (to be decided). |
54 |
|
55 |
During the 2.7 release, masterd will still respond to queries itself, |
56 |
but it will log all such queries for identification of "misbehaving" |
57 |
clients. |
58 |
|
59 |
Advantages |
60 |
---------- |
61 |
|
62 |
As far as I can see, this will bring some significant advantages. |
63 |
|
64 |
First, we remove any interaction between the job execution and cluster |
65 |
query state. This means that bugs in the locking code (job execution) |
66 |
will not impact the query of the cluster state, nor the query of the |
67 |
job execution itself. Furthermore, we will be able to have different |
68 |
tuning parameters between job execution (e.g. 25 threads for job |
69 |
execution) versus query (since these are transient, we could |
70 |
practically have unlimited numbers of query threads). |
71 |
|
72 |
As a result of the above split, we move from the current model, where |
73 |
shutdown of the master daemon practically "breaks" the entire Ganeti |
74 |
functionality (no job execution nor queries, not even connecting to |
75 |
the instance console), to a split model: |
76 |
|
77 |
- if just masterd is stopped, then other cluster functionality remains |
78 |
available: listing instances, connecting to the console of an |
79 |
instance, etc. |
80 |
- if just "queryd" is stopped, masterd can still process jobs, and one |
81 |
can furthermore run queries from other nodes (MCs) |
82 |
- only if both are stopped, we end up with the previous state |
83 |
|
84 |
This will help, for example, in the case where the master node has |
85 |
crashed and we haven't failed it over yet: querying and investigating |
86 |
the cluster state will still be possible from other master candidates |
87 |
(on small clusters, this will mean from all nodes). |
88 |
|
89 |
A last advantage is that we finally will be able to reduce the |
90 |
footprint of masterd; instead of previous discussion of splitting |
91 |
individual jobs, which requires duplication of all the base |
92 |
functionality, this will just split the queries, a more trivial piece |
93 |
of code than job execution. This should be a reasonable work effort, |
94 |
with a much smaller impact in case of failure (we can still run |
95 |
masterd as before). |
96 |
|
97 |
Disadvantages |
98 |
------------- |
99 |
|
100 |
We might get increased inconsistency during queries, as there will be |
101 |
a delay between masterd saving an updated configuration and |
102 |
confd/query loading and parsing it. However, this could be compensated |
103 |
by the fact that queries will only look at "snapshots" of the |
104 |
configuration, whereas before it could also look at "in-progress" |
105 |
modifications (due to the non-atomic updates). I think these will |
106 |
cancel each other out, we will have to see in practice how it works. |
107 |
|
108 |
Another disadvantage *might* be that we have a more complex setup, due |
109 |
to the introduction of a new daemon. However, the query path will be |
110 |
much simpler, and when we remove the query functionality from masterd |
111 |
we should have a more robust system. |
112 |
|
113 |
Finally, we have QR_LOCK, which is an internal query related to the |
114 |
master daemon, using the same infrastructure as the other queries |
115 |
(related to cluster state). This is unfortunate, and will require |
116 |
untangling in order to keep code duplication low. |
117 |
|
118 |
Long-term plans |
119 |
=============== |
120 |
|
121 |
If this works well, the plan would be (tentatively) to disable the |
122 |
query functionality in masterd completely in Ganeti 2.8, in order to |
123 |
remove the duplication. This might change based on how/if we split the |
124 |
configuration/locking daemon out, or not. |
125 |
|
126 |
Once we split this out, there is not technical reason why we can't |
127 |
execute any query from any node; except maybe practical reasons |
128 |
(network topology, remote nodes, etc.) or security reasons (if/whether |
129 |
we want to change the cluster security model). In any case, it should |
130 |
be possible to do this in a reliable way from all master candidates. |
131 |
|
132 |
Some implementation details |
133 |
--------------------------- |
134 |
|
135 |
We will fold this in confd, at least initially, to reduce the |
136 |
proliferation of daemons. Haskell will limit (if used properly) any too |
137 |
deep integration between the old "confd" functionality and the new query |
138 |
one. As advantages, we'll have a single daemons that handles |
139 |
configuration queries. |
140 |
|
141 |
The redirection of Luxi requests can be easily done based on the |
142 |
request type, if we have both sockets open, or if we open on demand. |
143 |
|
144 |
We don't want the masterd to talk to the queryd itself (hidden |
145 |
redirection), since we want to be able to run queries while masterd is |
146 |
down. |
147 |
|
148 |
During the 2.7 release cycle, we can test all queries against both |
149 |
masterd and queryd in QA, so we know we have exactly the same |
150 |
interface and it is consistent. |
151 |
|
152 |
.. vim: set textwidth=72 : |
153 |
.. Local Variables: |
154 |
.. mode: rst |
155 |
.. fill-column: 72 |
156 |
.. End: |