|
1 |
===========================================
|
|
2 |
Splitting the query and job execution paths
|
|
3 |
===========================================
|
|
4 |
|
|
5 |
|
|
6 |
Introduction
|
|
7 |
============
|
|
8 |
|
|
9 |
Currently, the master daemon does two main roles:
|
|
10 |
|
|
11 |
- execute jobs that change the cluster state
|
|
12 |
- respond to queries
|
|
13 |
|
|
14 |
Due to the technical details of the implementation, the job execution
|
|
15 |
and query paths interact with each other, and for example the "masterd
|
|
16 |
hang" issue that we had late in the 2.5 release cycle was due to the
|
|
17 |
interaction between job queries and job execution.
|
|
18 |
|
|
19 |
Furthermore, also because technical implementations (Python lacking
|
|
20 |
read-only variables being one example), we can't share internal data
|
|
21 |
structures for jobs; instead, in the query path, we read them from
|
|
22 |
disk in order to not block job execution due to locks.
|
|
23 |
|
|
24 |
All these point to the fact that the integration of both queries and
|
|
25 |
job execution in the same process (multi-threaded) creates more
|
|
26 |
problems than advantages, and hence we should look into separating
|
|
27 |
them.
|
|
28 |
|
|
29 |
|
|
30 |
Proposed design
|
|
31 |
===============
|
|
32 |
|
|
33 |
In Ganeti 2.7, we will introduce a separate, optional daemon to handle
|
|
34 |
queries (note: whether this is an actual "new" daemon, or its
|
|
35 |
functionality is folded into confd, remains to be seen).
|
|
36 |
|
|
37 |
This daemon will expose exactly the same Luxi interface as masterd,
|
|
38 |
except that job submission will be disabled. If so configured (at
|
|
39 |
build time), clients will be changed to:
|
|
40 |
|
|
41 |
- keep sending REQ_SUBMIT_JOB, REQ_SUBMIT_MANY_JOBS, and all requests
|
|
42 |
except REQ_QUERY_* to the masterd socket (but also QR_LOCK)
|
|
43 |
- redirect all REQ_QUERY_* requests to the new Luxi socket of the new
|
|
44 |
daemon (except generic query with QR_LOCK)
|
|
45 |
|
|
46 |
This new daemon will serve both pure configuration queries (which
|
|
47 |
confd can already serve), and run-time queries (which currently only
|
|
48 |
masterd can serve). Since the RPC can be done from any node to any
|
|
49 |
node, the new daemon can run on all master candidates, not only on the
|
|
50 |
master node. This means that all gnt-* list options can be now run on
|
|
51 |
other nodes than the master node. If we implement this as a separate
|
|
52 |
daemon that talks to confd, then we could actually run this on all
|
|
53 |
nodes of the cluster (to be decided).
|
|
54 |
|
|
55 |
During the 2.7 release, masterd will still respond to queries itself,
|
|
56 |
but it will log all such queries for identification of "misbehaving"
|
|
57 |
clients.
|
|
58 |
|
|
59 |
Advantages
|
|
60 |
----------
|
|
61 |
|
|
62 |
As far as I can see, this will bring some significant advantages.
|
|
63 |
|
|
64 |
First, we remove any interaction between the job execution and cluster
|
|
65 |
query state. This means that bugs in the locking code (job execution)
|
|
66 |
will not impact the query of the cluster state, nor the query of the
|
|
67 |
job execution itself. Furthermore, we will be able to have different
|
|
68 |
tuning parameters between job execution (e.g. 25 threads for job
|
|
69 |
execution) versus query (since these are transient, we could
|
|
70 |
practically have unlimited numbers of query threads).
|
|
71 |
|
|
72 |
As a result of the above split, we move from the current model, where
|
|
73 |
shutdown of the master daemon practically "breaks" the entire Ganeti
|
|
74 |
functionality (no job execution nor queries, not even connecting to
|
|
75 |
the instance console), to a split model:
|
|
76 |
|
|
77 |
- if just masterd is stopped, then other cluster functionality remains
|
|
78 |
available: listing instances, connecting to the console of an
|
|
79 |
instance, etc.
|
|
80 |
- if just "queryd" is stopped, masterd can still process jobs, and one
|
|
81 |
can furthermore run queries from other nodes (MCs)
|
|
82 |
- only if both are stopped, we end up with the previous state
|
|
83 |
|
|
84 |
This will help, for example, in the case where the master node has
|
|
85 |
crashed and we haven't failed it over yet: querying and investigating
|
|
86 |
the cluster state will still be possible from other master candidates
|
|
87 |
(on small clusters, this will mean from all nodes).
|
|
88 |
|
|
89 |
A last advantage is that we finally will be able to reduce the
|
|
90 |
footprint of masterd; instead of previous discussion of splitting
|
|
91 |
individual jobs, which requires duplication of all the base
|
|
92 |
functionality, this will just split the queries, a more trivial piece
|
|
93 |
of code than job execution. This should be a reasonable work effort,
|
|
94 |
with a much smaller impact in case of failure (we can still run
|
|
95 |
masterd as before).
|
|
96 |
|
|
97 |
Disadvantages
|
|
98 |
-------------
|
|
99 |
|
|
100 |
We might get increased inconsistency during queries, as there will be
|
|
101 |
a delay between masterd saving an updated configuration and
|
|
102 |
confd/query loading and parsing it. However, this could be compensated
|
|
103 |
by the fact that queries will only look at "snapshots" of the
|
|
104 |
configuration, whereas before it could also look at "in-progress"
|
|
105 |
modifications (due to the non-atomic updates). I think these will
|
|
106 |
cancel each other out, we will have to see in practice how it works.
|
|
107 |
|
|
108 |
Another disadvantage *might* be that we have a more complex setup, due
|
|
109 |
to the introduction of a new daemon. However, the query path will be
|
|
110 |
much simpler, and when we remove the query functionality from masterd
|
|
111 |
we should have a more robust system.
|
|
112 |
|
|
113 |
Finally, we have QR_LOCK, which is an internal query related to the
|
|
114 |
master daemon, using the same infrastructure as the other queries
|
|
115 |
(related to cluster state). This is unfortunate, and will require
|
|
116 |
untangling in order to keep code duplication low.
|
|
117 |
|
|
118 |
Long-term plans
|
|
119 |
===============
|
|
120 |
|
|
121 |
If this works well, the plan would be (tentatively) to disable the
|
|
122 |
query functionality in masterd completely in Ganeti 2.8, in order to
|
|
123 |
remove the duplication. This might change based on how/if we split the
|
|
124 |
configuration/locking daemon out, or not.
|
|
125 |
|
|
126 |
Once we split this out, there is not technical reason why we can't
|
|
127 |
execute any query from any node; except maybe practical reasons
|
|
128 |
(network topology, remote nodes, etc.) or security reasons (if/whether
|
|
129 |
we want to change the cluster security model). In any case, it should
|
|
130 |
be possible to do this in a reliable way from all master candidates.
|
|
131 |
|
|
132 |
Some implementation details
|
|
133 |
---------------------------
|
|
134 |
|
|
135 |
We will fold this in confd, at least initially, to reduce the
|
|
136 |
proliferation of daemons. Haskell will limit (if used properly) any too
|
|
137 |
deep integration between the old "confd" functionality and the new query
|
|
138 |
one. As advantages, we'll have a single daemons that handles
|
|
139 |
configuration queries.
|
|
140 |
|
|
141 |
The redirection of Luxi requests can be easily done based on the
|
|
142 |
request type, if we have both sockets open, or if we open on demand.
|
|
143 |
|
|
144 |
We don't want the masterd to talk to the queryd itself (hidden
|
|
145 |
redirection), since we want to be able to run queries while masterd is
|
|
146 |
down.
|
|
147 |
|
|
148 |
During the 2.7 release cycle, we can test all queries against both
|
|
149 |
masterd and queryd in QA, so we know we have exactly the same
|
|
150 |
interface and it is consistent.
|
|
151 |
|
|
152 |
.. vim: set textwidth=72 :
|
|
153 |
.. Local Variables:
|
|
154 |
.. mode: rst
|
|
155 |
.. fill-column: 72
|
|
156 |
.. End:
|