Statistics
| Branch: | Tag: | Revision:

root / doc / design-query-splitting.rst @ fc6075dd

History | View | Annotate | Download (6.5 kB)

1 c6d992c5 Iustin Pop
===========================================
2 c6d992c5 Iustin Pop
Splitting the query and job execution paths
3 c6d992c5 Iustin Pop
===========================================
4 c6d992c5 Iustin Pop
5 c6d992c5 Iustin Pop
6 c6d992c5 Iustin Pop
Introduction
7 c6d992c5 Iustin Pop
============
8 c6d992c5 Iustin Pop
9 c6d992c5 Iustin Pop
Currently, the master daemon does two main roles:
10 c6d992c5 Iustin Pop
11 c6d992c5 Iustin Pop
- execute jobs that change the cluster state
12 c6d992c5 Iustin Pop
- respond to queries
13 c6d992c5 Iustin Pop
14 c6d992c5 Iustin Pop
Due to the technical details of the implementation, the job execution
15 c6d992c5 Iustin Pop
and query paths interact with each other, and for example the "masterd
16 c6d992c5 Iustin Pop
hang" issue that we had late in the 2.5 release cycle was due to the
17 c6d992c5 Iustin Pop
interaction between job queries and job execution.
18 c6d992c5 Iustin Pop
19 c6d992c5 Iustin Pop
Furthermore, also because technical implementations (Python lacking
20 c6d992c5 Iustin Pop
read-only variables being one example), we can't share internal data
21 c6d992c5 Iustin Pop
structures for jobs; instead, in the query path, we read them from
22 c6d992c5 Iustin Pop
disk in order to not block job execution due to locks.
23 c6d992c5 Iustin Pop
24 c6d992c5 Iustin Pop
All these point to the fact that the integration of both queries and
25 c6d992c5 Iustin Pop
job execution in the same process (multi-threaded) creates more
26 c6d992c5 Iustin Pop
problems than advantages, and hence we should look into separating
27 c6d992c5 Iustin Pop
them.
28 c6d992c5 Iustin Pop
29 c6d992c5 Iustin Pop
30 c6d992c5 Iustin Pop
Proposed design
31 c6d992c5 Iustin Pop
===============
32 c6d992c5 Iustin Pop
33 c6d992c5 Iustin Pop
In Ganeti 2.7, we will introduce a separate, optional daemon to handle
34 c6d992c5 Iustin Pop
queries (note: whether this is an actual "new" daemon, or its
35 c6d992c5 Iustin Pop
functionality is folded into confd, remains to be seen).
36 c6d992c5 Iustin Pop
37 c6d992c5 Iustin Pop
This daemon will expose exactly the same Luxi interface as masterd,
38 c6d992c5 Iustin Pop
except that job submission will be disabled. If so configured (at
39 c6d992c5 Iustin Pop
build time), clients will be changed to:
40 c6d992c5 Iustin Pop
41 c6d992c5 Iustin Pop
- keep sending REQ_SUBMIT_JOB, REQ_SUBMIT_MANY_JOBS, and all requests
42 c6d992c5 Iustin Pop
  except REQ_QUERY_* to the masterd socket (but also QR_LOCK)
43 c6d992c5 Iustin Pop
- redirect all REQ_QUERY_* requests to the new Luxi socket of the new
44 c6d992c5 Iustin Pop
  daemon (except generic query with QR_LOCK)
45 c6d992c5 Iustin Pop
46 c6d992c5 Iustin Pop
This new daemon will serve both pure configuration queries (which
47 c6d992c5 Iustin Pop
confd can already serve), and run-time queries (which currently only
48 c6d992c5 Iustin Pop
masterd can serve). Since the RPC can be done from any node to any
49 c6d992c5 Iustin Pop
node, the new daemon can run on all master candidates, not only on the
50 c6d992c5 Iustin Pop
master node. This means that all gnt-* list options can be now run on
51 c6d992c5 Iustin Pop
other nodes than the master node. If we implement this as a separate
52 c6d992c5 Iustin Pop
daemon that talks to confd, then we could actually run this on all
53 c6d992c5 Iustin Pop
nodes of the cluster (to be decided).
54 c6d992c5 Iustin Pop
55 c6d992c5 Iustin Pop
During the 2.7 release, masterd will still respond to queries itself,
56 c6d992c5 Iustin Pop
but it will log all such queries for identification of "misbehaving"
57 c6d992c5 Iustin Pop
clients.
58 c6d992c5 Iustin Pop
59 c6d992c5 Iustin Pop
Advantages
60 c6d992c5 Iustin Pop
----------
61 c6d992c5 Iustin Pop
62 c6d992c5 Iustin Pop
As far as I can see, this will bring some significant advantages.
63 c6d992c5 Iustin Pop
64 c6d992c5 Iustin Pop
First, we remove any interaction between the job execution and cluster
65 c6d992c5 Iustin Pop
query state. This means that bugs in the locking code (job execution)
66 c6d992c5 Iustin Pop
will not impact the query of the cluster state, nor the query of the
67 c6d992c5 Iustin Pop
job execution itself. Furthermore, we will be able to have different
68 c6d992c5 Iustin Pop
tuning parameters between job execution (e.g. 25 threads for job
69 c6d992c5 Iustin Pop
execution) versus query (since these are transient, we could
70 c6d992c5 Iustin Pop
practically have unlimited numbers of query threads).
71 c6d992c5 Iustin Pop
72 c6d992c5 Iustin Pop
As a result of the above split, we move from the current model, where
73 c6d992c5 Iustin Pop
shutdown of the master daemon practically "breaks" the entire Ganeti
74 c6d992c5 Iustin Pop
functionality (no job execution nor queries, not even connecting to
75 c6d992c5 Iustin Pop
the instance console), to a split model:
76 c6d992c5 Iustin Pop
77 c6d992c5 Iustin Pop
- if just masterd is stopped, then other cluster functionality remains
78 c6d992c5 Iustin Pop
  available: listing instances, connecting to the console of an
79 c6d992c5 Iustin Pop
  instance, etc.
80 c6d992c5 Iustin Pop
- if just "queryd" is stopped, masterd can still process jobs, and one
81 c6d992c5 Iustin Pop
  can furthermore run queries from other nodes (MCs)
82 c6d992c5 Iustin Pop
- only if both are stopped, we end up with the previous state
83 c6d992c5 Iustin Pop
84 c6d992c5 Iustin Pop
This will help, for example, in the case where the master node has
85 c6d992c5 Iustin Pop
crashed and we haven't failed it over yet: querying and investigating
86 c6d992c5 Iustin Pop
the cluster state will still be possible from other master candidates
87 c6d992c5 Iustin Pop
(on small clusters, this will mean from all nodes).
88 c6d992c5 Iustin Pop
89 c6d992c5 Iustin Pop
A last advantage is that we finally will be able to reduce the
90 c6d992c5 Iustin Pop
footprint of masterd; instead of previous discussion of splitting
91 c6d992c5 Iustin Pop
individual jobs, which requires duplication of all the base
92 c6d992c5 Iustin Pop
functionality, this will just split the queries, a more trivial piece
93 c6d992c5 Iustin Pop
of code than job execution. This should be a reasonable work effort,
94 c6d992c5 Iustin Pop
with a much smaller impact in case of failure (we can still run
95 c6d992c5 Iustin Pop
masterd as before).
96 c6d992c5 Iustin Pop
97 c6d992c5 Iustin Pop
Disadvantages
98 c6d992c5 Iustin Pop
-------------
99 c6d992c5 Iustin Pop
100 c6d992c5 Iustin Pop
We might get increased inconsistency during queries, as there will be
101 c6d992c5 Iustin Pop
a delay between masterd saving an updated configuration and
102 c6d992c5 Iustin Pop
confd/query loading and parsing it. However, this could be compensated
103 c6d992c5 Iustin Pop
by the fact that queries will only look at "snapshots" of the
104 c6d992c5 Iustin Pop
configuration, whereas before it could also look at "in-progress"
105 c6d992c5 Iustin Pop
modifications (due to the non-atomic updates). I think these will
106 c6d992c5 Iustin Pop
cancel each other out, we will have to see in practice how it works.
107 c6d992c5 Iustin Pop
108 c6d992c5 Iustin Pop
Another disadvantage *might* be that we have a more complex setup, due
109 c6d992c5 Iustin Pop
to the introduction of a new daemon. However, the query path will be
110 c6d992c5 Iustin Pop
much simpler, and when we remove the query functionality from masterd
111 c6d992c5 Iustin Pop
we should have a more robust system.
112 c6d992c5 Iustin Pop
113 c6d992c5 Iustin Pop
Finally, we have QR_LOCK, which is an internal query related to the
114 c6d992c5 Iustin Pop
master daemon, using the same infrastructure as the other queries
115 c6d992c5 Iustin Pop
(related to cluster state). This is unfortunate, and will require
116 c6d992c5 Iustin Pop
untangling in order to keep code duplication low.
117 c6d992c5 Iustin Pop
118 c6d992c5 Iustin Pop
Long-term plans
119 c6d992c5 Iustin Pop
===============
120 c6d992c5 Iustin Pop
121 c6d992c5 Iustin Pop
If this works well, the plan would be (tentatively) to disable the
122 c6d992c5 Iustin Pop
query functionality in masterd completely in Ganeti 2.8, in order to
123 c6d992c5 Iustin Pop
remove the duplication. This might change based on how/if we split the
124 c6d992c5 Iustin Pop
configuration/locking daemon out, or not.
125 c6d992c5 Iustin Pop
126 c6d992c5 Iustin Pop
Once we split this out, there is not technical reason why we can't
127 c6d992c5 Iustin Pop
execute any query from any node; except maybe practical reasons
128 c6d992c5 Iustin Pop
(network topology, remote nodes, etc.) or security reasons (if/whether
129 c6d992c5 Iustin Pop
we want to change the cluster security model). In any case, it should
130 c6d992c5 Iustin Pop
be possible to do this in a reliable way from all master candidates.
131 c6d992c5 Iustin Pop
132 c6d992c5 Iustin Pop
Some implementation details
133 c6d992c5 Iustin Pop
---------------------------
134 c6d992c5 Iustin Pop
135 c6d992c5 Iustin Pop
We will fold this in confd, at least initially, to reduce the
136 c6d992c5 Iustin Pop
proliferation of daemons. Haskell will limit (if used properly) any too
137 c6d992c5 Iustin Pop
deep integration between the old "confd" functionality and the new query
138 c6d992c5 Iustin Pop
one. As advantages, we'll have a single daemons that handles
139 c6d992c5 Iustin Pop
configuration queries.
140 c6d992c5 Iustin Pop
141 c6d992c5 Iustin Pop
The redirection of Luxi requests can be easily done based on the
142 c6d992c5 Iustin Pop
request type, if we have both sockets open, or if we open on demand.
143 c6d992c5 Iustin Pop
144 c6d992c5 Iustin Pop
We don't want the masterd to talk to the queryd itself (hidden
145 c6d992c5 Iustin Pop
redirection), since we want to be able to run queries while masterd is
146 c6d992c5 Iustin Pop
down.
147 c6d992c5 Iustin Pop
148 c6d992c5 Iustin Pop
During the 2.7 release cycle, we can test all queries against both
149 c6d992c5 Iustin Pop
masterd and queryd in QA, so we know we have exactly the same
150 c6d992c5 Iustin Pop
interface and it is consistent.
151 c6d992c5 Iustin Pop
152 c6d992c5 Iustin Pop
.. vim: set textwidth=72 :
153 c6d992c5 Iustin Pop
.. Local Variables:
154 c6d992c5 Iustin Pop
.. mode: rst
155 c6d992c5 Iustin Pop
.. fill-column: 72
156 c6d992c5 Iustin Pop
.. End: