root / doc / design-query-splitting.rst @ 8fa74099
History | View | Annotate | Download (6.5 kB)
1 | c6d992c5 | Iustin Pop | =========================================== |
---|---|---|---|
2 | c6d992c5 | Iustin Pop | Splitting the query and job execution paths |
3 | c6d992c5 | Iustin Pop | =========================================== |
4 | c6d992c5 | Iustin Pop | |
5 | c6d992c5 | Iustin Pop | |
6 | c6d992c5 | Iustin Pop | Introduction |
7 | c6d992c5 | Iustin Pop | ============ |
8 | c6d992c5 | Iustin Pop | |
9 | c6d992c5 | Iustin Pop | Currently, the master daemon does two main roles: |
10 | c6d992c5 | Iustin Pop | |
11 | c6d992c5 | Iustin Pop | - execute jobs that change the cluster state |
12 | c6d992c5 | Iustin Pop | - respond to queries |
13 | c6d992c5 | Iustin Pop | |
14 | c6d992c5 | Iustin Pop | Due to the technical details of the implementation, the job execution |
15 | c6d992c5 | Iustin Pop | and query paths interact with each other, and for example the "masterd |
16 | c6d992c5 | Iustin Pop | hang" issue that we had late in the 2.5 release cycle was due to the |
17 | c6d992c5 | Iustin Pop | interaction between job queries and job execution. |
18 | c6d992c5 | Iustin Pop | |
19 | c6d992c5 | Iustin Pop | Furthermore, also because technical implementations (Python lacking |
20 | c6d992c5 | Iustin Pop | read-only variables being one example), we can't share internal data |
21 | c6d992c5 | Iustin Pop | structures for jobs; instead, in the query path, we read them from |
22 | c6d992c5 | Iustin Pop | disk in order to not block job execution due to locks. |
23 | c6d992c5 | Iustin Pop | |
24 | c6d992c5 | Iustin Pop | All these point to the fact that the integration of both queries and |
25 | c6d992c5 | Iustin Pop | job execution in the same process (multi-threaded) creates more |
26 | c6d992c5 | Iustin Pop | problems than advantages, and hence we should look into separating |
27 | c6d992c5 | Iustin Pop | them. |
28 | c6d992c5 | Iustin Pop | |
29 | c6d992c5 | Iustin Pop | |
30 | c6d992c5 | Iustin Pop | Proposed design |
31 | c6d992c5 | Iustin Pop | =============== |
32 | c6d992c5 | Iustin Pop | |
33 | c6d992c5 | Iustin Pop | In Ganeti 2.7, we will introduce a separate, optional daemon to handle |
34 | c6d992c5 | Iustin Pop | queries (note: whether this is an actual "new" daemon, or its |
35 | c6d992c5 | Iustin Pop | functionality is folded into confd, remains to be seen). |
36 | c6d992c5 | Iustin Pop | |
37 | c6d992c5 | Iustin Pop | This daemon will expose exactly the same Luxi interface as masterd, |
38 | c6d992c5 | Iustin Pop | except that job submission will be disabled. If so configured (at |
39 | c6d992c5 | Iustin Pop | build time), clients will be changed to: |
40 | c6d992c5 | Iustin Pop | |
41 | c6d992c5 | Iustin Pop | - keep sending REQ_SUBMIT_JOB, REQ_SUBMIT_MANY_JOBS, and all requests |
42 | c6d992c5 | Iustin Pop | except REQ_QUERY_* to the masterd socket (but also QR_LOCK) |
43 | c6d992c5 | Iustin Pop | - redirect all REQ_QUERY_* requests to the new Luxi socket of the new |
44 | c6d992c5 | Iustin Pop | daemon (except generic query with QR_LOCK) |
45 | c6d992c5 | Iustin Pop | |
46 | c6d992c5 | Iustin Pop | This new daemon will serve both pure configuration queries (which |
47 | c6d992c5 | Iustin Pop | confd can already serve), and run-time queries (which currently only |
48 | c6d992c5 | Iustin Pop | masterd can serve). Since the RPC can be done from any node to any |
49 | c6d992c5 | Iustin Pop | node, the new daemon can run on all master candidates, not only on the |
50 | c6d992c5 | Iustin Pop | master node. This means that all gnt-* list options can be now run on |
51 | c6d992c5 | Iustin Pop | other nodes than the master node. If we implement this as a separate |
52 | c6d992c5 | Iustin Pop | daemon that talks to confd, then we could actually run this on all |
53 | c6d992c5 | Iustin Pop | nodes of the cluster (to be decided). |
54 | c6d992c5 | Iustin Pop | |
55 | c6d992c5 | Iustin Pop | During the 2.7 release, masterd will still respond to queries itself, |
56 | c6d992c5 | Iustin Pop | but it will log all such queries for identification of "misbehaving" |
57 | c6d992c5 | Iustin Pop | clients. |
58 | c6d992c5 | Iustin Pop | |
59 | c6d992c5 | Iustin Pop | Advantages |
60 | c6d992c5 | Iustin Pop | ---------- |
61 | c6d992c5 | Iustin Pop | |
62 | c6d992c5 | Iustin Pop | As far as I can see, this will bring some significant advantages. |
63 | c6d992c5 | Iustin Pop | |
64 | c6d992c5 | Iustin Pop | First, we remove any interaction between the job execution and cluster |
65 | c6d992c5 | Iustin Pop | query state. This means that bugs in the locking code (job execution) |
66 | c6d992c5 | Iustin Pop | will not impact the query of the cluster state, nor the query of the |
67 | c6d992c5 | Iustin Pop | job execution itself. Furthermore, we will be able to have different |
68 | c6d992c5 | Iustin Pop | tuning parameters between job execution (e.g. 25 threads for job |
69 | c6d992c5 | Iustin Pop | execution) versus query (since these are transient, we could |
70 | c6d992c5 | Iustin Pop | practically have unlimited numbers of query threads). |
71 | c6d992c5 | Iustin Pop | |
72 | c6d992c5 | Iustin Pop | As a result of the above split, we move from the current model, where |
73 | c6d992c5 | Iustin Pop | shutdown of the master daemon practically "breaks" the entire Ganeti |
74 | c6d992c5 | Iustin Pop | functionality (no job execution nor queries, not even connecting to |
75 | c6d992c5 | Iustin Pop | the instance console), to a split model: |
76 | c6d992c5 | Iustin Pop | |
77 | c6d992c5 | Iustin Pop | - if just masterd is stopped, then other cluster functionality remains |
78 | c6d992c5 | Iustin Pop | available: listing instances, connecting to the console of an |
79 | c6d992c5 | Iustin Pop | instance, etc. |
80 | c6d992c5 | Iustin Pop | - if just "queryd" is stopped, masterd can still process jobs, and one |
81 | c6d992c5 | Iustin Pop | can furthermore run queries from other nodes (MCs) |
82 | c6d992c5 | Iustin Pop | - only if both are stopped, we end up with the previous state |
83 | c6d992c5 | Iustin Pop | |
84 | c6d992c5 | Iustin Pop | This will help, for example, in the case where the master node has |
85 | c6d992c5 | Iustin Pop | crashed and we haven't failed it over yet: querying and investigating |
86 | c6d992c5 | Iustin Pop | the cluster state will still be possible from other master candidates |
87 | c6d992c5 | Iustin Pop | (on small clusters, this will mean from all nodes). |
88 | c6d992c5 | Iustin Pop | |
89 | c6d992c5 | Iustin Pop | A last advantage is that we finally will be able to reduce the |
90 | c6d992c5 | Iustin Pop | footprint of masterd; instead of previous discussion of splitting |
91 | c6d992c5 | Iustin Pop | individual jobs, which requires duplication of all the base |
92 | c6d992c5 | Iustin Pop | functionality, this will just split the queries, a more trivial piece |
93 | c6d992c5 | Iustin Pop | of code than job execution. This should be a reasonable work effort, |
94 | c6d992c5 | Iustin Pop | with a much smaller impact in case of failure (we can still run |
95 | c6d992c5 | Iustin Pop | masterd as before). |
96 | c6d992c5 | Iustin Pop | |
97 | c6d992c5 | Iustin Pop | Disadvantages |
98 | c6d992c5 | Iustin Pop | ------------- |
99 | c6d992c5 | Iustin Pop | |
100 | c6d992c5 | Iustin Pop | We might get increased inconsistency during queries, as there will be |
101 | c6d992c5 | Iustin Pop | a delay between masterd saving an updated configuration and |
102 | c6d992c5 | Iustin Pop | confd/query loading and parsing it. However, this could be compensated |
103 | c6d992c5 | Iustin Pop | by the fact that queries will only look at "snapshots" of the |
104 | c6d992c5 | Iustin Pop | configuration, whereas before it could also look at "in-progress" |
105 | c6d992c5 | Iustin Pop | modifications (due to the non-atomic updates). I think these will |
106 | c6d992c5 | Iustin Pop | cancel each other out, we will have to see in practice how it works. |
107 | c6d992c5 | Iustin Pop | |
108 | c6d992c5 | Iustin Pop | Another disadvantage *might* be that we have a more complex setup, due |
109 | c6d992c5 | Iustin Pop | to the introduction of a new daemon. However, the query path will be |
110 | c6d992c5 | Iustin Pop | much simpler, and when we remove the query functionality from masterd |
111 | c6d992c5 | Iustin Pop | we should have a more robust system. |
112 | c6d992c5 | Iustin Pop | |
113 | c6d992c5 | Iustin Pop | Finally, we have QR_LOCK, which is an internal query related to the |
114 | c6d992c5 | Iustin Pop | master daemon, using the same infrastructure as the other queries |
115 | c6d992c5 | Iustin Pop | (related to cluster state). This is unfortunate, and will require |
116 | c6d992c5 | Iustin Pop | untangling in order to keep code duplication low. |
117 | c6d992c5 | Iustin Pop | |
118 | c6d992c5 | Iustin Pop | Long-term plans |
119 | c6d992c5 | Iustin Pop | =============== |
120 | c6d992c5 | Iustin Pop | |
121 | c6d992c5 | Iustin Pop | If this works well, the plan would be (tentatively) to disable the |
122 | c6d992c5 | Iustin Pop | query functionality in masterd completely in Ganeti 2.8, in order to |
123 | c6d992c5 | Iustin Pop | remove the duplication. This might change based on how/if we split the |
124 | c6d992c5 | Iustin Pop | configuration/locking daemon out, or not. |
125 | c6d992c5 | Iustin Pop | |
126 | c6d992c5 | Iustin Pop | Once we split this out, there is not technical reason why we can't |
127 | c6d992c5 | Iustin Pop | execute any query from any node; except maybe practical reasons |
128 | c6d992c5 | Iustin Pop | (network topology, remote nodes, etc.) or security reasons (if/whether |
129 | c6d992c5 | Iustin Pop | we want to change the cluster security model). In any case, it should |
130 | c6d992c5 | Iustin Pop | be possible to do this in a reliable way from all master candidates. |
131 | c6d992c5 | Iustin Pop | |
132 | c6d992c5 | Iustin Pop | Some implementation details |
133 | c6d992c5 | Iustin Pop | --------------------------- |
134 | c6d992c5 | Iustin Pop | |
135 | c6d992c5 | Iustin Pop | We will fold this in confd, at least initially, to reduce the |
136 | c6d992c5 | Iustin Pop | proliferation of daemons. Haskell will limit (if used properly) any too |
137 | c6d992c5 | Iustin Pop | deep integration between the old "confd" functionality and the new query |
138 | c6d992c5 | Iustin Pop | one. As advantages, we'll have a single daemons that handles |
139 | c6d992c5 | Iustin Pop | configuration queries. |
140 | c6d992c5 | Iustin Pop | |
141 | c6d992c5 | Iustin Pop | The redirection of Luxi requests can be easily done based on the |
142 | c6d992c5 | Iustin Pop | request type, if we have both sockets open, or if we open on demand. |
143 | c6d992c5 | Iustin Pop | |
144 | c6d992c5 | Iustin Pop | We don't want the masterd to talk to the queryd itself (hidden |
145 | c6d992c5 | Iustin Pop | redirection), since we want to be able to run queries while masterd is |
146 | c6d992c5 | Iustin Pop | down. |
147 | c6d992c5 | Iustin Pop | |
148 | c6d992c5 | Iustin Pop | During the 2.7 release cycle, we can test all queries against both |
149 | c6d992c5 | Iustin Pop | masterd and queryd in QA, so we know we have exactly the same |
150 | c6d992c5 | Iustin Pop | interface and it is consistent. |
151 | c6d992c5 | Iustin Pop | |
152 | c6d992c5 | Iustin Pop | .. vim: set textwidth=72 : |
153 | c6d992c5 | Iustin Pop | .. Local Variables: |
154 | c6d992c5 | Iustin Pop | .. mode: rst |
155 | c6d992c5 | Iustin Pop | .. fill-column: 72 |
156 | c6d992c5 | Iustin Pop | .. End: |