Statistics
| Branch: | Tag: | Revision:

root / doc / design-2.2.rst @ 6e56e84a

History | View | Annotate | Download (7.2 kB)

1
=================
2
Ganeti 2.2 design
3
=================
4

    
5
This document describes the major changes in Ganeti 2.2 compared to
6
the 2.1 version.
7

    
8
The 2.2 version will be a relatively small release. Its main aim is to
9
avoid changing too much of the core code, while addressing issues and
10
adding new features and improvements over 2.1, in a timely fashion.
11

    
12
.. contents:: :depth: 4
13

    
14
Objective
15
=========
16

    
17
Background
18
==========
19

    
20
Overview
21
========
22

    
23
Detailed design
24
===============
25

    
26
As for 2.1 we divide the 2.2 design into three areas:
27

    
28
- core changes, which affect the master daemon/job queue/locking or
29
  all/most logical units
30
- logical unit/feature changes
31
- external interface changes (eg. command line, os api, hooks, ...)
32

    
33
Core changes
34
------------
35

    
36
Remote procedure call timeouts
37
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
38

    
39
Current state and shortcomings
40
++++++++++++++++++++++++++++++
41

    
42
The current RPC protocol used by Ganeti is based on HTTP. Every request
43
consists of an HTTP PUT request (e.g. ``PUT /hooks_runner HTTP/1.0``)
44
and doesn't return until the function called has returned. Parameters
45
and return values are encoded using JSON.
46

    
47
On the server side, ``ganeti-noded`` handles every incoming connection
48
in a separate process by forking just after accepting the connection.
49
This process exits after sending the response.
50

    
51
There is one major problem with this design: Timeouts can not be used on
52
a per-request basis. Neither client or server know how long it will
53
take. Even if we might be able to group requests into different
54
categories (e.g. fast and slow), this is not reliable.
55

    
56
If a node has an issue or the network connection fails while a request
57
is being handled, the master daemon can wait for a long time for the
58
connection to time out (e.g. due to the operating system's underlying
59
TCP keep-alive packets or timeouts). While the settings for keep-alive
60
packets can be changed using Linux-specific socket options, we prefer to
61
use application-level timeouts because these cover both machine down and
62
unresponsive node daemon cases.
63

    
64
Proposed changes
65
++++++++++++++++
66

    
67
RPC glossary
68
^^^^^^^^^^^^
69

    
70
Function call ID
71
  Unique identifier returned by ``ganeti-noded`` after invoking a
72
  function.
73
Function process
74
  Process started by ``ganeti-noded`` to call actual (backend) function.
75

    
76
Protocol
77
^^^^^^^^
78

    
79
Initially we chose HTTP as our RPC protocol because there were existing
80
libraries, which, unfortunately, turned out to miss important features
81
(such as SSL certificate authentication) and we had to write our own.
82

    
83
This proposal can easily be implemented using HTTP, though it would
84
likely be more efficient and less complicated to use the LUXI protocol
85
already used to communicate between client tools and the Ganeti master
86
daemon. Switching to another protocol can occur at a later point. This
87
proposal should be implemented using HTTP as its underlying protocol.
88

    
89
The LUXI protocol currently contains two functions, ``WaitForJobChange``
90
and ``AutoArchiveJobs``, which can take a longer time. They both support
91
a parameter to specify the timeout. This timeout is usually chosen as
92
roughly half of the socket timeout, guaranteeing a response before the
93
socket times out. After the specified amount of time,
94
``AutoArchiveJobs`` returns and reports the number of archived jobs.
95
``WaitForJobChange`` returns and reports a timeout. In both cases, the
96
functions can be called again.
97

    
98
A similar model can be used for the inter-node RPC protocol. In some
99
sense, the node daemon will implement a light variant of *"node daemon
100
jobs"*. When the function call is sent, it specifies an initial timeout.
101
If the function didn't finish within this timeout, a response is sent
102
with a unique identifier, the function call ID. The client can then
103
choose to wait for the function to finish again with a timeout.
104
Inter-node RPC calls would no longer be blocking indefinitely and there
105
would be an implicit ping-mechanism.
106

    
107
Request handling
108
^^^^^^^^^^^^^^^^
109

    
110
To support the protocol changes described above, the way the node daemon
111
handles request will have to change. Instead of forking and handling
112
every connection in a separate process, there should be one child
113
process per function call and the master process will handle the
114
communication with clients and the function processes using asynchronous
115
I/O.
116

    
117
Function processes communicate with the parent process via stdio and
118
possibly their exit status. Every function process has a unique
119
identifier, though it shouldn't be the process ID only (PIDs can be
120
recycled and are prone to race conditions for this use case). The
121
proposed format is ``${ppid}:${cpid}:${time}:${random}``, where ``ppid``
122
is the ``ganeti-noded`` PID, ``cpid`` the child's PID, ``time`` the
123
current Unix timestamp with decimal places and ``random`` at least 16
124
random bits.
125

    
126
The following operations will be supported:
127

    
128
``StartFunction(fn_name, fn_args, timeout)``
129
  Starts a function specified by ``fn_name`` with arguments in
130
  ``fn_args`` and waits up to ``timeout`` seconds for the function
131
  to finish. Fire-and-forget calls can be made by specifying a timeout
132
  of 0 seconds (e.g. for powercycling the node). Returns three values:
133
  function call ID (if not finished), whether function finished (or
134
  timeout) and the function's return value.
135
``WaitForFunction(fnc_id, timeout)``
136
  Waits up to ``timeout`` seconds for function call to finish. Return
137
  value same as ``StartFunction``.
138

    
139
In the future, ``StartFunction`` could support an additional parameter
140
to specify after how long the function process should be aborted.
141

    
142
Simplified timing diagram::
143

    
144
  Master daemon        Node daemon                      Function process
145
   |
146
  Call function
147
  (timeout 10s) -----> Parse request and fork for ----> Start function
148
                       calling actual function, then     |
149
                       wait up to 10s for function to    |
150
                       finish                            |
151
                        |                                |
152
                       ...                              ...
153
                        |                                |
154
  Examine return <----  |                                |
155
  value and wait                                         |
156
  again -------------> Wait another 10s for function     |
157
                        |                                |
158
                       ...                              ...
159
                        |                                |
160
  Examine return <----  |                                |
161
  value and wait                                         |
162
  again -------------> Wait another 10s for function     |
163
                        |                                |
164
                       ...                              ...
165
                        |                                |
166
                        |                               Function ends,
167
                       Get return value and forward <-- process exits
168
  Process return <---- it to caller
169
  value and continue
170
   |
171

    
172
.. TODO: Convert diagram above to graphviz/dot graphic
173

    
174
On process termination (e.g. after having been sent a ``SIGTERM`` or
175
``SIGINT`` signal), ``ganeti-noded`` should send ``SIGTERM`` to all
176
function processes and wait for all of them to terminate.
177

    
178

    
179
Feature changes
180
---------------
181

    
182
External interface changes
183
--------------------------
184

    
185
.. vim: set textwidth=72 :