root / doc / design-monitoring-agent.rst @ fae96b7c
History | View | Annotate | Download (25.1 kB)
1 | 109e07c2 | Guido Trotter | ======================= |
---|---|---|---|
2 | 109e07c2 | Guido Trotter | Ganeti monitoring agent |
3 | 109e07c2 | Guido Trotter | ======================= |
4 | 109e07c2 | Guido Trotter | |
5 | 109e07c2 | Guido Trotter | .. contents:: :depth: 4 |
6 | 109e07c2 | Guido Trotter | |
7 | 109e07c2 | Guido Trotter | This is a design document detailing the implementation of a Ganeti |
8 | 109e07c2 | Guido Trotter | monitoring agent report system, that can be queried by a monitoring |
9 | 109e07c2 | Guido Trotter | system to calculate health information for a Ganeti cluster. |
10 | 109e07c2 | Guido Trotter | |
11 | 109e07c2 | Guido Trotter | Current state and shortcomings |
12 | 109e07c2 | Guido Trotter | ============================== |
13 | 109e07c2 | Guido Trotter | |
14 | 109e07c2 | Guido Trotter | There is currently no monitoring support in Ganeti. While we don't want |
15 | 109e07c2 | Guido Trotter | to build something like Nagios or Pacemaker as part of Ganeti, it would |
16 | 109e07c2 | Guido Trotter | be useful if such tools could easily extract information from a Ganeti |
17 | 109e07c2 | Guido Trotter | machine in order to take actions (example actions include logging an |
18 | 109e07c2 | Guido Trotter | outage for future reporting or alerting a person or system about it). |
19 | 109e07c2 | Guido Trotter | |
20 | 109e07c2 | Guido Trotter | Proposed changes |
21 | 109e07c2 | Guido Trotter | ================ |
22 | 109e07c2 | Guido Trotter | |
23 | 109e07c2 | Guido Trotter | Each Ganeti node should export a status page that can be queried by a |
24 | 109e07c2 | Guido Trotter | monitoring system. Such status page will be exported on a network port |
25 | 109e07c2 | Guido Trotter | and will be encoded in JSON (simple text) over HTTP. |
26 | 109e07c2 | Guido Trotter | |
27 | 3301805f | Michele Tartara | The choice of JSON is obvious as we already depend on it in Ganeti and |
28 | 109e07c2 | Guido Trotter | thus we don't need to add extra libraries to use it, as opposed to what |
29 | 109e07c2 | Guido Trotter | would happen for XML or some other markup format. |
30 | 109e07c2 | Guido Trotter | |
31 | 109e07c2 | Guido Trotter | Location of agent report |
32 | 109e07c2 | Guido Trotter | ------------------------ |
33 | 109e07c2 | Guido Trotter | |
34 | 109e07c2 | Guido Trotter | The report will be available from all nodes, and be concerned for all |
35 | 109e07c2 | Guido Trotter | node-local resources. This allows more real-time information to be |
36 | 109e07c2 | Guido Trotter | available, at the cost of querying all nodes. |
37 | 109e07c2 | Guido Trotter | |
38 | 109e07c2 | Guido Trotter | Information reported |
39 | 109e07c2 | Guido Trotter | -------------------- |
40 | 109e07c2 | Guido Trotter | |
41 | 109e07c2 | Guido Trotter | The monitoring agent system will report on the following basic information: |
42 | 109e07c2 | Guido Trotter | |
43 | 109e07c2 | Guido Trotter | - Instance status |
44 | 109e07c2 | Guido Trotter | - Instance disk status |
45 | 109e07c2 | Guido Trotter | - Status of storage for instances |
46 | 109e07c2 | Guido Trotter | - Ganeti daemons status, CPU usage, memory footprint |
47 | 109e07c2 | Guido Trotter | - Hypervisor resources report (memory, CPU, network interfaces) |
48 | 109e07c2 | Guido Trotter | - Node OS resources report (memory, CPU, network interfaces) |
49 | 109e07c2 | Guido Trotter | - Information from a plugin system |
50 | 109e07c2 | Guido Trotter | |
51 | 3301805f | Michele Tartara | Format of the report |
52 | 3301805f | Michele Tartara | -------------------- |
53 | 3301805f | Michele Tartara | |
54 | 3301805f | Michele Tartara | The report of the will be in JSON format, and it will present an array |
55 | 3301805f | Michele Tartara | of report objects. |
56 | 3301805f | Michele Tartara | Each report object will be produced by a specific data collector. |
57 | 3301805f | Michele Tartara | Each report object includes some mandatory fields, to be provided by all |
58 | 3301805f | Michele Tartara | the data collectors: |
59 | 3301805f | Michele Tartara | |
60 | 3301805f | Michele Tartara | ``name`` |
61 | 3301805f | Michele Tartara | The name of the data collector that produced this part of the report. |
62 | 3301805f | Michele Tartara | It is supposed to be unique inside a report. |
63 | 3301805f | Michele Tartara | |
64 | 3301805f | Michele Tartara | ``version`` |
65 | 3301805f | Michele Tartara | The version of the data collector that produces this part of the |
66 | 3301805f | Michele Tartara | report. Built-in data collectors (as opposed to those implemented as |
67 | 3301805f | Michele Tartara | plugins) should have "B" as the version number. |
68 | 3301805f | Michele Tartara | |
69 | 834dc290 | Michele Tartara | ``format_version`` |
70 | 3301805f | Michele Tartara | The format of what is represented in the "data" field for each data |
71 | 3301805f | Michele Tartara | collector might change over time. Every time this happens, the |
72 | 3301805f | Michele Tartara | format_version should be changed, so that who reads the report knows |
73 | 3301805f | Michele Tartara | what format to expect, and how to correctly interpret it. |
74 | 3301805f | Michele Tartara | |
75 | 3301805f | Michele Tartara | ``timestamp`` |
76 | 0e8d8384 | Michele Tartara | The time when the reported data were gathered. It has to be expressed |
77 | 3301805f | Michele Tartara | in nanoseconds since the unix epoch (0:00:00 January 01, 1970). If not |
78 | 3301805f | Michele Tartara | enough precision is available (or needed) it can be padded with |
79 | 3301805f | Michele Tartara | zeroes. If a report object needs multiple timestamps, it can add more |
80 | 3301805f | Michele Tartara | and/or override this one inside its own "data" section. |
81 | 3301805f | Michele Tartara | |
82 | 3301805f | Michele Tartara | ``category`` |
83 | 3301805f | Michele Tartara | A collector can belong to a given category of collectors (e.g.: storage |
84 | 3301805f | Michele Tartara | collectors, daemon collector). This means that it will have to provide a |
85 | 3301805f | Michele Tartara | minumum set of prescribed fields, as documented for each category. |
86 | 3301805f | Michele Tartara | This field will contain the name of the category the collector belongs to, |
87 | 3301805f | Michele Tartara | if any, or just the ``null`` value. |
88 | 3301805f | Michele Tartara | |
89 | 3301805f | Michele Tartara | ``kind`` |
90 | 3301805f | Michele Tartara | Two kinds of collectors are possible: |
91 | 3301805f | Michele Tartara | `Performance reporting collectors`_ and `Status reporting collectors`_. |
92 | 3301805f | Michele Tartara | The respective paragraphs will describe them and the value of this field. |
93 | 3301805f | Michele Tartara | |
94 | 3301805f | Michele Tartara | ``data`` |
95 | 3301805f | Michele Tartara | This field contains all the data generated by the specific data collector, |
96 | 3301805f | Michele Tartara | in its own independently defined format. The monitoring agent could check |
97 | 3301805f | Michele Tartara | this syntactically (according to the JSON specifications) but not |
98 | 3301805f | Michele Tartara | semantically. |
99 | 3301805f | Michele Tartara | |
100 | 3301805f | Michele Tartara | Here follows a minimal example of a report:: |
101 | 3301805f | Michele Tartara | |
102 | 3301805f | Michele Tartara | [ |
103 | 3301805f | Michele Tartara | { |
104 | 3301805f | Michele Tartara | "name" : "TheCollectorIdentifier", |
105 | 3301805f | Michele Tartara | "version" : "1.2", |
106 | 834dc290 | Michele Tartara | "format_version" : 1, |
107 | 3301805f | Michele Tartara | "timestamp" : 1351607182000000000, |
108 | 3301805f | Michele Tartara | "category" : null, |
109 | 3301805f | Michele Tartara | "kind" : 0, |
110 | 3301805f | Michele Tartara | "data" : { "plugin_specific_data" : "go_here" } |
111 | 3301805f | Michele Tartara | }, |
112 | 3301805f | Michele Tartara | { |
113 | 3301805f | Michele Tartara | "name" : "AnotherDataCollector", |
114 | 3301805f | Michele Tartara | "version" : "B", |
115 | 834dc290 | Michele Tartara | "format_version" : 7, |
116 | 3301805f | Michele Tartara | "timestamp" : 1351609526123854000, |
117 | 3301805f | Michele Tartara | "category" : "storage", |
118 | 3301805f | Michele Tartara | "kind" : 1, |
119 | 3301805f | Michele Tartara | "data" : { "status" : { "code" : 1, |
120 | 3301805f | Michele Tartara | "message" : "Error on disk 2" |
121 | 3301805f | Michele Tartara | }, |
122 | 3301805f | Michele Tartara | "plugin_specific" : "data", |
123 | 3301805f | Michele Tartara | "some_late_data" : { "timestamp" : 1351609526123942720, |
124 | 3301805f | Michele Tartara | ... |
125 | 3301805f | Michele Tartara | } |
126 | 3301805f | Michele Tartara | } |
127 | 3301805f | Michele Tartara | } |
128 | 3301805f | Michele Tartara | ] |
129 | 3301805f | Michele Tartara | |
130 | 3301805f | Michele Tartara | Performance reporting collectors |
131 | 3301805f | Michele Tartara | ++++++++++++++++++++++++++++++++ |
132 | 3301805f | Michele Tartara | |
133 | 3301805f | Michele Tartara | These collectors only provide data about some component of the system, without |
134 | 3301805f | Michele Tartara | giving any interpretation over their meaning. |
135 | 3301805f | Michele Tartara | |
136 | 3301805f | Michele Tartara | The value of the ``kind`` field of the report will be ``0``. |
137 | 3301805f | Michele Tartara | |
138 | 3301805f | Michele Tartara | Status reporting collectors |
139 | 3301805f | Michele Tartara | +++++++++++++++++++++++++++ |
140 | 3301805f | Michele Tartara | |
141 | 3301805f | Michele Tartara | These collectors will provide information about the status of some |
142 | 3301805f | Michele Tartara | component of ganeti, or managed by ganeti. |
143 | 3301805f | Michele Tartara | |
144 | 3301805f | Michele Tartara | The value of their ``kind`` field will be ``1``. |
145 | 3301805f | Michele Tartara | |
146 | 3301805f | Michele Tartara | The rationale behind this kind of collectors is that there are some situations |
147 | 3301805f | Michele Tartara | where exporting data about the underlying subsystems would expose potential |
148 | 3301805f | Michele Tartara | issues. But if Ganeti itself is able (and going) to fix the problem, conflicts |
149 | 3301805f | Michele Tartara | might arise between Ganeti and something/somebody else trying to fix the same |
150 | 3301805f | Michele Tartara | problem. |
151 | 3301805f | Michele Tartara | Also, some external monitoring systems might not be aware of the internals of a |
152 | 3301805f | Michele Tartara | particular subsystem (e.g.: DRBD) and might only exploit the high level |
153 | 3301805f | Michele Tartara | response of its data collector, alerting an administrator if anything is wrong. |
154 | 3301805f | Michele Tartara | Still, completely hiding the underlying data is not a good idea, as they might |
155 | 3301805f | Michele Tartara | still be of use in some cases. So status reporting plugins will provide two |
156 | 3301805f | Michele Tartara | output modes: one just exporting a high level information about the status, |
157 | 3301805f | Michele Tartara | and one also exporting all the data they gathered. |
158 | 3301805f | Michele Tartara | The default output mode will be the status-only one. Through a command line |
159 | 3301805f | Michele Tartara | parameter (for stand-alone data collectors) or through the HTTP request to the |
160 | 3301805f | Michele Tartara | monitoring agent |
161 | 3301805f | Michele Tartara | (when collectors are executed as part of it) the verbose output mode providing |
162 | 3301805f | Michele Tartara | all the data can be selected. |
163 | 3301805f | Michele Tartara | |
164 | 3301805f | Michele Tartara | When exporting just the status each status reporting collector will provide, |
165 | 3301805f | Michele Tartara | in its ``data`` section, at least the following field: |
166 | 3301805f | Michele Tartara | |
167 | 3301805f | Michele Tartara | ``status`` |
168 | 3301805f | Michele Tartara | summarizes the status of the component being monitored and consists of two |
169 | 3301805f | Michele Tartara | subfields: |
170 | 3301805f | Michele Tartara | |
171 | 3301805f | Michele Tartara | ``code`` |
172 | 3301805f | Michele Tartara | It assumes a numeric value, encoded in such a way to allow using a bitset |
173 | 88e23508 | Michele Tartara | to easily distinguish which states are currently present in the whole |
174 | 88e23508 | Michele Tartara | cluster. If the bitwise OR of all the ``status`` fields is 0, the cluster |
175 | 88e23508 | Michele Tartara | is completely healty. |
176 | 3301805f | Michele Tartara | The status codes are as follows: |
177 | 3301805f | Michele Tartara | |
178 | 3301805f | Michele Tartara | ``0`` |
179 | 3301805f | Michele Tartara | The collector can determine that everything is working as |
180 | 3301805f | Michele Tartara | intended. |
181 | 3301805f | Michele Tartara | |
182 | 3301805f | Michele Tartara | ``1`` |
183 | 3301805f | Michele Tartara | Something is temporarily wrong but it is being automatically fixed by |
184 | 3301805f | Michele Tartara | Ganeti. |
185 | 3301805f | Michele Tartara | There is no need of external intervention. |
186 | 3301805f | Michele Tartara | |
187 | 3301805f | Michele Tartara | ``2`` |
188 | 3301805f | Michele Tartara | The collector has failed to understand whether the status is good or |
189 | 3301805f | Michele Tartara | bad. Further analysis is required. Interpret this status as a |
190 | 3301805f | Michele Tartara | potentially dangerous situation. |
191 | 3301805f | Michele Tartara | |
192 | 82437b28 | Michele Tartara | ``4`` |
193 | 82437b28 | Michele Tartara | The collector can determine that something is wrong and Ganeti has no |
194 | 82437b28 | Michele Tartara | way to fix it autonomously. External intervention is required. |
195 | 82437b28 | Michele Tartara | |
196 | 3301805f | Michele Tartara | ``message`` |
197 | 3301805f | Michele Tartara | A message to better explain the reason of the status. |
198 | 3301805f | Michele Tartara | The exact format of the message string is data collector dependent. |
199 | 3301805f | Michele Tartara | |
200 | debfca88 | Michele Tartara | The field is mandatory, but the content can be an empty string if the |
201 | debfca88 | Michele Tartara | ``code`` is ``0`` (working as intended) or ``1`` (being fixed |
202 | debfca88 | Michele Tartara | automatically). |
203 | 3301805f | Michele Tartara | |
204 | 3301805f | Michele Tartara | If the status code is ``2``, the message should specify what has gone |
205 | 3301805f | Michele Tartara | wrong. |
206 | 3301805f | Michele Tartara | If the status code is ``4``, the message shoud explain why it was not |
207 | 3301805f | Michele Tartara | possible to determine a proper status. |
208 | 3301805f | Michele Tartara | |
209 | 3301805f | Michele Tartara | The ``data`` section will also contain all the fields describing the gathered |
210 | 3301805f | Michele Tartara | data, according to a collector-specific format. |
211 | 3301805f | Michele Tartara | |
212 | 109e07c2 | Guido Trotter | Instance status |
213 | 109e07c2 | Guido Trotter | +++++++++++++++ |
214 | 109e07c2 | Guido Trotter | |
215 | 109e07c2 | Guido Trotter | At the moment each node knows which instances are running on it, which |
216 | 109e07c2 | Guido Trotter | instances it is primary for, but not the cause why an instance might not |
217 | 109e07c2 | Guido Trotter | be running. On the other hand we don't want to distribute full instance |
218 | 109e07c2 | Guido Trotter | "admin" status information to all nodes, because of the performance |
219 | 109e07c2 | Guido Trotter | impact this would have. |
220 | 109e07c2 | Guido Trotter | |
221 | 109e07c2 | Guido Trotter | As such we propose that: |
222 | 109e07c2 | Guido Trotter | |
223 | 109e07c2 | Guido Trotter | - Any operation that can affect instance status will have an optional |
224 | 109e07c2 | Guido Trotter | "reason" attached to it (at opcode level). This can be used for |
225 | 109e07c2 | Guido Trotter | example to distinguish an admin request, from a scheduled maintenance |
226 | 109e07c2 | Guido Trotter | or an automated tool's work. If this reason is not passed, Ganeti will |
227 | 2bd9ec7c | Michele Tartara | just use the information it has about the source of the request. |
228 | 2bd9ec7c | Michele Tartara | This reason information will be structured according to the |
229 | 2bd9ec7c | Michele Tartara | :doc:`Ganeti reason trail <design-reason-trail>` design document. |
230 | 109e07c2 | Guido Trotter | - RPCs that affect the instance status will be changed so that the |
231 | 109e07c2 | Guido Trotter | "reason" and the version of the config object they ran on is passed to |
232 | 109e07c2 | Guido Trotter | them. They will then export the new expected instance status, together |
233 | 109e07c2 | Guido Trotter | with the associated reason and object version to the status report |
234 | 109e07c2 | Guido Trotter | system, which then will export those themselves. |
235 | 109e07c2 | Guido Trotter | |
236 | 109e07c2 | Guido Trotter | Monitoring and auditing systems can then use the reason to understand |
237 | 3301805f | Michele Tartara | the cause of an instance status, and they can use the timestamp to |
238 | 109e07c2 | Guido Trotter | understand the freshness of their data even in the absence of an atomic |
239 | 109e07c2 | Guido Trotter | cross-node reporting: for example if they see an instance "up" on a node |
240 | 109e07c2 | Guido Trotter | after seeing it running on a previous one, they can compare these values |
241 | 109e07c2 | Guido Trotter | to understand which data is freshest, and repoll the "older" node. Of |
242 | 109e07c2 | Guido Trotter | course if they keep seeing this status this represents an error (either |
243 | 109e07c2 | Guido Trotter | an instance continuously "flapping" between nodes, or an instance is |
244 | 109e07c2 | Guido Trotter | constantly up on more than one), which should be reported and acted |
245 | 109e07c2 | Guido Trotter | upon. |
246 | 109e07c2 | Guido Trotter | |
247 | 109e07c2 | Guido Trotter | The instance status will be on each node, for the instances it is |
248 | 3301805f | Michele Tartara | primary for, and its ``data`` section of the report will contain a list |
249 | 42b50796 | Michele Tartara | of instances, named ``instances``, with at least the following fields for |
250 | 42b50796 | Michele Tartara | each instance: |
251 | 3301805f | Michele Tartara | |
252 | 3301805f | Michele Tartara | ``name`` |
253 | 3301805f | Michele Tartara | The name of the instance. |
254 | 3301805f | Michele Tartara | |
255 | 3301805f | Michele Tartara | ``uuid`` |
256 | 3301805f | Michele Tartara | The UUID of the instance (stable on name change). |
257 | 3301805f | Michele Tartara | |
258 | 3301805f | Michele Tartara | ``admin_state`` |
259 | 3301805f | Michele Tartara | The status of the instance (up/down/offline) as requested by the admin. |
260 | 3301805f | Michele Tartara | |
261 | 3301805f | Michele Tartara | ``actual_state`` |
262 | 3301805f | Michele Tartara | The actual status of the instance. It can be ``up``, ``down``, or |
263 | 3301805f | Michele Tartara | ``hung`` if the instance is up but it appears to be completely stuck. |
264 | 3301805f | Michele Tartara | |
265 | 3301805f | Michele Tartara | ``uptime`` |
266 | 3301805f | Michele Tartara | The uptime of the instance (if it is up, "null" otherwise). |
267 | 3301805f | Michele Tartara | |
268 | 3301805f | Michele Tartara | ``mtime`` |
269 | 3301805f | Michele Tartara | The timestamp of the last known change to the instance state. |
270 | 3301805f | Michele Tartara | |
271 | 3301805f | Michele Tartara | ``state_reason`` |
272 | 2bd9ec7c | Michele Tartara | The last known reason for state change of the instance, described according |
273 | 42b50796 | Michele Tartara | to the JSON representation of a reason trail, as detailed in the :doc:`reason |
274 | 42b50796 | Michele Tartara | trail design document <design-reason-trail>`. |
275 | 109e07c2 | Guido Trotter | |
276 | 3301805f | Michele Tartara | ``status`` |
277 | 3301805f | Michele Tartara | It represents the status of the instance, and its format is the same as that |
278 | 3301805f | Michele Tartara | of the ``status`` field of `Status reporting collectors`_. |
279 | 3301805f | Michele Tartara | |
280 | 3301805f | Michele Tartara | Each hypervisor should provide its own instance status data collector, possibly |
281 | 3301805f | Michele Tartara | with the addition of more, specific, fields. |
282 | 3301805f | Michele Tartara | The ``category`` field of all of them will be ``instance``. |
283 | 3301805f | Michele Tartara | The ``kind`` field will be ``1``. |
284 | 109e07c2 | Guido Trotter | |
285 | 109e07c2 | Guido Trotter | Note that as soon as a node knows it's not the primary anymore for an |
286 | 109e07c2 | Guido Trotter | instance it will stop reporting status for it: this means the instance |
287 | 109e07c2 | Guido Trotter | will either disappear, if it has been deleted, or appear on another |
288 | 109e07c2 | Guido Trotter | node, if it's been moved. |
289 | 109e07c2 | Guido Trotter | |
290 | 3301805f | Michele Tartara | The ``code`` of the ``status`` field of the report of the Instance status data |
291 | 3301805f | Michele Tartara | collector will be: |
292 | 109e07c2 | Guido Trotter | |
293 | 3301805f | Michele Tartara | ``0`` |
294 | 3301805f | Michele Tartara | if ``status`` is ``0`` for all the instances it is reporting about. |
295 | 109e07c2 | Guido Trotter | |
296 | 3301805f | Michele Tartara | ``1`` |
297 | 3301805f | Michele Tartara | otherwise. |
298 | 3301805f | Michele Tartara | |
299 | 05f88ad6 | Michele Tartara | Storage collectors |
300 | 05f88ad6 | Michele Tartara | ++++++++++++++++++ |
301 | 3301805f | Michele Tartara | |
302 | 05f88ad6 | Michele Tartara | The storage collectors will be a series of data collectors |
303 | 05f88ad6 | Michele Tartara | that will gather data about storage for the current node. The collection |
304 | 05f88ad6 | Michele Tartara | will be performed at different granularity and abstraction levels, from |
305 | 05f88ad6 | Michele Tartara | the physical disks, to partitions, logical volumes and to the specific |
306 | 05f88ad6 | Michele Tartara | storage types used by Ganeti itself (drbd, rbd, plain, file). |
307 | 3301805f | Michele Tartara | |
308 | 3301805f | Michele Tartara | The ``name`` of each of these collector will reflect what storage type each of |
309 | 3301805f | Michele Tartara | them refers to. |
310 | 3301805f | Michele Tartara | |
311 | 3301805f | Michele Tartara | The ``category`` field of these collector will be ``storage``. |
312 | 3301805f | Michele Tartara | |
313 | 05f88ad6 | Michele Tartara | The ``kind`` field will depend on the specific collector. |
314 | 3301805f | Michele Tartara | |
315 | 05f88ad6 | Michele Tartara | Each ``storage`` collector's ``data`` section will provide collector-specific |
316 | 05f88ad6 | Michele Tartara | fields. |
317 | 3301805f | Michele Tartara | |
318 | fae96b7c | Michele Tartara | The various storage collectors will provide keys to join the data they provide, |
319 | fae96b7c | Michele Tartara | in order to allow the user to get a better understanding of the system. E.g.: |
320 | fae96b7c | Michele Tartara | through device names, or instance names. |
321 | 3301805f | Michele Tartara | |
322 | 777a3109 | Michele Tartara | Diskstats collector |
323 | 777a3109 | Michele Tartara | ******************* |
324 | 777a3109 | Michele Tartara | |
325 | 777a3109 | Michele Tartara | This storage data collector will gather information about the status of the |
326 | 777a3109 | Michele Tartara | disks installed in the system, as listed in the /proc/diskstats file. This means |
327 | 777a3109 | Michele Tartara | that not only physical hard drives, but also ramdisks and loopback devices will |
328 | 777a3109 | Michele Tartara | be listed. |
329 | 777a3109 | Michele Tartara | |
330 | 777a3109 | Michele Tartara | Its ``kind`` in the report will be ``0`` (`Performance reporting collectors`_). |
331 | 777a3109 | Michele Tartara | |
332 | 777a3109 | Michele Tartara | Its ``category`` field in the report will contain the value ``storage``. |
333 | 777a3109 | Michele Tartara | |
334 | 777a3109 | Michele Tartara | When executed in verbose mode, the ``data`` section of the report of this |
335 | 777a3109 | Michele Tartara | collector will be a list of items, each representing one disk, each providing |
336 | 777a3109 | Michele Tartara | the following fields: |
337 | 777a3109 | Michele Tartara | |
338 | 777a3109 | Michele Tartara | ``major`` |
339 | 777a3109 | Michele Tartara | The major number of the device. |
340 | 777a3109 | Michele Tartara | |
341 | 777a3109 | Michele Tartara | ``minor`` |
342 | 777a3109 | Michele Tartara | The minor number of the device. |
343 | 777a3109 | Michele Tartara | |
344 | 777a3109 | Michele Tartara | ``name`` |
345 | 777a3109 | Michele Tartara | The name of the device. |
346 | 777a3109 | Michele Tartara | |
347 | 92070017 | Michele Tartara | ``readsNum`` |
348 | 777a3109 | Michele Tartara | This is the total number of reads completed successfully. |
349 | 777a3109 | Michele Tartara | |
350 | 777a3109 | Michele Tartara | ``mergedReads`` |
351 | 777a3109 | Michele Tartara | Reads which are adjacent to each other may be merged for efficiency. Thus |
352 | 777a3109 | Michele Tartara | two 4K reads may become one 8K read before it is ultimately handed to the |
353 | 777a3109 | Michele Tartara | disk, and so it will be counted (and queued) as only one I/O. This field |
354 | 777a3109 | Michele Tartara | specifies how often this was done. |
355 | 777a3109 | Michele Tartara | |
356 | 777a3109 | Michele Tartara | ``secRead`` |
357 | 777a3109 | Michele Tartara | This is the total number of sectors read successfully. |
358 | 777a3109 | Michele Tartara | |
359 | 777a3109 | Michele Tartara | ``timeRead`` |
360 | 777a3109 | Michele Tartara | This is the total number of milliseconds spent by all reads. |
361 | 777a3109 | Michele Tartara | |
362 | 777a3109 | Michele Tartara | ``writes`` |
363 | 777a3109 | Michele Tartara | This is the total number of writes completed successfully. |
364 | 777a3109 | Michele Tartara | |
365 | 777a3109 | Michele Tartara | ``mergedWrites`` |
366 | 777a3109 | Michele Tartara | Writes which are adjacent to each other may be merged for efficiency. Thus |
367 | 777a3109 | Michele Tartara | two 4K writes may become one 8K read before it is ultimately handed to the |
368 | 777a3109 | Michele Tartara | disk, and so it will be counted (and queued) as only one I/O. This field |
369 | 777a3109 | Michele Tartara | specifies how often this was done. |
370 | 777a3109 | Michele Tartara | |
371 | 777a3109 | Michele Tartara | ``secWritten`` |
372 | 777a3109 | Michele Tartara | This is the total number of sectors written successfully. |
373 | 777a3109 | Michele Tartara | |
374 | 777a3109 | Michele Tartara | ``timeWrite`` |
375 | fae96b7c | Michele Tartara | This is the total number of milliseconds spent by all writes. |
376 | 777a3109 | Michele Tartara | |
377 | 777a3109 | Michele Tartara | ``ios`` |
378 | 777a3109 | Michele Tartara | The number of I/Os currently in progress. |
379 | 777a3109 | Michele Tartara | The only field that should go to zero, it is incremented as requests are |
380 | 777a3109 | Michele Tartara | given to appropriate struct request_queue and decremented as they finish. |
381 | 777a3109 | Michele Tartara | |
382 | 777a3109 | Michele Tartara | ``timeIO`` |
383 | 777a3109 | Michele Tartara | The number of milliseconds spent doing I/Os. This field increases so long |
384 | 777a3109 | Michele Tartara | as field ``IOs`` is nonzero. |
385 | 777a3109 | Michele Tartara | |
386 | 777a3109 | Michele Tartara | ``wIOmillis`` |
387 | 777a3109 | Michele Tartara | The weighted number of milliseconds spent doing I/Os. |
388 | 777a3109 | Michele Tartara | This field is incremented at each I/O start, I/O completion, I/O merge, |
389 | 777a3109 | Michele Tartara | or read of these stats by the number of I/Os in progress (field ``IOs``) |
390 | 777a3109 | Michele Tartara | times the number of milliseconds spent doing I/O since the last update of |
391 | 777a3109 | Michele Tartara | this field. This can provide an easy measure of both I/O completion time |
392 | 777a3109 | Michele Tartara | and the backlog that may be accumulating. |
393 | 777a3109 | Michele Tartara | |
394 | 3301805f | Michele Tartara | DRBD status |
395 | 3301805f | Michele Tartara | *********** |
396 | 3301805f | Michele Tartara | |
397 | 3301805f | Michele Tartara | This data collector will run only on nodes where DRBD is actually |
398 | 3301805f | Michele Tartara | present and it will gather information about DRBD devices. |
399 | 3301805f | Michele Tartara | |
400 | 3301805f | Michele Tartara | Its ``kind`` in the report will be ``1`` (`Status reporting collectors`_). |
401 | 3301805f | Michele Tartara | |
402 | 3301805f | Michele Tartara | Its ``category`` field in the report will contain the value ``storage``. |
403 | 3301805f | Michele Tartara | |
404 | 3301805f | Michele Tartara | When executed in verbose mode, the ``data`` section of the report of this |
405 | 3301805f | Michele Tartara | collector will provide the following fields: |
406 | 3301805f | Michele Tartara | |
407 | 3301805f | Michele Tartara | ``versionInfo`` |
408 | 3301805f | Michele Tartara | Information about the DRBD version number, given by a combination of |
409 | 3301805f | Michele Tartara | any (but at least one) of the following fields: |
410 | 3301805f | Michele Tartara | |
411 | 3301805f | Michele Tartara | ``version`` |
412 | 3301805f | Michele Tartara | The DRBD driver version. |
413 | 3301805f | Michele Tartara | |
414 | 3301805f | Michele Tartara | ``api`` |
415 | 3301805f | Michele Tartara | The API version number. |
416 | 3301805f | Michele Tartara | |
417 | 3301805f | Michele Tartara | ``proto`` |
418 | 3301805f | Michele Tartara | The protocol version. |
419 | 3301805f | Michele Tartara | |
420 | 3301805f | Michele Tartara | ``srcversion`` |
421 | 3301805f | Michele Tartara | The version of the source files. |
422 | 3301805f | Michele Tartara | |
423 | 3301805f | Michele Tartara | ``gitHash`` |
424 | 3301805f | Michele Tartara | Git hash of the source files. |
425 | 3301805f | Michele Tartara | |
426 | 3301805f | Michele Tartara | ``buildBy`` |
427 | 3301805f | Michele Tartara | Who built the binary, and, optionally, when. |
428 | 3301805f | Michele Tartara | |
429 | 3301805f | Michele Tartara | ``device`` |
430 | 3301805f | Michele Tartara | A list of structures, each describing a DRBD device (a minor) and containing |
431 | 3301805f | Michele Tartara | the following fields: |
432 | 3301805f | Michele Tartara | |
433 | 3301805f | Michele Tartara | ``minor`` |
434 | 3301805f | Michele Tartara | The device minor number. |
435 | 3301805f | Michele Tartara | |
436 | 3301805f | Michele Tartara | ``connectionState`` |
437 | 3301805f | Michele Tartara | The state of the connection. If it is "Unconfigured", all the following |
438 | 3301805f | Michele Tartara | fields are not present. |
439 | 3301805f | Michele Tartara | |
440 | 3301805f | Michele Tartara | ``localRole`` |
441 | 3301805f | Michele Tartara | The role of the local resource. |
442 | 3301805f | Michele Tartara | |
443 | 3301805f | Michele Tartara | ``remoteRole`` |
444 | 3301805f | Michele Tartara | The role of the remote resource. |
445 | 3301805f | Michele Tartara | |
446 | 3301805f | Michele Tartara | ``localState`` |
447 | 3301805f | Michele Tartara | The status of the local disk. |
448 | 3301805f | Michele Tartara | |
449 | 3301805f | Michele Tartara | ``remoteState`` |
450 | 3301805f | Michele Tartara | The status of the remote disk. |
451 | 3301805f | Michele Tartara | |
452 | 3301805f | Michele Tartara | ``replicationProtocol`` |
453 | 3301805f | Michele Tartara | The replication protocol being used. |
454 | 3301805f | Michele Tartara | |
455 | 3301805f | Michele Tartara | ``ioFlags`` |
456 | 3301805f | Michele Tartara | The input/output flags. |
457 | 3301805f | Michele Tartara | |
458 | 3301805f | Michele Tartara | ``perfIndicators`` |
459 | 3301805f | Michele Tartara | The performance indicators. This field will contain the following |
460 | 3301805f | Michele Tartara | sub-fields: |
461 | 3301805f | Michele Tartara | |
462 | 3301805f | Michele Tartara | ``networkSend`` |
463 | 3301805f | Michele Tartara | KiB of data sent on the network. |
464 | 3301805f | Michele Tartara | |
465 | 3301805f | Michele Tartara | ``networkReceive`` |
466 | 3301805f | Michele Tartara | KiB of data received from the network. |
467 | 3301805f | Michele Tartara | |
468 | 3301805f | Michele Tartara | ``diskWrite`` |
469 | 3301805f | Michele Tartara | KiB of data written on local disk. |
470 | 3301805f | Michele Tartara | |
471 | 3301805f | Michele Tartara | ``diskRead`` |
472 | 3301805f | Michele Tartara | KiB of date read from the local disk. |
473 | 3301805f | Michele Tartara | |
474 | 3301805f | Michele Tartara | ``activityLog`` |
475 | 3301805f | Michele Tartara | Number of updates of the activity log. |
476 | 3301805f | Michele Tartara | |
477 | 3301805f | Michele Tartara | ``bitMap`` |
478 | 3301805f | Michele Tartara | Number of updates to the bitmap area of the metadata. |
479 | 3301805f | Michele Tartara | |
480 | 3301805f | Michele Tartara | ``localCount`` |
481 | 3301805f | Michele Tartara | Number of open requests to the local I/O subsystem. |
482 | 3301805f | Michele Tartara | |
483 | 3301805f | Michele Tartara | ``pending`` |
484 | 3301805f | Michele Tartara | Number of requests sent to the partner but not yet answered. |
485 | 3301805f | Michele Tartara | |
486 | 3301805f | Michele Tartara | ``unacknowledged`` |
487 | 3301805f | Michele Tartara | Number of requests received by the partner but still to be answered. |
488 | 3301805f | Michele Tartara | |
489 | 3301805f | Michele Tartara | ``applicationPending`` |
490 | 3301805f | Michele Tartara | Num of block input/output requests forwarded to DRBD but that have not yet |
491 | 3301805f | Michele Tartara | been answered. |
492 | 3301805f | Michele Tartara | |
493 | 3301805f | Michele Tartara | ``epochs`` |
494 | 3301805f | Michele Tartara | (Optional) Number of epoch objects. Not provided by all DRBD versions. |
495 | 3301805f | Michele Tartara | |
496 | 3301805f | Michele Tartara | ``writeOrder`` |
497 | 3301805f | Michele Tartara | (Optional) Currently used write ordering method. Not provided by all DRBD |
498 | 3301805f | Michele Tartara | versions. |
499 | 3301805f | Michele Tartara | |
500 | 3301805f | Michele Tartara | ``outOfSync`` |
501 | 3301805f | Michele Tartara | (Optional) KiB of storage currently out of sync. Not provided by all DRBD |
502 | 3301805f | Michele Tartara | versions. |
503 | 3301805f | Michele Tartara | |
504 | 3301805f | Michele Tartara | ``syncStatus`` |
505 | 3301805f | Michele Tartara | (Optional) The status of the synchronization of the disk. This is present |
506 | 3301805f | Michele Tartara | only if the disk is being synchronized, and includes the following fields: |
507 | 3301805f | Michele Tartara | |
508 | 3301805f | Michele Tartara | ``percentage`` |
509 | 3301805f | Michele Tartara | The percentage of synchronized data. |
510 | 3301805f | Michele Tartara | |
511 | 3301805f | Michele Tartara | ``progress`` |
512 | 3301805f | Michele Tartara | How far the synchronization is. Written as "x/y", where x and y are |
513 | 3301805f | Michele Tartara | integer numbers expressed in the measurement unit stated in |
514 | 3301805f | Michele Tartara | ``progressUnit`` |
515 | 3301805f | Michele Tartara | |
516 | 3301805f | Michele Tartara | ``progressUnit`` |
517 | 3301805f | Michele Tartara | The measurement unit for the progress indicator. |
518 | 3301805f | Michele Tartara | |
519 | 3301805f | Michele Tartara | ``timeToFinish`` |
520 | 3301805f | Michele Tartara | The expected time before finishing the synchronization. |
521 | 3301805f | Michele Tartara | |
522 | 3301805f | Michele Tartara | ``speed`` |
523 | 3301805f | Michele Tartara | The speed of the synchronization. |
524 | 3301805f | Michele Tartara | |
525 | 3301805f | Michele Tartara | ``want`` |
526 | 3301805f | Michele Tartara | The desiderd speed of the synchronization. |
527 | 3301805f | Michele Tartara | |
528 | 3301805f | Michele Tartara | ``speedUnit`` |
529 | 3301805f | Michele Tartara | The measurement unit of the ``speed`` and ``want`` values. Expressed |
530 | 3301805f | Michele Tartara | as "size/time". |
531 | 3301805f | Michele Tartara | |
532 | 3301805f | Michele Tartara | ``instance`` |
533 | 3301805f | Michele Tartara | The name of the Ganeti instance this disk is associated to. |
534 | 109e07c2 | Guido Trotter | |
535 | 109e07c2 | Guido Trotter | |
536 | 109e07c2 | Guido Trotter | Ganeti daemons status |
537 | 109e07c2 | Guido Trotter | +++++++++++++++++++++ |
538 | 109e07c2 | Guido Trotter | |
539 | 3301805f | Michele Tartara | Ganeti will report what information it has about its own daemons. |
540 | 3301805f | Michele Tartara | This should allow identifying possible problems with the Ganeti system itself: |
541 | 3301805f | Michele Tartara | for example memory leaks, crashes and high resource utilization should be |
542 | 3301805f | Michele Tartara | evident by analyzing this information. |
543 | 3301805f | Michele Tartara | |
544 | 3301805f | Michele Tartara | The ``kind`` field will be ``1`` (`Status reporting collectors`_). |
545 | 3301805f | Michele Tartara | |
546 | 3301805f | Michele Tartara | Each daemon will have its own data collector, and each of them will have |
547 | 3301805f | Michele Tartara | a ``category`` field valued ``daemon``. |
548 | 3301805f | Michele Tartara | |
549 | 3301805f | Michele Tartara | When executed in verbose mode, their data section will include at least: |
550 | 3301805f | Michele Tartara | |
551 | 3301805f | Michele Tartara | ``memory`` |
552 | 3301805f | Michele Tartara | The amount of used memory. |
553 | 3301805f | Michele Tartara | |
554 | 3301805f | Michele Tartara | ``size_unit`` |
555 | 3301805f | Michele Tartara | The measurement unit used for the memory. |
556 | 109e07c2 | Guido Trotter | |
557 | 3301805f | Michele Tartara | ``uptime`` |
558 | 3301805f | Michele Tartara | The uptime of the daemon. |
559 | 3301805f | Michele Tartara | |
560 | 3301805f | Michele Tartara | ``CPU usage`` |
561 | 3301805f | Michele Tartara | How much cpu the daemon is using (percentage). |
562 | 3301805f | Michele Tartara | |
563 | 3301805f | Michele Tartara | Any other daemon-specific information can be included as well in the ``data`` |
564 | 3301805f | Michele Tartara | section. |
565 | 109e07c2 | Guido Trotter | |
566 | 109e07c2 | Guido Trotter | Hypervisor resources report |
567 | 109e07c2 | Guido Trotter | +++++++++++++++++++++++++++ |
568 | 109e07c2 | Guido Trotter | |
569 | 109e07c2 | Guido Trotter | Each hypervisor has a view of system resources that sometimes is |
570 | 109e07c2 | Guido Trotter | different than the one the OS sees (for example in Xen the Node OS, |
571 | 109e07c2 | Guido Trotter | running as Dom0, has access to only part of those resources). In this |
572 | 109e07c2 | Guido Trotter | section we'll report all information we can in a "non hypervisor |
573 | 109e07c2 | Guido Trotter | specific" way. Each hypervisor can then add extra specific information |
574 | 109e07c2 | Guido Trotter | that is not generic enough be abstracted. |
575 | 109e07c2 | Guido Trotter | |
576 | 3301805f | Michele Tartara | The ``kind`` field will be ``0`` (`Performance reporting collectors`_). |
577 | 3301805f | Michele Tartara | |
578 | 3301805f | Michele Tartara | Each of the hypervisor data collectory will be of ``category``: ``hypervisor``. |
579 | 3301805f | Michele Tartara | |
580 | 109e07c2 | Guido Trotter | Node OS resources report |
581 | 109e07c2 | Guido Trotter | ++++++++++++++++++++++++ |
582 | 109e07c2 | Guido Trotter | |
583 | 109e07c2 | Guido Trotter | Since Ganeti assumes it's running on Linux, it's useful to export some |
584 | 3301805f | Michele Tartara | basic information as seen by the host system. |
585 | 109e07c2 | Guido Trotter | |
586 | 3301805f | Michele Tartara | The ``category`` field of the report will be ``null``. |
587 | 109e07c2 | Guido Trotter | |
588 | 3301805f | Michele Tartara | The ``kind`` field will be ``0`` (`Performance reporting collectors`_). |
589 | 109e07c2 | Guido Trotter | |
590 | 3301805f | Michele Tartara | The ``data`` section will include: |
591 | 109e07c2 | Guido Trotter | |
592 | 3301805f | Michele Tartara | ``cpu_number`` |
593 | 3301805f | Michele Tartara | The number of available cpus. |
594 | 109e07c2 | Guido Trotter | |
595 | 3301805f | Michele Tartara | ``cpus`` |
596 | 3301805f | Michele Tartara | A list with one element per cpu, showing its average load. |
597 | 109e07c2 | Guido Trotter | |
598 | 3301805f | Michele Tartara | ``memory`` |
599 | 3301805f | Michele Tartara | The current view of memory (free, used, cached, etc.) |
600 | 109e07c2 | Guido Trotter | |
601 | 3301805f | Michele Tartara | ``filesystem`` |
602 | 3301805f | Michele Tartara | A list with one element per filesystem, showing a summary of the |
603 | 3301805f | Michele Tartara | total/available space. |
604 | 109e07c2 | Guido Trotter | |
605 | 3301805f | Michele Tartara | ``NICs`` |
606 | 3301805f | Michele Tartara | A list with one element per network interface, showing the amount of |
607 | 3301805f | Michele Tartara | sent/received data, error rate, IP address of the interface, etc. |
608 | 109e07c2 | Guido Trotter | |
609 | 3301805f | Michele Tartara | ``versions`` |
610 | 3301805f | Michele Tartara | A map using the name of a component Ganeti interacts (Linux, drbd, |
611 | 3301805f | Michele Tartara | hypervisor, etc) as the key and its version number as the value. |
612 | 109e07c2 | Guido Trotter | |
613 | 3301805f | Michele Tartara | Note that we won't go into any hardware specific details (e.g. querying a |
614 | 3301805f | Michele Tartara | node RAID is outside the scope of this, and can be implemented as a |
615 | 3301805f | Michele Tartara | plugin) but we can easily just report the information above, since it's |
616 | 3301805f | Michele Tartara | standard enough across all systems. |
617 | 9ef3e121 | Michele Tartara | |
618 | b166dcfc | Michele Tartara | Format of the query |
619 | b166dcfc | Michele Tartara | ------------------- |
620 | b166dcfc | Michele Tartara | |
621 | 431ff2c1 | Michele Tartara | .. include:: monitoring-query-format.rst |
622 | b166dcfc | Michele Tartara | |
623 | 3301805f | Michele Tartara | Instance disk status propagation |
624 | 3301805f | Michele Tartara | -------------------------------- |
625 | 9ef3e121 | Michele Tartara | |
626 | 3301805f | Michele Tartara | As for the instance status Ganeti has now only partial information about |
627 | 3301805f | Michele Tartara | its instance disks: in particular each node is unaware of the disk to |
628 | 3301805f | Michele Tartara | instance mapping, that exists only on the master. |
629 | 9ef3e121 | Michele Tartara | |
630 | 3301805f | Michele Tartara | For this design doc we plan to fix this by changing all RPCs that create |
631 | 3301805f | Michele Tartara | a backend storage or that put an already existing one in use and passing |
632 | 3301805f | Michele Tartara | the relevant instance to the node. The node can then export these to the |
633 | 3301805f | Michele Tartara | status reporting tool. |
634 | 9ef3e121 | Michele Tartara | |
635 | 3301805f | Michele Tartara | While we haven't implemented these RPC changes yet, we'll use Confd to |
636 | 3301805f | Michele Tartara | fetch this information in the data collectors. |
637 | 9ef3e121 | Michele Tartara | |
638 | 3301805f | Michele Tartara | Plugin system |
639 | 3301805f | Michele Tartara | ------------- |
640 | 9ef3e121 | Michele Tartara | |
641 | 3301805f | Michele Tartara | The monitoring system will be equipped with a plugin system that can |
642 | 3301805f | Michele Tartara | export specific local information through it. |
643 | 9ef3e121 | Michele Tartara | |
644 | 3301805f | Michele Tartara | The plugin system is expected to be used by local installations to |
645 | 3301805f | Michele Tartara | export any installation specific information that they want to be |
646 | 3301805f | Michele Tartara | monitored, about either hardware or software on their systems. |
647 | 9ef3e121 | Michele Tartara | |
648 | 3301805f | Michele Tartara | The plugin system will be in the form of either scripts or binaries whose output |
649 | 3301805f | Michele Tartara | will be inserted in the report. |
650 | 109e07c2 | Guido Trotter | |
651 | 3301805f | Michele Tartara | Eventually support for other kinds of plugins might be added as well, such as |
652 | 3301805f | Michele Tartara | plain text files which will be inserted into the report, or local unix or |
653 | 3301805f | Michele Tartara | network sockets from which the information has to be read. This should allow |
654 | 3301805f | Michele Tartara | most flexibility for implementing an efficient system, while being able to keep |
655 | 3301805f | Michele Tartara | it as simple as possible. |
656 | 109e07c2 | Guido Trotter | |
657 | 109e07c2 | Guido Trotter | Data collectors |
658 | 109e07c2 | Guido Trotter | --------------- |
659 | 109e07c2 | Guido Trotter | |
660 | 109e07c2 | Guido Trotter | In order to ease testing as well as to make it simple to reuse this |
661 | 109e07c2 | Guido Trotter | subsystem it will be possible to run just the "data collectors" on each |
662 | 3301805f | Michele Tartara | node without passing through the agent daemon. |
663 | 109e07c2 | Guido Trotter | |
664 | 9ef3e121 | Michele Tartara | If a data collector is run independently, it should print on stdout its |
665 | 9ef3e121 | Michele Tartara | report, according to the format corresponding to a single data collector |
666 | 3301805f | Michele Tartara | report object, as described in the previous paragraphs. |
667 | 109e07c2 | Guido Trotter | |
668 | 109e07c2 | Guido Trotter | Mode of operation |
669 | 109e07c2 | Guido Trotter | ----------------- |
670 | 109e07c2 | Guido Trotter | |
671 | 109e07c2 | Guido Trotter | In order to be able to report information fast the monitoring agent |
672 | 109e07c2 | Guido Trotter | daemon will keep an in-memory or on-disk cache of the status, which will |
673 | 109e07c2 | Guido Trotter | be returned when queries are made. The status system will then |
674 | 109e07c2 | Guido Trotter | periodically check resources to make sure the status is up to date. |
675 | 109e07c2 | Guido Trotter | |
676 | 109e07c2 | Guido Trotter | Different parts of the report will be queried at different speeds. These |
677 | 109e07c2 | Guido Trotter | will depend on: |
678 | 109e07c2 | Guido Trotter | - how often they vary (or we expect them to vary) |
679 | 109e07c2 | Guido Trotter | - how fast they are to query |
680 | 109e07c2 | Guido Trotter | - how important their freshness is |
681 | 109e07c2 | Guido Trotter | |
682 | 109e07c2 | Guido Trotter | Of course the last parameter is installation specific, and while we'll |
683 | 109e07c2 | Guido Trotter | try to have defaults, it will be configurable. The first two instead we |
684 | 109e07c2 | Guido Trotter | can use adaptively to query a certain resource faster or slower |
685 | 109e07c2 | Guido Trotter | depending on those two parameters. |
686 | 109e07c2 | Guido Trotter | |
687 | 3301805f | Michele Tartara | When run as stand-alone binaries, the data collector will not using any |
688 | 3301805f | Michele Tartara | caching system, and just fetch and return the data immediately. |
689 | 109e07c2 | Guido Trotter | |
690 | 109e07c2 | Guido Trotter | Implementation place |
691 | 109e07c2 | Guido Trotter | -------------------- |
692 | 109e07c2 | Guido Trotter | |
693 | 109e07c2 | Guido Trotter | The status daemon will be implemented as a standalone Haskell daemon. In |
694 | 109e07c2 | Guido Trotter | the future it should be easy to merge multiple daemons into one with |
695 | 109e07c2 | Guido Trotter | multiple entry points, should we find out it saves resources and doesn't |
696 | 109e07c2 | Guido Trotter | impact functionality. |
697 | 109e07c2 | Guido Trotter | |
698 | 109e07c2 | Guido Trotter | The libekg library should be looked at for easily providing metrics in |
699 | 109e07c2 | Guido Trotter | json format. |
700 | 109e07c2 | Guido Trotter | |
701 | 109e07c2 | Guido Trotter | Implementation order |
702 | 109e07c2 | Guido Trotter | -------------------- |
703 | 109e07c2 | Guido Trotter | |
704 | 109e07c2 | Guido Trotter | We will implement the agent system in this order: |
705 | 109e07c2 | Guido Trotter | |
706 | 3301805f | Michele Tartara | - initial example data collectors (eg. for drbd and instance status). |
707 | 3301805f | Michele Tartara | - initial daemon for exporting data, integrating the existing collectors |
708 | 3301805f | Michele Tartara | - plugin system |
709 | 109e07c2 | Guido Trotter | - RPC updates for instance status reasons and disk to instance mapping |
710 | 3301805f | Michele Tartara | - cache layer for the daemon |
711 | 109e07c2 | Guido Trotter | - more data collectors |
712 | 109e07c2 | Guido Trotter | |
713 | 109e07c2 | Guido Trotter | |
714 | 109e07c2 | Guido Trotter | Future work |
715 | 109e07c2 | Guido Trotter | =========== |
716 | 109e07c2 | Guido Trotter | |
717 | 109e07c2 | Guido Trotter | As a future step it can be useful to "centralize" all this reporting |
718 | 109e07c2 | Guido Trotter | data on a single place. This for example can be just the master node, or |
719 | 109e07c2 | Guido Trotter | all the master candidates. We will evaluate doing this after the first |
720 | 109e07c2 | Guido Trotter | node-local version has been developed and tested. |
721 | 109e07c2 | Guido Trotter | |
722 | 109e07c2 | Guido Trotter | Another possible change is replacing the "read-only" RPCs with queries |
723 | 109e07c2 | Guido Trotter | to the agent system, thus having only one way of collecting information |
724 | 109e07c2 | Guido Trotter | from the nodes from a monitoring system and for Ganeti itself. |
725 | 109e07c2 | Guido Trotter | |
726 | 109e07c2 | Guido Trotter | One extra feature we may need is a way to query for only sub-parts of |
727 | 109e07c2 | Guido Trotter | the report (eg. instances status only). This can be done by passing |
728 | 109e07c2 | Guido Trotter | arguments to the HTTP GET, which will be defined when we get to this |
729 | 109e07c2 | Guido Trotter | funtionality. |
730 | 109e07c2 | Guido Trotter | |
731 | 109e07c2 | Guido Trotter | Finally the :doc:`autorepair system design <design-autorepair>`. system |
732 | 109e07c2 | Guido Trotter | (see its design) can be expanded to use the monitoring agent system as a |
733 | 109e07c2 | Guido Trotter | source of information to decide which repairs it can perform. |
734 | 109e07c2 | Guido Trotter | |
735 | 109e07c2 | Guido Trotter | .. vim: set textwidth=72 : |
736 | 109e07c2 | Guido Trotter | .. Local Variables: |
737 | 109e07c2 | Guido Trotter | .. mode: rst |
738 | 109e07c2 | Guido Trotter | .. fill-column: 72 |
739 | 109e07c2 | Guido Trotter | .. End: |