Revision 99b67c35 doc/design-monitoring-agent.rst
b/doc/design-monitoring-agent.rst | ||
---|---|---|
46 | 46 |
- Ganeti daemons status, CPU usage, memory footprint |
47 | 47 |
- Hypervisor resources report (memory, CPU, network interfaces) |
48 | 48 |
- Node OS resources report (memory, CPU, network interfaces) |
49 |
- Node OS CPU load average report |
|
49 | 50 |
- Information from a plugin system |
50 | 51 |
|
51 | 52 |
Format of the report |
... | ... | |
692 | 693 |
plugin) but we can easily just report the information above, since it's |
693 | 694 |
standard enough across all systems. |
694 | 695 |
|
696 |
Node OS CPU load average report |
|
697 |
+++++++++++++++++++++++++++++++ |
|
698 |
|
|
699 |
This data collector will export CPU load statistics as seen by the host |
|
700 |
system. Apart from using the data from an external monitoring system we |
|
701 |
can also use the data to improve instance allocation and/or the Ganeti |
|
702 |
cluster balance. To compute the CPU load average we will use a number of |
|
703 |
values collected inside a time window. The collection process will be |
|
704 |
done by an independent thread (see `Mode of Operation`_). |
|
705 |
|
|
706 |
This report is a subset of the previous report (`Node OS resources |
|
707 |
report`_) and they might eventually get merged, once reporting for the |
|
708 |
other fields (memory, filesystem, NICs) gets implemented too. |
|
709 |
|
|
710 |
Specifically: |
|
711 |
|
|
712 |
The ``category`` field of the report will be ``null``. |
|
713 |
|
|
714 |
The ``kind`` field will be ``0`` (`Performance reporting collectors`_). |
|
715 |
|
|
716 |
The ``data`` section will include: |
|
717 |
|
|
718 |
``cpu_number`` |
|
719 |
The number of available cpus. |
|
720 |
|
|
721 |
``cpus`` |
|
722 |
A list with one element per cpu, showing its average load. |
|
723 |
|
|
724 |
``cpu_total`` |
|
725 |
The total CPU load average as a sum of the all separate cpus. |
|
726 |
|
|
727 |
The CPU load report function will get N values, collected by the |
|
728 |
CPU load collection function and calculate the above averages. Please |
|
729 |
see the section `Mode of Operation`_ for more information one how the |
|
730 |
two functions of the data collector interact. |
|
731 |
|
|
695 | 732 |
Format of the query |
696 | 733 |
------------------- |
697 | 734 |
|
... | ... | |
764 | 801 |
When run as stand-alone binaries, the data collector will not using any |
765 | 802 |
caching system, and just fetch and return the data immediately. |
766 | 803 |
|
804 |
Since some performance collectors have to operate on a number of values |
|
805 |
collected in previous times, we need a mechanism independent of the data |
|
806 |
collector which will trigger the collection of those values and also |
|
807 |
store them, so that they are available for calculation by the data |
|
808 |
collectors. |
|
809 |
|
|
810 |
To collect data periodically, a thread will be created by the monitoring |
|
811 |
agent which will run the collection function of every data collector |
|
812 |
that provides one. The values returned by the collection function of |
|
813 |
the data collector will be saved in an appropriate map, associating each |
|
814 |
value to the corresponding collector, using the collector's name as the |
|
815 |
key of the map. This map will be stored in mond's memory. |
|
816 |
|
|
817 |
For example: the collection function of the CPU load collector will |
|
818 |
collect a CPU load value and save it in the map mentioned above. The |
|
819 |
collection function will be called by the collector thread every t |
|
820 |
milliseconds. When the report function of the collector is called, it |
|
821 |
will process the last N values of the map and calculate the |
|
822 |
corresponding average. |
|
823 |
|
|
767 | 824 |
Implementation place |
768 | 825 |
-------------------- |
769 | 826 |
|
Also available in: Unified diff