Revision 99b67c35

b/doc/design-monitoring-agent.rst
46 46
- Ganeti daemons status, CPU usage, memory footprint
47 47
- Hypervisor resources report (memory, CPU, network interfaces)
48 48
- Node OS resources report (memory, CPU, network interfaces)
49
- Node OS CPU load average report
49 50
- Information from a plugin system
50 51

  
51 52
Format of the report
......
692 693
plugin) but we can easily just report the information above, since it's
693 694
standard enough across all systems.
694 695

  
696
Node OS CPU load average report
697
+++++++++++++++++++++++++++++++
698

  
699
This data collector will export CPU load statistics as seen by the host
700
system. Apart from using the data from an external monitoring system we
701
can also use the data to improve instance allocation and/or the Ganeti
702
cluster balance. To compute the CPU load average we will use a number of
703
values collected inside a time window. The collection process will be
704
done by an independent thread (see `Mode of Operation`_).
705

  
706
This report is a subset of the previous report (`Node OS resources
707
report`_) and they might eventually get merged, once reporting for the
708
other fields (memory, filesystem, NICs) gets implemented too.
709

  
710
Specifically:
711

  
712
The ``category`` field of the report will be ``null``.
713

  
714
The ``kind`` field will be ``0`` (`Performance reporting collectors`_).
715

  
716
The ``data`` section will include:
717

  
718
``cpu_number``
719
  The number of available cpus.
720

  
721
``cpus``
722
  A list with one element per cpu, showing its average load.
723

  
724
``cpu_total``
725
  The total CPU load average as a sum of the all separate cpus.
726

  
727
The CPU load report function will get N values, collected by the
728
CPU load collection function and calculate the above averages. Please
729
see the section `Mode of Operation`_  for more information one how the
730
two functions of the data collector interact.
731

  
695 732
Format of the query
696 733
-------------------
697 734

  
......
764 801
When run as stand-alone binaries, the data collector will not using any
765 802
caching system, and just fetch and return the data immediately.
766 803

  
804
Since some performance collectors have to operate on a number of values
805
collected in previous times, we need a mechanism independent of the data
806
collector which will trigger the collection of those values and also
807
store them, so that they are available for calculation by the data
808
collectors.
809

  
810
To collect data periodically, a thread will be created by the monitoring
811
agent which will run the collection function of every data collector
812
that provides one. The values returned by the collection function of
813
the data collector will be saved in an appropriate map, associating each
814
value to the corresponding collector, using the collector's name as the
815
key of the map. This map will be stored in mond's memory.
816

  
817
For example: the collection function of the CPU load collector will
818
collect a CPU load value and save it in the map mentioned above. The
819
collection function will be called by the collector thread every t
820
milliseconds. When the report function of the collector is called, it
821
will process the last N values of the map and calculate the
822
corresponding average.
823

  
767 824
Implementation place
768 825
--------------------
769 826

  

Also available in: Unified diff