Revision e32e7886 doc/design-2.3.rst
b/doc/design-2.3.rst | ||
---|---|---|
12 | 12 |
- core changes, which affect the master daemon/job queue/locking or |
13 | 13 |
all/most logical units |
14 | 14 |
- logical unit/feature changes |
15 |
- external interface changes (e.g. command line, os api, hooks, ...)
|
|
15 |
- external interface changes (e.g. command line, OS API, hooks, ...)
|
|
16 | 16 |
|
17 | 17 |
Core changes |
18 | 18 |
============ |
... | ... | |
55 | 55 |
gnt-node group-del <group> # delete an empty group |
56 | 56 |
gnt-node group-list # list node groups |
57 | 57 |
gnt-node group-rename <oldname> <newname> # rename a group |
58 |
gnt-node list/info -g <group> # list only nodes belongin to a group |
|
58 |
gnt-node list/info -g <group> # list only nodes belonging to a group
|
|
59 | 59 |
gnt-node add -g <group> # add a node to a certain group |
60 | 60 |
gnt-node modify -g <group> # move a node to a new group |
61 | 61 |
|
... | ... | |
69 | 69 |
|
70 | 70 |
- The cluster will have a default group, which will initially be |
71 | 71 |
- Instance allocation will happen to the cluster's default group |
72 |
(which will be changable via gnt-cluster modify or RAPI) unless a
|
|
73 |
group is explicitely specified in the creation job (with -g or via
|
|
72 |
(which will be changeable via ``gnt-cluster modify`` or RAPI) unless
|
|
73 |
a group is explicitly specified in the creation job (with -g or via
|
|
74 | 74 |
RAPI). Iallocator will be only passed the nodes belonging to that |
75 | 75 |
group. |
76 | 76 |
- Moving an instance between groups can only happen via an explicit |
... | ... | |
119 | 119 |
Other work and future changes |
120 | 120 |
+++++++++++++++++++++++++++++ |
121 | 121 |
|
122 |
Commands like gnt-cluster command/copyfile will continue to work on the
|
|
123 |
whole cluster, but it will be possible to target one group only by
|
|
124 |
specifying it. |
|
122 |
Commands like ``gnt-cluster command``/``gnt-cluster copyfile`` will
|
|
123 |
continue to work on the whole cluster, but it will be possible to target
|
|
124 |
one group only by specifying it.
|
|
125 | 125 |
|
126 | 126 |
Commands which allow selection of sets of resources (for example |
127 |
gnt-instance start/stop) will be able to select them by node group as
|
|
128 |
well. |
|
127 |
``gnt-instance start``/``gnt-instance stop``) will be able to select
|
|
128 |
them by node group as well.
|
|
129 | 129 |
|
130 | 130 |
Initially node groups won't be taggable objects, to simplify the first |
131 | 131 |
implementation, but we expect this to be easy to add in a future version |
... | ... | |
139 | 139 |
won't implement this in the first version, but we'll evaluate it for the |
140 | 140 |
future, if we see scalability problems on big multi-group clusters. |
141 | 141 |
|
142 |
When Ganeti will support more storage models (eg. SANs, sheepdog, ceph)
|
|
142 |
When Ganeti will support more storage models (e.g. SANs, Sheepdog, Ceph)
|
|
143 | 143 |
we expect groups to be the basis for this, allowing for example a |
144 |
different sheepdog/ceph cluster, or a different SAN to be connected to
|
|
144 |
different Sheepdog/Ceph cluster, or a different SAN to be connected to
|
|
145 | 145 |
each group. In some cases this will mean that inter-group move operation |
146 | 146 |
will be necessarily performed with instance downtime, unless the |
147 | 147 |
hypervisor has block-migrate functionality, and we implement support for |
... | ... | |
176 | 176 |
cluster state. While this is still acceptable for smaller clusters where |
177 | 177 |
a small number of allocations/removal are presumed to occur between two |
178 | 178 |
periodic capacity calculations, on bigger clusters where we aim to |
179 |
parallelise heavily between node groups this is no longer true.
|
|
179 |
parallelize heavily between node groups this is no longer true.
|
|
180 | 180 |
|
181 | 181 |
|
182 | 182 |
|
... | ... | |
238 | 238 |
node will not invalidate the capacity, as we're more interested in “at |
239 | 239 |
least available” correctness, not “at most available”. |
240 | 240 |
|
241 |
Cache invalidations
|
|
242 |
+++++++++++++++++++
|
|
241 |
Cache invalidation |
|
242 |
++++++++++++++++++ |
|
243 | 243 |
|
244 | 244 |
If a partial node query is done (e.g. just for the node free space), and |
245 | 245 |
the returned values don't match with the cache, then the entire node |
... | ... | |
540 | 540 |
A job's priority can never go below -20. If a job hits priority -20, it |
541 | 541 |
must acquire its locks in blocking mode. |
542 | 542 |
|
543 |
Opcode priorities are synchronized to disk in order to be restored after
|
|
543 |
Opcode priorities are synchronised to disk in order to be restored after
|
|
544 | 544 |
a restart or crash of the master daemon. |
545 | 545 |
|
546 | 546 |
Priorities also need to be considered inside the locking library to |
... | ... | |
671 | 671 |
netutils: Utilities for handling common network tasks |
672 | 672 |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
673 | 673 |
|
674 |
Currently common util functions are kept in the utils modules. Since
|
|
675 |
this module grows bigger and bigger network-related functions are moved
|
|
676 |
to a separate module named *netutils*. Additionally all these utilities
|
|
677 |
will be IPv6-enabled. |
|
674 |
Currently common utility functions are kept in the ``utils`` module.
|
|
675 |
Since this module grows bigger and bigger network-related functions are
|
|
676 |
moved to a separate module named *netutils*. Additionally all these
|
|
677 |
utilities will be IPv6-enabled.
|
|
678 | 678 |
|
679 | 679 |
Cluster initialization |
680 | 680 |
~~~~~~~~~~~~~~~~~~~~~~ |
... | ... | |
726 | 726 |
Privilege Separation |
727 | 727 |
-------------------- |
728 | 728 |
|
729 |
Current state and short comings |
|
730 |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
731 |
|
|
732 |
As of Ganeti 2.2 we introduced privilege separation. This was affecting |
|
733 |
just Ganeti RAPI and also that just in a quickly short term solution. In |
|
734 |
this release we iterate again over it and make it more advanced and |
|
735 |
stable. This also means we'll remove the privilege separation again from |
|
736 |
the core and put it completely external so the daemons will be started |
|
737 |
on the final user already. |
|
729 |
Current state and shortcomings |
|
730 |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
738 | 731 |
|
739 |
Additionally this involves removing SSH code out auf bootstrap and core |
|
740 |
component and put it into a separate script. This means every |
|
741 |
daemon/script will assume that a working ssh setup is in place. |
|
732 |
In Ganeti 2.2 we introduced privilege separation for the RAPI daemon. |
|
733 |
This was done directly in the daemon's code in the process of |
|
734 |
daemonizing itself. Doing so leads to several potential issues. For |
|
735 |
example, a file could be opened while the code is still running as |
|
736 |
``root`` and for some reason not be closed again. Even after changing |
|
737 |
the user ID, the file descriptor can be written to. |
|
742 | 738 |
|
743 | 739 |
Implementation |
744 | 740 |
~~~~~~~~~~~~~~ |
745 | 741 |
|
746 |
We need to partially revert changes done in Ganeti 2.2 to move on the |
|
747 |
long term solution. This involves removing the drop privileges code in |
|
748 |
``daemons.py`` as this is already done on startup time by |
|
749 |
``start-stop-daemon`` util. |
|
742 |
To address these shortcomings, daemons will be started under the target |
|
743 |
user right away. The ``start-stop-daemon`` utility used to start daemons |
|
744 |
supports the ``--chuid`` option to change user and group ID before |
|
745 |
starting the executable. |
|
746 |
|
|
747 |
The intermediate solution for the RAPI daemon from Ganeti 2.2 will be |
|
748 |
removed again. |
|
750 | 749 |
|
751 |
The ssh code will be separated into one single script called upon |
|
752 |
``gnt-node add`` which guarantees that the SSH setup is done and |
|
753 |
functioning. |
|
750 |
Files written by the daemons may need to have an explicit owner and |
|
751 |
group set (easily done through ``utils.WriteFile``). |
|
754 | 752 |
|
755 |
Additionally some of the utils.WriteFile calls needs to be adjusted |
|
756 |
for the new permissions and ownerships. |
|
753 |
All SSH-related code is removed from the ``ganeti.bootstrap`` module and |
|
754 |
core components and moved to a separate script. The core code will |
|
755 |
simply assume a working SSH setup to be in place. |
|
757 | 756 |
|
758 | 757 |
Security Domains |
759 | 758 |
~~~~~~~~~~~~~~~~ |
... | ... | |
763 | 762 |
|
764 | 763 |
1. Public: ``0755`` respectively ``0644`` |
765 | 764 |
2. Ganeti wide: shared between the daemons (gntdaemons) |
766 |
3. Secret files: shared just between a specified set of daemons/users
|
|
765 |
3. Secret files: shared among a specific set of daemons/users
|
|
767 | 766 |
|
768 | 767 |
So for point 3 this tables shows the correlation of the sets to groups |
769 | 768 |
and their users: |
... | ... | |
772 | 771 |
Set Group Users Description |
773 | 772 |
=== ========== ============================== ========================== |
774 | 773 |
A gntrapi gntrapi, gntmasterd Share data between |
775 |
gntrapi & gntmasterd
|
|
774 |
gntrapi and gntmasterd
|
|
776 | 775 |
B gntadmins gntrapi, gntmasterd, *users* Shared between users who |
777 | 776 |
needs to call gntmasterd |
778 | 777 |
C gntconfd gntconfd, gntmasterd Share data between |
779 |
gntconfd & gntmasterd
|
|
778 |
gntconfd and gntmasterd
|
|
780 | 779 |
D gntmasterd gntmasterd masterd only; Currently |
781 | 780 |
only to redistribute the |
782 | 781 |
configuration, has access |
... | ... | |
798 | 797 |
gnt-node {add|remove} |
799 | 798 |
gnt-instance {console} |
800 | 799 |
|
801 |
Directory structure & permissions
|
|
802 |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
800 |
Directory structure and permissions
|
|
801 |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
803 | 802 |
|
804 |
Here's how we propose to change the filesystem hierachy and their |
|
803 |
Here's how we propose to change the filesystem hierarchy and their
|
|
805 | 804 |
permissions. |
806 | 805 |
|
807 | 806 |
Assuming it follows the defaults: ``gnt${daemon}`` for user and |
Also available in: Unified diff