Revision 2f2f1289 doc/design-2.2.rst

b/doc/design-2.2.rst
11 11

  
12 12
.. contents:: :depth: 4
13 13

  
14
Detailed design
15
===============
16

  
17 14
As for 2.1 we divide the 2.2 design into three areas:
18 15

  
19 16
- core changes, which affect the master daemon/job queue/locking or
20 17
  all/most logical units
21 18
- logical unit/feature changes
22
- external interface changes (eg. command line, os api, hooks, ...)
19
- external interface changes (e.g. command line, OS API, hooks, ...)
20

  
23 21

  
24 22
Core changes
25
------------
23
============
26 24

  
27 25
Master Daemon Scaling improvements
28
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
26
----------------------------------
29 27

  
30 28
Current state and shortcomings
31
++++++++++++++++++++++++++++++
29
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
32 30

  
33 31
Currently the Ganeti master daemon is based on four sets of threads:
34 32

  
......
50 48
scalability issues:
51 49

  
52 50
Core daemon connection handling
53
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
51
+++++++++++++++++++++++++++++++
54 52

  
55 53
Since the 16 client worker threads handle one connection each, it's very
56 54
easy to exhaust them, by just connecting to masterd 16 times and not
......
60 58
informed that everything is proceeding, and doesn't need to time out.
61 59

  
62 60
Wait for job change
63
^^^^^^^^^^^^^^^^^^^
61
+++++++++++++++++++
64 62

  
65 63
The REQ_WAIT_FOR_JOB_CHANGE luxi operation makes the relevant client
66 64
thread block on its job for a relative long time. This is another easy
......
69 67
contention (see below).
70 68

  
71 69
Job Queue lock
72
^^^^^^^^^^^^^^
70
++++++++++++++
73 71

  
74 72
The job queue lock is quite heavily contended, and certain easily
75 73
reproducible workloads show that's it's very easy to put masterd in
......
120 118
    remote rpcs to complete (starting, finishing, and submitting jobs)
121 119

  
122 120
Proposed changes
123
++++++++++++++++
121
~~~~~~~~~~~~~~~~
124 122

  
125 123
In order to be able to interact with the master daemon even when it's
126 124
under heavy load, and  to make it simpler to add core functionality
......
135 133
understand, debug, and scale.
136 134

  
137 135
Connection handling
138
^^^^^^^^^^^^^^^^^^^
136
+++++++++++++++++++
139 137

  
140 138
We'll move the main thread of ganeti-masterd to asyncore, so that it can
141 139
share the mainloop code with all other Ganeti daemons. Then all luxi
......
148 146
thread on the socket.
149 147

  
150 148
Wait for job change
151
^^^^^^^^^^^^^^^^^^^
149
+++++++++++++++++++
152 150

  
153 151
The REQ_WAIT_FOR_JOB_CHANGE luxi request is changed to be
154 152
subscription-based, so that the executing thread doesn't have to be
......
173 171
    them at a maximum rate (lower priority).
174 172

  
175 173
Job Queue lock
176
^^^^^^^^^^^^^^
174
++++++++++++++
177 175

  
178 176
In order to decrease the job queue lock contention, we will change the
179 177
code paths in the following ways, initially:
......
202 200

  
203 201

  
204 202
Remote procedure call timeouts
205
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
203
------------------------------
206 204

  
207 205
Current state and shortcomings
208
++++++++++++++++++++++++++++++
206
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
209 207

  
210 208
The current RPC protocol used by Ganeti is based on HTTP. Every request
211 209
consists of an HTTP PUT request (e.g. ``PUT /hooks_runner HTTP/1.0``)
......
230 228
unresponsive node daemon cases.
231 229

  
232 230
Proposed changes
233
++++++++++++++++
231
~~~~~~~~~~~~~~~~
234 232

  
235 233
RPC glossary
236
^^^^^^^^^^^^
234
++++++++++++
237 235

  
238 236
Function call ID
239 237
  Unique identifier returned by ``ganeti-noded`` after invoking a
......
242 240
  Process started by ``ganeti-noded`` to call actual (backend) function.
243 241

  
244 242
Protocol
245
^^^^^^^^
243
++++++++
246 244

  
247 245
Initially we chose HTTP as our RPC protocol because there were existing
248 246
libraries, which, unfortunately, turned out to miss important features
......
273 271
would be an implicit ping-mechanism.
274 272

  
275 273
Request handling
276
^^^^^^^^^^^^^^^^
274
++++++++++++++++
277 275

  
278 276
To support the protocol changes described above, the way the node daemon
279 277
handles request will have to change. Instead of forking and handling
......
345 343

  
346 344

  
347 345
Inter-cluster instance moves
348
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
346
----------------------------
349 347

  
350 348
Current state and shortcomings
351
++++++++++++++++++++++++++++++
349
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
352 350

  
353 351
With the current design of Ganeti, moving whole instances between
354 352
different clusters involves a lot of manual work. There are several ways
......
359 357
this process in Ganeti 2.2.
360 358

  
361 359
Proposed changes
362
++++++++++++++++
360
~~~~~~~~~~~~~~~~
363 361

  
364 362
Authorization, Authentication and Security
365
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
363
++++++++++++++++++++++++++++++++++++++++++
366 364

  
367 365
Until now, each Ganeti cluster was a self-contained entity and wouldn't
368 366
talk to other Ganeti clusters. Nodes within clusters only had to trust
......
424 422
certificate while providing a client certificate to the server.
425 423

  
426 424
Copying data
427
^^^^^^^^^^^^
425
++++++++++++
428 426

  
429 427
To simplify the implementation, we decided to operate at a block-device
430 428
level only, allowing us to easily support non-DRBD instance moves.
......
442 440
directly, where it'll be written to the new block device directly again.
443 441

  
444 442
Workflow
445
^^^^^^^^
443
++++++++
446 444

  
447 445
#. Third party tells source cluster to shut down instance, asks for the
448 446
   instance specification and for the public part of an encryption key
......
510 508
#. Source cluster removes the instance if requested
511 509

  
512 510
Instance move in pseudo code
513
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
511
++++++++++++++++++++++++++++
514 512

  
515 513
.. highlight:: python
516 514

  
......
651 649
.. highlight:: text
652 650

  
653 651
Miscellaneous notes
654
^^^^^^^^^^^^^^^^^^^
652
+++++++++++++++++++
655 653

  
656 654
- A very similar system could also be used for instance exports within
657 655
  the same cluster. Currently OpenSSH is being used, but could be
......
679 677

  
680 678

  
681 679
Privilege separation
682
~~~~~~~~~~~~~~~~~~~~
680
--------------------
683 681

  
684 682
Current state and shortcomings
685
++++++++++++++++++++++++++++++
683
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
686 684

  
687 685
All Ganeti daemons are run under the user root. This is not ideal from a
688 686
security perspective as for possible exploitation of any daemon the user
......
694 692
is in the same group.
695 693

  
696 694
Implementation
697
++++++++++++++
695
~~~~~~~~~~~~~~
698 696

  
699 697
For Ganeti 2.2 the implementation will be focused on a the RAPI daemon
700 698
only. This involves changes to ``daemons.py`` so it's possible to drop
......
710 708

  
711 709

  
712 710
Feature changes
713
---------------
711
===============
714 712

  
715 713
KVM Security
716
~~~~~~~~~~~~
714
------------
717 715

  
718 716
Current state and shortcomings
719
++++++++++++++++++++++++++++++
717
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
720 718

  
721 719
Currently all kvm processes run as root. Taking ownership of the
722 720
hypervisor process, from inside a virtual machine, would mean a full
......
725 723
option of subverting other basic services on the cluster (eg: ssh).
726 724

  
727 725
Proposed changes
728
++++++++++++++++
726
~~~~~~~~~~~~~~~~
729 727

  
730 728
We would like to decrease the surface of attack available if an
731 729
hypervisor is compromised. We can do so adding different features to
......
734 732
subvert the node.
735 733

  
736 734
Dropping privileges in kvm to a single user (easy)
737
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
735
++++++++++++++++++++++++++++++++++++++++++++++++++
738 736

  
739 737
By passing the ``-runas`` option to kvm, we can make it drop privileges.
740 738
The user can be chosen by an hypervisor parameter, so that each instance
......
761 759
- read unprotected data on the node filesystem
762 760

  
763 761
Running kvm in a chroot (slightly harder)
764
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
762
+++++++++++++++++++++++++++++++++++++++++
765 763

  
766 764
By passing the ``-chroot`` option to kvm, we can restrict the kvm
767 765
process in its own (possibly empty) root directory. We need to set this
......
784 782

  
785 783

  
786 784
Running kvm with a pool of users (slightly harder)
787
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
785
++++++++++++++++++++++++++++++++++++++++++++++++++
788 786

  
789 787
If rather than passing a single user as an hypervisor parameter, we have
790 788
a pool of useable ones, we can dynamically choose a free one to use and
......
795 793
can still be combined with the chroot benefits.
796 794

  
797 795
Running iptables rules to limit network interaction (easy)
798
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
796
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
799 797

  
800 798
These don't need to be handled by Ganeti, but we can ship examples. If
801 799
the users used to run VMs would be blocked from sending some or all
......
808 806

  
809 807

  
810 808
Running kvm inside a container (even harder)
811
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
809
++++++++++++++++++++++++++++++++++++++++++++
812 810

  
813 811
Recent linux kernels support different process namespaces through
814 812
control groups. PIDs, users, filesystems and even network interfaces can
......
820 818
just rely on iptables.
821 819

  
822 820
Implementation plan
823
+++++++++++++++++++
821
~~~~~~~~~~~~~~~~~~~
824 822

  
825 823
We will first implement dropping privileges for kvm processes as a
826 824
single user, and most probably backport it to 2.1. Then we'll ship
......
833 831

  
834 832

  
835 833
External interface changes
836
--------------------------
834
==========================
837 835

  
838 836

  
839 837
OS API
840
~~~~~~
838
------
841 839

  
842 840
The OS variants implementation in Ganeti 2.1 didn't prove to be useful
843 841
enough to alleviate the need to hack around the Ganeti API in order to
......
856 854

  
857 855

  
858 856
OS version
859
++++++++++
857
~~~~~~~~~~
860 858

  
861 859
A new ``os_version`` file will be supported by Ganeti. This file is not
862 860
required, but if existing, its contents will be checked for consistency
......
870 868
intra-cluster migration.
871 869

  
872 870
Parameters
873
++++++++++
871
~~~~~~~~~~
874 872

  
875 873
The interface between Ganeti and the OS scripts will be based on
876 874
environment variables, and as such the parameters and their values will
877 875
need to be valid in this context.
878 876

  
879 877
Names
880
^^^^^
878
+++++
881 879

  
882 880
The parameter names will be declared in a new file, ``parameters.list``,
883 881
together with a one-line documentation (whitespace-separated). Example::
......
896 894
parameters which differ in case only.
897 895

  
898 896
Values
899
^^^^^^
897
++++++
900 898

  
901 899
The values of the parameters are, from Ganeti's point of view,
902 900
completely freeform. If a given parameter has, from the OS' point of
......
917 915

  
918 916

  
919 917
Environment variables
920
+++++++++++++++++++++
918
^^^^^^^^^^^^^^^^^^^^^
921 919

  
922 920
The parameters will be exposed in the environment upper-case and
923 921
prefixed with the string ``OSP_``. For example, a parameter declared in

Also available in: Unified diff