root / docs / index.rst @ 4b1eede6
History | View | Annotate | Download (9.9 kB)
1 |
.. snf-network documentation master file, created by |
---|---|
2 |
sphinx-quickstart on Wed Feb 12 20:00:16 2014. |
3 |
You can adapt this file completely to your liking, but it should at least |
4 |
contain the root `toctree` directive. |
5 |
|
6 |
Welcome to snf-network's documentation! |
7 |
======================================= |
8 |
|
9 |
snf-network is a set of scripts that handle the network configuration of |
10 |
an instance inside a Ganeti cluster. It takes advantage of the |
11 |
variables that Ganeti exports to their execution environment and issue |
12 |
all the necessary commands to ensure network connectivity to the instance |
13 |
based on the requested setup. |
14 |
|
15 |
Environment |
16 |
----------- |
17 |
|
18 |
Ganeti supports `IP pool management |
19 |
<http://docs.ganeti.org/ganeti/master/html/design-network.html>`_ |
20 |
so that end-user can put instances inside networks and get all information |
21 |
related to the network in scripts. Specifically the following options are |
22 |
exported: |
23 |
|
24 |
* IP |
25 |
* MAC |
26 |
* MODE |
27 |
* LINK |
28 |
|
29 |
are per NIC specific, whereas: |
30 |
|
31 |
* NETWORK_SUBNET |
32 |
* NETWORK_GATEWAY |
33 |
* NETWORK_MAC_PREFIX |
34 |
* NETWORK_TAGS |
35 |
* NETWORK_SUBNET6 |
36 |
* NETWORK_GATEWAY6 |
37 |
|
38 |
are inherited by the network in which a NIC resides (optional). |
39 |
|
40 |
Scripts |
41 |
------- |
42 |
|
43 |
The scripts can be divided into two categories: |
44 |
|
45 |
1. The scripts that are invoked explicitly by Ganeti upon NIC creation. |
46 |
|
47 |
2. The scripts that are invoked by Ganeti Hooks Manager before or after an |
48 |
opcode execution. |
49 |
|
50 |
The first group has the exact NIC info that is about to be configured where |
51 |
the latter one has the info of the whole instance. The big difference is that |
52 |
instance configuration (from the master perspective) might vary or be total |
53 |
different from the one that is currently running. The reason is that some |
54 |
modifications can take place without hotplugging. |
55 |
|
56 |
|
57 |
kvm-ifup-custom |
58 |
^^^^^^^^^^^^^^^ |
59 |
|
60 |
Ganeti upon instance startup and NIC hotplugging creates the TAP devices to |
61 |
reflect to the instance's NICs. After that it invokes the Ganeti's `kvm-ifup` |
62 |
script with the TAP name as first argument and an environment including |
63 |
all NIC's and the corresponding network's info. This script searches for |
64 |
a user provided one under `/etc/ganeti/kvm-ifup-custom` and executes it |
65 |
instead. |
66 |
|
67 |
|
68 |
kvm-ifdown-custom |
69 |
^^^^^^^^^^^^^^^^^ |
70 |
|
71 |
In order to cleanup or modify the node's setup or the configuration of an |
72 |
external component, Ganeti upon instance shutdown, successful instance |
73 |
migration on source node and NIC hot-unplug invokes `kvm-ifdown` script |
74 |
with the TAP name as first argument and a boolean second argument pointing |
75 |
whether we want to do local cleanup only (in case of instance migration) or |
76 |
totally unconfigure the interface along with e.g., any DNS entries (in case |
77 |
of NIC hot-unplug). This script searches for a user provided one under |
78 |
`/etc/ganeti/kvm-ifdown-custom` and executes it instead. |
79 |
|
80 |
|
81 |
vif-custom |
82 |
^^^^^^^^^^ |
83 |
|
84 |
Ganeti provides a hypervisor parameter that defines the script to be executed |
85 |
per NIC upon instance startup: `vif-script`. Ganeti provides `vif-ganeti` as |
86 |
example script which executes `/etc/xen/scripts/vif-custom` if found. |
87 |
|
88 |
|
89 |
snf-network-hook |
90 |
^^^^^^^^^^^^^^^^ |
91 |
|
92 |
This hook gets all static info related to an instance from environment variables |
93 |
and issues any commands needed. It was used to fix node's setup upon migration |
94 |
when ifdown script was not supported but now it does nothing. |
95 |
|
96 |
|
97 |
snf-network-dnshook |
98 |
^^^^^^^^^^^^^^^^^^^ |
99 |
|
100 |
This hook updates an external `DDNS <https://wiki.debian.org/DDNS>`_ setup via |
101 |
``nsupdate``. Since we add/remove entries during ifup/ifdown scripts, we use |
102 |
this only during instance remove/shutdown/rename. It does not rely on exported |
103 |
environment but it queries first the DNS server to obtain current entries and |
104 |
then it invokes the necessary commands to remove them (and the relevant |
105 |
reverse ones too). |
106 |
|
107 |
|
108 |
Supported Setups |
109 |
---------------- |
110 |
|
111 |
Currently since NICs in Ganeti are not taggable objects, we use network's and |
112 |
instance's tags to customize each NIC configuration. NIC inherits the network's |
113 |
tags (if attached to any) and further customization can be achieved with |
114 |
instance tags e.g. <tag prefix>:<nic uuid or name>:<tag>. In the following |
115 |
subsections we will mention all supported tags and their reflected underline |
116 |
setup. |
117 |
|
118 |
|
119 |
ip-less-routed |
120 |
^^^^^^^^^^^^^^ |
121 |
|
122 |
This setup has the following characteristics: |
123 |
|
124 |
* An external gateway on the same collision domain with all nodes on some |
125 |
interface (e.g. eth1, eth0.200) is needed. |
126 |
* Each node is a router for the hosted VMs |
127 |
* The node itself does not have an IP inside the routed network |
128 |
* The node does proxy ARP for IPv4 networks |
129 |
* The node does proxy NDP for IPv6 networks while RA and NA are |
130 |
* RS and NS are served locally by |
131 |
`nfdhcpd <http://www.synnefo.org/docs/nfdhcpd/latest/index.html>`_ |
132 |
since the VMs are not on the same link with the router. |
133 |
|
134 |
Lets analyze a simple PING from an instance to an external IP using this setup. |
135 |
We assume the following: |
136 |
|
137 |
* ``IP`` is the instance's IP |
138 |
* ``GW_IP`` is the external router's IP |
139 |
* ``NODE_IP`` is the node's IP |
140 |
* ``ARP_IP`` is a dummy IP inside the network needed for proxy ARP |
141 |
|
142 |
* ``MAC`` is the instance's MAC |
143 |
* ``TAP_MAC`` is the tap's MAC |
144 |
* ``DEV_MAC`` is the host's DEV MAC |
145 |
* ``GW_MAC`` is the external router's MAC |
146 |
|
147 |
* ``DEV`` is the node's device that the router is visible from |
148 |
* ``TAP`` is the host interface connected with the instance's eth0 |
149 |
|
150 |
Since we suppose to be on the same link with the router, ARP takes place first: |
151 |
|
152 |
1) The VM wants to know the GW_MAC. Since the traffic is routed we do proxy ARP. |
153 |
|
154 |
- ARP, Request who-has GW_IP tell IP |
155 |
- ARP, Reply GW_IP is-at TAP_MAC ``echo 1 > /proc/sys/net/conf/TAP/proxy_arp`` |
156 |
- So `arp -na` inside the VM shows: ``(GW_IP) at TAP_MAC [ether] on eth0`` |
157 |
|
158 |
2) The host wants to know the GW_MAC. Since the node does **not** have an IP |
159 |
inside the network we use the dummy one specified above. |
160 |
|
161 |
- ARP, Request who-has GW_IP tell ARP_IP (Created by DEV) |
162 |
``arptables -I OUTPUT -o DEV --opcode 1 -j mangle --mangle-ip-s ARP_IP`` |
163 |
- ARP, Reply GW_IP is-at GW_MAC |
164 |
|
165 |
3) The host wants to know MAC so that it can proxy it. |
166 |
|
167 |
- We simulate here that the VM sees **only** GW on the link. |
168 |
- ARP, Request who-has IP tell GW_IP (Created by TAP) |
169 |
``arptables -I OUTPUT -o TAP --opcode 1 -j mangle --mangle-ip-s GW_IP`` |
170 |
- So `arp -na` inside the host shows: |
171 |
``(GW_IP) at GW_MAC [ether] on DEV, (IP) at MAC on TAP`` |
172 |
|
173 |
4) GW wants to know who does proxy for IP. |
174 |
|
175 |
- ARP, Request who-has IP tell GW_IP |
176 |
- ARP, Reply IP is-at DEV_MAC (Created by host's DEV) |
177 |
|
178 |
|
179 |
With the above we have a working proxy ARP configuration. The rest is done |
180 |
via simple L3 routing. Lets assume the following: |
181 |
|
182 |
* ``TABLE`` is the extra routing table |
183 |
* ``SUBNET`` is the IPv4 subnet where the VM's IP resides |
184 |
|
185 |
1) Outgoing traffic: |
186 |
|
187 |
- Traffic coming out of TAP is routed via TABLE |
188 |
``ip rule add dev TAP table TABLE`` |
189 |
- TABLE states that default route is GW_IP via DEV |
190 |
``ip route add default via GW_IP dev DEV`` |
191 |
|
192 |
2) Incoming traffic: |
193 |
|
194 |
- Packet arrives at router |
195 |
- Router knows from proxy ARP that the IP is at DEV_MAC. |
196 |
- Router sends Ethernet packet with tgt DEV_MAC |
197 |
- Host receives the packet from DEV interface |
198 |
- Traffic coming out DEV is routed via TABLE |
199 |
``ip rule add dev DEV table TABLE`` |
200 |
- Traffic targeting IP is routed to TAP |
201 |
``ip route add IP dev TAP`` |
202 |
|
203 |
3) Host to VM traffic: |
204 |
|
205 |
- Impossible if the VM resides in the host |
206 |
- Otherwise there is a route for it: ``ip route add SUBNET dev DEV`` |
207 |
|
208 |
The IPv6 setup is pretty similar but instead of proxy ARP we have proxy NDP |
209 |
and RS and NS coming from TAP are served by nfdhpcd. RA contain network's |
210 |
prefix and has M flag unset in order the VM to obtain its IP6 via SLAAC and |
211 |
O flag set to obtain static info (nameservers, domain search list) via DHCPv6 |
212 |
(also served by nfdhcpd). |
213 |
|
214 |
Again the VM sees on its link local only TAP which is supposed to be the |
215 |
Router. The host does proxy for IP6 ``ip -6 neigh add EUI64 dev DEV``. |
216 |
|
217 |
When an interface gets up inside a host we should invalidate all entries |
218 |
related to its IP among other nodes and the router. For proxy ARP we do |
219 |
``arpsend -U -c 1 -i IP DEV`` and for proxy NDP we do ``ndsend EUI64 DEV`` |
220 |
|
221 |
|
222 |
private-filtered |
223 |
^^^^^^^^^^^^^^^^ |
224 |
|
225 |
In order to provide L2 isolation among several VMs we can use ebtables on a |
226 |
**single** bridge. The infrastructure must provide a physical VLAN or separate |
227 |
interface shared among all nodes in the cluster. All virtual interfaces will |
228 |
be bridged on a common bridge (e.g. ``prv0``) and filtering will be done via |
229 |
ebtables and MAC prefix. The concept is that all interfaces on the same L2 |
230 |
should have the same MAC prefix. MAC prefix uniqueness is guaranteed by |
231 |
Synnefo and passed to Ganeti as a network option. |
232 |
|
233 |
To ensure isolation we should allow traffic coming from tap to have specific |
234 |
source MAC and at the same time allow traffic coming to tap to have a source |
235 |
MAC in the same MAC prefix. Applying those rules only in FORWARD chain will not |
236 |
guarantee isolation. The reason is because packets with target MAC a `multicast |
237 |
address <http://en.wikipedia.org/wiki/Multicast_address>`_ go through INPUT and |
238 |
OUTPUT chains. To sum up the following ebtables rules are applied: |
239 |
|
240 |
.. code-block:: console |
241 |
|
242 |
# Create new chains |
243 |
ebtables -t filter -N FROMTAP5 |
244 |
ebtables -t filter -N TOTAP5 |
245 |
|
246 |
# Filter multicast traffic from VM |
247 |
ebtables -t filter -A INPUT -i tap5 -j FROMTAP5 |
248 |
|
249 |
# Filter multicast traffic to VM |
250 |
ebtables -t filter -A OUTPUT -o tap5 -j TOTAP5 |
251 |
|
252 |
# Filter traffic from VM |
253 |
ebtables -t filter -A FORWARD -i tap5 -j FROMTAP5 |
254 |
# Filter traffic to VM |
255 |
ebtables -t filter -A FORWARD -o tap5 -j TOTAP5 |
256 |
|
257 |
# Allow only specific src MAC for outgoing traffic |
258 |
ebtables -t filter -A FROMTAP5 -s ! aa:55:66:1a:ae:82 -j DROP |
259 |
# Allow only specific src MAC prefix for incoming traffic |
260 |
ebtables -t filter -A TOTAP5 -s ! aa:55:60:0:0:0/ff:ff:f0:0:0:0 -j DROP |
261 |
|
262 |
|
263 |
dns |
264 |
^^^ |
265 |
|
266 |
snf-network can update an external `DDNS <https://wiki.debian.org/DDNS>`_ |
267 |
server. `ifup` and `ifdown` scripts, if `dns` network tag is found, will use |
268 |
`nsupdate` and add/remove entries related to the interface that is being |
269 |
managed. |
270 |
|
271 |
|
272 |
Contents: |
273 |
|
274 |
.. toctree:: |
275 |
:maxdepth: 2 |
276 |
|
277 |
|
278 |
|
279 |
Indices and tables |
280 |
================== |
281 |
|
282 |
* :ref:`genindex` |
283 |
* :ref:`modindex` |
284 |
* :ref:`search` |
285 |
|