1 .. snf-network documentation master file, created by
2 sphinx-quickstart on Wed Feb 12 20:00:16 2014.
3 You can adapt this file completely to your liking, but it should at least
4 contain the root `toctree` directive.
6 Welcome to snf-network's documentation!
7 =======================================
9 snf-network is a set of scripts that handle the network configuration of
10 an instance inside a Ganeti cluster. It takes advantange of the
11 variables that Ganeti exports to their execution environment and issue
12 all the necessary commands to ensure network connectivity to the instance
13 based on the requested setup.
18 Ganeti supports `IP pool management
19 <http://docs.ganeti.org/ganeti/master/html/design-network.html>`_
20 so that end-user can put instances inside networks and get all information
21 related to the network in scripts. Specifically the following options are
29 are per NIC specific, whereas:
38 are inherited by the network in which a NIC resides (optional).
43 The scripts can be devided into two categories:
45 1. The scripts that are invoked explicitly by Ganeti upon NIC creation.
47 2. The scripts that are invoked by Ganeti Hooks Manager before or after an
50 The first group has the exact NIC info that is about to be configured where
51 the latter one has the info of the whole instance. The big difference is that
52 instance configuration (from the master perspective) might vary or be total
53 different from the one that is currently running. The reason is that some
54 modifications can take place without hotplug.
60 Ganeti upon instance startup and NIC hotplug creates the TAP devices to
61 reflect to the instance's NICs. After that it invokes the Ganeti's `kvm-ifup`
62 script with the TAP name as first argument and an environment including
63 all NIC's and the corresponding network's info. This script searches for
64 a user provided one under `/etc/ganeti/kvm-ifup-custom` and executes it
71 In order to cleanup or modify the node's setup or the configuration of an
72 external component, Ganeti upon instance shutdown, successful instance
73 migration on source node and NIC hot-unplug invokes `kvm-ifdown` script
74 with the TAP name as first argument and a boolean second argument pointing
75 whether we want to do local cleanup only (in case of instance migration) or
76 totally unconfigure the interface along with e.g., any DNS entries (in case
77 of NIC hot-unplug). This script searches for a user provided one under
78 `/etc/ganeti/kvm-ifdown-custom` and executes it instead.
84 Ganeti provides a hypervisor parameter that defines the script to be executed
85 per NIC upon instance startup: `vif-script`. Ganeti provides `vif-ganeti` as
86 example script which executes `/etc/xen/scripts/vif-custom` if found.
92 This hook gets all static info related to an instance from evironment variables
93 and issues any commands needed. It was used to fix node's setup upon migration
94 when ifdown script was not supported but now it does nothing.
100 This hook updates an external `DDNS <https://wiki.debian.org/DDNS>`_ setup via
101 ``nsupdate``. Since we add/remove entries during ifup/ifdown scripts, we use
102 this only during instance remove/shutdown/rename. It does not rely on exported
103 environment but it queries first the DNS server to obtain current entries and
104 then it invokes the neccessary commands to remove them (and the relevant
111 Currently since NICs in Ganeti are not taggable objects, we use network's and
112 instance's tags to customize each NIC configuration. NIC inherits the network's
113 tags (if attached to any) and further customization can be achieved with
114 instance tags e.g. <tag prefix>:<nic uuid or name>:<tag>. In the following
115 subsections we will mention all supported tags and their reflected underline
122 This setup has the following characteristics:
124 * An external gateway on the same collition domain with all nodes on some
125 interface (e.g. eth1, eth0.200) is needed.
126 * Each node is a router for the hostes VMs
127 * The node itself does not have an IP inside the routed network
128 * The node does proxy ARP for IPv4 networks
129 * The node does proxy NDP for IPv6 networks while RA and NA are
130 * RS and NS are served locally by
131 `nfdhcpd <http://www.synnefo.org/docs/nfdhcpd/latest/index.html>`_
132 since the VMs are not on the same link with the router.
134 Lets analyze a simple PING from an instance to an external IP using this setup.
135 We assume the following:
137 * ``IP`` is the instance's IP
138 * ``GW_IP`` is the external router's IP
139 * ``NODE_IP`` is the node's IP
140 * ``ARP_IP`` is a dummy IP inside the network needed for proxy ARP
142 * ``MAC`` is the instance's MAC
143 * ``TAP_MAC`` is the tap's MAC
144 * ``DEV_MAC`` is the host's DEV MAC
145 * ``GW_MAC`` is the external router's MAC
147 * ``DEV`` is the node's device that the router is visible from
148 * ``TAP`` is the host interface connected with the instance's eth0
150 Since we suppose to be on the same link with the router, ARP takes place first:
152 1) The VM wants to know the GW_MAC. Since the traffic is routed we do proxy ARP.
154 - ARP, Request who-has GW_IP tell IP
155 - ARP, Reply GW_IP is-at TAP_MAC ``echo 1 > /proc/sys/net/conf/TAP/proxy_arp``
156 - So `arp -na` insided the VM shows: ``(GW_IP) at TAP_MAC [ether] on eth0``
158 2) The host wants to know the GW_MAC. Since the node does **not** have an IP
159 inside the network we use the dummy one specified above.
161 - ARP, Request who-has GW_IP tell ARP_IP (Created by DEV)
162 ``arptables -I OUTPUT -o DEV --opcode 1 -j mangle --mangle-ip-s ARP_IP``
163 - ARP, Reply GW_IP is-at GW_MAC
165 3) The host wants to know MAC so that it can proxy it.
167 - We simulate here that the VM sees **only** GW on the link.
168 - ARP, Request who-has IP tell GW_IP (Created by TAP)
169 ``arptables -I OUTPUT -o TAP --opcode 1 -j mangle --mangle-ip-s GW_IP``
170 - So `arp -na` inside the host shows:
171 ``(GW_IP) at GW_MAC [ether] on DEV, (IP) at MAC on TAP``
173 4) GW wants to know who does proxy for IP.
175 - ARP, Request who-has IP tell GW_IP
176 - ARP, Reply IP is-at DEV_MAC (Created by host's DEV)
179 With the above we have a working proxy ARP configuration. The rest is done
180 via simple L3 routing. Lets assume the following:
182 * ``TABLE`` is the extra routing table
183 * ``SUBNET`` is the IPv4 subnet where the VM's IP reside
187 - Traffic coming out of TAP is routed via TABLE
188 ``ip rule add dev TAP table TABLE``
189 - TABLE states that default route is GW_IP via DEV
190 ``ip route add default via GW_IP dev DEV``
194 - Packet arrives at router
195 - Router knows from proxy ARP that the IP is at DEV_MAC.
196 - Router sends ethernet packet with tgt DEV_MAC
197 - Host receives the packet from DEV interface
198 - Traffic coming out DEV is routed via TABLE
199 ``ip rule add dev DEV table TABLE``
200 - Traffic targeting IP is routed to TAP
201 ``ip route add IP dev TAP``
203 3) Host to VM traffic:
205 - Impossible if the VM resides in the host
206 - Otherwise there is a route for it: ``ip route add SUBNET dev DEV``
208 The IPv6 setup is pretty similar but instead of proxy ARP we have proxy NDP
209 and RS and NS coming from TAP are served by nfdhpcd. RA contain network's
210 prefix and has M flag unset in order the VM to obtain its IP6 via SLAAC and
211 O flag set to obtain static info (nameservers, domain search list) via DHCPv6
212 (also served by nfdhcpd).
214 Again the VM sees on its link local only TAP which is supposed to be the
215 Router. The host does proxy for IP6 ``ip -6 neigh add EUI64 dev DEV``.
217 When an interface gets up inside a host we should invalidate all entries
218 related to its IP among other nodes and the router. For proxy ARP we do
219 ``arpsend -U -c 1 -i IP DEV`` and for proxy NDP we do ``ndsend EUI64 DEV``
225 In order to provide L2 isolation among several VMs we can use ebtables on a
226 **single** bridge. The infrastracture must provide a physical VLAN or separate
227 interaface shared among all nodes in the cluster. All virtual interfaces will
228 be bridged on a common bridge (e.g. ``prv0``) and filtering will be done via
229 ebtables and MAC prefix. The concept is that all interfaces on the same L2
230 should have the same MAC prefix. MAC prefix uniqueness is quaranteed by
231 synnefo and passed to Ganeti as a network option.
233 To ensure isolation we should allow traffic coming from tap to have specific
234 source MAC and at the same time allow traffic coming to tap to have a source
235 MAC in the same MAC prefix. Applying those rules only in FORWARD chain will not
236 guarantee isolation. The reason is because packets with target MAC a `mutlicast
237 address <http://en.wikipedia.org/wiki/Multicast_address>`_ go through INPUT and
238 OUTPUT chains. To sum up the following ebtables rules are applied:
240 .. code-block:: console
243 ebtables -t filter -N FROMTAP5
244 ebtables -t filter -N TOTAP5
246 # Filter multicast traffic from VM
247 ebtables -t filter -A INPUT -i tap5 -j FROMTAP5
249 # Filter multicast traffic to VM
250 ebtables -t filter -A OUTPUT -o tap5 -j TOTAP5
252 # Filter traffic from VM
253 ebtables -t filter -A FORWARD -i tap5 -j FROMTAP5
254 # Filter traffic to VM
255 ebtables -t filter -A FORWARD -o tap5 -j TOTAP5
257 # Allow only specific src MAC for outgoing traffic
258 ebtables -t filter -A FROMTAP5 -s ! aa:55:66:1a:ae:82 -j DROP
259 # Allow only specific src MAC prefix for incoming traffic
260 ebtables -t filter -A TOTAP5 -s ! aa:55:60:0:0:0/ff:ff:f0:0:0:0 -j DROP
266 snf-network can update an external `DDNS <https://wiki.debian.org/DDNS>`_
267 server. `ifup` and `ifdown` scripts, if `dns` network tag is found, will use
268 `nsupdate` and add/remove entries related to the interface that is being