How ML2/VXLAN works
My setup:
1 controller node + 2 compute nodes
RDO Havana 2013.2.2,
CentOS 6.5, OpenVSwitch 1.11.0
VXLAN local IPs:
controller: 10.142.255.101
compute-1:10.142.255.102
compute-2:10.142.255.103
1. Setup VXLAN with ML2
After packstack installation, ML2 is not installed by default, we have to configure it manually.
On controller node:
yum install openstack-neutron-ml2 python-pyudev
Edit /etc/neutron/neutron.conf
core_plugin =neutron.plugins.ml2.plugin.Ml2Plugin
service_plugins=neutron.services.l3_router.l3_router_plugin.L3RouterPlugin,neutron.services.loadbalancer.plugin.LoadBalancerPlugin
Change plugin.ini
link
unlink /etc/neutron/plugin.ini
ln -s /etc/neutron/plugins/ml2/ml2_conf.ini /etc/neutron/plugin.ini
Edit /etc/neutron/plugin.ini
[ml2]
type_drivers = vxlan
tenant_network_types = vxlan
mechanism_drivers = openvswitch
[ml2_type_flat]
[ml2_type_vlan]
[ml2_type_gre]
[ml2_type_vxlan]
vni_ranges = 1001:2000
vxlan_group = 239.1.1.1
[database]
sql_connection = mysql://neutron:[email protected]/neutron_ml2
[securitygroup]
firewall_driver = dummy_value_to_enable_security_groups_in_server
Edit /etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini
on every node
[OVS]
vxlan_udp_port=4789
tunnel_type=vxlan
tunnel_id_ranges=1001:2000
tenant_network_type=vxlan
local_ip=10.142.255.101 #Use 102 for compute-1, 103 for compute-2
enable_tunneling=True
[AGENT]
tunnel_types = vxlan
polling_interval=2
[SECURITYGROUP]
firewall_driver=neutron.agent.linux.iptables_firewall.OVSHybridIptablesFirewallDriver
Database creation:
mysql -e “drop database if exists neutron_ml2;”
mysql -e “create database neutron_ml2 character set utf8;”
mysql -e “grant all on neutron_ml2.* to ‘neutron’@’%’;”
neutron-db-manage –config-file /usr/share/neutron/neutron-dist.conf \
–config-file /etc/neutron/neutron.conf –config-file /etc/neutron/plugin.ini \
upgrade head
Restart neutron services
service neutron-server restart
service neutron-openvswitch-agent restart #On every node
Check if tunnel is established on controller:
[[email protected] ~]# ovs-vsctl show
…
Bridge br-tun
Port patch-int
Interface patch-int
type: patch
options: {peer=patch-tun}
Port br-tun
Interface br-tun
type: internal
Port “vxlan-10.142.255.103”
Interface “vxlan-10.142.255.103″
type: vxlan
options: {in_key=flow, local_ip=”10.142.255.101″, out_key=flow, remote_ip=”10.142.255.103”}
<span style="color:#ff0000;">Port “vxlan-10.142.255.102”</span>
Interface “vxlan-10.142.255.102″
type: vxlan
options: {in_key=flow, local_ip=”10.142.255.101″, out_key=flow, remote_ip=”10.142.255.102”}
…
We see 2 tunnels created between controller to compute-1 and compute-2.
Check if tunnel is established on compute (compute-1 as example):
[[email protected] ~]# ovs-vsctl show
…
Bridge br-tun
Port “vxlan-10.142.255.103”
Interface “vxlan-10.142.255.103″
type: vxlan
options: {in_key=flow, local_ip=”10.142.255.102″, out_key=flow, remote_ip=”10.142.255.103”}
Port “vxlan-10.142.255.101”
Interface “vxlan-10.142.255.101″
type: vxlan
options: {in_key=flow, local_ip=”10.142.255.102″, out_key=flow, remote_ip=”10.142.255.101”}
…
The same, 2 tunnels created between compute-1 and controller and compute-2.
Try to launch some VM instances, to test how VM traffic go through VXLAN tunnels.
nova boot –flavor 1 –image cirros –num-instances 2 –nic net-id=<net-uuid> vm
Each compute node should take one VM:
[[email protected] ~]# nova list –fields name,status,power_state,host,networks
+————————————–+—————————————–+——–+————-+———-+———————+
| ID | Name | Status | Power State | Host | Networks |
+————————————–+—————————————–+——–+————-+———-+———————+
| 2bc01296-f8d4-48ce-a600-5acf83ee2bbf | vm-2bc01296-f8d4-48ce-a600-5acf83ee2bbf | ACTIVE | Running | compute2 | testnet=192.168.1.2 |
| 4f63feaf-e92a-4045-8e45-d3160c99fb84 | vm-4f63feaf-e92a-4045-8e45-d3160c99fb84 | ACTIVE | Running | compute | testnet=192.168.1.4 |
+————————————–+—————————————–+——–+————-+———-+———————+
2. Unicast packets of VM traffic
Send an ICMP packet from controller (L3 agent router namespace) to a VM. Capture the packet to see the VXLAN packet.
[[email protected] ~]# ip netns exec qrouter-28f6fe53-1f94-4355-81e3-85a2aad7b665 ping -c 1192.168.1.4
We can see the Outer IP header + VXLAN header with VNI 1000, then it’s Inner IP header plus ICMP payload. We can see the VXLAN packet is unicast between controller local IP 10.142.255.101 to compute-1 local IP 10.142.255.102
3. Broadcast or Multicase packet of VM traffic
Neutron and OpenvSwitch handle VM broadcast and multicast in the same way, here takes broadcast as example.
Send one ARP request broadcast packet from controller (L3 agent router namespace) to the VM network. Capture the packet to see the VXLAN packet.
[[email protected] ~]# ip netns exec qrouter-28f6fe53-1f94-4355-81e3-85a2aad7b665 arping -c 1 -I qr-f3d1a9ea-9a 192.168.1.4
We can see the broadcast is actually sent as 2 unicast packets from controller to compute-1 and compute-2. Then compute-1 holding the VM with IP 192.168.1.4 replies the ARP request.
The picture from official Openstack documentation explains this situation:
This kind of packet flood is obvious not ideal especially in large size of deployment. However, Openstack support better mechanisms to handle this. See following chapters.
4. VXLAN uses multicast between VTEP for VM broadcast/multicast traffic
According to VXLAN specification draft, chapter 4.2:
http://tools.ietf.org/html/draft-mahalingam-dutt-dcops-vxlan-08#page-9
If VM traffic type is broadcast or multicast, VXLAN VTEPs use multicast to send them among.
Linux bridge support this way of working.
However, OpenvSwitch does not support this way, as I saw in last chapter, it uses multiple unicast between VTEPs.
See OpenvSwitch Q&A:
http://git.openvswitch.org/cgi-bin/gitweb.cgi?p=openvswitch;a=blob_plain;f=FAQ;hb=HEAD
Q: How much of the VXLAN protocol does Open vSwitch currently support?
A: Open vSwitch currently supports the framing format for packets on the
wire. There is currently no support for the multicast aspects of VXLAN.
To get around the lack of multicast support, it is possible to
pre-provision MAC to IP address mappings either manually or from a
controller.
So the configuration vxlan_group = <a multicast ip, e.g 239.1.1.1>
in /etc/neutron/plugins/ml2/ml2_conf.ini
only applies to Linux Bridge mechanism driver.
5. ML2 L2 population mechanism driver
From the scenario of VM broadcast/multicast traffic handling above, we know there are many useless broadcast emulation(unicast) packets flying around VTEPs. L2 population is introduced to solving the flooding.
When using the ML2 plugin with tunnels and a new port goes up, ML2 sends a update_port_postcommit notification which is picked up and processed by the l2pop mechanism driver. l2 pop then gathers the IP and MAC of the port, as well as the host that the port was scheduled on; It then sends an RPC notification to all layer 2 agents.
So every agent can learn the MAC-IP-VTEP mappings, when broadcast packet comes from VM, VTEP only sends it to related VTEPs, no multicast emulation needed.
This picture explains:
Let’s go through how it works.
Add l2population in mechanism driver list in /etc/neutron/plugins/ml2/ml2_conf.ini on controller node.
[ml2]
mechanism_drivers = openvswitch,<span style="color:#ff0000;">l2population</span>
Enable l2population on every openvswitch agent node, in /etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini
[AGENT]
l2_population = True
Restart all neutron-server and openvswitch agent services on controller and compute nodes.
service neutron-server restart #on controller
service neutron-openvswitch-agent restart #on all nodes
Now L2 population should start to work.
…To be continued with detailed packet level analysis…