How ML2/VXLAN works

My setup:
1 controller node + 2 compute nodes
RDO Havana 2013.2.2,
CentOS 6.5, OpenVSwitch 1.11.0

VXLAN local IPs:

controller: 10.142.255.101
compute-1:10.142.255.102
compute-2:10.142.255.103

1. Setup VXLAN with ML2

After packstack installation, ML2 is not installed by default, we have to configure it manually.

On controller node:

yum install  openstack-neutron-ml2 python-pyudev  

Edit  /etc/neutron/neutron.conf

core_plugin =neutron.plugins.ml2.plugin.Ml2Plugin  
service_plugins=neutron.services.l3_router.l3_router_plugin.L3RouterPlugin,neutron.services.loadbalancer.plugin.LoadBalancerPlugin  

Change plugin.ini link

unlink /etc/neutron/plugin.ini  
ln -s /etc/neutron/plugins/ml2/ml2_conf.ini /etc/neutron/plugin.ini  

Edit  /etc/neutron/plugin.ini

[ml2]  
type_drivers = vxlan  
tenant_network_types = vxlan  
mechanism_drivers = openvswitch  
[ml2_type_flat]  
[ml2_type_vlan]  
[ml2_type_gre]  
[ml2_type_vxlan]  
vni_ranges = 1001:2000  
vxlan_group = 239.1.1.1  
[database]  
sql_connection = mysql://neutron:83105f1d6ded47cc@10.142.0.101/neutron_ml2  
[securitygroup]  
firewall_driver = dummy_value_to_enable_security_groups_in_server  

Edit /etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini on every node

[OVS]  
vxlan_udp_port=4789  
tunnel_type=vxlan  
tunnel_id_ranges=1001:2000  
tenant_network_type=vxlan  
local_ip=10.142.255.101      #Use 102 for compute-1, 103 for compute-2  
enable_tunneling=True  
[AGENT]  
tunnel_types = vxlan  
polling_interval=2  
[SECURITYGROUP]  
firewall_driver=neutron.agent.linux.iptables_firewall.OVSHybridIptablesFirewallDriver  

Database creation:

mysql -e “drop database if exists neutron_ml2;”  
mysql -e “create database neutron_ml2 character set utf8;”  
mysql -e “grant all on neutron_ml2.* to ‘neutron’@’%’;”  
neutron-db-manage –config-file /usr/share/neutron/neutron-dist.conf \  
–config-file /etc/neutron/neutron.conf –config-file /etc/neutron/plugin.ini \
upgrade head  

Restart neutron services

service neutron-server restart  
service neutron-openvswitch-agent restart    #On every node  

Check if tunnel is established on controller:

[root@controller ~]# ovs-vsctl show  
…  
Bridge br-tun  
Port patch-int  
Interface patch-int  
type: patch  
options: {peer=patch-tun}  
Port br-tun  
Interface br-tun  
type: internal  
Port “vxlan-10.142.255.103”  
Interface “vxlan-10.142.255.103″  
type: vxlan  
options: {in_key=flow, local_ip=”10.142.255.101″, out_key=flow, remote_ip=”10.142.255.103”}  
<span style="color:#ff0000;">Port “vxlan-10.142.255.102”</span>  
Interface “vxlan-10.142.255.102″  
type: vxlan  
options: {in_key=flow, local_ip=”10.142.255.101″, out_key=flow, remote_ip=”10.142.255.102”}  
…

We see 2 tunnels created between controller to compute-1 and compute-2.

Check if tunnel is established on compute (compute-1 as example):

[root@compute ~]# ovs-vsctl show  
…  
Bridge br-tun  
Port “vxlan-10.142.255.103”  
Interface “vxlan-10.142.255.103″  
type: vxlan  
options: {in_key=flow, local_ip=”10.142.255.102″, out_key=flow, remote_ip=”10.142.255.103”}  
Port “vxlan-10.142.255.101”  
Interface “vxlan-10.142.255.101″  
type: vxlan  
options: {in_key=flow, local_ip=”10.142.255.102″, out_key=flow, remote_ip=”10.142.255.101”}  
…

The same, 2 tunnels created between compute-1 and controller and compute-2.

Try to launch some VM instances, to test how VM traffic go through VXLAN tunnels.

nova boot –flavor 1 –image cirros –num-instances 2 –nic net-id=<net-uuid> vm  

Each compute node should take one VM:

[root@controller ~]# nova list –fields name,status,power_state,host,networks  
+————————————–+—————————————–+——–+————-+———-+———————+  
| ID | Name | Status | Power State | Host | Networks |  
+————————————–+—————————————–+——–+————-+———-+———————+  
| 2bc01296-f8d4-48ce-a600-5acf83ee2bbf | vm-2bc01296-f8d4-48ce-a600-5acf83ee2bbf | ACTIVE | Running | compute2 | testnet=192.168.1.2 |  
| 4f63feaf-e92a-4045-8e45-d3160c99fb84 | vm-4f63feaf-e92a-4045-8e45-d3160c99fb84 | ACTIVE | Running | compute | testnet=192.168.1.4 |  
+————————————–+—————————————–+——–+————-+———-+———————+

2. Unicast packets of VM traffic

Send an ICMP packet from controller (L3 agent router namespace) to a VM. Capture the packet to see the VXLAN packet.

[root@controller ~]# ip netns exec qrouter-28f6fe53-1f94-4355-81e3-85a2aad7b665 ping -c 1192.168.1.4

VXLAN-VM-Unicast-ICMP

We can see the Outer IP header + VXLAN header with VNI 1000, then it’s Inner IP header plus ICMP payload. We can see the VXLAN packet is unicast between controller local IP 10.142.255.101 to compute-1 local IP 10.142.255.102

3. Broadcast or Multicase packet of VM traffic

Neutron and OpenvSwitch handle VM broadcast and multicast in the same way, here takes broadcast as example.

Send one ARP request broadcast packet from controller (L3 agent router namespace) to the VM network. Capture the packet to see the VXLAN packet.

[root@controller ~]# ip netns exec qrouter-28f6fe53-1f94-4355-81e3-85a2aad7b665 arping -c 1 -I qr-f3d1a9ea-9a 192.168.1.4

VXLAN-VM-BroadCast-ARP-0

VXLAN-VM-BroadCast-ARP

We can see the broadcast is actually sent as 2 unicast packets from controller to compute-1 and compute-2. Then compute-1 holding the VM with IP 192.168.1.4 replies the ARP request.

The picture from official Openstack documentation explains this situation:

ml2<em>without</em>l2pop<em>full</em>mesh

This kind of packet flood is obvious not ideal especially in large size of deployment. However, Openstack support better mechanisms to handle this. See following chapters.

4. VXLAN uses multicast between VTEP for VM broadcast/multicast traffic

According to VXLAN specification draft, chapter 4.2:
http://tools.ietf.org/html/draft-mahalingam-dutt-dcops-vxlan-08#page-9

If VM traffic type is broadcast or multicast, VXLAN VTEPs use multicast to send them among.

Linux bridge support this way of working.

However, OpenvSwitch does not support this way, as I saw in last chapter, it uses multiple unicast between VTEPs.

See OpenvSwitch Q&A:
http://git.openvswitch.org/cgi-bin/gitweb.cgi?p=openvswitch;a=blob_plain;f=FAQ;hb=HEAD

Q: How much of the VXLAN protocol does Open vSwitch currently support?

A: Open vSwitch currently supports the framing format for packets on the
wire. There is currently no support for the multicast aspects of VXLAN.
To get around the lack of multicast support, it is possible to
pre-provision MAC to IP address mappings either manually or from a
controller.

So the configuration vxlan_group = <a multicast ip, e.g 239.1.1.1> in /etc/neutron/plugins/ml2/ml2_conf.ini only applies to Linux Bridge mechanism driver.

5. ML2 L2 population mechanism driver

From the scenario of VM broadcast/multicast traffic handling above, we know there are many useless broadcast emulation(unicast) packets flying around VTEPs. L2 population is introduced to solving the flooding.

When using the ML2 plugin with tunnels and a new port goes up, ML2 sends a updateportpostcommit notification which is picked up and processed by the l2pop mechanism driver. l2 pop then gathers the IP and MAC of the port, as well as the host that the port was scheduled on; It then sends an RPC notification to all layer 2 agents.

So every agent can learn the MAC-IP-VTEP mappings, when broadcast packet comes from VM, VTEP only sends it to related VTEPs, no multicast emulation needed.
This picture explains:

ml2<em>without</em>l2pop<em>partial</em>mesh

Let’s go through how it works.

Add l2population in mechanism driver list in /etc/neutron/plugins/ml2/ml2_conf.ini on controller node.

[ml2]  
mechanism_drivers = openvswitch,<span style="color:#ff0000;">l2population</span>  

Enable l2population on every openvswitch agent node, in /etc/neutron/plugins/openvswitch/ovsneutronplugin.ini

[AGENT]  
l2_population = True  

Restart all neutron-server and openvswitch agent services on controller and compute nodes.

service neutron-server restart    #on controller  
service neutron-openvswitch-agent restart   #on all nodes  

Now L2 population should start to work.

…To be continued with detailed packet level analysis…