How L3 and DHCP agents HA work in Red Hat OSP7

In Red Hat Openstack Platform 7, l3-agent and dhcp-agents are running in active-active on each controller node, instead of active-standby in OSP6.

[root@overcloud-controller-2 ~]# pcs status | grep 'l3-agent\|dhcp-agent'  -A1
 Clone Set: neutron-l3-agent-clone [neutron-l3-agent]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
--
 Clone Set: neutron-dhcp-agent-clone [neutron-dhcp-agent]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]

Let's look into more details of how tenant dhcp server and vrouter work in HA.

dhcp-agent HA

Let's create a tenant network with dhcp enabled:

[root@overcloud-controller-2 ~]# neutron net-create testdhcp
[root@overcloud-controller-2 ~]# neutron subnet-create  --name subnet-testdhcp testdhcp 192.168.200.0/24

Before we have this line in /etc/neutron/neutron.conf:

[root@overcloud-controller-0 ~]# grep ^dhcp_agents_per_network /etc/neutron/neutron.conf
dhcp_agents_per_network = 3  

So here we have 3 dhcp servers running on 3 controller nodes, same namespace gets created on each of them.

[root@overcloud-controller-0 ~]# for i in 0 1 2; do ssh overcloud-controller-$i ip netns; done
qdhcp-3f3f6372-0c96-4521-9d60-2524c139ab72  
qdhcp-3f3f6372-0c96-4521-9d60-2524c139ab72  
qdhcp-3f3f6372-0c96-4521-9d60-2524c139ab72  

We could see 3 dnsmasq servers with same hosts file running on each controller:

[root@overcloud-controller-0 ~]# for i in 0 1 2; do echo "On overcloud-controller-$i:" ; ssh overcloud-controller-$i ip netns exec qdhcp-3f3f6372-0c96-4521-9d60-2524c139ab72 ps -ef|grep dnsmasq; done
On overcloud-controller-0:  
root      1181 27974  0 09:05 pts/0    00:00:00 grep --color=auto dnsmasq  
nobody    6585     1  0 08:27 ?        00:00:00 dnsmasq --no-hosts --no-resolv --strict-order --bind-interfaces --interface=tapf1509d9f-57 --except-interface=lo --pid-file=/var/lib/neutron/dhcp/3f3f6372-0c96-4521-9d60-2524c139ab72/pid --dhcp-hostsfile=/var/lib/neutron/dhcp/3f3f6372-0c96-4521-9d60-2524c139ab72/host --addn-hosts=/var/lib/neutron/dhcp/3f3f6372-0c96-4521-9d60-2524c139ab72/addn_hosts --dhcp-optsfile=/var/lib/neutron/dhcp/3f3f6372-0c96-4521-9d60-2524c139ab72/opts --dhcp-leasefile=/var/lib/neutron/dhcp/3f3f6372-0c96-4521-9d60-2524c139ab72/leases --dhcp-range=set:tag0,192.168.200.0,static,86400s --dhcp-lease-max=256 --conf-file=/etc/neutron/dnsmasq-neutron.conf --domain=openstacklocal  
On overcloud-controller-1:  
nobody    6714     1  0 08:27 ?        00:00:00 dnsmasq --no-hosts --no-resolv --strict-order --bind-interfaces --interface=tap97155ac6-9b --except-interface=lo --pid-file=/var/lib/neutron/dhcp/3f3f6372-0c96-4521-9d60-2524c139ab72/pid --dhcp-hostsfile=/var/lib/neutron/dhcp/3f3f6372-0c96-4521-9d60-2524c139ab72/host --addn-hosts=/var/lib/neutron/dhcp/3f3f6372-0c96-4521-9d60-2524c139ab72/addn_hosts --dhcp-optsfile=/var/lib/neutron/dhcp/3f3f6372-0c96-4521-9d60-2524c139ab72/opts --dhcp-leasefile=/var/lib/neutron/dhcp/3f3f6372-0c96-4521-9d60-2524c139ab72/leases --dhcp-range=set:tag0,192.168.200.0,static,86400s --dhcp-lease-max=256 --conf-file=/etc/neutron/dnsmasq-neutron.conf --domain=openstacklocal  
On overcloud-controller-2:  
nobody   21166     1  0 08:27 ?        00:00:00 dnsmasq --no-hosts --no-resolv --strict-order --bind-interfaces --interface=tap05b08c68-49 --except-interface=lo --pid-file=/var/lib/neutron/dhcp/3f3f6372-0c96-4521-9d60-2524c139ab72/pid --dhcp-hostsfile=/var/lib/neutron/dhcp/3f3f6372-0c96-4521-9d60-2524c139ab72/host --addn-hosts=/var/lib/neutron/dhcp/3f3f6372-0c96-4521-9d60-2524c139ab72/addn_hosts --dhcp-optsfile=/var/lib/neutron/dhcp/3f3f6372-0c96-4521-9d60-2524c139ab72/opts --dhcp-leasefile=/var/lib/neutron/dhcp/3f3f6372-0c96-4521-9d60-2524c139ab72/leases --dhcp-range=set:tag0,192.168.200.0,static,86400s --dhcp-lease-max=256 --conf-file=/etc/neutron/dnsmasq-neutron.conf --domain=openstacklocal

[root@overcloud-controller-0 ~]# for i in 0 1 2; do echo "On overcloud-controller-$i:" ;ssh overcloud-controller-$i ip netns exec qdhcp-3f3f6372-0c96-4521-9d60-2524c139ab72 cat  /var/lib/neutron/dhcp/3f3f6372-0c96-4521-9d60-2524c139ab72/host ; done
On overcloud-controller-0:  
fa:16:3e:98:b5:e4,host-192-168-200-2.openstacklocal,192.168.200.2  
fa:16:3e:f8:8c:4e,host-192-168-200-4.openstacklocal,192.168.200.4  
fa:16:3e:d7:e4:22,host-192-168-200-3.openstacklocal,192.168.200.3  
On overcloud-controller-1:  
fa:16:3e:98:b5:e4,host-192-168-200-2.openstacklocal,192.168.200.2  
fa:16:3e:f8:8c:4e,host-192-168-200-4.openstacklocal,192.168.200.4  
fa:16:3e:d7:e4:22,host-192-168-200-3.openstacklocal,192.168.200.3  
On overcloud-controller-2:  
fa:16:3e:98:b5:e4,host-192-168-200-2.openstacklocal,192.168.200.2  
fa:16:3e:f8:8c:4e,host-192-168-200-4.openstacklocal,192.168.200.4  
fa:16:3e:d7:e4:22,host-192-168-200-3.openstacklocal,192.168.200.3  

Also we could see 3 neutron ports for 3 dhcp server IPs:

[root@overcloud-controller-0 ~]# neutron port-list
+--------------------------------------+------+-------------------+--------------------------------------------------------------------------------------+
| id                                   | name | mac_address       | fixed_ips                                                                            |
+--------------------------------------+------+-------------------+--------------------------------------------------------------------------------------+
| 05b08c68-4963-4003-ba83-46175bb72d24 |      | fa:16:3e:98:b5:e4 | {"subnet_id": "183a7323-a015-4eb5-8108-9e1295dfbe42", "ip_address": "192.168.200.2"} |
| 97155ac6-9bc6-42ce-98a1-cb2c868868eb |      | fa:16:3e:f8:8c:4e | {"subnet_id": "183a7323-a015-4eb5-8108-9e1295dfbe42", "ip_address": "192.168.200.4"} |
| f1509d9f-5704-4114-a81b-e328d1076419 |      | fa:16:3e:d7:e4:22 | {"subnet_id": "183a7323-a015-4eb5-8108-9e1295dfbe42", "ip_address": "192.168.200.3"} |
+--------------------------------------+------+-------------------+--------------------------------------------------------------------------------------+

Now we know in OSP7, tenant dhcp HA is achieved by running 3 dhcp servers at the same time, if there's one controller down, still other 2 dhcp servers are running and serving dhcp requests.

vRouter(l3-agent) HA

Let's create a vRouter

[root@overcloud-controller-0 ~]# neutron router-create testrouter
Created a new router:  
+-----------------------+--------------------------------------+
| Field                 | Value                                |
+-----------------------+--------------------------------------+
| admin_state_up        | True                                 |
| distributed           | False                                |
| external_gateway_info |                                      |
| ha                    | True                                 |
| id                    | 2be95cbd-efee-4908-90cf-622fcef8cae8 |
| name                  | testrouter                           |
| routes                |                                      |
| status                | ACTIVE                               |
| tenant_id             | c5cb88bd612949a5afaed8acf79350ef     |
+-----------------------+--------------------------------------+

We can see ha is true, because of have this:

[root@overcloud-controller-0 ~]# grep ^l3_ha /etc/neutron/neutron.conf
l3_ha = True  
l3_ha_net_cidr = 169.254.192.0/18  

And a "HA network" is created using 169.254.192.0/18 network as configured in neutron.conf:

[root@overcloud-controller-0 ~]# neutron net-list
+--------------------------------------+----------------------------------------------------+-------------------------------------------------------+
| id                                   | name                                               | subnets                                               |
+--------------------------------------+----------------------------------------------------+-------------------------------------------------------+
| 3f3f6372-0c96-4521-9d60-2524c139ab72 | testdhcp                                           | 183a7323-a015-4eb5-8108-9e1295dfbe42 192.168.200.0/24 |
| d160978b-fa7d-4d3e-bb45-a9ba1d98439f | HA network tenant c5cb88bd612949a5afaed8acf79350ef | 6ecc49ee-2dd9-47eb-9593-fa6fc55469a3 169.254.192.0/18 |
+--------------------------------------+----------------------------------------------------+-------------------------------------------------------+

We can see 3 ports created for 3 controllers in this HA network:

[root@overcloud-controller-0 ~]# neutron router-port-list testrouter
+--------------------------------------+-------------------------------------------------+-------------------+--------------------------------------------------------------------------------------+
| id                                   | name                                            | mac_address       | fixed_ips                                                                            |
+--------------------------------------+-------------------------------------------------+-------------------+--------------------------------------------------------------------------------------+
| 2dba7c38-0547-4d77-806c-e9bf9acb8a55 | HA port tenant c5cb88bd612949a5afaed8acf79350ef | fa:16:3e:3d:5b:95 | {"subnet_id": "6ecc49ee-2dd9-47eb-9593-fa6fc55469a3", "ip_address": "169.254.192.2"} |
| 5d00bea2-3dbe-4ea6-97bb-65ce74fb056b | HA port tenant c5cb88bd612949a5afaed8acf79350ef | fa:16:3e:fa:8c:8b | {"subnet_id": "6ecc49ee-2dd9-47eb-9593-fa6fc55469a3", "ip_address": "169.254.192.1"} |
| 835b136c-ec0b-4f5d-a96c-145de332efb6 | HA port tenant c5cb88bd612949a5afaed8acf79350ef | fa:16:3e:e7:22:d0 | {"subnet_id": "6ecc49ee-2dd9-47eb-9593-fa6fc55469a3", "ip_address": "169.254.192.3"} |
+--------------------------------------+-------------------------------------------------+-------------------+--------------------------------------------------------------------------------------+

Keeplived/VRRP process is running for this HA router:

[root@overcloud-controller-0 ~]# ps -ef | grep keepalived
neutron  22025     1  0 02:35 ?        00:00:00 /usr/bin/python2 /bin/neutron-keepalived-state-change --router_id=2be95cbd-efee-4908-90cf-622fcef8cae8 --namespace=qrouter-2be95cbd-efee-4908-90cf-622fcef8cae8 --conf_dir=/var/lib/neutron/ha_confs/2be95cbd-efee-4908-90cf-622fcef8cae8 --monitor_interface=ha-835b136c-ec --monitor_cidr=169.254.0.1/24 --pid_file=/var/lib/neutron/external/pids/2be95cbd-efee-4908-90cf-622fcef8cae8.monitor.pid --state_path=/var/lib/neutron --user=998 --group=996  
root     22041 20724  0 03:15 pts/0    00:00:00 grep --color=auto keepalived  
root     22047     1  0 02:35 ?        00:00:00 keepalived -P -f /var/lib/neutron/ha_confs/2be95cbd-efee-4908-90cf-622fcef8cae8/keepalived.conf -p /var/lib/neutron/ha_confs/2be95cbd-efee-4908-90cf-622fcef8cae8.pid -r /var/lib/neutron/ha_confs/2be95cbd-efee-4908-90cf-622fcef8cae8.pid-vrrp  
root     22049 22047  0 02:35 ?        00:00:00 keepalived -P -f /var/lib/neutron/ha_confs/2be95cbd-efee-4908-90cf-622fcef8cae8/keepalived.conf -p /var/lib/neutron/ha_confs/2be95cbd-efee-4908-90cf-622fcef8cae8.pid -r /var/lib/neutron/ha_confs/2be95cbd-efee-4908-90cf-622fcef8cae8.pid-vrrp  

Let's check keeplived.conf of this HA router:

[root@overcloud-controller-0 ~]# cat /var/lib/neutron/ha_confs/2be95cbd-efee-4908-90cf-622fcef8cae8/keepalived.conf
vrrp_instance VR_1 {  
    state BACKUP
    interface ha-835b136c-ec
    virtual_router_id 1
    priority 50
    garp_master_repeat 5
    garp_master_refresh 10
    nopreempt
    advert_int 2
    track_interface {
        ha-835b136c-ec
    }
    virtual_ipaddress {
        169.254.0.1/24 dev ha-835b136c-ec
    }

We can see there's one internal virtual_ipaddress, 169.254.0.1 defined for this router, and it now running on overcloud-controller-0:

[root@overcloud-controller-0 ~]# ip netns  exec qrouter-2be95cbd-efee-4908-90cf-622fcef8cae8 ip a | grep " inet 16 "
    inet 169.254.192.3/18 brd 169.254.255.255 scope global ha-835b136c-ec
    inet 169.254.0.1/24 scope global ha-835b136c-ec

From other 2 controllers, we can't see this VIP, only keeplived/VRRP IP available:

[root@overcloud-controller-0 ~]# ssh overcloud-controller-1 ip netns  exec qrouter-2be95cbd-efee-4908-90cf-622fcef8cae8 ip a | grep " inet 16"
    inet 169.254.192.2/18 brd 169.254.255.255 scope global ha-2dba7c38-05

[root@overcloud-controller-0 ~]# ssh overcloud-controller-2 ip netns  exec qrouter-2be95cbd-efee-4908-90cf-622fcef8cae8 ip a | grep " inet 16"
    inet 169.254.192.1/18 brd 169.254.255.255 scope global ha-5d00bea2-3d

This can be witnessed by neutron l3-agent-list-hosting-router:

[root@overcloud-controller-0 ~]# neutron l3-agent-list-hosting-router testrouter
+--------------------------------------+------------------------------------+----------------+-------+----------+
| id                                   | host                               | admin_state_up | alive | ha_state |
+--------------------------------------+------------------------------------+----------------+-------+----------+
| a7b76ad9-83a7-4c2c-a9ef-59e60e175a81 | overcloud-controller-1.localdomain | True           | :-)   | standby  |
| 85836d82-a187-424a-9816-a3db79e0bb8b | overcloud-controller-2.localdomain | True           | :-)   | standby  |
| 51e840f0-5d52-4b9c-a1a5-52b976abff7d | overcloud-controller-0.localdomain | True           | :-)   | active   |
+--------------------------------------+------------------------------------+----------------+-------+----------+

We can see the testrouter is active on overcloud-controller-0, standby on other 2 controllers.

Now let's add the testdhcp network to testrouter:

[root@overcloud-controller-0 ~]# neutron router-interface-add testrouter subnet-testdhcp
Added interface 946f1e38-2ef3-4747-8a81-60b14909d8c0 to router testrouter.  

Check output of ip a on overcloud-controller-0, vrouter namespace:

[root@overcloud-controller-0 ~]# ip netns exec qrouter-2be95cbd-efee-4908-90cf-622fcef8cae8 ip a |grep " inet "
    inet 127.0.0.1/8 scope host lo
    inet 169.254.192.3/18 brd 169.254.255.255 scope global ha-835b136c-ec
    inet 169.254.0.1/24 scope global ha-835b136c-ec
    inet 192.168.200.1/24 scope global qr-946f1e38-2e

We can see now the gateway ip of testdhcp network, 192.168.200.1 is running on overcloud-controller-0.

Also we should be able to see keeplived.conf gets updated:

[root@overcloud-controller-0 ~]# cat /var/lib/neutron/ha_confs/2be95cbd-efee-4908-90cf-622fcef8cae8/keepalived.conf
vrrp_instance VR_1 {  
    state BACKUP
    interface ha-835b136c-ec
    virtual_router_id 1
    priority 50
    garp_master_repeat 5
    garp_master_refresh 10
    nopreempt
    advert_int 2
    track_interface {
        ha-835b136c-ec
    }
    virtual_ipaddress {
        169.254.0.1/24 dev ha-835b136c-ec
    }
    virtual_ipaddress_excluded {
        192.168.200.1/24 dev qr-946f1e38-2e
        fe80::f816:3eff:fe1a:58d8/64 dev qr-946f1e38-2e scope link
    }

L3-agent HA working way can be illustrated as:
alt

Let's see how vRouter switch over when active controller is down, now we shutdown active overcloud-controller-0:

[root@overcloud-controller-0 ~]# shutdown now
Connection to 192.0.2.7 closed by remote host.  

Now check who will take over the testrouter:

[root@overcloud-controller-1 ~]# neutron l3-agent-list-hosting-router testrouter
+--------------------------------------+------------------------------------+----------------+-------+----------+
| id                                   | host                               | admin_state_up | alive | ha_state |
+--------------------------------------+------------------------------------+----------------+-------+----------+
| a7b76ad9-83a7-4c2c-a9ef-59e60e175a81 | overcloud-controller-1.localdomain | True           | :-)   | active   |
| 85836d82-a187-424a-9816-a3db79e0bb8b | overcloud-controller-2.localdomain | True           | :-)   | standby  |
| 51e840f0-5d52-4b9c-a1a5-52b976abff7d | overcloud-controller-0.localdomain | True           | xxx   | active   |
+--------------------------------------+------------------------------------+----------------+-------+----------+

We can see overcloud-controller-0 is not alive anymore, and now overcloud-controller-1 is active. Let's check if virtual IP moves to new active node or not:

[root@overcloud-controller-1 ~]# ip netns exec qrouter-2be95cbd-efee-4908-90cf-622fcef8cae8 ip a |grep " inet "
    inet 127.0.0.1/8 scope host lo
    inet 169.254.192.2/18 brd 169.254.255.255 scope global ha-2dba7c38-05
    inet 169.254.0.1/24 scope global ha-2dba7c38-05
    inet 192.168.200.1/24 scope global qr-946f1e38-2e

We can see virtual IP 192.168.200.1 is now running on overcloud-controller-1.

We still see for this vRouter, overcloud-controller-0 is still active although it's not alive, and virtual IP has moved to overcloud-controller-1, now let's bring overcloud-controller-0 back online, to see what will happen.

When it's online, check again:

[root@overcloud-controller-0 ~]# neutron l3-agent-list-hosting-router testrouter
+--------------------------------------+------------------------------------+----------------+-------+----------+
| id                                   | host                               | admin_state_up | alive | ha_state |
+--------------------------------------+------------------------------------+----------------+-------+----------+
| a7b76ad9-83a7-4c2c-a9ef-59e60e175a81 | overcloud-controller-1.localdomain | True           | :-)   | active   |
| 85836d82-a187-424a-9816-a3db79e0bb8b | overcloud-controller-2.localdomain | True           | :-)   | standby  |
| 51e840f0-5d52-4b9c-a1a5-52b976abff7d | overcloud-controller-0.localdomain | True           | :-)   | standby  |
+--------------------------------------+------------------------------------+----------------+-------+----------+

[root@overcloud-controller-0 ~]# ip netns exec qrouter-2be95cbd-efee-4908-90cf-622fcef8cae8 ip a |grep " inet "
    inet 127.0.0.1/8 scope host lo
    inet 169.254.192.3/18 brd 169.254.255.255 scope global ha-835b136c-ec

Now overcloud-controller-0 is alive and in standby state(as expected), also in namespace, virtual IP is not there anymore.