How L3 and DHCP agents HA work in Red Hat OSP7
In Red Hat Openstack Platform 7, l3-agent and dhcp-agents are running in active-active on each controller node, instead of active-standby in OSP6.
[[email protected] ~]# pcs status | grep 'l3-agent\|dhcp-agent' -A1
Clone Set: neutron-l3-agent-clone [neutron-l3-agent]
Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
--
Clone Set: neutron-dhcp-agent-clone [neutron-dhcp-agent]
Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
Let's look into more details of how tenant dhcp server and vrouter work in HA.
dhcp-agent HA
Let's create a tenant network with dhcp enabled:
[[email protected] ~]# neutron net-create testdhcp
[[email protected] ~]# neutron subnet-create --name subnet-testdhcp testdhcp 192.168.200.0/24
Before we have this line in /etc/neutron/neutron.conf
:
[[email protected] ~]# grep ^dhcp_agents_per_network /etc/neutron/neutron.conf
dhcp_agents_per_network = 3
So here we have 3 dhcp servers running on 3 controller nodes, same namespace gets created on each of them.
[[email protected] ~]# for i in 0 1 2; do ssh overcloud-controller-$i ip netns; done
qdhcp-3f3f6372-0c96-4521-9d60-2524c139ab72
qdhcp-3f3f6372-0c96-4521-9d60-2524c139ab72
qdhcp-3f3f6372-0c96-4521-9d60-2524c139ab72
We could see 3 dnsmasq servers with same hosts file running on each controller:
[[email protected] ~]# for i in 0 1 2; do echo "On overcloud-controller-$i:" ; ssh overcloud-controller-$i ip netns exec qdhcp-3f3f6372-0c96-4521-9d60-2524c139ab72 ps -ef|grep dnsmasq; done
On overcloud-controller-0:
root 1181 27974 0 09:05 pts/0 00:00:00 grep --color=auto dnsmasq
nobody 6585 1 0 08:27 ? 00:00:00 dnsmasq --no-hosts --no-resolv --strict-order --bind-interfaces --interface=tapf1509d9f-57 --except-interface=lo --pid-file=/var/lib/neutron/dhcp/3f3f6372-0c96-4521-9d60-2524c139ab72/pid --dhcp-hostsfile=/var/lib/neutron/dhcp/3f3f6372-0c96-4521-9d60-2524c139ab72/host --addn-hosts=/var/lib/neutron/dhcp/3f3f6372-0c96-4521-9d60-2524c139ab72/addn_hosts --dhcp-optsfile=/var/lib/neutron/dhcp/3f3f6372-0c96-4521-9d60-2524c139ab72/opts --dhcp-leasefile=/var/lib/neutron/dhcp/3f3f6372-0c96-4521-9d60-2524c139ab72/leases --dhcp-range=set:tag0,192.168.200.0,static,86400s --dhcp-lease-max=256 --conf-file=/etc/neutron/dnsmasq-neutron.conf --domain=openstacklocal
On overcloud-controller-1:
nobody 6714 1 0 08:27 ? 00:00:00 dnsmasq --no-hosts --no-resolv --strict-order --bind-interfaces --interface=tap97155ac6-9b --except-interface=lo --pid-file=/var/lib/neutron/dhcp/3f3f6372-0c96-4521-9d60-2524c139ab72/pid --dhcp-hostsfile=/var/lib/neutron/dhcp/3f3f6372-0c96-4521-9d60-2524c139ab72/host --addn-hosts=/var/lib/neutron/dhcp/3f3f6372-0c96-4521-9d60-2524c139ab72/addn_hosts --dhcp-optsfile=/var/lib/neutron/dhcp/3f3f6372-0c96-4521-9d60-2524c139ab72/opts --dhcp-leasefile=/var/lib/neutron/dhcp/3f3f6372-0c96-4521-9d60-2524c139ab72/leases --dhcp-range=set:tag0,192.168.200.0,static,86400s --dhcp-lease-max=256 --conf-file=/etc/neutron/dnsmasq-neutron.conf --domain=openstacklocal
On overcloud-controller-2:
nobody 21166 1 0 08:27 ? 00:00:00 dnsmasq --no-hosts --no-resolv --strict-order --bind-interfaces --interface=tap05b08c68-49 --except-interface=lo --pid-file=/var/lib/neutron/dhcp/3f3f6372-0c96-4521-9d60-2524c139ab72/pid --dhcp-hostsfile=/var/lib/neutron/dhcp/3f3f6372-0c96-4521-9d60-2524c139ab72/host --addn-hosts=/var/lib/neutron/dhcp/3f3f6372-0c96-4521-9d60-2524c139ab72/addn_hosts --dhcp-optsfile=/var/lib/neutron/dhcp/3f3f6372-0c96-4521-9d60-2524c139ab72/opts --dhcp-leasefile=/var/lib/neutron/dhcp/3f3f6372-0c96-4521-9d60-2524c139ab72/leases --dhcp-range=set:tag0,192.168.200.0,static,86400s --dhcp-lease-max=256 --conf-file=/etc/neutron/dnsmasq-neutron.conf --domain=openstacklocal
[[email protected] ~]# for i in 0 1 2; do echo "On overcloud-controller-$i:" ;ssh overcloud-controller-$i ip netns exec qdhcp-3f3f6372-0c96-4521-9d60-2524c139ab72 cat /var/lib/neutron/dhcp/3f3f6372-0c96-4521-9d60-2524c139ab72/host ; done
On overcloud-controller-0:
fa:16:3e:98:b5:e4,host-192-168-200-2.openstacklocal,192.168.200.2
fa:16:3e:f8:8c:4e,host-192-168-200-4.openstacklocal,192.168.200.4
fa:16:3e:d7:e4:22,host-192-168-200-3.openstacklocal,192.168.200.3
On overcloud-controller-1:
fa:16:3e:98:b5:e4,host-192-168-200-2.openstacklocal,192.168.200.2
fa:16:3e:f8:8c:4e,host-192-168-200-4.openstacklocal,192.168.200.4
fa:16:3e:d7:e4:22,host-192-168-200-3.openstacklocal,192.168.200.3
On overcloud-controller-2:
fa:16:3e:98:b5:e4,host-192-168-200-2.openstacklocal,192.168.200.2
fa:16:3e:f8:8c:4e,host-192-168-200-4.openstacklocal,192.168.200.4
fa:16:3e:d7:e4:22,host-192-168-200-3.openstacklocal,192.168.200.3
Also we could see 3 neutron ports for 3 dhcp server IPs:
[[email protected] ~]# neutron port-list
+--------------------------------------+------+-------------------+--------------------------------------------------------------------------------------+
| id | name | mac_address | fixed_ips |
+--------------------------------------+------+-------------------+--------------------------------------------------------------------------------------+
| 05b08c68-4963-4003-ba83-46175bb72d24 | | fa:16:3e:98:b5:e4 | {"subnet_id": "183a7323-a015-4eb5-8108-9e1295dfbe42", "ip_address": "192.168.200.2"} |
| 97155ac6-9bc6-42ce-98a1-cb2c868868eb | | fa:16:3e:f8:8c:4e | {"subnet_id": "183a7323-a015-4eb5-8108-9e1295dfbe42", "ip_address": "192.168.200.4"} |
| f1509d9f-5704-4114-a81b-e328d1076419 | | fa:16:3e:d7:e4:22 | {"subnet_id": "183a7323-a015-4eb5-8108-9e1295dfbe42", "ip_address": "192.168.200.3"} |
+--------------------------------------+------+-------------------+--------------------------------------------------------------------------------------+
Now we know in OSP7, tenant dhcp HA is achieved by running 3 dhcp servers at the same time, if there's one controller down, still other 2 dhcp servers are running and serving dhcp requests.
vRouter(l3-agent) HA
Let's create a vRouter
[[email protected] ~]# neutron router-create testrouter
Created a new router:
+-----------------------+--------------------------------------+
| Field | Value |
+-----------------------+--------------------------------------+
| admin_state_up | True |
| distributed | False |
| external_gateway_info | |
| ha | True |
| id | 2be95cbd-efee-4908-90cf-622fcef8cae8 |
| name | testrouter |
| routes | |
| status | ACTIVE |
| tenant_id | c5cb88bd612949a5afaed8acf79350ef |
+-----------------------+--------------------------------------+
We can see ha
is true
, because of have this:
[[email protected] ~]# grep ^l3_ha /etc/neutron/neutron.conf
l3_ha = True
l3_ha_net_cidr = 169.254.192.0/18
And a "HA network" is created using 169.254.192.0/18
network as configured in neutron.conf
:
[[email protected] ~]# neutron net-list
+--------------------------------------+----------------------------------------------------+-------------------------------------------------------+
| id | name | subnets |
+--------------------------------------+----------------------------------------------------+-------------------------------------------------------+
| 3f3f6372-0c96-4521-9d60-2524c139ab72 | testdhcp | 183a7323-a015-4eb5-8108-9e1295dfbe42 192.168.200.0/24 |
| d160978b-fa7d-4d3e-bb45-a9ba1d98439f | HA network tenant c5cb88bd612949a5afaed8acf79350ef | 6ecc49ee-2dd9-47eb-9593-fa6fc55469a3 169.254.192.0/18 |
+--------------------------------------+----------------------------------------------------+-------------------------------------------------------+
We can see 3 ports created for 3 controllers in this HA network:
[[email protected] ~]# neutron router-port-list testrouter
+--------------------------------------+-------------------------------------------------+-------------------+--------------------------------------------------------------------------------------+
| id | name | mac_address | fixed_ips |
+--------------------------------------+-------------------------------------------------+-------------------+--------------------------------------------------------------------------------------+
| 2dba7c38-0547-4d77-806c-e9bf9acb8a55 | HA port tenant c5cb88bd612949a5afaed8acf79350ef | fa:16:3e:3d:5b:95 | {"subnet_id": "6ecc49ee-2dd9-47eb-9593-fa6fc55469a3", "ip_address": "169.254.192.2"} |
| 5d00bea2-3dbe-4ea6-97bb-65ce74fb056b | HA port tenant c5cb88bd612949a5afaed8acf79350ef | fa:16:3e:fa:8c:8b | {"subnet_id": "6ecc49ee-2dd9-47eb-9593-fa6fc55469a3", "ip_address": "169.254.192.1"} |
| 835b136c-ec0b-4f5d-a96c-145de332efb6 | HA port tenant c5cb88bd612949a5afaed8acf79350ef | fa:16:3e:e7:22:d0 | {"subnet_id": "6ecc49ee-2dd9-47eb-9593-fa6fc55469a3", "ip_address": "169.254.192.3"} |
+--------------------------------------+-------------------------------------------------+-------------------+--------------------------------------------------------------------------------------+
Keeplived/VRRP process is running for this HA router:
[[email protected] ~]# ps -ef | grep keepalived
neutron 22025 1 0 02:35 ? 00:00:00 /usr/bin/python2 /bin/neutron-keepalived-state-change --router_id=2be95cbd-efee-4908-90cf-622fcef8cae8 --namespace=qrouter-2be95cbd-efee-4908-90cf-622fcef8cae8 --conf_dir=/var/lib/neutron/ha_confs/2be95cbd-efee-4908-90cf-622fcef8cae8 --monitor_interface=ha-835b136c-ec --monitor_cidr=169.254.0.1/24 --pid_file=/var/lib/neutron/external/pids/2be95cbd-efee-4908-90cf-622fcef8cae8.monitor.pid --state_path=/var/lib/neutron --user=998 --group=996
root 22041 20724 0 03:15 pts/0 00:00:00 grep --color=auto keepalived
root 22047 1 0 02:35 ? 00:00:00 keepalived -P -f /var/lib/neutron/ha_confs/2be95cbd-efee-4908-90cf-622fcef8cae8/keepalived.conf -p /var/lib/neutron/ha_confs/2be95cbd-efee-4908-90cf-622fcef8cae8.pid -r /var/lib/neutron/ha_confs/2be95cbd-efee-4908-90cf-622fcef8cae8.pid-vrrp
root 22049 22047 0 02:35 ? 00:00:00 keepalived -P -f /var/lib/neutron/ha_confs/2be95cbd-efee-4908-90cf-622fcef8cae8/keepalived.conf -p /var/lib/neutron/ha_confs/2be95cbd-efee-4908-90cf-622fcef8cae8.pid -r /var/lib/neutron/ha_confs/2be95cbd-efee-4908-90cf-622fcef8cae8.pid-vrrp
Let's check keeplived.conf
of this HA router:
[[email protected] ~]# cat /var/lib/neutron/ha_confs/2be95cbd-efee-4908-90cf-622fcef8cae8/keepalived.conf
vrrp_instance VR_1 {
state BACKUP
interface ha-835b136c-ec
virtual_router_id 1
priority 50
garp_master_repeat 5
garp_master_refresh 10
nopreempt
advert_int 2
track_interface {
ha-835b136c-ec
}
virtual_ipaddress {
169.254.0.1/24 dev ha-835b136c-ec
}
We can see there's one internal virtual_ipaddress
, 169.254.0.1
defined for this router, and it now running on overcloud-controller-0
:
[[email protected] ~]# ip netns exec qrouter-2be95cbd-efee-4908-90cf-622fcef8cae8 ip a | grep " inet 16 "
inet 169.254.192.3/18 brd 169.254.255.255 scope global ha-835b136c-ec
inet 169.254.0.1/24 scope global ha-835b136c-ec
From other 2 controllers, we can't see this VIP, only keeplived/VRRP IP available:
[[email protected] ~]# ssh overcloud-controller-1 ip netns exec qrouter-2be95cbd-efee-4908-90cf-622fcef8cae8 ip a | grep " inet 16"
inet 169.254.192.2/18 brd 169.254.255.255 scope global ha-2dba7c38-05
[[email protected] ~]# ssh overcloud-controller-2 ip netns exec qrouter-2be95cbd-efee-4908-90cf-622fcef8cae8 ip a | grep " inet 16"
inet 169.254.192.1/18 brd 169.254.255.255 scope global ha-5d00bea2-3d
This can be witnessed by neutron l3-agent-list-hosting-router
:
[[email protected] ~]# neutron l3-agent-list-hosting-router testrouter
+--------------------------------------+------------------------------------+----------------+-------+----------+
| id | host | admin_state_up | alive | ha_state |
+--------------------------------------+------------------------------------+----------------+-------+----------+
| a7b76ad9-83a7-4c2c-a9ef-59e60e175a81 | overcloud-controller-1.localdomain | True | :-) | standby |
| 85836d82-a187-424a-9816-a3db79e0bb8b | overcloud-controller-2.localdomain | True | :-) | standby |
| 51e840f0-5d52-4b9c-a1a5-52b976abff7d | overcloud-controller-0.localdomain | True | :-) | active |
+--------------------------------------+------------------------------------+----------------+-------+----------+
We can see the testrouter
is active on overcloud-controller-0
, standby on other 2 controllers.
Now let's add the testdhcp
network to testrouter
:
[[email protected] ~]# neutron router-interface-add testrouter subnet-testdhcp
Added interface 946f1e38-2ef3-4747-8a81-60b14909d8c0 to router testrouter.
Check output of ip a
on overcloud-controller-0
, vrouter namespace:
[[email protected] ~]# ip netns exec qrouter-2be95cbd-efee-4908-90cf-622fcef8cae8 ip a |grep " inet "
inet 127.0.0.1/8 scope host lo
inet 169.254.192.3/18 brd 169.254.255.255 scope global ha-835b136c-ec
inet 169.254.0.1/24 scope global ha-835b136c-ec
inet 192.168.200.1/24 scope global qr-946f1e38-2e
We can see now the gateway ip of testdhcp
network, 192.168.200.1
is running on overcloud-controller-0
.
Also we should be able to see keeplived.conf
gets updated:
[[email protected] ~]# cat /var/lib/neutron/ha_confs/2be95cbd-efee-4908-90cf-622fcef8cae8/keepalived.conf
vrrp_instance VR_1 {
state BACKUP
interface ha-835b136c-ec
virtual_router_id 1
priority 50
garp_master_repeat 5
garp_master_refresh 10
nopreempt
advert_int 2
track_interface {
ha-835b136c-ec
}
virtual_ipaddress {
169.254.0.1/24 dev ha-835b136c-ec
}
virtual_ipaddress_excluded {
192.168.200.1/24 dev qr-946f1e38-2e
fe80::f816:3eff:fe1a:58d8/64 dev qr-946f1e38-2e scope link
}
L3-agent HA working way can be illustrated as:
Let's see how vRouter switch over when active controller is down, now we shutdown active overcloud-controller-0
:
[[email protected] ~]# shutdown now
Connection to 192.0.2.7 closed by remote host.
Now check who will take over the testrouter
:
[[email protected] ~]# neutron l3-agent-list-hosting-router testrouter
+--------------------------------------+------------------------------------+----------------+-------+----------+
| id | host | admin_state_up | alive | ha_state |
+--------------------------------------+------------------------------------+----------------+-------+----------+
| a7b76ad9-83a7-4c2c-a9ef-59e60e175a81 | overcloud-controller-1.localdomain | True | :-) | active |
| 85836d82-a187-424a-9816-a3db79e0bb8b | overcloud-controller-2.localdomain | True | :-) | standby |
| 51e840f0-5d52-4b9c-a1a5-52b976abff7d | overcloud-controller-0.localdomain | True | xxx | active |
+--------------------------------------+------------------------------------+----------------+-------+----------+
We can see overcloud-controller-0
is not alive anymore, and now overcloud-controller-1
is active. Let's check if virtual IP moves to new active node or not:
[[email protected] ~]# ip netns exec qrouter-2be95cbd-efee-4908-90cf-622fcef8cae8 ip a |grep " inet "
inet 127.0.0.1/8 scope host lo
inet 169.254.192.2/18 brd 169.254.255.255 scope global ha-2dba7c38-05
inet 169.254.0.1/24 scope global ha-2dba7c38-05
inet 192.168.200.1/24 scope global qr-946f1e38-2e
We can see virtual IP 192.168.200.1
is now running on overcloud-controller-1
.
We still see for this vRouter, overcloud-controller-0
is still active although it's not alive, and virtual IP has moved to overcloud-controller-1
, now let's bring overcloud-controller-0
back online, to see what will happen.
When it's online, check again:
[[email protected] ~]# neutron l3-agent-list-hosting-router testrouter
+--------------------------------------+------------------------------------+----------------+-------+----------+
| id | host | admin_state_up | alive | ha_state |
+--------------------------------------+------------------------------------+----------------+-------+----------+
| a7b76ad9-83a7-4c2c-a9ef-59e60e175a81 | overcloud-controller-1.localdomain | True | :-) | active |
| 85836d82-a187-424a-9816-a3db79e0bb8b | overcloud-controller-2.localdomain | True | :-) | standby |
| 51e840f0-5d52-4b9c-a1a5-52b976abff7d | overcloud-controller-0.localdomain | True | :-) | standby |
+--------------------------------------+------------------------------------+----------------+-------+----------+
[[email protected] ~]# ip netns exec qrouter-2be95cbd-efee-4908-90cf-622fcef8cae8 ip a |grep " inet "
inet 127.0.0.1/8 scope host lo
inet 169.254.192.3/18 brd 169.254.255.255 scope global ha-835b136c-ec
Now overcloud-controller-0
is alive and in standby state(as expected), also in namespace, virtual IP is not there anymore.