I configured the following topology, and while OSPF could not balance the load between interfaces on its own, I was able to balance the load between interfaces by using the wan load balancing feature. Is Equal Cost Multi-Path (ECMP) supported by VyOS in OSPF?
Beware of ECMP:
-afaik, nowadays, per packet ECMP is used, not per udp/tcp flow. So prepare to receive packets out-of-order
-Moreover, firewalls in between can’t be statefull
vyos@vyos:~$ show version
Version: VyOS 1.4-rolling-202112211328
Release train: sagitta
Built by: autobuild@vyos.net
Built on: Tue 21 Dec 2021 13:28 UTC
Build UUID: 22a9f692-6e20-4886-8bba-c2f51a59d217
Build commit ID: f84a69729ad517
Architecture: x86_64
Boot via: installed image
System type: KVM guest
My configuration for all routers
VyOs 1 ( The left router)
set interfaces ethernet eth0 address '172.16.1.1/24'
set interfaces ethernet eth0 hw-id '50:00:00:12:00:00'
set interfaces ethernet eth1 address '172.16.2.1/24'
set interfaces ethernet eth1 hw-id '50:00:00:12:00:01'
set interfaces ethernet eth2 address '192.168.1.2/24'
set interfaces ethernet eth2 hw-id '50:00:00:12:00:02'
set protocols ospf area 0 network '172.16.1.0/24'
set protocols ospf area 0 network '172.16.2.0/24'
set protocols ospf redistribute connected
VyOS 2 ( The right router)
set interfaces ethernet eth0 hw-id '50:00:00:11:00:00'
set interfaces ethernet eth1 address '172.16.4.1/24'
set interfaces ethernet eth1 hw-id '50:00:00:11:00:01'
set interfaces ethernet eth2 address '10.10.10.2/24'
set interfaces ethernet eth2 hw-id '50:00:00:11:00:02'
set protocols ospf area 0 network '172.16.3.0/24'
set protocols ospf area 0 network '172.16.4.0/24'
set protocols ospf redistribute connected
VyOS 19
set interfaces ethernet eth0 address '172.16.2.2/24'
set interfaces ethernet eth0 hw-id '50:00:00:13:00:00'
set interfaces ethernet eth1 address '172.16.4.2/24'
set interfaces ethernet eth1 hw-id '50:00:00:13:00:01'
set protocols ospf area 0 network '172.16.2.0/24'
set protocols ospf area 0 network '172.16.4.0/24'
VyOS 20
set interfaces ethernet eth0 address '172.16.1.2/24'
set interfaces ethernet eth0 hw-id '50:00:00:14:00:00'
set interfaces ethernet eth1 address '172.16.3.2/24'
set interfaces ethernet eth1 hw-id '50:00:00:14:00:01'
set protocols ospf area 0 network '172.16.1.0/24'
set protocols ospf area 0 network '172.16.3.0/24'
VyOS 1 ( the left router)
vyos@vyos# run show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR,
f - OpenFabric,
> - selected route, * - FIB route, q - queued, r - rejected, b - backup
t - trapped, o - offload failure
O>* 10.10.10.0/24 [110/20] via 172.16.1.2, eth0, weight 1, 19:13:51
* via 172.16.2.2, eth1, weight 1, 19:13:51
O 172.16.1.0/24 [110/1] is directly connected, eth0, weight 1, 19:14:47
C>* 172.16.1.0/24 is directly connected, eth0, 19:14:54
O 172.16.2.0/24 [110/1] is directly connected, eth1, weight 1, 19:14:47
C>* 172.16.2.0/24 is directly connected, eth1, 19:14:55
O>* 172.16.3.0/24 [110/2] via 172.16.1.2, eth0, weight 1, 19:14:01
O>* 172.16.4.0/24 [110/2] via 172.16.2.2, eth1, weight 1, 19:13:57
C>* 192.168.1.0/24 is directly connected, eth2, 19:14:53
VyOS 2 ( the right router)
vyos@vyos# run show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR,
f - OpenFabric,
> - selected route, * - FIB route, q - queued, r - rejected, b - backup
t - trapped, o - offload failure
C>* 10.10.10.0/24 is directly connected, eth2, 19:15:43
O>* 172.16.1.0/24 [110/2] via 172.16.3.2, eth0, weight 1, 19:14:48
O>* 172.16.2.0/24 [110/2] via 172.16.4.2, eth1, weight 1, 19:14:53
O 172.16.3.0/24 [110/1] is directly connected, eth0, weight 1, 19:15:37
C>* 172.16.3.0/24 is directly connected, eth0, 19:15:44
O 172.16.4.0/24 [110/1] is directly connected, eth1, weight 1, 19:15:37
C>* 172.16.4.0/24 is directly connected, eth1, 19:15:44
O>* 192.168.1.0/24 [110/20] via 172.16.3.2, eth0, weight 1, 19:14:47
* via 172.16.4.2, eth1, weight 1, 19:14:47
[edit]
When I ping VPC22 from VPC21, I can see the Ping request go through the down path and the response come back from the top path. However, this does not imply that it can load balance between interfaces. Moreover, when I enable WAN load balancing, it works properly, indicating that both paths contain the ping request and response on their own. is there any miss-configuration?
it’s possible you some issues on you current configuration, but if you check on VyOS 2 ( The right router) …it shows a prefix with two path (ecmp behavior) :
O>* 192.168.1.0/24 [110/20] via 172.16.3.2, eth0, weight 1, 19:14:47
* via 172.16.4.2, eth1, weight 1, 19:14:47
it is possible that command set protocols ospf redistribute connected changes the metric (it should be works with network …so all the prefixes have the same attributes.)
I remove set protocols ospf redistribute connected command and used the network command to advertise the routes but it still has the problem which means that the Ping request goes through the down path and the response comes back from the top path.
VyOS 1 ( the left router)
vyos@vyos# run show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR,
f - OpenFabric,
> - selected route, * - FIB route, q - queued, r - rejected, b - backup
t - trapped, o - offload failure
O>* 10.10.10.0/24 [110/3] via 172.16.1.2, eth0, weight 1, 00:08:08
* via 172.16.2.2, eth1, weight 1, 00:08:08
O 172.16.1.0/24 [110/1] is directly connected, eth0, weight 1, 1d19h06m
C>* 172.16.1.0/24 is directly connected, eth0, 1d19h06m
O 172.16.2.0/24 [110/1] is directly connected, eth1, weight 1, 1d19h06m
C>* 172.16.2.0/24 is directly connected, eth1, 1d19h06m
O>* 172.16.3.0/24 [110/2] via 172.16.1.2, eth0, weight 1, 1d19h05m
O>* 172.16.4.0/24 [110/2] via 172.16.2.2, eth1, weight 1, 1d19h05m
O 192.168.1.0/24 [110/1] is directly connected, eth2, weight 1, 00:09:55
C>* 192.168.1.0/24 is directly connected, eth2, 1d19h06m
[edit]
VyOS 2 ( the right router)
vyos@vyos# run show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR,
f - OpenFabric,
> - selected route, * - FIB route, q - queued, r - rejected, b - backup
t - trapped, o - offload failure
O 10.10.10.0/24 [110/1] is directly connected, eth2, weight 1, 00:09:29
C>* 10.10.10.0/24 is directly connected, eth2, 1d19h07m
O>* 172.16.1.0/24 [110/2] via 172.16.3.2, eth0, weight 1, 1d19h06m
O>* 172.16.2.0/24 [110/2] via 172.16.4.2, eth1, weight 1, 1d19h06m
O 172.16.3.0/24 [110/1] is directly connected, eth0, weight 1, 1d19h07m
C>* 172.16.3.0/24 is directly connected, eth0, 1d19h07m
O 172.16.4.0/24 [110/1] is directly connected, eth1, weight 1, 1d19h07m
C>* 172.16.4.0/24 is directly connected, eth1, 1d19h07m
O>* 192.168.1.0/24 [110/3] via 172.16.3.2, eth0, weight 1, 00:11:05
* via 172.16.4.2, eth1, weight 1, 00:11:05
[edit]
## rt1 - ( The left router) / ecmp 10.10.10.0/24
vyos@vyos-r1:~$ show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR,
f - OpenFabric,
> - selected route, * - FIB route, q - queued, r - rejected, b - backup
t - trapped, o - offload failure
O>* 10.10.10.0/24 [110/3] via 172.16.1.2, eth0, weight 1, 00:02:01
* via 172.16.2.2, eth1, weight 1, 00:02:01
O 172.16.1.0/24 [110/1] is directly connected, eth0, weight 1, 00:11:12
C>* 172.16.1.0/24 is directly connected, eth0, 00:11:53
O 172.16.2.0/24 [110/1] is directly connected, eth1, weight 1, 00:11:12
C>* 172.16.2.0/24 is directly connected, eth1, 00:12:07
O>* 172.16.3.0/24 [110/2] via 172.16.1.2, eth0, weight 1, 00:11:06
O>* 172.16.4.0/24 [110/2] via 172.16.2.2, eth1, weight 1, 00:06:22
rt2 - ( The rigth router) / ecmp 192.168.0.0/24
vyos@vyosr2:~$ show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR,
f - OpenFabric,
> - selected route, * - FIB route, q - queued, r - rejected, b - backup
t - trapped, o - offload failure
O 10.10.10.0/24 [110/1] is directly connected, eth2, weight 1, 00:20:12
C>* 10.10.10.0/24 is directly connected, eth2, 00:20:54
O>* 172.16.1.0/24 [110/2] via 172.16.3.2, eth0, weight 1, 00:20:06
O>* 172.16.2.0/24 [110/2] via 172.16.4.2, eth1, weight 1, 00:20:06
O 172.16.3.0/24 [110/1] is directly connected, eth0, weight 1, 00:20:12
C>* 172.16.3.0/24 is directly connected, eth0, 00:20:57
O 172.16.4.0/24 [110/1] is directly connected, eth1, weight 1, 00:20:12
C>* 172.16.4.0/24 is directly connected, eth1, 00:21:13
O>* 192.168.1.0/24 [110/3] via 172.16.3.2, eth0, weight 1, 00:20:06
* via 172.16.4.2, eth1, weight 1, 00:20:0
if we check the RIB , it looks well :
vyos@vyos-r1:~$ ip route
10.10.10.0/24 nhid 41 proto ospf metric 20
nexthop via 172.16.1.2 dev eth0 weight 1
nexthop via 172.16.2.2 dev eth1 weight 1
172.16.1.0/24 dev eth0 proto kernel scope link src 172.16.1.1
172.16.2.0/24 dev eth1 proto kernel scope link src 172.16.2.1
172.16.3.0/24 nhid 25 via 172.16.1.2 dev eth0 proto ospf metric 20
172.16.4.0/24 nhid 36 via 172.16.2.2 dev eth1 proto ospf metric 20
192.168.1.0/24 dev eth2 proto kernel scope link src 192.168.1.2
vyos@vyosr2:~$ ip route
10.10.10.0/24 dev eth2 proto kernel scope link src 10.10.10.2
172.16.1.0/24 nhid 26 via 172.16.3.2 dev eth0 proto ospf metric 20
172.16.2.0/24 nhid 27 via 172.16.4.2 dev eth1 proto ospf metric 20
172.16.3.0/24 dev eth0 proto kernel scope link src 172.16.3.1
172.16.4.0/24 dev eth1 proto kernel scope link src 172.16.4.1
192.168.1.0/24 nhid 28 proto ospf metric 20
nexthop via 172.16.3.2 dev eth0 weight 1
nexthop via 172.16.4.2 dev eth1 weight 1
I understand that it should work with ecmp , but if we capture icmp form 192.168. 1.10 to 10.10.10.10 on rt2 - ( The rigth router) , it shows the following :
vyos@vyosr2:~$ tcpdump -i any icmp
15:21:08.783492 eth0 In IP 192.168.1.10 > 10.10.10.10: ICMP echo request, id 58749, seq 1, length 64
15:21:08.783617 eth2 Out IP 192.168.1.10 > 10.10.10.10: ICMP echo request, id 58749, seq 1, length 64
15:21:08.784790 eth2 In IP 10.10.10.10 > 192.168.1.10: ICMP echo reply, id 58749, seq 1, length 64
15:21:08.784841 eth1 Out IP 10.10.10.10 > 192.168.1.10: ICMP echo reply, id 58749, seq 1, length 64
15:21:09.794159 eth0 In IP 192.168.1.10 > 10.10.10.10: ICMP echo request, id 59005, seq 2, length 64
15:21:09.794248 eth2 Out IP 192.168.1.10 > 10.10.10.10: ICMP echo request, id 59005, seq 2, length 64
15:21:09.795665 eth2 In IP 10.10.10.10 > 192.168.1.10: ICMP echo reply, id 59005, seq 2, length 64
15:21:09.795738 eth1 Out IP 10.10.10.10 > 192.168.1.10: ICMP echo reply, id 59005, seq 2, length 64
15:21:10.802999 eth0 In IP 192.168.1.10 > 10.10.10.10: ICMP echo request, id 59261, seq 3, length 64
15:21:10.803855 eth2 Out IP 192.168.1.10 > 10.10.10.10: ICMP echo request, id 59261, seq 3, length 64
15:21:10.804921 eth2 In IP 10.10.10.10 > 192.168.1.10: ICMP echo reply, id 59261, seq 3, length 64
15:21:10.804964 eth1 Out IP 10.10.10.10 > 192.168.1.10: ICMP echo reply, id 59261, seq 3, length 64
15:21:11.812172 eth0 In IP 192.168.1.10 > 10.10.10.10: ICMP echo request, id 59517, seq 4, length 64
15:21:11.813146 eth2 Out IP 192.168.1.10 > 10.10.10.10: ICMP echo request, id 59517, seq 4, length 64
15:21:11.814565 eth2 In IP 10.10.10.10 > 192.168.1.10: ICMP echo reply, id 59517, seq 4, length 64
15:21:11.814611 eth1 Out IP 10.10.10.10 > 192.168.1.10: ICMP echo reply, id 59517, seq 4, length 64
15:21:12.822752 eth0 In IP 192.168.1.10 > 10.10.10.10: ICMP echo request, id 59773, seq 5, length 64
15:21:12.823577 eth2 Out IP 192.168.1.10 > 10.10.10.10: ICMP echo request, id 59773, seq 5, length 64
15:21:12.825261 eth2 In IP 10.10.10.10 > 192.168.1.10: ICMP echo reply, id 59773, seq 5, length 64
15:21:12.825312 eth1 Out IP 10.10.10.10 > 192.168.1.10: ICMP echo reply, id 59773, seq 5, length 64
15:21:13.831602 eth0 In IP 192.168.1.10 > 10.10.10.10: ICMP echo request, id 60029, seq 6, length 64
15:21:13.832484 eth2 Out IP 192.168.1.10 > 10.10.10.10: ICMP echo request, id 60029, seq 6, length 64
Once installed into the FIB, FRR currently has little control over what nexthops are choosen to forward packets on. Currently the Linux kernel has a fib_multipath_hash_policy sysctl which dictates how the hashing algorithm is used to forward packets.
let me show , if you change the source of packet , it chooses a different path to send the packet :
vyos@vyos-r1:~$ sudo traceroute 10.10.10.10
traceroute to 10.10.10.10 (10.10.10.10), 30 hops max, 60 byte packets
1 172.16.1.2 (172.16.1.2) 4.595 ms 21.028 ms 20.893 ms
2 172.16.3.1 (172.16.3.1) 20.305 ms 20.175 ms 19.470 ms
3 10.10.10.10 (10.10.10.10) 57.175 ms 57.106 ms 57.040 ms
vyos@vyos-r1:~$ sudo traceroute -s 192.168.1.2 10.10.10.10
traceroute to 10.10.10.10 (10.10.10.10), 30 hops max, 60 byte packets
1 172.16.2.2 (172.16.2.2) 4.599 ms 2.803 ms 2.478 ms
2 172.16.4.1 (172.16.4.1) 22.095 ms 20.382 ms 19.858 ms
3 10.10.10.10 (10.10.10.10) 51.912 ms 51.007 ms 49.929 ms
Thanks for your answer. After I add two other hosts to both left and right routers, I saw that VyOS was able to balance the load between interfaces. It seems to work.