(1.4-rolling) route table for WAN load balancing not created

Hey all. I apologize for the long post. I recently upgraded from 1.3 equuleus to 1.4 rolling. Suddenly, WAN load balancing is not working. I think a route table is missing but before I create a bug report maybe I’m doing something wrong.

Version:

10:55 vyos@gw 1.4-rolling-202303090317 /home/vyos
✎ edit » run show version
Version:          VyOS 1.4-rolling-202303090317
Release train:    current

Built by:         [email protected]
Built on:         Thu 09 Mar 2023 03:17 UTC
Build UUID:       eb850b00-0438-4c51-8d73-3c6438ff4c9a
Build commit ID:  8f4837fcf72865

Architecture:     x86_64
Boot via:         installed image
System type:       guest

Hardware vendor:  Deciso B.V.
Hardware model:   NetBoard-A10
Hardware S/N:
Hardware UUID:    12345678-1234-5678-90ab-cddeefaabbcc

Copyright:        VyOS maintainers and contributors

Two WAN interfaces (DHCP) and an LACP bond for VLAN trunk:

10:57 vyos@gw 1.4-rolling-202303090317 /home/vyos
✎ edit interfaces » show
 bonding bond0 {
     description "Bonded VLAN trunk to switch1 (Mikrotik CRS312-4C+8XG)"
     member {
         interface eth3
         interface eth4
     }
     mode 802.3ad
     vif 1 {
         address 192.168.69.1/24
         description Management
         mtu 1500
     }
     vif 7 {
         address 192.168.7.254/24
         description WWAN
         mtu 1500
     }
     vif 30 {
         address 192.168.30.1/24
         description IOT
         mtu 1500
     }
     vif 40 {
         address 192.168.40.1/24
         description Casting
         mtu 1500
     }
     vif 70 {
         address 192.168.70.1/24
         description Gedeeld
         mtu 1500
     }
     vif 80 {
         address 192.168.80.1/24
         description Kinderen
         mtu 1500
     }
     vif 88 {
         address 192.168.88.2/24
         description "GPEN21 management"
         mtu 1500
     }
     vif 90 {
         address 192.168.90.1/24
         description Gasten
         mtu 1500
     }
 }
 ethernet eth0 {
     address dhcp
     description "Port 0 (Ziggo WAN)"
     dhcp-options {
         default-route-distance 215
     }
     duplex auto
     hw-id f4:90:ea:00:99:df
     offload {
         gro
         gso
         sg
         tso
     }
     ring-buffer {
         rx 4096
         tx 4096
     }
     speed auto
 }
 ethernet eth1 {
     description "Port 1 (Fiber WAN)"
     duplex auto
     hw-id f4:90:ea:00:99:e0
     offload {
         gro
         gso
         sg
         tso
     }
     ring-buffer {
         rx 4096
         tx 4096
     }
     speed auto
     vif 261 {
         address dhcp
         description "Fiber WAN VLAN 261"
         dhcp-options {
             host-name gw.thuis.local
         }
     }
 }
 ethernet eth2 {
     description "Port X0 (SFP+)"
     hw-id f4:90:ea:00:99:e2
     offload {
         gro
         gso
         sg
         tso
     }
     speed auto
 }
 ethernet eth3 {
     description "Port 2 (RJ45), LAG member"
     duplex auto
     hw-id f4:90:ea:00:99:e1
     offload {
         gro
         gso
         rfs
         rps
         sg
         tso
     }
     ring-buffer {
         rx 4096
         tx 4096
     }
     speed auto
 }
 ethernet eth4 {
     description "Port X1 (SFP+), LAG member"
     hw-id f4:90:ea:00:99:e3
     offload {
         gro
         gso
         rfs
         rps
         sg
         tso
     }
 }
 loopback lo {
 }

WLB config:

10:59 vyos@gw 1.4-rolling-202303090317 /home/vyos
✎ edit load-balancing » show
 wan {
     flush-connections
     interface-health eth0 {
         failure-count 5
         nexthop dhcp
         success-count 1
         test 10 {
             resp-time 5
             target 8.8.8.8
             ttl-limit 1
             type ping
         }
         test 20 {
             resp-time 5
             target 1.1.1.1
             ttl-limit 1
             type ping
         }
     }
     interface-health eth1.261 {
         failure-count 5
         nexthop dhcp
         success-count 1
         test 10 {
             resp-time 5
             target 8.8.4.4
             ttl-limit 1
             type ping
         }
         test 20 {
             resp-time 5
             target 1.0.0.1
             ttl-limit 1
             type ping
         }
     }
     rule 1 {
         destination {
             address 192.168.0.0/16
         }
         exclude
         inbound-interface bond0+
         protocol all
     }
     rule 2 {
         destination {
             address 172.16.0.0/12
         }
         exclude
         inbound-interface bond0+
         protocol all
     }
     rule 3 {
         destination {
             address 10.0.0.0/8
         }
         exclude
         inbound-interface bond0+
         protocol all
     }
     rule 5 {
         description "Chromecast traffic first over Ziggo WAN"
         failover
         inbound-interface bond0.40
         interface eth0 {
             weight 10
         }
         interface eth1.261 {
             weight 1
         }
         protocol all
     }
     rule 6 {
         description "DSLReports Puma chipset test traffic first over Ziggo WAN"
         destination {
             address 83.150.0.50
         }
         failover
         inbound-interface bond0+
         interface eth0 {
             weight 10
         }
         interface eth1.261 {
             weight 1
         }
         protocol all
     }
     rule 7 {
         description "speedtest.serverius.net over Ziggo WAN"
         destination {
             address 178.21.16.76
         }
         failover
         inbound-interface bond0+
         interface eth0 {
             weight 10
         }
         interface eth1.261 {
             weight 1
         }
         protocol all
     }
     rule 10 {
         failover
         inbound-interface bond0+
         interface eth0 {
             weight 1
         }
         interface eth1.261 {
             weight 10
         }
         protocol all
     }
     sticky-connections {
         inbound
     }
 }

Static routes for checks:

11:00 vyos@gw 1.4-rolling-202303090317 /home/vyos
✎ edit protocols static » show
 route 1.0.0.1/32 {
     dhcp-interface eth1.261
 }
 route 1.1.1.1/32 {
     dhcp-interface eth0
 }
 route 8.8.4.4/32 {
     dhcp-interface eth1.261
 }
 route 8.8.8.8/32 {
     dhcp-interface eth0
 }

Content of the mangle table:

11:06 vyos@gw 1.4-rolling-202303090317 /home/vyos
» sudo nft -a list ruleset
<...>
# Warning: table ip mangle is managed by iptables-nft, do not touch!
table ip mangle { # handle 34
        chain PREROUTING { # handle 3
                type filter hook prerouting priority mangle; policy accept;
                iifname "eth1.261" ct state new counter packets 628 bytes 79151 jump ISP_eth1.261_IN # handle 46
                iifname "eth0" ct state new counter packets 159 bytes 7821 jump ISP_eth0_IN # handle 39
                counter packets 1488737 bytes 1671089785 jump WANLOADBALANCE_PRE # handle 32
        }

        chain WANLOADBALANCE_PRE { # handle 30
                iifname "bond0*" ip daddr 192.168.0.0/16 counter packets 7795 bytes 2171762 accept # handle 47
                iifname "bond0*" ip daddr 172.16.0.0/12 counter packets 29600 bytes 17621333 accept # handle 48
                iifname "bond0*" ip daddr 10.0.0.0/8 counter packets 0 bytes 0 accept # handle 49
                iifname "bond0.40" ct state new counter packets 24 bytes 2879 jump ISP_eth0 # handle 50
                iifname "bond0.40" counter packets 264 bytes 47133 meta mark set ct mark # handle 51
                iifname "bond0*" ip daddr 83.150.0.50 ct state new counter packets 302 bytes 15704 jump ISP_eth0 # handle 52
                iifname "bond0*" ip daddr 83.150.0.50 counter packets 1213 bytes 149896 meta mark set ct mark # handle 53
                iifname "bond0*" ip daddr 178.21.16.76 ct state new counter packets 13 bytes 764 jump ISP_eth0 # handle 54
                iifname "bond0*" ip daddr 178.21.16.76 counter packets 215379 bytes 1172910459 meta mark set ct mark # handle 55
                iifname "bond0*" ct state new counter packets 1487 bytes 164296 jump ISP_eth1.261 # handle 56
                iifname "bond0*" counter packets 545720 bytes 1418627338 meta mark set ct mark # handle 57
        }

        chain ISP_eth0 { # handle 33
                counter packets 339 bytes 19347 ct mark set 0xc9 # handle 34
                counter packets 339 bytes 19347 meta mark set 0xc9 # handle 35
                counter packets 339 bytes 19347 accept # handle 36
        }

        chain ISP_eth0_IN { # handle 37
                counter packets 159 bytes 7821 ct mark set 0xc9 # handle 38
        }

        chain ISP_eth1.261 { # handle 40
                counter packets 1487 bytes 164296 ct mark set 0xca # handle 41
                counter packets 1487 bytes 164296 meta mark set 0xca # handle 42
                counter packets 1487 bytes 164296 accept # handle 43
        }

        chain ISP_eth1.261_IN { # handle 44
                counter packets 628 bytes 79151 ct mark set 0xca # handle 45
        }
}

Check IP rules:

11:09 vyos@gw 1.4-rolling-202303090317 /home/vyos
» ip rule list
0:      from all lookup local
218:    from all fwmark 0xca lookup 202
219:    from all fwmark 0xc9 lookup 201
220:    from all lookup 220
32766:  from all lookup main
32767:  from all lookup default

Check what a packet coming from a Chromecast in the CAST VLAN (bond0.40) would do (remember config: should go out Ziggo on eth0, handle #50 above):

11:06 vyos@gw 1.4-rolling-202303090317 /home/vyos
» ip route get 9.9.9.9 from 192.168.40.100 iif bond0.40 mark 0xc9
9.9.9.9 from 192.168.40.100 via 188.239.183.1 dev eth1.261 mark 0xc9
    cache iif bond0.40

Shows that VyOS will route that out over eth1.261 (wrong WAN!). Mark 0xc9 should get route table 201. So what’s in table 201 and 202?

11:11 vyos@gw 1.4-rolling-202303090317 /home/vyos
» sh ip ro table 201

11:11 vyos@gw 1.4-rolling-202303090317 /home/vyos
» sh ip ro table 202
Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR,
       f - OpenFabric,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

VRF default table 202:
K>* 0.0.0.0/0 [0/0] via 188.239.183.1, eth1.261, 00:32:58

Table 201 is empty! In equuleus I would see:

root@gw:~# show ip route table 201
VRF default table 201:

K>* 0.0.0.0/0 [0/0] via 82.217.228.1, eth0, 18:42:51

What can I do to test this?

Did you check wan load balance status? Or both up?

Thanks @n.fort for your reply.

» show wan
Interface:  eth0
  Status:  active
  Last Status Change:  Mon Mar 13 11:51:44 2023
  +Test:  ping  Target: 8.8.8.8
   Test:
    Last Interface Success:  0s
    Last Interface Failure:  n/a
    # Interface Failure(s):  0

Interface:  eth1.261
  Status:  active
  Last Status Change:  Mon Mar 13 11:51:44 2023
  +Test:  ping  Target: 8.8.4.4
   Test:  ping  Target: 1.0.0.1
    Last Interface Success:  0s
    Last Interface Failure:  n/a
    # Interface Failure(s):  0

» ping 8.8.8.8 interface eth0
PING 8.8.8.8 (8.8.8.8) from 84.107.140.244 eth0: 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=118 time=10.1 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=118 time=11.3 ms
64 bytes from 8.8.8.8: icmp_seq=3 ttl=118 time=13.0 ms
64 bytes from 8.8.8.8: icmp_seq=4 ttl=118 time=9.29 ms
64 bytes from 8.8.8.8: icmp_seq=5 ttl=118 time=12.5 ms

» sh int
Codes: S - State, L - Link, u - Up, D - Down, A - Admin Down
Interface        IP Address                        S/L  Description
---------        ----------                        ---  -----------
bond0            -                                 u/u  Bonded VLAN trunk to switch1 (Mikrotik CRS312-4C+8XG)
bond0.1          192.168.69.1/24                   u/u  Management
bond0.7          192.168.7.254/24                  u/u  WWAN
bond0.30         192.168.30.1/24                   u/u  IOT
bond0.40         192.168.40.1/24                   u/u  Casting
bond0.70         192.168.70.1/24                   u/u  Gedeeld
bond0.80         192.168.80.1/24                   u/u  Kinderen
bond0.88         192.168.88.2/24                   u/u  GPEN21 management
bond0.90         192.168.90.1/24                   u/u  Gasten
eth0             x.x.x.x/24                 u/u  Port 0 (Ziggo WAN)
eth1             -                                 u/u  Port 1 (Fiber WAN)
eth1.261         x.x.x.x/24                u/u  Fiber WAN VLAN 261
eth2             -                                 u/D  Port X0 (SFP+)
eth3             -                                 u/u  Port 2 (RJ45), LAG member
eth4             -                                 u/u  Port X1 (SFP+), LAG member
lo               127.0.0.1/8                       u/u
                 ::1/128

There is the bug with wlb + dhcp T4362

Yep!

» cat /var/lib/dhcp/dhclient_eth0.lease
Sun Mar 12 14:07:11 CET 2023
reason='PREINIT'
interface='eth0'
new_expiry=''
new_dhcp_lease_time=''
medium=''
alias_ip_address=''
new_ip_address=''
new_broadcast_address=''
new_subnet_mask=''
new_domain_name=''
new_network_number=''
new_domain_name_servers=''
new_routers=''
new_static_routes=''
new_dhcp_server_identifier=''
new_dhcp_message_type=''
old_ip_address=''
old_subnet_mask=''
old_domain_name=''
old_domain_name_servers=''
old_routers=''
old_static_routes=''

» show dhcp client leases
interface  : eth0
last update: Sun Mar 12 14:07:11 CET 2023
reason     : PREINIT

interface  : eth1.261
ip address : 188.239.183.x    [Active]
subnet mask: 255.255.255.0
domain name: ngnet.nl   [overridden by domain-name set using CLI]
router     : 188.239.183.1
name server: 1.1.1.1 8.8.8.8
dhcp server: 83.174.143.254
lease time : 900
last update: Mon Mar 13 12:13:36 CET 2023
expiry     : Mon Mar 13 12:28:35 CET 2023
reason     : RENEW

Thanks @Viacheslav. I actually subscribed to that bug but wasn’t sure if it was the one :slight_smile:
I added a link to this thread for reference.

Is there anything I can do to help analyze this for you? Do you need specific logs, debug stuff etc.?

First of all, you have wrong configured WLB. There must be one primary interface. In your configuration all interfaces are configured as failover.

https://docs.vyos.io/en/latest/configuration/loadbalancing/index.html#failover

Thanks @pepe for your comment. I struggle a bit with what you mean.

I followed the examples set out in WAN Load Balancer examples — VyOS 1.4.x (sagitta) documentation.

I get the expected result:

  • all traffic goes out eth1.261 [rule 10]

except for

  • excluded traffic [rules 1,2,3]
  • and specific traffic from Chromecast [rule 5]
  • and specific traffic to DSLreports and a speedtest site [rules 6 and 7]

When eth0 is down, eth1.261 takes over. And when eth1.261 is down, eth0 takes over, regardless of preference.

So I’m wondering what you think the correct syntax should be :slight_smile:

Look at this topic: WAN Load Balancing when interface stays up - #13 by pepe

1 Like

Wow, I think I get what you mean. I rewrote to:

set load-balancing wan rule 5 description 'Chromecast traffic first over Ziggo WAN'
set load-balancing wan rule 5 inbound-interface 'bond0.40'
set load-balancing wan rule 5 interface eth0
set load-balancing wan rule 5 protocol 'all'
set load-balancing wan rule 6 description 'Chromecast traffic failover over Fiber'
set load-balancing wan rule 6 failover
set load-balancing wan rule 6 inbound-interface 'bond0.40'
set load-balancing wan rule 6 interface eth1.261
set load-balancing wan rule 6 protocol 'all'
set load-balancing wan rule 10 description 'LAN traffic first over Fiber'
set load-balancing wan rule 10 inbound-interface 'bond0+'
set load-balancing wan rule 10 interface eth1.261
set load-balancing wan rule 10 protocol 'all'
set load-balancing wan rule 15 description 'LAN traffic failover over Ziggo'
set load-balancing wan rule 15 failover
set load-balancing wan rule 15 inbound-interface 'bond0+'
set load-balancing wan rule 15 interface eth0
set load-balancing wan rule 15 protocol 'all'

That OK @pepe?

Looks good. Test if it works as expected.

1 Like

Thanks pepe.
@Viacheslav I updated ⚓ T4362 Wan Load Balancing - Can't create routing tables with a possible cause+workaround.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.