wan load-balance failover trouble with dhcp

We are having trouble getting wan load-balance failover to work with DHCP, especially when one of the interfaces uses pppoe. WLB works fine for us with static interfaces. The problem with DHCP seems to be the default gateway routes that get set up automatically. In that case, we don’t have control over the “distance” parameters for the gateways. Also, pppoe DHCP adds routes to the kernel routing table while ethernet dhcp adds routes to the static routing table.

We have found a work-around to get it working in the case of a DHCP ethernet primary interface and a DHCP pppoe failover, but only if the failover has a static gateway, which of course we can never expect to be the case. The workaround consists of (1) setting a static default gateway for the pppoe interface with a distance parameter that is greater than the default gateway route distance chosen by DHCP on the primary interface; and (2) setting up a separate task on a one-minute timer that pings the “interface health” test addresses over the appropriate interfaces.

Here is a cleansed config that I was expecting to work in the first place. Any suggestions would be most welcome, especially if they have been tried and are known to work.

[code]firewall {
all-ping enable
broadcast-ping disable
config-trap disable
ipv6-receive-redirects disable
ipv6-src-route disable
ip-src-route disable
log-martians enable
name wan-in {
default-action drop
rule 10 {
action accept
state {
established enable
related enable
}
}
}
name wan-local {
default-action drop
rule 10 {
action accept
state {
established enable
related enable
}
}
rule 20 {
action accept
protocol icmp
}
rule 30 {
action accept
destination {
port ssh
}
protocol tcp
}
}
receive-redirects disable
send-redirects enable
source-validation disable
syn-cookies enable
twa-hazards-protection disable
}
interfaces {
ethernet eth0 {
address 10.xx.0.1/24
duplex auto
hw-id xx:xx:xx:xx:xx:xx
smp_affinity auto
speed auto
}
ethernet eth1 {
duplex auto
hw-id xx:xx:xx:xx:xx:xx
smp_affinity auto
speed auto
}
ethernet eth2 {
duplex auto
hw-id xx:xx:xx:xx:xx:xx
pppoe 2 {
access-concentrator XXXX
default-route auto
mtu 1492
name-server auto
password 1234
user-id xxxx
}
smp_affinity auto
speed auto
}
ethernet eth3 {
address dhcp
duplex auto
firewall {
in {
name wan-in
}
local {
name wan-local
}
}
hw-id xx:xx:xx:xx:xx:xx
smp_affinity auto
speed auto
}
loopback lo {
}
}
load-balancing {
wan {
enable-local-traffic
flush-connections
interface-health eth3 {
failure-count 2
nexthop dhcp
success-count 1
test 10 {
resp-time 5
target 8.8.8.8
ttl-limit 1
}
test 20 {
resp-time 5
target 205.171.2.25
ttl-limit 1
}
}
interface-health pppoe2 {
failure-count 2
nexthop dhcp
success-count 1
test 10 {
resp-time 5
target 8.8.4.4
ttl-limit 1
}
test 20 {
resp-time 5
target 205.171.3.25
ttl-limit 1
}
}
rule 100 {
failover
inbound-interface eth0
interface eth3 {
weight 10
}
interface pppoe2 {
weight 1
}
protocol all
}
}
}
nat {
source {
rule 10 {
outbound-interface eth3
translation {
address masquerade
}
}
rule 20 {
outbound-interface pppoe2
translation {
address masquerade
}
}
}
}
service {
dhcp-server {
disabled false
shared-network-name LAN_POOL {
authoritative enable
subnet 10.xx.0.0/24 {
default-router 10.42.0.1
dns-server 10.42.0.1
domain-name xx.lan
lease 86400
start 10.xx.0.32 {
stop 10.xx.0.250
}
}
}
}
dns {
forwarding {
cache-size 150
listen-on eth0
}
}
ssh {
port 22
}
}
system {
config-management {
commit-revisions 20
}
console {
device ttyS0 {
speed 19200
}
}
domain-name xx.lan
host-name xxrouter
name-server 4.2.2.1
name-server 4.2.2.2
ntp {
}
package {
auto-sync 1
repository community {
components main
distribution helium
password “”
url http://packages.vyos.net/vyos
username “”
}
}
time-zone America/Denver
}

/* Warning: Do not remove the following line. /
/
=== vyatta-config-version: “cluster@1:config-management@1:conntrack-sync@1:conntrack@1:cron@1:dhcp-relay@1:dhcp-server@4:firewall@5:ipsec@4:nat@4:qos@1:quagga@2:system@6:vrrp@1:wanloadbalance@3:webgui@1:webproxy@1:zone-policy@1” === /
/
Release version: VyOS 1.1.7 */[/code]

This is my issue also.

Thais is known issues, and unfortunately, we will work on fix in 1.2.x only

@syncer do you know if this Bug has ever been solved? i could not find a Bug Task in phabricator that specifically mentions pppoe and Wan load balancing. although i can see that @ciprian.craciun has come up with a workaround. mentioned here: https://forum.vyos.io/t/solved-wan-load-balancing-with-policy-route-rules-previously-wan-load-balancing-with-2-pppoe-connections-with-tcp-mss-clamping/1968/3

is this bug still present in the rolling release?

i can confirm it is in 1.2.6-S1

Hello @masterit, it should work in an easy way. Just need to understand that you need routes via pppoe and DHCP together. The simple way is to disable apply route received via pppoe and define the interface route, as an example

set interfaces ethernet eth0 pppoe 0 default-route 'none'
set interfaces ethernet eth0 pppoe 0 password 'test'
set interfaces ethernet eth0 pppoe 0 user-id 'test'
set protocols static interface-route 0.0.0.0/0 next-hop-interface pppoe0 distance '230'

Route received via DHCP has distance 210. So now we have 2 default routes with the correct metrics.

vyos@vyos# run show ip route 
Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
       F - PBR, f - OpenFabric,
       > - selected route, * - FIB route, q - queued route, r - rejected route

S   0.0.0.0/0 [230/0] is directly connected, pppoe0, 00:00:55
S>* 0.0.0.0/0 [210/0] via 192.168.100.1, eth1, 00:39:38
C>* 10.255.255.255/32 is directly connected, pppoe0, 00:00:55
C>* 192.168.100.0/24 is directly connected, eth1, 00:39:38

WLB works properly

vyos@vyos# run show wan-load-balance 
Interface:  eth1
  Status:  active
  Last Status Change:  Mon Feb 22 18:18:44 2021
  +Test:  ping  Target: 8.8.8.8
    Last Interface Success:  0s 
    Last Interface Failure:  1m49s      
    # Interface Failure(s):  0

Interface:  pppoe0
  Status:  active
  Last Status Change:  Mon Feb 22 18:26:28 2021
  +Test:  ping  Target: 1.1.1.1
    Last Interface Success:  0s 
    Last Interface Failure:  1m27s      
    # Interface Failure(s):  0

That’s exactly how it’s setup.

when a WLB rule is setup to prefer a secondary pppoe interface over another (when a dhcp wan interface is not added to the mix, things function as they should)
once a dhcp wan interface is added the static routes distance preference is ignored.
you can confirm this by setting a ‘PBR’ route in the WLB rules, testing, and subsequently adding a static route for the preferred interface.

you’ll notice that under both tests the traffic will still exit based on the primary interface and completely ignore any rules or static routes that were setup.

I’m a bit confused, but all works in lab

set load-balancing wan rule 10 destination address '0.0.0.1/1'
set load-balancing wan rule 10 inbound-interface 'eth2'
set load-balancing wan rule 10 interface eth1 weight '10'
set load-balancing wan rule 10 interface pppoe0 weight '2'
set load-balancing wan rule 10 protocol 'all'
set load-balancing wan rule 10 source address '172.16.0.0/24'
set load-balancing wan rule 20 destination address '128.0.0.1/1'
set load-balancing wan rule 20 inbound-interface 'eth2'
set load-balancing wan rule 20 interface eth1 weight '2'
set load-balancing wan rule 20 interface pppoe0 weight '10'
set load-balancing wan rule 20 protocol 'all'
set load-balancing wan rule 20 source address '172.16.0.0/24'

Check statistic

vyos@vyos# run show wan-load-balance status 
Chain WANLOADBALANCE_PRE (1 references)
 pkts bytes target     prot opt in     out     source               destination         
   39  2460 ISP_eth1   all  --  eth2   *       172.16.0.0/24        0.0.0.0/1            state NEW statistic mode random probability 0.83333300008
   12   773 ISP_pppoe0  all  --  eth2   *       172.16.0.0/24        0.0.0.0/1            state NEW
    2   227 CONNMARK   all  --  eth2   *       172.16.0.0/24        0.0.0.0/1            CONNMARK restore
    6   360 ISP_eth1   all  --  eth2   *       172.16.0.0/24        128.0.0.0/1          state NEW statistic mode random probability 0.16666699992
   27  1636 ISP_pppoe0  all  --  eth2   *       172.16.0.0/24        128.0.0.0/1          state NEW
    0     0 CONNMARK   all  --  eth2   *       172.16.0.0/24        128.0.0.0/1          CONNMARK restore

Traceroute from client

client@client# run traceroute 1.1.1.1
traceroute to 1.1.1.1 (1.1.1.1), 30 hops max, 60 byte packets
 1  172.16.0.1 (172.16.0.1)  0.956 ms  1.019 ms  1.602 ms
 2  192.168.100.1 (192.168.100.1)  10.151 ms  10.221 ms  10.413 ms
...

client@client# run traceroute www.us
Resolving Address: www.us
traceroute to www.us (156.154.52.34), 30 hops max, 60 byte packets
 1  172.16.0.1 (172.16.0.1)  0.645 ms  0.667 ms  0.662 ms
 2  10.255.255.255 (10.255.255.255)  1.169 ms  4.319 ms  4.270 ms
...

Where 192.168.100.1 gw recived via DHCP, 10.255.255.255 gw for pppoe0

PS:/ Failover also works properly

The current setup has the below static routes set as an example:

show protocols static interface-route 4.2.2.1/32
 next-hop-interface pppoe0 {
 }
[edit]
vyatta@router# show protocols static interface-route 4.2.2.2/32
 next-hop-interface pppoe1 {
 }
[edit]
 show protocols static route 4.2.2.3/32
 dhcp-interface eth0.410
[edit]




    vyatta@router# traceroute 4.2.2.1
traceroute to 4.2.2.1 (4.2.2.1), 30 hops max, 60 byte packets
 1  10.11.7.49 (10.11.7.49)  1.549 ms  1.764 ms  2.079 ms
 2  * * *
 3  64.230.59.184 (64.230.59.184)  13.832 ms  13.779 ms  13.759 ms
 4  tcore4-chicagocp_hundredgige0-5-0-0.net.bell.ca (64.230.79.155)  13.572 ms  13.528 ms  13.546 ms
 5  bx9-chicagodt_ae1-0.net.bell.ca (64.230.79.75)  12.951 ms  12.906 ms  12.936 ms
 6  lag-101.ear7.Chicago2.Level3.net (4.15.248.93)  13.408 ms  13.259 ms  13.283 ms
 7  * * *
 8  a.resolvers.level3.net (4.2.2.1)  15.278 ms  15.340 ms  15.229 ms
[edit]

vyatta@router# traceroute 4.2.2.2
traceroute to 4.2.2.2 (4.2.2.2), 30 hops max, 60 byte packets
 1  lo0-0-lns01-tor2.teksavvy.com (206.248.155.132)  4.197 ms  4.614 ms  4.603 ms
 2  ae1-2140-bdr01-tor2.teksavvy.com (69.196.136.138)  5.038 ms ae0-2150-bdr01-tor.teksavvy.com (69.196.136.172)  11.223 ms ae1-2140-bdr01-tor2.teksavvy.com (69.196.136.138)  5.164 ms
 3  toro-b1-link.telia.net (62.115.61.240)  5.683 ms toro-b1-link.telia.net (62.115.171.248)  5.647 ms toro-b1-link.telia.net (62.115.61.240)  5.635 ms
 4  level3-ic-361864-toro-b3.ip.twelve99-cust.net (213.248.94.123)  6.050 ms  5.978 ms  5.933 ms
 5  * * *
 6  b.resolvers.Level3.net (4.2.2.2)  19.791 ms  19.055 ms  19.120 ms
[edit]

vyatta@router# traceroute 4.2.2.3
traceroute to 4.2.2.3 (4.2.2.3), 30 hops max, 60 byte packets
 1  10.11.7.49 (10.11.7.49)  8.664 ms  9.033 ms  9.301 ms
 2  * * *
 3  64.230.59.184 (64.230.59.184)  18.294 ms  18.271 ms  18.220 ms
 4  tcore4-chicagocp_hundredgige0-5-0-0.net.bell.ca (64.230.79.155)  18.842 ms  18.795 ms  18.747 ms
 5  bx9-chicagodt_ae1-0.net.bell.ca (64.230.79.75)  10.949 ms  10.980 ms  10.964 ms
 6  lag-101.ear7.Chicago2.Level3.net (4.15.248.93)  11.396 ms  11.479 ms  11.464 ms
 7  * * *
 8  c.resolvers.level3.net (4.2.2.3)  13.041 ms  12.973 ms  12.991 ms
[edit]
vyatta@router#

as you can see from the above both routes that are set utilizing a pppoe nexthop do forward as they should.

however the route that utilizes the dhcp interface eth0.410, utilizes pppoe0 even when it shouldn’t.

vyatta@router# run show ip route supernets-only
Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
       F - PBR, f - OpenFabric,
       > - selected route, * - FIB route, q - queued route, r - rejected route

S   0.0.0.0/0 [240/0] is directly connected, pppoe1, 02:48:51
S   0.0.0.0/0 [230/0] is directly connected, pppoe0, 02:48:52
S>* 0.0.0.0/0 [210/0] is directly connected, eth0.410, 02:48:52
  *                   via xxx.xxx.95.4, eth0.410, 02:48:52


vyatta@router# run show wan-load-balance status

     4586  382K ISP_pppoe0  all  --  eth1.3235 *       0.0.0.0/0            4.2.2.1              state NEW
      889 97975 CONNMARK   all  --  eth1.3235 *       0.0.0.0/0            4.2.2.1              CONNMARK restore
     4585  382K ISP_pppoe1  all  --  eth1.3235 *       0.0.0.0/0            4.2.2.2              state NEW
      873 96649 CONNMARK   all  --  eth1.3235 *       0.0.0.0/0            4.2.2.2              CONNMARK restore

you can see that the route policy for 4.2.2.3 is not listed.
with enable-local-traffic enabled and a rule is added for either an exclude based on the destination of 4.2.2.3 or an explicit rule such as:

vyatta@router# show load-balancing wan rule 60
 destination {
     address 4.2.2.3
 }
 inbound-interface any
 interface eth0.410 {
 }
 protocol all
[edit]

This remains the same and 4.2.2.3 still does not follow the route that was configured for it:

vyatta@router# run show wan-load-balance status

         4586  382K ISP_pppoe0  all  --  eth1.3235 *       0.0.0.0/0            4.2.2.1              state NEW
          889 97975 CONNMARK   all  --  eth1.3235 *       0.0.0.0/0            4.2.2.1              CONNMARK restore
         4585  382K ISP_pppoe1  all  --  eth1.3235 *       0.0.0.0/0            4.2.2.2              state NEW
          873 96649 CONNMARK   all  --  eth1.3235 *       0.0.0.0/0            4.2.2.2              CONNMARK restore

i should mention that enable-local-traffic makes has no bearing on this.

it seems as if there is a bug when multiple pppoe interfaces have a 0.0.0.0 route.

1 Like

has this bug been resolved? would like to go back to VyOS but need this support for multiple ISPs w/ different dhcp-assigned routes

I use the latest rolling release with two PPPoE (DHCP-based) interfaces and WLB without any troubles.

two PPPoE interfaces doesn’t imply they use dhcp just because the ip changes on every renewal of the tunnel.

the problem mentioned here is when loadbalancing uses TWO separate interface types or more. i.e pppoe0, eth0, wwan0 etc.

this issue is still present in the latest rolling 1.4 Sagitta release