I write here the solution I have found to have a dual wan
failover with DHCP. I hope it could be useful to other people with similar requirements.
Configuration:
eth0
is the mainwan
getting address over DHCP.eth1.11
is the secondarywan
connected to a 4G router. It has address192.168.11.123/24
, with192.168.11.1
as gateway.
Goal:
Implement a simple faiolver mechanism (no need of load-balancing) using the VyOS
failolver
protocol that does not support by itself DHCP interfaces.
Implementation:
The first step is to define the default routes using the failover
. In order to do that, the default route must not be installed by DHCP. With my ISP, just setting the no-default-route
option in the interface dhcp-options
does not work. The goal can be reached using the DHCP hooks.
A pre-hook (e.g., /config/scripts/dhcp-client/pre-hooks.d/01-no-default-route
) can indeed be used to avoid the installation of the default route. That is usually achieved un-setting the new_routers
variable:
RUN="yes"
# Use FD 19 to capture the debug stream caused by "set -x":
exec 19>/tmp/01-no-default-route.log
# Tell bash about it (there's nothing special about 19, its arbitrary)
export BASH_XTRACEFD=19
set -x
# Setting new_routers to an empty string avoids the installation
# of the default roots and allows to properly setup failover rules.
# That applies only to eth0, the main WAN getting the IP via dhcp.
#
# See /config/scripts/setup-failover-routes.sh
# See /config/scripts/dhcp-client/post-hooks.d/01-failover
# See https://vyos.dev/T5724
if [ "$RUN" = "yes" ]; then
if [ "$interface" = "eth0" ]; then
case "$reason" in
BOUND|RENEW|REBIND|REBOOT)
export new_gw="$new_routers"
export old_gw="$old_routers"
new_routers=""
;;
esac
fi
fi
set +x
The new_routers
and old_routers
variable (set by DHCP) are exported in the new_gw
and old_gw
variables, used in the post-hook script (e.g., /config/scripts/dhcp-client/post-hooks.d/01-failover
):
RUN="yes"
# Use FD 19 to capture the debug stream caused by "set -x":
exec 19>/tmp/01-failover.log
# Tell bash about it (there's nothing special about 19, its arbitrary)
export BASH_XTRACEFD=19
set -x
# Execute the script to configure the failover mechanism in case of a
# BOUND, RENEW, REBIND, REBOOT.
# That applies only to eth0, the main WAN getting the IP via dhcp.
#
# See /config/scripts/setup-failover-routes.sh
# See /config/scripts/dhcp-client/pre-hooks.d/01-no-default-route
# See https://vyos.dev/T5724
if [ "$RUN" = "yes" ]; then
if [ "$interface" = "eth0" ]; then
case $reason in
BOUND|RENEW|REBIND|REBOOT)
sudo /config/scripts/setup-failover-routes.sh $old_gw $new_gw
;;
esac
fi
fi
set +x
The post-hook script calls the /config/scripts/setup-failover-routes.sh
, used to actually configure the failover:
#!/bin/vbash
if [ "$(id -g -n)" != 'vyattacfg' ] ; then
exec sg vyattacfg -c "/bin/vbash $(readlink -f $0) $1 $2"
fi
# Use FD 19 to capture the debug stream caused by "set -x":
exec 19>/tmp/failover.log
# Tell bash about it (there's nothing special about 19, its arbitrary)
export BASH_XTRACEFD=19
set -x
OLD_GW="$1"
NEW_GW="$2"
set +x
source /opt/vyatta/etc/functions/script-template
configure
if [ ! -z "$NEW_GW" ]; then
delete protocols failover route 0.0.0.0/0
set protocols failover route 0.0.0.0/0 next-hop $NEW_GW check target '1.1.1.1'
set protocols failover route 0.0.0.0/0 next-hop $NEW_GW check target '4.2.2.1'
set protocols failover route 0.0.0.0/0 next-hop $NEW_GW check timeout '5'
set protocols failover route 0.0.0.0/0 next-hop $NEW_GW check type 'icmp'
set protocols failover route 0.0.0.0/0 next-hop $NEW_GW interface 'eth0'
set protocols failover route 0.0.0.0/0 next-hop $NEW_GW metric '1'
set protocols failover route 0.0.0.0/0 next-hop 192.168.11.1 check target '1.0.0.1'
set protocols failover route 0.0.0.0/0 next-hop 192.168.11.1 check target '4.2.2.2'
set protocols failover route 0.0.0.0/0 next-hop 192.168.11.1 check timeout '5'
set protocols failover route 0.0.0.0/0 next-hop 192.168.11.1 check type 'icmp'
set protocols failover route 0.0.0.0/0 next-hop 192.168.11.1 interface 'eth1.11'
set protocols failover route 0.0.0.0/0 next-hop 192.168.11.1 metric '254'
delete protocols static route 1.1.1.1/32
delete protocols static route 4.2.2.1/32
delete protocols static route 1.0.0.1/32
delete protocols static route 4.2.2.2/32
if [ ! -z "$OLD_GW" ]; then
delete protocols static route $OLD_GW/32
fi
set protocols static route $NEW_GW/32 interface eth0
set protocols static route 1.1.1.1/32 next-hop $NEW_GW interface 'eth0'
set protocols static route 4.2.2.1/32 next-hop $NEW_GW interface 'eth0'
set protocols static route 1.0.0.1/32 next-hop 192.168.11.1 interface 'eth1.11'
set protocols static route 4.2.2.2/32 next-hop 192.168.11.1 interface 'eth1.11'
fi
commit
exit
That script sets the two default routes using the new_gw
and old_gw
variables set in the DHCP pre-hook script. As you can see, the main wan
has a lower metrics so that the corresponding default route is used when both wans
are up. Additionally, some static routes are also defined for the targets used for testing the two default routes.
At this point, everything should be done. At the same time, I would like to receive an e-mail notification when one of the two wans
is added or removed and, additionally, I would like to clean conntrack when the status of the main wan
changes. In order to do that, I use the event handler:
event CellWanAdded {
filter {
pattern "ip route add.*dev eth1\\.11 .*"
syslog-identifier vyos-failover
}
script {
environment ACTION {
value added
}
environment INTERFACE {
value eth1.11
}
path /config/scripts/failover-handler.py
}
}
event CellWanRemoved {
filter {
pattern "ip route del.*dev eth1\\.11 .*"
syslog-identifier vyos-failover
}
script {
environment ACTION {
value deleted
}
environment INTERFACE {
value eth1.11
}
path /config/scripts/failover-handler.py
}
}
event MainWanAdded {
filter {
pattern "ip route add.*dev eth0 .*"
syslog-identifier vyos-failover
}
script {
environment ACTION {
value added
}
environment FLUSH_CONNTRACK {
value true
}
environment INTERFACE {
value eth0
}
environment RESTORE_IPV6 {
value true
}
path /config/scripts/failover-handler.py
}
}
event MainWanRemoved {
filter {
pattern "ip route del.*dev eth0 .*"
syslog-identifier vyos-failover
}
script {
environment ACTION {
value deleted
}
environment FLUSH_CONNTRACK {
value true
}
environment INTERFACE {
value eth0
}
environment REJECT_IPV6 {
value true
}
path /config/scripts/failover-handler.py
}
}
A simple regex is used to identify the events corresponding to the add/remove actions for the two wans
, and environment variables are used to trigger actions in the /config/scripts/failover-handler.py
handler script:
#!/usr/bin/env python3
import smtplib, ssl
from os import environ
from sys import exit
from email.mime.text import MIMEText
from systemd import journal
from vyos.util import rc_cmd
def sendMail(interface_name, interface_action):
port = 465 # For SSL
smtp_server = "smtp.gmail.com"
sender_email = "[email protected]" # Enter your address
receiver_email = "[email protected]" # Enter receiver address
password = "xxxxxxxxxxxxx"
event_message = environ.get('message')
body = f"""\
eth0: main WAN
eth1.11: back-up cellular wan
Event message: {event_message}."""
message = MIMEText(body)
message['From'] = sender_email
message['To'] = receiver_email
message['Subject'] = f"VyOS failover: interface {interface_name}, action {interface_action}"
try:
context = ssl.create_default_context()
with smtplib.SMTP_SSL(smtp_server, port, context=context) as server:
server.login(sender_email, password)
server.sendmail(sender_email, receiver_email, message.as_string())
except Exception as err:
journal.send(f'Error sending notification e-mail: {err}')
def flushConntrack():
rc, command = rc_cmd('/usr/bin/sudo /usr/sbin/conntrack -F')
if rc != 0:
journal.send(f'{command} -- return-code [RC: {rc}]')
else:
journal.send('Flushed conntrack')
def rejectIPV6():
rc, command = rc_cmd('/config/scripts/reject_ipv6.sh')
if rc != 0:
journal.send(f'{command} -- return-code [RC: {rc}]')
else:
journal.send('Default route for IPV6 is now rejected')
def restoreIPV6():
rc, command = rc_cmd('/config/scripts/restore_ipv6.sh')
if rc != 0:
journal.send(f'{command} -- return-code [RC: {rc}]')
else:
journal.send('Default route for IPV6 is now restored')
if __name__ == '__main__':
interface = environ.get('INTERFACE')
action = environ.get('ACTION')
flush_conntrack = environ.get('FLUSH_CONNTRACK')
reject_ipv6 = environ.get('REJECT_IPV6')
restore_ipv6 = environ.get('RESTORE_IPV6')
if reject_ipv6 == 'true':
rejectIPV6()
elif restore_ipv6 == 'true':
restoreIPV6()
if flush_conntrack == 'true':
flushConntrack()
sendMail(interface, action)
exit(0)
My secondary wan
does not support IPv6 that is then “enabled/disabled” using the /config/scripts/reject_ipv6.sh
and /config/scripts/restore_ipv6.sh
scripts:
#!/bin/vbash
if [ "$(id -g -n)" != 'vyattacfg' ] ; then
exec sg vyattacfg -c "/bin/vbash $(readlink -f $0) $@"
fi
set +x
source /opt/vyatta/etc/functions/script-template
configure
set protocols static route6 ::/0 reject
commit
exit
#!/bin/vbash
if [ "$(id -g -n)" != 'vyattacfg' ] ; then
exec sg vyattacfg -c "/bin/vbash $(readlink -f $0) $@"
fi
set +x
source /opt/vyatta/etc/functions/script-template
configure
delete protocols static route6 ::/0 reject
commit
exit
I am aware the solution is not completely generic. At the same time, it took me a while to get there and I hope it could be somehow useful to somebody else.
Please, note that the two following fixes have to be applied in order for this solution to work:
Of course, any feed-back is very welcome.