I’m setting up some peers in the dn42 network and am running into issues with MP-BGP and using the extended-nexthop capability. I’ve seen this strange behaviour on two different peers with similar config. My config and log outputs below are just from one of these peers.
My VyOS box is a virtual machine on Proxmox with the following details:
~$ show ver
Version: VyOS 1.5-rolling-202412100007
Release train: current
Release flavor: generic
Built by: [email protected]
Built on: Tue 10 Dec 2024 00:07 UTC
Build UUID: 1609a9df-717f-4871-b4d3-301696b398d3
Build commit ID: 0ba21e93c8213e
Architecture: x86_64
Boot via: installed image
System type: KVM guest
Secure Boot: n/a (BIOS)
Hardware vendor: QEMU
Hardware model: Standard PC (i440FX + PIIX, 1996)
Hardware S/N:
Hardware UUID: fcfe9107-a9a6-4a32-a16d-fdc872d30574
Copyright: VyOS maintainers and contributors
When I first setup the BGP peer, the neighbour was establishing and I was receiving routes from them, but none of the routes were valid and therefore not being installed into the routing table. When looking at the BGP table, it was saying the next-hop IP address was “inaccessible”. I was able to ping the next-hop IP address though using the percentage symbol and specifying the outgoing interface.
I tried various troubleshooting steps involving different variations of config, and ultimately ended up rebooting the machine. To my surprise when the BGP peer established again after the reboot, everything was working fine - all the routes that were received from the peer were valid.
I can reproduce this issue by disabling the Wireguard tunnel interface and re-enabling it. After doing so, and after the BGP peer is established again, all of their routes are marked as invalid due to the next-hop being “inaccessible”. This is the related log message:
fe80::207: nexthop_set failed, local: [fe80::9898]:37229 remote: (null)p update_if: wg4242420207 resetting connection - intf (Unknown)
Again, if I reboot the machine, it fixes the issue, but I can still re-produce it by resetting the Wireguard tunnel interface, after which the next-hop is marked as inaccessible again.
This is the related Wireguard tunnel interface config:
[edit]
# show interfaces wireguard
wireguard wg4242420207 {
address fe80::9898/64
address 172.20.47.225/32
ip {
source-validation disable
}
ipv6 {
address {
}
}
peer syd1 {
address 194.195.252.178
allowed-ips 0.0.0.0/0
allowed-ips ::/0
persistent-keepalive 60
port 52227
public-key xxxxx
}
port 20207
}
This is the related BGP config:
[edit]
# show protocols bgp
neighbor fe80::207 {
interface {
source-interface wg4242420207
v6only {
}
}
peer-group generic_mpbgp_ext-nh
remote-as 4242420207
update-source wg4242420207
}
peer-group generic_mpbgp_ext-nh {
address-family {
ipv4-unicast {
}
ipv6-unicast {
}
}
capability {
extended-nexthop
}
ebgp-multihop 20
}
This is the output from the BGP table showing an example route, and the next-hop being marked as “inaccessible”:
~$ show bgp ipv4 172.20.0.53
BGP routing table entry for 172.20.0.53/32, version 1173
Paths: (3 available, best #2, table default)
Advertised to non peer-group peers:
172.20.16.142 fe80::102 fe80::207 fd42:4242:2601:ac12::1
4242420207 4242423914
fe80::207 (inaccessible) from fe80::207 (172.20.19.83)
(fe80::207) (used)
Origin IGP, metric 0, invalid, external
Large Community: 4242420207:120:6 4242420207:130:1 4242420207:140:42
Last update: Mon Dec 23 15:33:28 2024
This is the same BGP table output after rebooting the machine:
~$ show bgp ipv4 172.20.0.53
BGP routing table entry for 172.20.0.53/32, version 605
Paths: (3 available, best #3, table default)
4242420207 4242423914
fe80::207 from fe80::207 (172.20.19.83)
(fe80::207) (used)
Origin IGP, metric 0, valid, external
Large Community: 4242420207:120:6 4242420207:130:1 4242420207:140:42
Last update: Mon Dec 23 15:41:35 2024
Interestingly, after rebooting the machine I get the same BGP error:
fe80::207: nexthop_set failed, local: [fe80::9898]:41563 remote: (null)p update_if: wg4242420207 resetting connection - intf (Unknown)
Given this issue is resolved with a reboot, it doesn’t sound like a configuration issue, but I could be wrong. Has anyone else encountered this issue before?