Issues with MP-BGP and IPv6 Link-local peers using extended next-hop

pudumaster · December 26, 2024, 8:12am

I’m setting up some peers in the dn42 network and am running into issues with MP-BGP and using the extended-nexthop capability. I’ve seen this strange behaviour on two different peers with similar config. My config and log outputs below are just from one of these peers.

My VyOS box is a virtual machine on Proxmox with the following details:

~$   show ver
Version:          VyOS 1.5-rolling-202412100007
Release train:    current
Release flavor:   generic

Built by:         [email protected]
Built on:         Tue 10 Dec 2024 00:07 UTC
Build UUID:       1609a9df-717f-4871-b4d3-301696b398d3
Build commit ID:  0ba21e93c8213e

Architecture:     x86_64
Boot via:         installed image
System type:      KVM guest
Secure Boot:      n/a (BIOS)

Hardware vendor:  QEMU
Hardware model:   Standard PC (i440FX + PIIX, 1996)
Hardware S/N:     
Hardware UUID:    fcfe9107-a9a6-4a32-a16d-fdc872d30574

Copyright:        VyOS maintainers and contributors

When I first setup the BGP peer, the neighbour was establishing and I was receiving routes from them, but none of the routes were valid and therefore not being installed into the routing table. When looking at the BGP table, it was saying the next-hop IP address was “inaccessible”. I was able to ping the next-hop IP address though using the percentage symbol and specifying the outgoing interface.

I tried various troubleshooting steps involving different variations of config, and ultimately ended up rebooting the machine. To my surprise when the BGP peer established again after the reboot, everything was working fine - all the routes that were received from the peer were valid.

I can reproduce this issue by disabling the Wireguard tunnel interface and re-enabling it. After doing so, and after the BGP peer is established again, all of their routes are marked as invalid due to the next-hop being “inaccessible”. This is the related log message:

fe80::207: nexthop_set failed, local: [fe80::9898]:37229 remote: (null)p update_if: wg4242420207 resetting connection - intf (Unknown)

Again, if I reboot the machine, it fixes the issue, but I can still re-produce it by resetting the Wireguard tunnel interface, after which the next-hop is marked as inaccessible again.

This is the related Wireguard tunnel interface config:

[edit]
# show interfaces wireguard 
 wireguard wg4242420207 {
     address fe80::9898/64
     address 172.20.47.225/32
     ip {
         source-validation disable
     }
     ipv6 {
         address {
         }
     }
     peer syd1 {
         address 194.195.252.178
         allowed-ips 0.0.0.0/0
         allowed-ips ::/0
         persistent-keepalive 60
         port 52227
         public-key xxxxx
     }
     port 20207
 }

This is the related BGP config:

[edit]
# show protocols bgp 
neighbor fe80::207 {
     interface {
         source-interface wg4242420207
         v6only {
         }
     }
     peer-group generic_mpbgp_ext-nh
     remote-as 4242420207
     update-source wg4242420207
 }
peer-group generic_mpbgp_ext-nh {
     address-family {
         ipv4-unicast {
         }
         ipv6-unicast {
         }
     }
     capability {
         extended-nexthop
     }
     ebgp-multihop 20
 }

This is the output from the BGP table showing an example route, and the next-hop being marked as “inaccessible”:

~$ show bgp ipv4 172.20.0.53
BGP routing table entry for 172.20.0.53/32, version 1173
Paths: (3 available, best #2, table default)
  Advertised to non peer-group peers:
  172.20.16.142 fe80::102 fe80::207 fd42:4242:2601:ac12::1
  4242420207 4242423914
    fe80::207 (inaccessible) from fe80::207 (172.20.19.83)
    (fe80::207) (used)
      Origin IGP, metric 0, invalid, external
      Large Community: 4242420207:120:6 4242420207:130:1 4242420207:140:42
      Last update: Mon Dec 23 15:33:28 2024

This is the same BGP table output after rebooting the machine:

~$ show bgp ipv4 172.20.0.53
BGP routing table entry for 172.20.0.53/32, version 605
Paths: (3 available, best #3, table default)
  4242420207 4242423914
    fe80::207 from fe80::207 (172.20.19.83)
    (fe80::207) (used)
      Origin IGP, metric 0, valid, external
      Large Community: 4242420207:120:6 4242420207:130:1 4242420207:140:42
      Last update: Mon Dec 23 15:41:35 2024

Interestingly, after rebooting the machine I get the same BGP error:
fe80::207: nexthop_set failed, local: [fe80::9898]:41563 remote: (null)p update_if: wg4242420207 resetting connection - intf (Unknown)

Given this issue is resolved with a reboot, it doesn’t sound like a configuration issue, but I could be wrong. Has anyone else encountered this issue before?

zero1three013 · December 26, 2024, 2:33pm

Funny enough, that AS4242420207 is my buddy and I. We experience the same issue at times too and we run VyOS 1.5, it’s not every peer it’s only some.

We haven’t been able to figure out a cause either. We have noted rebooting fixes it, if we tcpdump on the wireguard interface it fixes it, etc.

Your configuration looks very similar to ours for you

 wireguard wg4242422227 {
     address 172.20.19.83/32
     address fe80::207/64
     description "DN42: AS4242422227 PUDUNET-SYD"
     ip {
         adjust-mss 1380
     }
     ipv6 {
         address {
             no-default-link-local
         }
         adjust-mss 1360
     }
     peer p2p {
         address x.x.x.x
         allowed-ips 0.0.0.0/0
         allowed-ips ::/0
         port 20207
         public-key xxx
     }
     port 52227
     private-key xxx
     vrf DN42
 }

 neighbor fe80::9898 {
     description PUDUNET-SYD
     interface {
         source-interface wg4242422227
     }
     peer-group DN42-PEER-MP-EXT-NH
     remote-as 4242422227
     strict-capability-match
     update-source fe80::207
 }
 peer-group DN42-PEER-MP-EXT-NH {
     address-family {
         ipv4-unicast {
             maximum-prefix 1000
             route-map {
                 export DN42-PEER-OUTBOUND
                 import DN42-PEER-INBOUND
             }
             soft-reconfiguration {
                 inbound
             }
         }
         ipv6-unicast {
             maximum-prefix 1000
             route-map {
                 export DN42-PEER-OUTBOUND
                 import DN42-PEER-INBOUND
             }
             soft-reconfiguration {
                 inbound
             }
         }
     }
     capability {
         extended-nexthop
     }
     disable-connected-check
 }

Not sure if it’s really VyOS related or FRRouting related either honestly.

pudumaster · January 19, 2025, 6:35am

Just bumping this post … any ideas for this behaviour?

I’ve noticed it also happens without the “extended next-hop” capability being used, and using IPv6 Link-local addresses for peers. And I can confirm zero1three013’s observation about tcpdump on the wireguard interface - that seems to fix it too.

pudumaster · January 22, 2025, 3:28am

In case anyone comes across this, it looks like it might be an issue with FRRouting: Wierd BGP IPv6 ll nh behavior · Issue #16095 · FRRouting/frr · GitHub

In my experience, the issue is with FRRouting 9.1.2

I have tried a version of Vyos 1.4 which uses FR 9.1 and have not encountered this issue.

fernando · January 22, 2025, 4:07pm

as I understand it’s the same issues which is described in this task :⚓ T7055 Update breaks BGP via link-local addresses

system · March 23, 2025, 4:08pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.