BGP issue: all my peers reset BGP session after 3 minutes

Hi,

seems like that all my peers are resetting BGP session after 3 minutes. Like my VyOS router is not sending any keepalive.

Any idea about what can I check to debug this issue?

I didn’t change the defaul timers for any of my neighbors, but even if I set the keepalive timer to 60 seconds for some neighbors it doesn’t change anything…

I’m open to any suggestion :wink:

Thanks in advance

1.4-rolling-202302010317

Are you sure when your BGP sessions come up, you’re not learning routes that stops your BGP sessions from being able to speak to each other?

You don’t provide any config, but if you’re doing eBGP multihop that’s probably what’s happening.

If your neighbours are all directly connected and it’s still timing out - I would suggest tcpdump -i tcp 179 and seeing what that’s showing you.

Hi,

all my peers are connected via Wireguard (it’s a DN42 network) and BGP is configured to use link-local IPv6 addresses with multiprotocol enabled.

I can consistently ping all my peers in any moment (before, during and after BGP session goes up and prefixes are exchanged).

I tried to tcpdump some interfaces and I can see that just after the OPEN message, A(my peer) sends a KEEPALIVE packet to B (me) immediately followed by another KEEPALIVE message sent from B to A.
After that I can only see UPDATE messages and some KEEPALIVE message sent every about 60 seconds from A to B and only ACKed from B (B never sends out other KEEPALIVE messages, it only ACKs the ones received from A).
After a while, A closes the connection and another cycle is started, with another exchange of OPEN messages.

I think it’s worth notice that I can see “clusters” of TCP errors in between the valid packets. Errors like “TCP previous segment not captured”, “TCP Dup ACK”, “TCP Out-Of-Order”, “TCP Retransmission”.
I cannot say if they are actually the cause of my issue, because I never tried a tcpdump when all was working fine, but doesn’t seem strange that my router never sends KEEPALIVE messages but it only ACKs the one it receives?

Is it the expected behavior?

(Configuration is based on this: howto/vyos1.4.x)

Try to check the routes that the router receives.
Is it possible that the router receives more-spec prefixes which used as neighbor
for example, route to peer x.x.x.x is known via some multihop (or other protocol) and when the session is established router receives the same x.x.x.x, but with more priority via bgp, so the session goes down.
It’s worth checking out

Or start with deleting firewall

I don’t think it’s the case for two reasons: 1) I’m using link local ipv6 to connect with my peers, so BGP knows exactly which IP and which interface to use to reach the peer. 2) prefixes are accepted only if correctly assigned to a specific ASN, and link-local addresses are forbidden by the registry.

In any case I checked, just to be sure, and I can confirm that no fe80:: prefix is received via BGP.
They are only directly connected routes.

I just tried that, but unfortunately nothing changed…

Here an extract of what I can see analyzing the packets.

fe80::ade0 is the remote peer.
fe80::42:688:42:3914 is my VyOS router.

Wireguard MTU is set to default (1420)…

Here what happens just after BGP OPEN:

Thanks

as you’re traversing tunnel, make sure this isn’t a mtu issue

I was just going to write about that :wink:

Out of desperation I tried to decrease the default MTU of wireguard from 1420 to 1400… and it worked!

Now I just need to find out why (or where) the MTU changed in the path… but at least I can say problem solved!

Thanks for your help!

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.