Help, BGP crashes and is not recoverable after resetting interfaces

ACiD_GRiM · September 6, 2023, 5:26am

I am getting frustrated because I just solved a long standing issue where my dynamic wireguard clients can’t reconnect because when they reset, their time is incorrect, however they don’t connect to the internet until the wireguard tunnel is up (which provides the only route out)

I worked around this by checking the handshake time on each interface and running,

set int wireg del wgX disable
commit
sudo ip link del dev wgX
del interface wireg del wgX disable
commit

if the handshake time is greater than 400 secconds

This has resolved the initial issue with extreme success, all wireguard remote peers can now connect regardless of their local time.

however, now this seems to cause bgp on this vyos router to crash, all wireguard peers lose their routes. and running restart bgp hangs.
Ultimately the only solution is to restart the whole router.

Can someone suggest a more graceful way to purge wireguard peer sessions that doesn’t so suddenly delete the interface , which I suspect is related to bgp daemon locking up?
Alternatively is there a way to re-fresh the bgp daemon directly in systemctl that bypasses the vbash run level commands?

recently updated to 20230905 nightly 1.4

Viacheslav · September 6, 2023, 7:16am

We should separate those tasks.
For first I’ll preference to solve issue with wireguard peers. Could you describe how to reproduce this issue? Is it site-to-site or road warrior connection?
Did you add the bur report for this?

For second we need some logs and bug report if the FRR daemon crashed

For third which commands are you bypassing to FRR? You can add a feature request

https://vyos.dev/

ACiD_GRiM · September 7, 2023, 4:10am

This is site to site, but the issue appears to be upstream in wireguard kernel, these are site to site but unstable so can lose power at any moment. Just restarting the interfaces on a running peer does not cause the issue.
Essentially the issue is if a peer is defined in on the static IP peer (or even two dynamic IP peers via DDNS) and a connection is established, the peer’s IP is registrered in the endpoint definition. When either end resets, if the timestamp of the handshake is before or out of sync of the surviving peer, the handshakes are rejected.
This is not an issue if the rebooted peer receives up to date ntp data.
I will gather logs the next time it occurs
Not bypassing FRR, but just looking for a way to work around the fact vbash shell resets. I guess this could become a feature request to add wireguard to the “reset interface” tree

echowings · September 7, 2023, 5:45am

Wireguard has no status for connecting or disconnecting. Try to detect connection with BFD enable.

ACiD_GRiM · September 7, 2023, 1:13pm

That’s only technically true, the handshake requires the peers to agree on timestamp. If there’s no handshake, bfd , which is already used, has no impact