Upload issue due to asymmetric routing?


I am facing an issue that goes really beyond my understanding and technical knowledge and I would like to see if anyone has ever face the same issue.

I have two virtual border routers running vyos. Each of them connect to one uplink with ebgp with full route. I have ibgp running between the two routers.
These two routers are connected with ospf to a core router. Behind this core router I have a few virtual servers with public IP. Mostly webapps.

I received the complain that although web servers are accessible smoothly from the global internet, customers struggle to upload files to the web servers via http. It can be any size of file, it doesn’t matter.

I can replicate this issue by creating a test web server and attempting to upload images. The test webserver has no firewall, it is not behind a nat either.

I face the same problem by attempting to upload files via SCP to the test webserver. So it really looks like a network issue to me.

So I start to play a bit with my routing…

What I notice is the following:

Assuming user try to upload to the webserver from the internet:

If traffic come in via uplink 1 and come back via uplink 1, it works fine.
If traffic come in via uplink 2 and come back via uplink 2, it works fine.
If traffic come in via uplink 1 and come back via uplink 2, it works fine.
If traffic come in via uplink 2 and come back via uplink 1, it doesn’t work and get stuck.

The last two scenarios really bother me. And overall I believe asymmetric routing should not be such an issue. I have run a few SP networks connecting with multiple uplinks and receiving only default route, traffic was really asymmetric but works fine.

I can confirm there is no QoS involved or any sort of firewalling. Only routers.

Does this ring a bell to anybody? I am running out of ideas.

Do pings work fine in this scenario? If pings work fine it seems wrong TCPMSS

set interfaces ethernet eth1 ip adjust-mss xxx

Ping works fine. And the websites are accessible and super fast. File upload is the issue.

How do you think I should adjust the mss? Shall I compare the two router interface first?

I connected to the two routers and run “sudo nmap -e ethX --script path-mtu” on the external interface towards internet, and on the internal interface towards the web server and the commands all return clean pmtu 1500 for both routers.

Note your routers aren’t the only one involved in assymetric routing, so are those of your ISP.
For packets coming in on 2, and sent via 1, do those return packets make it to R1 WAN port?
Even if ping works (are you sure it does in scenario: in on 2, sent via 1) , tcp has other statefull behavior

Yes they made it back to R1. Ping works fine so does web browsing. Only upload is an issue.

I continue to troubleshoot and to what i can see, this issue happens when the traffic get in the vyos router through one interface and return back via another interface. This is when this issue happens.

What’s the difference between web browsing and upload ? Both ways, the TCP connection is setup from client to server.
Probably packet sizes are different, which suggest mss-clamp issue

yes i agree with you. When uploading the transfer starts, but the transfer speed drops very quickly until it fails.