Odd routing issue?


#1

We made a physical change to our network and now when our systems attempt to contact the outside world, the connection is sporadic. It seems like the router, or some router is stopping transfer shortly after a connection. I’ve included some basic wget on a linux system to illustrate what’s happeneing.


[root@storage ~]# wget http://mirrors.cat.pdx.edu/centos/6.7/updates/x86_64/repodata/2123a9b143f6c786498bdf05a65efbd678b0905a27f53c53223e71b42a3656d0-primary.sqlite.bz2
--2015-10-05 15:27:02--  http://mirrors.cat.pdx.edu/centos/6.7/updates/x86_64/repodata/2123a9b143f6c786498bdf05a65efbd678b0905a27f53c53223e71b42a3656d0-primary.sqlite.bz2
Resolving mirrors.cat.pdx.edu... 131.252.208.20, 2610:10:20:208::20
Connecting to mirrors.cat.pdx.edu|131.252.208.20|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2010050 (1.9M) [text/plain]
Saving to: “2123a9b143f6c786498bdf05a65efbd678b0905a27f53c53223e71b42a3656d0-primary.sqlite.bz2”

 2% [===>                                                                                                                                                                  ] 59,368      38.8K/s   in 17s

2015-10-05 15:27:27 (3.35 KB/s) - Connection closed at byte 59368. Retrying.

--2015-10-05 15:27:28--  (try: 2)  http://mirrors.cat.pdx.edu/centos/6.7/updates/x86_64/repodata/2123a9b143f6c786498bdf05a65efbd678b0905a27f53c53223e71b42a3656d0-primary.sqlite.bz2
Connecting to mirrors.cat.pdx.edu|131.252.208.20|:80... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 2010050 (1.9M), 1950682 (1.9M) remaining [text/plain]
Saving to: “2123a9b143f6c786498bdf05a65efbd678b0905a27f53c53223e71b42a3656d0-primary.sqlite.bz2”

12% [++++===============>                                                                                                                                                  ] 249,056     --.-K/s  eta 72m 23s ^C

As you can see the connection stops and wget retries. This type of behaviour is the same when trying to update via yum. I get so many bytes of the package before it just stops downloading, then yum assumes it lost connection because the mirror is down and tries another mirror. We get more byte, but ultimately on larger packages we either have to try to download many times, after it goes through each mirror.

I’m not sure where to begin looking at the vyos config to troubleshoot. I’m very green to the vyos myself. I have knowledge of network which is why I’m at a routing issue.

The layer 2 traffic goes perfect to and from virtual and physical hosts in the internal network. Any connections to external sources is where we’re having the issue.

Thanks to anyone with ideas.


#2

If you are convinced of a Layer 3 problem how about trying some long running pings or mtr first. Running from behind the VyOS system as well as on it. What do you have configured for routing on the VyOS router? Static / dynamic protocol, default gateway, etc.


#3

I have done long pings and they’re also sporadic and based on start system. Pings started from the vyos have no issue. However pings started from another system behind it are hit and miss. Some will go through right away and never stop, others won’t even start and don’t receive and replies.

It is configured as a gateway and nothing else. Nothing really complicated by the setup.


#4

what does the ARP table look like on one of the hosts behind the VyOS, do they agree on the L2 address for the VyOS IP? when you tcpdump can you see multiple replies to an ARP request for the VyOS LAN IP?