Starlink: bizarre packet issues on VyOS, works fine from laptop

I’m having a really bizarre issue when Starlink is connected to a VyOS router. One example is that I can dig most domain domains for eg. dig apple.com @8.8.8.8 works fine, but if I do dig google.com @8.8.8.8 the query times out. It doesn’t matter what resolver I use, tried 1.1.1.1 and 9.9.9.9 as well, same result.
I did a packet capture on the WAN interface, and I see the query go out, but no return packet ever arrives for the latter query.

The kicker is that connecting the Starlink router to my laptop, I have none of these issues. The MTU is set to 1500 on both, and I’ve tried going all the way down to 1316 on VyOS without it having any affect.

I’ve tried with the Starlink router in both bridge mode, and in default router mode. This made no difference.

Also tested with both VyOS 1.4 and 1.5, on separate physical hardware.

How does this make any sense? Why would the DNS packet lose it’s way from VyOS, for a specific domain.
Any ideas on stuff I could try to narrow this down?

For what it’s worth, I just setup a Ubiquiti EdgeRouter X and connected it to the Starlink, it works fine, no issues. And then connected the VyOS box to the ER-X, and no issues. It appears to just be problematic if VyOS is connected directly to Starlink.

Are you actually typing – dig google.com 8.8.8.8 or are you typing dig google.com @8.8.8.8

If you’re typing it without the @, then it’s trying to resolve both google.com and 8.8.8.8, if you use the @, it’s trying to resolve google.com by querying 8.8.8.8.

Seems pedantic, but it would matter for troubleshooting as it could be a DNS software/resolution issue, or it could be a network issue related to connections made trying to resolve.

For example if you’re typing dig google.com 8.8.8.8, it’s going to use the system’s resolver, which could be your router, Starlink resolvers, etc and is that resolver actually recursively resolving, is it just forwarding queries, is it able to reach all the servers in the recursive lookup, etc.

If you’re typing dig google.com @8.8.8.8, then are you able to reach 8.8.8.8 each time, which would lead more towards Google resolver is doing something weird, if you try Cloudflare’s 1.1.1.1 do you get better results.

Sorry that was a typo in my post, fixing it now. Was certainly doing @8.8.8.8 and I did a wireshark over SSH to confirm that the DNS queries were going out via the starlink connected interface.

And also yes, tried other resolvers including cloudflare, also edited post to reflect that.

Are you testing connectivity from vyos directly, or from a host behind vyos?

A couple quick things you can do:

  • Remove the interface offloads if they are currently configured.
  • Verify you have SNAT correctly configured.

From a VyOS shell directly.

I don’t think any offloads were configured, but will mess with that next.

SNAT wouldn’t apply because I’m testing from the host (vyos) directly. But SNAT was configured correctly, confirmed because when starlink has the ER-X in between the VyOS box and the starlink router, there are no problems.

Do you allow for both 53/UDP and 53/TCP?

UDP will be used if the response can fit in a single packet normally 1280 bytes (or whatever EDNS is configured to). But if more than one packet is needed for the response then DNS will switch to TCP instead.

However when I try “dig @8.8.8.8 apple.com” and “dig @8.8.8.8 google.com” both with “any” and without any at the end the responses are between 54-1121 bytes so they should fit in a single packet no matter what.

I would also try with other public resolvers or just go for the authoritive servers of apple.com and google.com just to rule things out. Perhaps there is some anti-ddos over at 8.8.8.8 that blocks certain requests if the srcip is starlinkbased client?

1 Like

OK spent some more time on this today. Disabled all offloads, but no change.

Now I think related is that often VyOS fails to get a DHCP lease from the Starlink as well, I have to physically unplug the cable and plug it back in, usually 2-4 times to get a lease.

@Apachez there is no firewall here, just VyOS to Starlink, literally the only thing configured on VyOS is the WAN and LAN interfaces, and SSH. As simple as I could make it.
And I’ve tested with several DNS services as mentioned above, used Google DNS as an example only.

I have tested multiple physical hardware boxes running VyOS, but they were both the same model (a Lanner NCA series) so perhaps as a next step I’ll find some other model to test with, and also I might try a USB ethernet adapter.

Possible for you to grab a tcpdump?

That DHCP issue sounds like when you have *nix DHCP clients in a Windows DHCP-server network.

There is a flag if it would use broadcast or unicast to fetch IP and DHCP-relays (for example Cisco) doesnt seem to be compatible with that so the fix in the case Im thinking of is to add a registry flag at the Windows DHCP-server to make it behave according to standards.

Ill try to dig some into this and see if I can find the article about this. I think I have posted this previously on this forum aswell.

I found the references. I think most DHCP implementations have something similar.

The always-broadcast statement

always-broadcast flag;

The DHCP and BOOTP protocols both require DHCP and BOOTP clients to set
the broadcast bit in the flags field of the BOOTP message header.

Unfortunately, some DHCP and BOOTP clients do not do this, and therefore
may not receive responses from the DHCP server.

The DHCP server can be made to always broadcast its responses to clients
by setting this flag to ´on´ for the relevant scope; relevant scopes would
be inside a conditional statement, as a parameter for a class, or as a
parameter for a host declaration.

To avoid creating excess broadcast traffic on your network, we recommend 
that you restrict the use of this option to as few clients as possible.

For example, the Microsoft DHCP client is known not to have this problem,
as are the OpenTransport and ISC DHCP clients.
HKLM\SYSTEM\CurrentControlSet\Services\DhcpServer\Parameters

IgnoreBroadcastFlag

Default 1, workaround set to 0 to make it use unicast for replies.

Sure enough, it’s something to do with the NIC. The built-in Intel X553 1GbE adapters just don’t seem to like this Gen2 Starlink device (remember I tested multiple boxes with the same NIC so it’s not simply a one-off hardware issue)

But I plugged in a cheap ASIX AX88772A USB 2.0 adapter and it works great. I get DHCP right away and no weird DNS issues.

I just want to close this thread with the solution I ended up with. Which is to simply setup a WAN switch. The physical connection goes from Starlink, to switch, and from switch to VyOS.
This solves the problem, other than the root cause which still bothers me but I can’t justify spending more time into it.

If putting a L2-switch in between solves the issue I would think of issue with EEE (802.3az).

It exists in two flavours one where it autodetects the cable length so it wont need to push more power than needed and the other is to perform microsleeps in the PHY to save power between packets being sent (or received).

I have seen Apple devices having issue with this and by that I was forced to disable this in the L2-switch they were connected to.

Check with ethtool if this is the case for you?

This will display if EEE is active:

ethtool --show-eee ethX

And this will disable it when needed:

ethtool --set-eee ethX eee off

And if this solves your issues there is also kernel boot strings that can be applied (unique per NIC driver) to disable EEE during boot (without having to involve ethtool through pre-boot scripting).

1 Like

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.