[solved] DHCP server hands out wrong addresses

I am experiencing some weird DHCP server behaviour. My VyOS machine has 4 private subnets in different VLANs:

$ sh int
Codes: S - State, L - Link, u - Up, D - Down, A - Admin Down
Interface        IP Address                        S/L  Description
---------        ----------                        ---  -----------
eth0             192.168.0.66/24                   u/u  WAN 
eth1             192.168.1.1/24                    u/u  LAN 
eth1.2           192.168.2.1/24                    u/u  untrusted 
eth1.3           192.168.3.1/24                    u/u  WiFi_guest 
eth1.99          192.168.99.1/24                   u/u  DMZ 
ifb0             -                                 u/u  QoS aggregation 
lo               127.0.0.1/8                       u/u  
                 ::1/128
vtun0            172.21.36.27/23                   u/u  fra-a30.ipvanish.com 
vtun1            172.21.32.68/23                   u/u  ams-a16.ipvanish.com 
vtun2            172.21.32.18/23                   u/u  sto-a05.ipvanish.com 
vtun3            172.21.33.22/23                   u/u  iev-c02.ipvanish.com 
vtun4            172.21.34.161/23                  u/u  nyc-a07.ipvanish.com

On VLANs 1 (native), 2, and 3, VyOS acts as a DHCP server:

$ sh conf com | grep dhcp-server
set service dhcp-server shared-network-name LAN authoritative
set service dhcp-server shared-network-name LAN subnet 192.168.1.0/24 default-router '192.168.1.1'
set service dhcp-server shared-network-name LAN subnet 192.168.1.0/24 dns-server '1.1.1.1'
set service dhcp-server shared-network-name LAN subnet 192.168.1.0/24 lease '7200'
set service dhcp-server shared-network-name LAN subnet 192.168.1.0/24 range LAN start '192.168.1.10'
set service dhcp-server shared-network-name LAN subnet 192.168.1.0/24 range LAN stop '192.168.1.99'
set service dhcp-server shared-network-name WiFi_guest authoritative
set service dhcp-server shared-network-name WiFi_guest subnet 192.168.3.0/24 default-router '192.168.3.1'
set service dhcp-server shared-network-name WiFi_guest subnet 192.168.3.0/24 dns-server '1.1.1.1'
set service dhcp-server shared-network-name WiFi_guest subnet 192.168.3.0/24 lease '7200'
set service dhcp-server shared-network-name WiFi_guest subnet 192.168.3.0/24 range WiFi_guest start '192.168.3.10'
set service dhcp-server shared-network-name WiFi_guest subnet 192.168.3.0/24 range WiFi_guest stop '192.168.3.99'
set service dhcp-server shared-network-name untrusted authoritative
set service dhcp-server shared-network-name untrusted subnet 192.168.2.0/24 default-router '192.168.2.1'
set service dhcp-server shared-network-name untrusted subnet 192.168.2.0/24 dns-server '1.1.1.1'
set service dhcp-server shared-network-name untrusted subnet 192.168.2.0/24 lease '7200'
set service dhcp-server shared-network-name untrusted subnet 192.168.2.0/24 range untrusted start '192.168.2.10'
set service dhcp-server shared-network-name untrusted subnet 192.168.2.0/24 range untrusted stop '192.168.2.99'

Physically, eth1 connects to port 1 on my switch. My (VLAN-aware) WiFi access point connects to the same switch on port 8. Here is the switch’s VLAN configuration:

And here are the corresponding ports:

The AP (on which exactly the same setup has worked before with an OPNsense machine) is configured like this:

So, when a device connects to SSID “SchreckNET”, it should receive an IP address in the 192.168.1.0/24 network. --> This works as expected.

When a device connects to SSID “SchreckNET_ugly”, it should receive an IP address in the 192.168.2.0/24 network. --> This DOES NOT work as expected, only in some cases it does.

When a device connects to SSID “Camarilla”, it should receive an IP address in the 192.168.3.0/24 network. --> This DOES NOT work as expected. It never does.

As an example, here is what happens when I connect my phone to “SchreckNET_ugly”:

$ sh dhcp se le
IP address    Hardware address    Lease expiration     Pool       Client Name
------------  ------------------  -------------------  ---------  ----------------
192.168.2.10  68:54:fd:4c:98:c8   2018/11/14 00:55:07  untrusted  [redacted]
192.168.1.10  74:d4:35:10:a5:72   2018/11/14 00:30:41
192.168.1.11  30:05:5c:8c:54:87   2018/11/14 00:36:14  LAN        [redacted]
192.168.1.20  50:01:d9:82:06:b7   2018/11/14 00:58:31  LAN        Honor_6X [<-- my phone]

Although both 192.168.1.0/24 and 192.168.2.0/24 are allowed to access the internet, a device which is given an address from the “wrong” network is unable to reach anything on the outside. This does not only happen with my phone but also with other people’s phones and laptop computers. Furthermore, whenever a device is given a wrong address, it will be in the 192.168.1.0/24 range.

There is nothing in the log that I can see:

$ sh log dhcp 
Nov 13 17:50:42 mewdemstr1 dhcpd: 
Nov 13 17:50:42 mewdemstr1 dhcpd: No subnet declaration for eth0 (192.168.0.66).
Nov 13 17:50:42 mewdemstr1 dhcpd: ** Ignoring requests on eth0.  If this is not what
Nov 13 17:50:42 mewdemstr1 dhcpd:    you want, please write a subnet declaration
Nov 13 17:50:42 mewdemstr1 dhcpd:    in your dhcpd.conf file for the network segment
Nov 13 17:50:42 mewdemstr1 dhcpd:    to which interface eth0 is attached. **
Nov 13 17:50:42 mewdemstr1 dhcpd: 
Nov 13 17:50:42 mewdemstr1 dhcpd: 
Nov 13 17:50:42 mewdemstr1 dhcpd: No subnet declaration for eth1.99 (192.168.99.1).
Nov 13 17:50:42 mewdemstr1 dhcpd: ** Ignoring requests on eth1.99.  If this is not what
Nov 13 17:50:42 mewdemstr1 dhcpd:    you want, please write a subnet declaration
Nov 13 17:50:42 mewdemstr1 dhcpd:    in your dhcpd.conf file for the network segment
Nov 13 17:50:42 mewdemstr1 dhcpd:    to which interface eth1.99 is attached. **
Nov 13 17:50:42 mewdemstr1 dhcpd:

What is going on here, what am I overlooking?

I’m going to ask the obvious questions, and only because I’m using multiple setups like you describe, and I don’t have any issues. You don’t mention what version you are running.

The setup worked end-to-end before only just replacing OPNSense with VyOS? Meaning same switch config and same AP config?

  1. Is the switch actually tagging those VLANs? ie, is it smart enough to use the setup you pictured to put the tags on port 1,8 without any additional config under one of those menus.
  2. Are the ports actually plugged into where you think they are?

Thank you for your reply.

  • I am running the current release candidate:
$ sh system im
The system currently has the following image(s) installed:

   1: 1.2.0-rc7 (default boot) (running image)
   2: 1.2.0-rc6
  • Yes, before the migration from OPNsense to VyOS, this worked as intended. The switch and AP configurations remained untouched.
  • Yes. The switch (a TP-Link TL-SG3210) is tagging the VLANs on egress ports 1 and 8.
  • Yes, I double checked the physical connections. VyOS connects to port 1, the AP connects to port 8.

One little addition:

When a device requests an address, sometimes (not always…), it will receive addresses from two pools. See the Android device below:

$ sh dhcp se le
IP address    Hardware address    Lease expiration     Pool       Client Name
------------  ------------------  -------------------  ---------  ------------------------
192.168.2.10  68:54:fd:4c:98:c8   2018/11/14 14:03:43
192.168.1.10  74:d4:35:10:a5:72   2018/11/14 14:18:43             PC
192.168.1.11  30:05:5c:8c:54:87   2018/11/14 14:36:14  LAN        BRN30055C8C5487
192.168.1.19  40:b8:37:c8:cf:7c   2018/11/14 14:52:42  LAN        android-17efb6ef2a127d5b
192.168.2.12  40:b8:37:c8:cf:7c   2018/11/14 14:52:42  untrusted  android-17efb6ef2a127d5b

The device is now able to reach the outside world from its 192.168.2.12 address. In this case, the wrongly issued lease seems to be only temporary:

$ ping 192.168.1.19
PING 192.168.1.19 (192.168.1.19) 56(84) bytes of data.
From 192.168.1.1 icmp_seq=1 Destination Host Unreachable
From 192.168.1.1 icmp_seq=2 Destination Host Unreachable
From 192.168.1.1 icmp_seq=3 Destination Host Unreachable
^C
--- 192.168.1.19 ping statistics ---
4 packets transmitted, 0 received, +3 errors, 100% packet loss, time 3088ms
pipe 4

$ ping 192.168.2.12
PING 192.168.2.12 (192.168.2.12) 56(84) bytes of data.
64 bytes from 192.168.2.12: icmp_seq=1 ttl=64 time=95.2 ms
64 bytes from 192.168.2.12: icmp_seq=2 ttl=64 time=117 ms
64 bytes from 192.168.2.12: icmp_seq=3 ttl=64 time=8.46 ms
^C
--- 192.168.2.12 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 8.462/73.683/117.299/46.985 ms

And here is some current log data showing “wrong network” errors:

Nov 14 13:52:41 mewdemstr1 dhcpd[4059]: DHCPDISCOVER from 40:b8:37:c8:cf:7c (android-17efb6ef2a127d5b) via eth1
Nov 14 13:52:41 mewdemstr1 dhcpd[4059]: DHCPDISCOVER from 40:b8:37:c8:cf:7c via eth1.2
Nov 14 13:52:42 mewdemstr1 dhcpd[4059]: DHCPOFFER on 192.168.1.19 to 40:b8:37:c8:cf:7c (android-17efb6ef2a127d5b) via eth1
Nov 14 13:52:42 mewdemstr1 dhcpd[4059]: DHCPOFFER on 192.168.2.12 to 40:b8:37:c8:cf:7c (android-17efb6ef2a127d5b) via eth1.2
Nov 14 13:52:42 mewdemstr1 dhcpd[4059]: DHCPREQUEST for 192.168.1.19 (192.168.1.1) from 40:b8:37:c8:cf:7c (android-17efb6ef2a127d5b) via eth1.2: wrong network.
Nov 14 13:52:42 mewdemstr1 dhcpd[4059]: DHCPNAK on 192.168.1.19 to 40:b8:37:c8:cf:7c via eth1.2
Nov 14 13:52:42 mewdemstr1 dhcpd[4059]: DHCPREQUEST for 192.168.1.19 (192.168.1.1) from 40:b8:37:c8:cf:7c (android-17efb6ef2a127d5b) via eth1
Nov 14 13:52:42 mewdemstr1 dhcpd[4059]: DHCPACK on 192.168.1.19 to 40:b8:37:c8:cf:7c (android-17efb6ef2a127d5b) via eth1
Nov 14 13:52:42 mewdemstr1 dhcpd[4059]: DHCPDISCOVER from 40:b8:37:c8:cf:7c (android-17efb6ef2a127d5b) via eth1.2
Nov 14 13:52:42 mewdemstr1 dhcpd[4059]: DHCPOFFER on 192.168.2.12 to 40:b8:37:c8:cf:7c (android-17efb6ef2a127d5b) via eth1.2
Nov 14 13:52:42 mewdemstr1 dhcpd[4059]: DHCPDISCOVER from 40:b8:37:c8:cf:7c via eth1
Nov 14 13:52:42 mewdemstr1 dhcpd[4059]: DHCPOFFER on 192.168.1.19 to 40:b8:37:c8:cf:7c (android-17efb6ef2a127d5b) via eth1
Nov 14 13:52:42 mewdemstr1 dhcpd[4059]: DHCPREQUEST for 192.168.2.12 (192.168.2.1) from 40:b8:37:c8:cf:7c (android-17efb6ef2a127d5b) via eth1.2
Nov 14 13:52:42 mewdemstr1 dhcpd[4059]: DHCPACK on 192.168.2.12 to 40:b8:37:c8:cf:7c (android-17efb6ef2a127d5b) via eth1.2
Nov 14 13:52:42 mewdemstr1 dhcpd[4059]: DHCPREQUEST for 192.168.2.12 (192.168.2.1) from 40:b8:37:c8:cf:7c (android-17efb6ef2a127d5b) via eth1: wrong network.
Nov 14 13:52:42 mewdemstr1 dhcpd[4059]: DHCPNAK on 192.168.2.12 to 40:b8:37:c8:cf:7c via eth1
[...]
Nov 14 14:01:23 mewdemstr1 dhcpd[4059]: DHCPREQUEST for 192.168.2.10 (192.168.2.1) from 68:54:fd:4c:98:c8 via eth1.2
Nov 14 14:01:23 mewdemstr1 dhcpd[4059]: DHCPACK on 192.168.2.10 to 68:54:fd:4c:98:c8 (amazon-b3c8977f1) via eth1.2
Nov 14 14:01:23 mewdemstr1 dhcpd[4059]: DHCPREQUEST for 192.168.2.10 (192.168.2.1) from 68:54:fd:4c:98:c8 (amazon-b3c8977f1) via eth1: wrong network.
Nov 14 14:01:23 mewdemstr1 dhcpd[4059]: DHCPNAK on 192.168.2.10 to 68:54:fd:4c:98:c8 via eth1

Weirdly, the “amazon-b3c8977f1” device always gets a correct address, even though it too shows the “wrong network” error.

One thing that might be worth trying out is rebooting back into the older image. Or, you can always readd an older image with “add system image old-rc6/5/etc”.

The issue persists at least since RC2 (which is the version I originally migrated to).

Hello, @matzus!
This behavior is really strange. I can propose to do one of this:

  • analyze incoming traffic on eth0 interface to see from which VLAN come DHCP requests;
  • change port 1 configuration on switch to trunk and use new VLAN instead 1.

Okay, I found a hint here: https://serverfault.com/questions/463391/switch-sending-dhcp-packets-to-wrong-vlan

The issue seems to be that some component (maybe the Linux network stack?) does not like tagged and untagged (in my case native) VLAN traffic on the same NIC. Specifically, broadcasts originating from the native VLAN (e.g. DHCP requests) will be propagated across VLAN boundaries.

My solution was to create another subnet for native VLAN traffic that I now use for management traffic only. This way, DCHP traffic will be kept out of the other VLANs and stays duly restricted to its assigned subnet.