Problem with dhcp-relay

I just ran into an interesting problem with DHCP-relay from a multihomed VyOS router (R2). DHCPDISCOVER requests from the client are transmitted to a remote DHCP server, and the server responds with a DHCPOFFER. This is received by R2, but not transmitted back to the client, as witnessed by tcpdump on R2 (and the client not actually getting an IP address). When I disable DHCP-relay on R2, and add a singlehomed VyOS router R3, just as DHCP-relay, to the client subnet, everything works as it should: the client recieves the offer, and requests and receives (dhcpack) the offered address.

One obvious difference between the two, is in the unicast addressing used:
R2:
DHCPDISCOVER: s:10.10.20.2 d:10.10.10.1
DHCPOFFER: s:10.10.10.1 d:[color=#FF0000]10.20.10.1[/color]

R3:
DHCPDISCOVER: s:10.20.10.2 d:10.10.10.1
DHCPOFFER: s:10.10.10.1 d:10.20.10.2

R2 sends the DHCPDISCOVER on it’s upstream interface (R2 eth0), but the DHCP server reponds with a DHCPOFFER directed at the IP listed in the DHCPDISCOVER’s giaddr field , which is the address of R2 eth1. This is correct behaviour as per the DHCP RFC. Apparently, the DHCP relay agent on R2 is unable to match the offer to the discover (tcpdump shows the transaction IDs match). When I use R3 as relay, because of it’s configuration the addresses stay the same, and (presumably) the relay agent is able to match the offer, and forwards it to the client.

Did I miss something when I configured the DHCP-relay on R2, or is this a bug?

Some details:

[DHCPd]-10.10.10.0/24-[R1]-10.10.20.0/24-[R2]-10.20.10.0/24-[client]

DHCPd is a physical linux box running ISC DHCP v4.3.2. v4.2.4 was tested as well.
R1 and R2 are dualhomed virtual VyOS routers
R3 is a singlehome virtual VyOS router connected to 10.20.10.0/24
Client is a virtual linux client (Kali)
All virtual systems are VMware Workstation 11, running on a PC in subnet 10.10.10.0/24. R1 has a bridged NIC in this subnet.
All VyOS routers are v1.1.3.
Other than the DHCP issue, the networks work as intended.

TCPdump from R2, R2 as relay:

19:56:11.635394 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 00:0c:29:c2:f5:fb (oui Unknown), length 334
19:56:11.635628 IP 10.10.20.2.bootps > 10.10.10.1.bootps: BOOTP/DHCP, Request from 00:0c:29:c2:f5:fb (oui Unknown), length 334
19:56:11.637151 IP 10.10.10.1.bootps > 10.20.10.1.bootps: BOOTP/DHCP, Reply, length 339

R2:

interfaces { ethernet eth0 { address 10.10.20.2/24 description Outside hw-id 00:0c:29:79:f8:ae } ethernet eth1 { address 10.20.10.1/24 description Inside hw-id 00:0c:29:79:f8:b8 } loopback lo { } } protocols { static { route 0.0.0.0/0 { next-hop 10.10.20.1 { } } } } service { dhcp-relay { interface eth1 server 10.10.10.1 } }

R3:

interfaces { ethernet eth0 { address 10.20.10.2/24 hw-id 00:0c:29:c0:b9:56 } loopback lo { } } protocols { static { route 0.0.0.0/0 { next-hop 10.20.10.1 { } } } } service { dhcp-relay { interface eth0 server 10.10.10.1 } }

R1 is a straightforward router with static routes:

interfaces {
    ethernet eth2 {
        address 10.10.20.1/24
        description Outside
        hw-id 00:0c:29:d7:77:53
        }
    }
    ethernet eth3 {
        address 10.10.10.3/24
        description Thuisnet
        hw-id 00:0c:29:d7:77:49
    }
    loopback lo {
    }
}
protocols {
    static {
        route 0.0.0.0/0 {
            next-hop 10.10.10.1 {
            }
        }
        route 10.20.10.0/24 {
            next-hop 10.10.20.2 {
            }
        }
    }
}

Small update: digging through the matter, and including Vyatta en EdgeOS configurations, it seems DHCP relay needs be configured on both interfaces of R2. This is counterintuitive, and potentially broken behaviour, as it would mean that both interfaces are listening for DHCP broadcasts, even if they shouldn’t. It would definitely be broken if the DHCP server would be on the same subnet 10.10.20.0/24 in the example above. There should not be a relay on a subnet that holds a DHCP server.

Managed to do some experiments. Configuring the DHCP on the DHCP-server facing interface of R1 solves the problem. Unfortunately, this also means R1 now functions as a DHCP-relay for the 10.10.20.0/24 subnet. Not a big issue in this setup, but if the DHCP-server would have been in that subnet, this would have been an undesirable situation. There should be no DHCP relay-servers pointing to a server in the same subnet, as this results in unnecessary double updates.

I work at a university and we are teaching dhcp-relay using VyOS routers and noticed this same behavior. Found the same “solution” recently. Hoping for a fix sometime soon. Not a good way to teach students about how dhcp-relay should work. The VyOS guide (http://vyos.net/wiki/User_Guide#DHCP_Relay) seems to imply that dhcp-relay should only be needed on one interface.

Thanks,
Kizan

I hit this today in some lab work I am doing. I also found https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=648401 which matches the symptoms.

I’ve updated wiki to reflect the requirement to specify the interface the DHCP server replies are coming from, and linked to the debian bug.