ARTICLE: Scalable and Secure VxLAN Multisite using NetBird

I’m starting a new series where I build VxLAN as a multi-tenant, multisite solution using WireGuard by way of NetBird. I intend for this to be at a minimum a 4 part series, covering:

  • L3VPN
  • L2VPN
  • Multitenancy
  • Microsegmentation

I may come up with additional ideas as I write the articles, but also feel free to make requests/recommendations if there’s something you want to see covered.

Here is part 1 of the series, where I go over the initial configuration and basic single tenant L3VPN.

6 Likes

Great stuff @L0crian! Thank you for sharing your exciting new series! Building a VxLAN multi-tenant, multisite solution using WireGuard through NetBird sounds like a fascinating project. Your plan to cover topics such as L3VPN, L2VPN, Multitenancy, and Microsegmentation over a series of articles is incredibly valuable for those interested in network architecture.

I’ve just read Part 1 of your series and found the initial configuration and single tenant L3VPN overview very insightful. Keep up the great work, and thank you for sharing your knowledge with the community!

2 Likes

Here is part 2 of this series, where I cover adding L2VPN functionality to this solution:

4 Likes

Thank you for your great articles.
I’ve been trying to create this setup with vyos-2025.03.15-0018-rolling-generic-amd64.
However, when I perform the steps of creating the vrf’s, the netbird-container loses it’s wan connection, effectively breaking connection between the different nodes.
I tried to recreate the setup a few times, both in vm’s and on bare-metal and everything seems to be working fine, until I issue the vrf config commands.
Have things in vrf changed since v1.4 or am I missing something else?

There’s some bugginess in later rolling releases for VxLAN that you may be hitting.

Can you provide your current config as well as the output for:
sudo vtysh -c "show run"

Output of sudo vtysh -c "show run"

Building configuration...

Current configuration:
!
frr version 10.2.1
frr defaults traditional
hostname vyos
hostname spoke1
service integrated-vtysh-config
!
ip route 0.0.0.0/0 192.168.122.1 eth0 tag 210 210
!
vrf Main
 vni 1001
exit-vrf
!
router bgp 65000
 no bgp ebgp-requires-policy
 no bgp default ipv4-unicast
 no bgp network import-check
 neighbor <hub-netbird-ip> remote-as 65000
 !
 address-family l2vpn evpn
  neighbor <hub-netbird-ip> activate
  neighbor <hub-netbird-ip> next-hop-self
 exit-address-family
exit
!
router bgp 65000 vrf Main
 no bgp ebgp-requires-policy
 no bgp default ipv4-unicast
 no bgp network import-check
 !
 address-family ipv4 unicast
  redistribute connected
 exit-address-family
 !
 address-family l2vpn evpn
  advertise ipv4 unicast
 exit-address-family
exit
!
end

output of show configuration

container {
    name nb1 {
        allow-host-networks
        capability net-admin
        capability net-raw
        environment NB_SETUP_KEY {
            value <NETBIRD-KEY-REDACTED>
        }
        image netbirdio/netbird:latest
        volume NB_PATH {
            destination /etc/netbird
            source /config/containers/nb1
        }
    }
}
interfaces {
    bridge br0 {
        enable-vlan
        member {
            interface vxlan0 {
            }
        }
    }
    ethernet eth0 {
        address dhcp
        hw-id xx:xx:xx:xx:xx:87
        offload {
            gro
            gso
            sg
            tso
        }
    }
    ethernet eth10 {
        mtu 1400
    }
    loopback lo {
    }
    vxlan vxlan0 {
        mtu 1350
        parameters {
            external
        }
        port 4789
        source-interface eth10
    }
}
protocols {
    bgp {
        neighbor <netbird-hub-ip> {
            address-family {
                l2vpn-evpn {
                    nexthop-self {
                    }
                }
            }
            remote-as 65000
        }
        system-as 65000
    }
    static {
    }
}
service {
    ntp {
        allow-client {
            address 127.0.0.0/8
            address 169.254.0.0/16
            address 10.0.0.0/8
            address 172.16.0.0/12
            address 192.168.0.0/16
            address ::1/128
            address fe80::/10
            address fc00::/7
        }
        server time1.vyos.net {
        }
        server time2.vyos.net {
        }
        server time3.vyos.net {
        }
    }
    ssh {
    }
}
system {
    config-management {
        commit-revisions 100
    }
    console {
        device ttyS0 {
            speed 115200
        }
    }
    host-name spoke1
    login {
        user vyos {
            authentication {
                encrypted-password ****************
                plaintext-password ****************
            }
        }
    }
    name-server 8.8.8.8
    syslog {
        local {
            facility all {
                level info
            }
            facility local7 {
                level debug
            }
        }
    }
}
vrf {
    name Main {
        protocols {
            bgp {
                address-family {
                    ipv4-unicast {
                        redistribute {
                            connected {
                            }
                        }
                    }
                    l2vpn-evpn {
                        advertise {
                            ipv4 {
                                unicast {
                                }
                            }
                        }
                    }
                }
                system-as 65000
            }
        }
        table 1000
        vni 1001
    }
}

But I actualy don’t think the issue lies in the VxLAN.
Everything is working as described when following the tutorial part 1, up until I commit the vrf config:

set vrf name Main protocols bgp address-family ipv4-unicast redistribute connected
set vrf name Main protocols bgp address-family l2vpn-evpn advertise ipv4 unicast
set vrf name Main protocols bgp system-as '65000'
set vrf name Main table '1000'
set vrf name Main vni '1001'

There’s a bug currently where making a change to a VxLAN interface deletes the FRR config, which is why I wanted to see your vtysh output, but it looks fine.

One thing I can say to change for sure is to delete the offloads on the eth0 interface. I had an issue recently in a VxLAN lab where that wasn’t playing nice. That could also not be playing nice with Netbird.

Can you ping the internet from VyOS itself? Can you ping between any of the Netbird IPs (hub to either spoke, or spoke to spoke)? Is BGP up in either the ipv4-unicast or l2vpn-evpn address families (you can see all BGP address families at the same time with show bgp vrf all summary)? What’s the output of show ip route vrf all? I’m trying to understand what you mean when you say that Netbird lost its WAN connection. That’ll help tshoot what is going on with your lab.

Not after I commit the “set vrf Main…” commands. Before that all seems to work just fine.

same as above

show bgp vrf all summary BEFORE vrf config

L2VPN EVPN Summary:
BGP router identifier 192.168.122.162, local AS number 65000 VRF default vrf-id 0
BGP table version 0
RIB entries 0, using 0 bytes of memory
Peers 1, using 24 KiB of memory

Neighbor        V         AS   MsgRcvd   MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd   PfxSnt Desc
<netbird-hub-ip>  4      65000         8        13        0    0    0 01:54:38       Active        0 FRRouting/10.2.1

Total number of neighbors 1

show bgp vrf all summary AFTER vrf config

L2VPN EVPN Summary:
BGP router identifier 192.168.122.162, local AS number 65000 VRF default vrf-id 0
BGP table version 0
RIB entries 0, using 0 bytes of memory
Peers 1, using 24 KiB of memory

Neighbor        V         AS   MsgRcvd   MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd   PfxSnt Desc
<netbird-hub-ip> 4      65000         8        13        0    0    0 01:42:32      Connect        0 FRRouting/10.2.1

Total number of neighbors 1
% No BGP neighbors found in VRF Main

show ip route vrf all AFTER vrf config

Codes: K - kernel route, C - connected, L - local, S - static,
       R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR,
       f - OpenFabric, t - Table-Direct,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

VRF Main:
K>* 127.0.0.0/8 [0/0] is directly connected, Main, weight 1, 00:15:46

VRF default:
S>* 0.0.0.0/0 [210/0] via 192.168.122.1, eth0, weight 1, 00:36:57
C>* 100.100.0.0/16 is directly connected, eth10, weight 1, 00:34:49
L>* <spoke1-nebird-ip>/32 is directly connected, eth10, weight 1, 00:34:49
C>* 192.168.122.0/24 is directly connected, eth0, weight 1, 00:36:57
K * 192.168.122.0/24 [0/0] is directly connected, eth0, weight 1, 00:36:57
L>* 192.168.122.162/32 is directly connected, eth0, weight 1, 00:36:57

As soon as I commit any vrf config, my wan routing ‘breaks’. I can not get to WAN with vyos. This results in the netbird container not being able to reach the netbird server therefore losing the netbird hub-spoke connection.

Now, if I move my wan iface (eth0) to the Main vrf, I’m able to do ping 8.8.8.8 vrf Main. However when I try from inside the Netbird container using connect container nb1, it fails.
running ip route inside the container shows no default route. Adding it manually does not help.

Looks like BGP had been down for almost 2 hours even before you added (or re-added) the VRF commands. Seems like there’s something buggy going on with that version. I’ve had a lot of small issues with the rolling releases from this month, you may be seeing similar results.

I’d probably try to do the latest stream release rather than rolling and see how that works for you. That (in theory) should be more stable. Have you tried with the rc3 release from the blog post?

I just retried with 1.4rc3, but I’m running into the same result. I didn’t realise 1.4rc was available without a subscription, so that’s the reason I was running 1.5rolling.
I just copied every command from the article, only changing the netbird 100.100.x.x ip’s when applicable.

Something changed in the Netbird container. I went back to 0.25.9 and it works correctly. Let me see if I can see why it’s not working in newer versions.

0.25.9 is working for me as well.
Didn’t expect the issue to be caused by the newer netbird version.
Do you reckon it’s worth the try with netbird 0.25.9 on vyos1.5, or should I just stick with 1.4rc3?

I’d probably stick with 1.4rc3, but totally up to you. Won’t hurt anything to try 1.5 or stream if you want to. I tested 0.25.9 on the latest rolling and it also worked.

I’m not sure if the NetBird team will consider this a bug or not. You can do the same thing with Tailscale as well, with the exception that you need to add a single static route so the BGP sessions will come up.

Also should add I can’t find what NetBird changed that is causing traffic to fail. You can’t even ping local IPs on VyOS itself.

Im late to the party but generally speaking:

  1. When encounter issues try first latest rolling to rule out things of missing or changed commands along with driversupport by kernel etc.

  2. When fiddling with vrfs this is a common thing compared to other vendors. When you configure vrf for an interface all IP address configuration will be wiped so protip is to either DONT do this remote OR use a known good config and upload that to the device and reboot to make that config active in order to have proper vrf setup without losing the box halfway.