I have upgrade yesterday one of my 2 route reflector servers from rolling 1.2.xxx to latest 1.4-rolling-202102130218 . It seems that it start to have packet loss every 30-40 icmp packets . In the logs i see the below event ,
Feb 15 14:09:00 ef-1-routeserver2 ntpd[2015]: routing socket reports: No buffer space available
Feb 15 14:09:00 ef-1-routeserver2 ntpd[2015]: message repeated 2 times: [ routing socket reports: No buffer space available]
Feb 15 14:09:00 ef-1-routeserver2 ntpd[2015]: routing socket reports: No buffer space available
Feb 15 14:09:01 ef-1-routeserver2 ntpd[2015]: routing socket reports: No buffer space available
Feb 15 14:09:01 ef-1-routeserver2 ntpd[2015]: message repeated 16 times: [ routing socket reports: No buffer space available]
Feb 15 14:09:01 ef-1-routeserver2 ntpd[2015]: routing socket reports: No buffer space available
Feb 15 14:09:02 ef-1-routeserver2 ntpd[2015]: message repeated 3 times: [ routing socket reports: No buffer space available]
Feb 15 14:09:02 ef-1-routeserver2 ntpd[2015]: routing socket reports: No buffer space available
Feb 15 14:09:02 ef-1-routeserver2 ntpd[2015]: message repeated 8 times: [ routing socket reports: No buffer space available]
[email protected]:~$ show configuration commands | strip-private
set interfaces bonding bond0 description ‘Trunk_to_Z9100’
set interfaces bonding bond0 hash-policy ‘layer2’
set interfaces bonding bond0 member interface ‘eth2’
set interfaces bonding bond0 member interface ‘eth3’
set interfaces bonding bond0 mode ‘802.3ad’
set interfaces bonding bond0 vif 702
set interfaces bonding bond0 vif 704
set interfaces bonding bond0 vif 707 address ‘xxx.xxx.79.185/30’
set interfaces bonding bond0 vif 707 address ‘xxxx:xxxx:1::a/126’
set interfaces bonding bond0 vif 708 address ‘xxx.xxx.79.189/30’
set interfaces bonding bond0 vif 708 address ‘xxxx:xxxx:1::e/126’
set interfaces bonding bond0 vif 999 address ‘xxx.xxx.0.4/29’
set interfaces ethernet eth0 address ‘xxx.xxx.0.42/24’
set interfaces ethernet eth0 description ‘Management’
set interfaces ethernet eth0 duplex ‘auto’
set interfaces ethernet eth0 hw-id ‘XX:XX:XX:XX:XX:e0’
set interfaces ethernet eth0 speed ‘auto’
set interfaces ethernet eth1 duplex ‘auto’
set interfaces ethernet eth1 hw-id ‘XX:XX:XX:XX:XX:e1’
set interfaces ethernet eth1 speed ‘auto’
set interfaces ethernet eth2 duplex ‘auto’
set interfaces ethernet eth2 hw-id ‘XX:XX:XX:XX:XX:50’
set interfaces ethernet eth2 speed ‘auto’
set interfaces ethernet eth3 duplex ‘auto’
set interfaces ethernet eth3 hw-id ‘XX:XX:XX:XX:XX:52’
set interfaces ethernet eth3 speed ‘auto’
set interfaces loopback lo address ‘xxx.xxx.100.4/32’
set interfaces loopback lo address ‘xxxx:xxxx::4/128’
set protocols bgp XXXXXX neighbor xxx.xxx.100.5 address-family ipv4-unicast soft-reconfiguration inbound
set protocols bgp XXXXXX neighbor xxx.xxx.100.5 remote-as ‘50919’
set protocols bgp XXXXXX neighbor xxx.xxx.100.5 update-source ‘xxx.xxx.100.4’
set protocols bgp XXXXXX neighbor xxx.xxx.100.6 address-family ipv4-unicast soft-reconfiguration inbound
set protocols bgp XXXXXX neighbor xxx.xxx.100.6 remote-as ‘50919’
set protocols bgp XXXXXX neighbor xxx.xxx.100.6 update-source ‘xxx.xxx.100.4’
set protocols bgp XXXXXX neighbor xxxx:xxxx::1 address-family ipv6-unicast
set protocols bgp XXXXXX neighbor xxxx:xxxx::1 remote-as ‘50919’
set protocols bgp XXXXXX neighbor xxxx:xxxx::1 update-source ‘xxxx:xxxx::4’
set protocols bgp XXXXXX neighbor xxxx:xxxx::2 address-family ipv6-unicast
set protocols bgp XXXXXX neighbor xxxx:xxxx::2 remote-as ‘50919’
set protocols bgp XXXXXX neighbor xxxx:xxxx::2 update-source ‘xxxx:xxxx::4’
set protocols ospf area 0 network ‘xxx.xxx.0.0/29’
set protocols ospf area 0 network ‘xxx.xxx.100.4/32’
set protocols ospf area 0 network ‘xxx.xxx.79.188/30’
set protocols ospf area 0 network ‘xxx.xxx.79.184/30’
set protocols ospf parameters abr-type ‘cisco’
set protocols ospf parameters router-id ‘xxx.xxx.100.4’
set protocols ospf passive-interface ‘default’
set protocols ospf passive-interface-exclude ‘bond0.999’
set protocols ospf passive-interface-exclude ‘bond0.708’
set protocols ospf passive-interface-exclude ‘bond0.707’
set protocols ospfv3 area xxx.xxx.0.0 interface bond0.707
set protocols ospfv3 area xxx.xxx.0.0 interface bond0.708
set protocols ospfv3 area xxx.xxx.0.0 range xxxx:xxxx:1::8/126
set protocols ospfv3 area xxx.xxx.0.0 range xxxx:xxxx:1::c/126
set protocols ospfv3 parameters router-id ‘xxx.xxx.100.4’
set protocols ospfv3 redistribute connected
set protocols static route xxx.xxx.2.0/24 next-hop xxx.xxx.0.120
set protocols static route xxx.xxx.40.0/24 next-hop xxx.xxx.0.120
set service snmp community interworks authorization ‘ro’
set service snmp community interworks network ‘xxx.xxx.2.0/24’
set service ssh port ‘2222’
set system config-management commit-revisions ‘20’
set system console device ttyS0 speed ‘9600’
set system host-name xxxxxx
set system login user xxxxxx authentication encrypted-password xxxxxx
set system login user xxxxxx authentication plaintext-password xxxxxx
set system login user xxxxxx authentication encrypted-password xxxxxx
set system login user xxxxxx authentication plaintext-password xxxxxx
set system login user xxxxxx authentication encrypted-password xxxxxx
set system login user xxxxxx authentication plaintext-password xxxxxx
set system name-server ‘xxx.xxx.8.8’
set system name-server ‘xxx.xxx.4.4’
set system ntp server xxxxx.tld
set system ntp server xxxxx.tld
set system ntp server xxxxx.tld
set system syslog global facility all level ‘notice’
set system syslog global facility protocols level ‘debug’
set system syslog host xxx.xxx.2.130 facility all level ‘warning’
set system time-zone ‘Europe/Athens’
Could you show an output of the command sudo top and press 1
Also, it will be interesting to get NICs information
show interfaces ethernet eth0 physical
show interfaces ethernet eth1 physical
show interfaces ethernet eth2 physical
show interfaces ethernet eth3 physical
And check sudo dmesg output, try to confirm that this is not conntrack table overflow issue
I guess the main issue related to i40e firmware and i40e kernel module, but on the other hand I see high cpu load by frr components, like zebra, mpls, etc.
Had this exact problem, on one of our core routers after an update. The NIC is actually flapping, it drops the BGP and OSPF routing tables and makes a mess of things, had to isolate router. It may even run for 24 hours but once it happens it’s game over. Eventually this popped up. So the interface logging was not showing anything, I presumed it was routing over flows but that was not the case.
Still having issues, no errors but ceilinged at 300mb on a 10gb and started dropping traffic, can’t find anything in the logs. Remote side says they were seeing flapping when it was pushed. Nothing in the logs on our side.