Monitoring VyOS using visualisation?

luratech · April 1, 2022, 11:31pm

My preferred “least-effort” option is check_mk with check_mk_agent.sh over ssh. That gives you availability/alerts and basic metrics that at least allow for rough capacity planning. “Production-ready” monitoring (that’s NOT MISSING that one dataset that you need so desperately) is only possible with mostly custom code/configs imho (even with pricey commercial solutions that claim to cover everything).

My “no-regrets dreamteam” around vyos:

check_mk for availability monitoring (typically at least 2 instances for internal/external view)
telegraf/influxdb/grafana for performance-monitoring
graylog for logs (and optionally netflow) with pure rsyslog for log-shipping
security onion for deep dives into security/traffic (I’m usually collecting/storing a couple of days/weeks of raw traffic from all relevant interfaces)
optional: rabbitmq for optimized message routing, buffering during maintenance of targets systems or to workaround security restrictions…

these are the main dashboards for daily operation:

Grafana provides the general Bandwidth/Firewall-Status (enhanced version of this dashboard with pandemic-induced focus on VPN-Server metrics)

Graylog gives a useful event-based overview on the network/firewall status:

to make that all happen the following components were needed:

rsyslog-configs with custom templates for agentless shipping of messages in gelf-format
some python for collecting vpn-server metrics via telegraf
grafana dashboard
a couple of lines of code in vyos-postconfig-bootup-script that puts everything into place on reboots