Monitoring VyOS using visualisation?

anowak · March 5, 2022, 4:27am

Hi All,

Wondering if anyone has had any experience setting up visualisation monitoring for VyOS?

I have been busy searching the web trying to find an answer and I see people mentioning the following setups.

Telegraf > InfuxDB > Grafana
VyOS > Greylog > ElasticSearch > Grafana
ElasticSearch > Logstasg > Kibana

Unfortunately none of the methods have a how to guides and it seems a massive learning curve to learn grafana, greylog or kibana. I did get Telegraf > InfuxDB > Grafana operational but now just staring at nothing in Grafana as I need to understand Influx script and then make sense of how to create a graph/table

I also stood up a LibreNMS system, but it does not show what traffic is passing through or external connection attempts etc. Mainly CPU temp, Ports are up/down and traffic weight.

Adding syslog support into LibreNMS also does not help, as it only produces what is already possible through VyOS CLI. I’m looking at something similar to this project (GitHub - pfelk/pfelk: pfSense/OPNsense + Elastic Stack) where I can gather some more detail at a single glance.

Happy for suggestions, ideas or solutions with a guide, would be awesome!

Kind Regards

n.fort · March 5, 2022, 11:23am

I still had no time to prepare a monitoring tool for my VyOS.
But in the past I was able to monitor networking equipment using Prometheus+Grafana. Most valuable information was recollected using snmp exporter (in Prometheus, think of exporters as little modules). There a tons of exporters that helps you get useful information.

As every monitoring tool, there’s a learning curve on how to setup it up properly. But that is something you’ll have to deal with, regardless what monitoring tool you choose.

anowak · March 6, 2022, 4:04am

Hi n.fort

Thanks for the reply. I understand that Telegraf is already in build 1.3 and active in 1.4 … not sure about Prometheus, but will take a look. I was hoping from this tread that some ones managed to battle the hard yards and point me in the right direction as to how that information is pulled out of the agents and presented. I can only hope

Kind Regards

madmatt · March 6, 2022, 5:39pm

Vyos has all the framework to support the standard prometheus node_exporter package. Any tutorial for setting up prometheus+grafana+node_exporter will apply to the standard Linux parts of Vyos (Host/CPU/Network/disk stats)

The Prometheus node_exporter package is not in the standard 1.4 builds, you need to add it either to your custom build if you have set up your dev environment, or you can apt-get install the relevant packages every time you upgrade:

vyos@vyos:~$ sudo apt list --installed | match node

prometheus-node-exporter-collectors/now 0+git20210115.7d89f19-1 all [installed,local]
prometheus-node-exporter/now 1.1.2+ds-2.1 amd64 [installed,local]

These two packages, when installed, will configure the standard prometheus node collector service

vyos@vyos:~$ sudo systemctl status prometheus-node-exporter
● prometheus-node-exporter.service - Prometheus exporter for machine metrics
     Loaded: loaded (/lib/systemd/system/prometheus-node-exporter.service; enabled; vendor preset: enabled)
     Active: active (running) since Thu 2022-02-24 08:59:41 UTC; 1 weeks 3 days ago
       Docs: https://github.com/prometheus/node_exporter
   Main PID: 866 (prometheus-node)
      Tasks: 9 (limit: 4689)
     Memory: 31.5M
        CPU: 41.769s
     CGroup: /system.slice/prometheus-node-exporter.service
             └─866 /usr/bin/prometheus-node-exporter

vyos@vyos:~$ sudo netstat -anvp | grep LISTEN | grep 9100
netstat: no support for `AF INET (sctp)' on this system.
netstat: no support for `AF INET (sctp)' on this system.
tcp6       0      0 :::9100                 :::*                    LISTEN      866/prometheus-node

NB: the service listens on all interfaces so make sure you have your firewall filters set up appropriately to allow querying the metrics only from trusted networks/hosts

Once the local node_exporter service is up and running it will expose default linux system metrics on vyos_ip:9100/metrics

and you can use the default Grafana-prometheus full node dashboard to check the data exported to prometheus: (Node Exporter Full | Grafana Labs)

or roll out your own.
There is also a prometheus exporter for FRR, that could be integrated in the mix, but I couldn’t find a ready made grafana dashboard to go with it. This would also need to be manually installed/added to the 1.4 rolling build

fernando · March 7, 2022, 2:09pm

add a comment , another option that you mention and it’s native on VyOS , it’s telegraf .Then

telegraf+ influx+grafana

it can be another good option .

https://phabricator.vyos.net/T3872

anowak · March 8, 2022, 1:25am

Hi madmatt,

Really appreciate your post - looking through it and I’ll see how I go. Does Prometheus also show the traffic going in and out of each port?

Kind Regards

anowak · March 8, 2022, 1:48am

Hi Fernando,

Yes, I looked at telegraf - I am using version 1.3 so it’s not part of the CLI. Manual configuration is necessary.

I installed InfluxDB using a docker container.

version: "3"

services:
  influxdb:
    image: influxdb:latest
    hostname: influxdb
    container_name: influxdb
    ports:
      - "8086:8086/tcp"
    volumes:
      - "./var-lib-influxdb2:/var/lib/grafana"
      - "./etc-influxdb2/config.yml:/etc/influxdb2/config.yml"
    restart: always

This brings up the container and I run through the wizard create a username, organization, bucket. I then provides me with a token.

I then modifying /etc/telegraf.conf and get it pushing over to InfluxDB:
I just changed the following section.

[[outputs.influxdb_v2]]
  urls = ["http://dockersrv02.lan:8086"]
  token = "abcd" // need this created in InfluxDB first
  organization = "localnet"
  bucket = "vyos-bucket"

Needed to punch a hole in the firewall to allow traffic from Local to Inside on port 8086.
Also require to start the telegraf service.
I run telegraf --debug to test if it is grabbing information - all good here.

At this point I hit a blank as I have no idea how to browse the data that has come into InfluxDB (looks like InfluxDB has it’s own graphing tool? why is this not used - still to read more.)

Again I use the docker version of Grafana.

version: "3"

services:
  grafana:
    image: grafana/grafana-oss:latest
    container_name: grafana
    ports:
      - "3000:3000/tcp"
    restart: always

I add the InfluxDB using flux query language which supports the token, org and bucket information.

That is about the point where I get to… How can I see what is being passed through? maybe someone knows? Also if Prometheus has some preconfigured graphs it may be helpful?

Kind Regards

madmatt · March 8, 2022, 7:01am

Influxdb is a time serie database, it used yo have an admin gui but that has been deprecated, if you want to check whether Vyos is sending data to influx correctly, you need to either:

add the visualization container for Influx, called chronograph
open a shell to the influxdb container, and run queries on the the influx cli

Neither prometheus nor influxdb have ‘pre-configured graphs’, Grafana has already made dashboards tailored to specific data collection agents and specific backend databases

In my previous post I linked to a dashboard that has pre-configured widgets when you use prometheus’s node_exporter as a collection agent and prometheus as a backend.
If you are using telegraf, one dashboard may be this one: InfluxDB Linux Server Telegraf | Grafana Labs
but it really depends on the model used for collecting and stoing data into the database …

anowak · March 8, 2022, 9:15pm

Hi madmatt,

Thanks for the reply… I will see if I can set it up and see how I go!

Kind Regards

n.fort · March 10, 2022, 1:18pm

Prometheus, using exporters,is in charge of collection data. For example you can get in/out traffic from every interface using snmp.
Then, in grafana you “import” data collected by Prometheus, and then you can create desired graphs.
In a simplified scenario, almost every advance combination of monitoring tools works like that. Basic tasks that these tools does:

Tool that collect data (and stores it) → Prometheus
Tool that reads the data and generates graphs, dashboards, etc → Grafana
Alarm tools: based on defined values, alerts can be triggered → AlertManager

anowak · March 11, 2022, 5:23am

Thanks n.fort. Guess it’s a learning curve I’ll need to get to sooner than later.

My concern is not having enough visibility with VyOS monitoring or looking at the messages log. Never seem to get any hits from the wan to the firewall even with logging enabled which troubles me? With previous opnsense firewall I would see constant hits to the firewall. Believe I had a post about monitoring a while back and never quite resolved it.

Kind Regards

RyVolodya · March 11, 2022, 7:09am

Hello @anowak
You can use Zabbix. Install Zabbix agent on VyOS and you will monitor the processor, memory, disks and network.

Viacheslav · March 12, 2022, 3:56pm

It will be in the next stable release or you can build own image as this code already in equuleus branch

luratech · April 1, 2022, 11:31pm

My preferred “least-effort” option is check_mk with check_mk_agent.sh over ssh. That gives you availability/alerts and basic metrics that at least allow for rough capacity planning. “Production-ready” monitoring (that’s NOT MISSING that one dataset that you need so desperately) is only possible with mostly custom code/configs imho (even with pricey commercial solutions that claim to cover everything).

My “no-regrets dreamteam” around vyos:

check_mk for availability monitoring (typically at least 2 instances for internal/external view)
telegraf/influxdb/grafana for performance-monitoring
graylog for logs (and optionally netflow) with pure rsyslog for log-shipping
security onion for deep dives into security/traffic (I’m usually collecting/storing a couple of days/weeks of raw traffic from all relevant interfaces)
optional: rabbitmq for optimized message routing, buffering during maintenance of targets systems or to workaround security restrictions…

these are the main dashboards for daily operation:

Grafana provides the general Bandwidth/Firewall-Status (enhanced version of this dashboard with pandemic-induced focus on VPN-Server metrics)

Graylog gives a useful event-based overview on the network/firewall status:

to make that all happen the following components were needed:

rsyslog-configs with custom templates for agentless shipping of messages in gelf-format
some python for collecting vpn-server metrics via telegraf
grafana dashboard
a couple of lines of code in vyos-postconfig-bootup-script that puts everything into place on reboots

luratech · April 2, 2022, 12:34am

I am using version 1.3 so it’s not part of the CLI.

I’m using telegraf without any issues since vyos 1.2.x - debian package of your choice + config-files + scripts/custom checks (e.g. vpn server metrics) + a couple of commands in vyos-postconfig-bootup.script does the trick

panks21 · May 29, 2022, 7:06am

While searching for the solution for monitoring. I found this article blog

This approach uses node_exporter container running on vyos and exporting metrics

I feel this is an excellent way for monitoring. There is also a very good dashboard available.

Viacheslav · May 29, 2022, 12:18pm

You also can pull metrics from prometheus server

set service monitoring telegraf prometheus-client

Output:

vyos@r14# curl localhost:9273/metrics |  egrep -v "#" | head -n 15
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0conntrack_ip_conntrack_count{host="r14"} 12
conntrack_ip_conntrack_max{host="r14"} 262144
cpu_usage_guest{cpu="cpu-total",host="r14"} 0
cpu_usage_guest{cpu="cpu0",host="r14"} 0
cpu_usage_guest{cpu="cpu1",host="r14"} 0
cpu_usage_guest_nice{cpu="cpu-total",host="r14"} 0
cpu_usage_guest_nice{cpu="cpu0",host="r14"} 0
cpu_usage_guest_nice{cpu="cpu1",host="r14"} 0
cpu_usage_idle{cpu="cpu-total",host="r14"} 99.67886962107082
cpu_usage_idle{cpu="cpu0",host="r14"} 99.7430956968473
cpu_usage_idle{cpu="cpu1",host="r14"} 99.55070603335469
cpu_usage_iowait{cpu="cpu-total",host="r14"} 0
cpu_usage_iowait{cpu="cpu0",host="r14"} 0
cpu_usage_iowait{cpu="cpu1",host="r14"} 0.06418485237483097

Also present splunk and azure-data-explorer

set service monitoring telegraf splunk xxx
set service monitoring telegraf azure-data-explorer xxx

And of course native indluxdb exporter

panks21 · May 29, 2022, 2:03pm

Fantastic. Many options available in the hand

anowak · May 29, 2022, 10:31pm

Yeah there are heaps of options but as I found getting the data out of VyOS is the simple thing, though having a easy one stop shop for monitoring is another. Requirement of a data collector then a graphing tool, then the time having to spend tweaking it. Like @luratech mentioned, check_mk, telegraf/influxdb/grafana, graylog and security onion … that is a lot of tool just to get information needed

I spent sometime looking into it and had to put it on the side for now, as I just couldn’t get what I wanted. Need to get more familiar with graphing tools.

Thanks All!

luratech · June 7, 2022, 1:37pm

Finding an “easy one stop shop for monitoring” is as tricky as finding “an easy one stop option for transportation”. There are just too many different aspects with completely different requirements - service availability, performance, security… all require different data - just as you need completely different vehicles for transportation of people, computers or oil.