Is it safe to reconfigure telegraf?

monotux · December 28, 2020, 12:32pm

Hi,

So I’ve noticed that there’s a Telegraf instance running on my recently updated VyOS (1.3) machine.

As I’m currently using telegraf/influxdb/grafana I’d love use Telegraf even with VyOS.

Is it ‘safe’? Or will my configuration file changes be overwritten on next update? What is the purpose of this telegraf instance?

I can probably just write another role for ansible to copy my configuration to my host if it’s overwritten, but I’m curious why it’s installed and how it’s intended to be used.

dmbaturin · December 28, 2020, 12:38pm

Telegraf is, so far, a purely experimental addition. It shouldn’t be running by default even—thanks for the find!
It was added by request from a specific user who wanted to test it. So far there’s no automatic config management for it, so it’s indeed safe to configure by hand. The config will survive reboots, though not image upgrades.

We are planning to make a CLI for it, but we aren’t active Telegraf users, so we’d like to hear from its fans what options they want to see there. If you have CLI design ideas, please share!

monotux · December 28, 2020, 12:44pm

Ah, cool!

I’m running 1.3-rolling-202012271303 but I haven’t tried to narrow down which build Telegraf was enabled in - but I did notice that the iso files increased in size from one day to the next very recently.

I’m just using Telegraf as I’ve been to lazy to understand enough about SNMP (so I push information from my current router/firewall to my influxdb)

As the default config is massive (2k+ lines) I guess designing a CLI might be tricky.

maznu · December 30, 2020, 7:08am

I also stumbled across telegraf running in 1.3-rolling-202012271303 and I must say, I’m rather pleased to find it.

The main things that I would find useful to get it usable would be the ability to set the port it listens on (currently telegraf isn’t listening on anything). For example, if I were able to set just the [[outputs.prometheus_client]] section (or create /etc/telegraf.d/prometheus_client.conf with a barebones setup a bit like):

[[outputs.prometheus_client]]
  listen = ":9273"
  metric_version = 2
  # ip_range = []
  path = "/metrics"
  expiration_interval = "60s"
  # collectors_exclude = ["gocollector", "process"]
  # string_as_label = true
  export_timestamp = true

…then I wouldn’t need to SNMP-poll my VyOS BGP edge routers, and could have much better telemetry about the devices.

I’ve got a bunch of other config templates for telegraf which we deploy to VM hosts and VPSs via SaltStack — the common configs we add to /etc/telegraf.d are:

# Gather metrics about network interfaces
[[inputs.net]]
  ## By default, telegraf gathers stats from any up interface (excluding loopback)
  ## Setting interfaces will tell it to gather these explicit interfaces,
  ## regardless of status. When specifying an interface, glob-style
  ## patterns are also supported.
  ##
  # interfaces = ["eth*", "enp0s[0-1]", "lo"]
  ##
  ## On linux systems telegraf also collects protocol stats.
  ## Setting ignore_protocol_stats to true will skip reporting of protocol metrics.
  ##
  # ignore_protocol_stats = false
  ##

[[inputs.cpu]]
  ## Whether to report per-cpu stats or not
  percpu = true
  ## Whether to report total system cpu stats or not
  totalcpu = true
  ## If true, collect raw CPU time metrics.
  collect_cpu_time = true
  ## If true, compute and report the sum of all non-idle CPU states.
  report_active = false

[[inputs.linux_sysctl_fs]]
  # no configuration, but useful for monitoring numbers of routes, conntrack entries, etc

# Read metrics about system load & uptime
[[inputs.system]]
  # no configuration

[[inputs.disk]]
  ## By default stats will be gathered for all mount points.
  ## Set mount_points will restrict the stats to only the specified mount points.
  # mount_points = ["/"]

  ## Ignore mount points by filesystem type.
  ignore_fs = ["tmpfs", "devtmpfs", "devfs", "iso9660", "overlay", "aufs", "squashfs"]

# Get kernel statistics from /proc/stat
[[inputs.kernel]]
  # no configuration

[[inputs.temp]]
  # no configuration

# Read metrics about memory usage
[[inputs.mem]]
  # no configuration