What time-series monitoring solutions do you use??

Aaron
Aaron Member, Administrator, Moderator, Employee Posts: 665 admin

Hey team,

Metrics! Historical metrics!! So useful for post-mortem analysis of incidents.


(screenshot of a Grafana dashboard)

I've worked with quite a few different technologies over the years, but was wondering what other people's experience and preferences are for this. Let me know, I'm really interested!!

Here's some of mine:

  • RRD: long time ago, heavily used in the network monitoring space
  • InfluxDB: awesome time-series DB, supports completely dynamic intervals (seconds, milliseconds, micros, nanos), and also push-based data insertion
  • KDB (KX): traditionally used in finance, there is now a Solace official plugin for this
  • Prometheus: I haven't played with this much, but it is SUPER popular. I have some issues with it, especially that it is poll-based (i.e. tightly coupling the data gathering & storage), and ability to store only one metric at time
  • Splunk: some people use this for time-series, but is it the best choice??

Others?? What are your opinions?

Comments

  • nram
    nram Member, Employee Posts: 80 Solace Employee
    edited April 2021 #2

    Good discussion topic @Aaron ... I have also heard Graphite in this context, esp customer who already have tighter pairing with Grafana.

    To your observation on Prometheus being poll-based, Yes, that infact is the natural mode for scraping metrics. However, its possible to push metrics from ephemeral and short lived tasks to Prometheus using push gateway. Putting aside the complexity of management and any performance overheads, this provides a practical possibility for push based solution with Prometheus.

  • marc
    marc Member, Administrator, Moderator, Employee Posts: 972 admin
    edited April 2021 #3

    Hey @Aaron, I've used the Elastic stack on a few projects in the past. It's been a few years since I've used it now but at the time it was a great getting started experience, easy to ingest the events using Logstash/Beats (push) and easy to visualize aggregations in Kibana. The shortcoming was really going beyond visualizing aggregations in Kibana...I don't really recall specifics now but I remember it wasn't able to create some of the visuals that we desired.

    Hopefully I'll get a chance to checkout Elastic cloud and their latest enhancements sometime soon.

    @arih has been doing some work with Elastic lately too.

  • Aaron
    Aaron Member, Administrator, Moderator, Employee Posts: 665 admin

    Hey, I thought I wrote back to this!! Weird.

    Yes, @nram : Graphite, of course. What a classic. I've heard it more tightly integrates with Grafana, but I have yet to play with it. Pairing Grafana with InfluxDB seems to satisfy most of my needs. I wonder what "extra special stuff" you could get / expose. Someone deeply knowledgeable on all 3 would have to answer that.

    @marc : I've heard of people using Elastic for time-series metric data. Not its primary purpose though, but it seems it can do it. Found this article that was looking at specifically that use case: https://medium.com/kudos-engineering/choosing-the-elastic-stack-as-a-time-series-database-9fac202c53ba. As you said, Kibana seems to show/visualize aggregate data, primary logging data "how many GETs per minute per host" or "how many WARNs" per minute, I don't think (aka I don't know how) to make Kibana display time-series metrics.

    Some of the big things I really like about InfluxDB:

    • Natively supports "pushing" into the database (like a normal database), accepts data whenever it arrives. Leaves the "polling" of the source data to you. Or to Telegraf or whatever.
    • No hard / constrained / required intervals defined for time-series. I think most of the other ones I listed you have to specify what your interval is that you want the data stored at (e.g. every 10 seconds). Influx, in the "raw" data (e.g. before the "continuous query" downsampling") allows you to push data in at ANY interval you want, any rate, or with any gap between successive inserts. Super useful for non-regular-interval data
    • Supports Strings and Booleans!! Not just numeric. How great it that? I can push all that data in there (e.g. operational state, redundancy state, status messages, etc.) and it will track when things change.
    • It's schema-less! Just start pushing data in, and Influx will figure it out. No need to predefine measurements (aka tables) or keys or anything. Note that you can add an i to integer numeric data to give Influx a hint that it's not a float.

    One disadvantage (and I don't know if other time-series DBs can do this?) is that it can't store nested objects... you have to "flatten" any nested entries.