background

I always used Elasticsearch as a visualization tool, but if you want to record numerical data such as so-called server metrics data, try influxDB as well.

Overview of influxDB + Grafana

influxDB

influxDB is classified as software called Time series database (Time series database). As the name suggests, a time-series DB is a DB that has a mechanism for storing data that changes over time. [There is an article page in English Wikipedia](https://en.wikipedia.org/ wiki / Time_series_database) says that RRDTool is also a time series DB.

Considering the comparison with RRDTool, the features of influxDB are as follows.

Data registration is possible with json through REST API.
Equipped with an authentication mechanism by default.
Queries can use SQL-based grammar.
It has a Web UI, and you can check the query results interactively with the GUI.

Grafana

Grafana is a dashboard tool forked from Kibana, but it is not specialized for influxDB and can be linked with Graphite and CloudWatch. This also has an authentication mechanism by default.

procedure

Start influxDB and Grafana

Since it's quick, I launched the official Docker image with Docker Compose.

** 2016-12-22 Addendum: ** Added Data Volume description to Grafana container settings. Because it was necessary to save the sqlite directory in order to persist Grafana's dashboard, user settings, etc. It is possible to save the dashboard settings etc. in an external DB such as MySQL, but for the sake of simplification, I decided to use sqlite prepared by default this time. (Reference: Grafana --Configuration)

`docker-compose.yml`


version: "2"

services:
  influxdb:
    image: influxdb
    ports:
      - "8083:8083"
      - "8086:8086"
    volumes:
      - /tmp/influxdb:/var/lib/influxdb

  grafana:
    image: grafana/grafana
    ports:
      - "3000:3000"
    volumes:
      - /tmp/grafana:/var/lib/grafana

Of the two ports open in influxDB, 8083 is the Web UI and 8086 is the REST API endpoint. Also, since the data is saved in / var / lib / influxdb, it is synchronized with the host OS as a data volume.

Grafana just opens a port for the web UI. In the case of Kibana, the connection with Elasticsearch was set in kibana.yml, but since Grafana sets the connection with the DB on the Web UI, it is not necessary to consider the configuration file etc. at this point.

connection between influxDB and Grafana

Access http: // (Docker Host IP): 3000 from a web browser, select" Data Sources "from the icon on the upper left, and select http: // (Docker Host IP): from "Add Data Source". Connect to 8086.

Grafana Add Data Source

The initial settings for the authentication user ID and password are root: root for influxDB and admin: admin for Grafana.

Data input to influxDB

I used Python this time, but before that, I will talk about the structure of influxDB.

structure of influxDB

Database: The largest data unit, similar to a database in an RDBMS. Database needs to be created before data input.
Measurement: The position of the table in the RDBMS.
Retention Policy: A data retention policy consisting of DURATION, which defines the retention period of data, and REPLICATION, which defines how many copies of data are retained in the influxDB cluster. Basically, it is used in the form of being linked to Database.

I haven't used it properly because I have tried it yet, but it would be nice to have a Retention Policy. It is possible to store unlimited data to prevent the event of overflowing capacity and to ensure availability.

influxDB data structure

With Elasticsearch, if you insert json appropriately, you can already use it, so you could do Zubora operation like throwing json obtained from API as it is, but in the case of influxDB, the data structure to be included in json is decided.

measurement: Specify which measurement to import to.
fields: Specify the so-called data itself in the form of key value. The key is limited to string type, but the value supports float, integer, boolean as well as string. It is also possible to include multiple fields.
tags: Specify optional data. For example, if you are entering server metrics data, the server name is tags. You can include more than one here, but note that only string types are supported.
timestamp: Timestamp of the data. The key to time-series data. If not specified, the time when the input process was performed will be applied automatically. InfluxDB only supports * UTC. *

There seems to be some arbitrary use of fields and tags, but the big difference is that fields is not indexed, but tags is indexed. So you might use the GROUP BY clause when querying, but we're using the tags here. The metadata used for searching etc. should be tags, and the data that changes over time should be suppressed as fields. I usually don't query for changing data, such as when the CPU usage is 80%, so it's understandable that fields isn't indexed.

Data input processing

I will briefly show the source.

from influxdb import InfluxDBClient

client = InfluxDBClient('127.0.0.1', 8086, 'root', 'root', 'sample')

#Judge the existence of database and create a new one before creation
dbs = client.get_list_database()
sample_db = {'name' : 'sample'}
if sample_db not in dbs:
    client.create_database('sample')

#Create json data to import
import_array = [
{
  "fields" : {
    "cpu" : 50.0,
    "mem" : 20.0,
  },
  "tags" : {
    "category" : "fuga",
    "machine" : "web02"
  },
  "measurement" : "metrics"
}
]

#Data input
client.write_points(import_array)

it's simple. ʻThe password is written directly in InfluxDB Client, but hard coding should be stopped during this operation. As for the data json, as I wrote earlier, I'm trying to put multiple fields and tags`. Now you can draw a graph for each machine.

Visualization with Grafana

Since it is an operation with GUI, I will omit the details, but I will visualize it by arranging parts called Panel on Dashboard like Kibana. In addition to Graph, there are also Panels such as Single stat that can display only the latest value in large numbers, Text that can write sentences with Markdown, etc., and the degree of freedom is high.

Also, when specifying the data to be displayed in Graph etc., it is set by SQL statement, so while it is easy to understand, it is necessary to hold down how to use SQL statement in influxDB. However, since you can create SQL by clicking candidates on the GUI and combining them, even a slightly vague understanding can be quite good. It's very easy.

Usability, compared to ES + Kibana

When I imagined the usability of Elasticsearch + Kibana, I was confused by the fact that the json key that needs to be set when inputting was defined and the selfishness was different (rather, if it is json, I can search anything moderately. The ES that can be created is God), and as for the usability, it is "easy to visualize the data" and it is easy to see. Will it compete with Elasticsearch at first? I thought, but now I understand that they are tools with distinctly different defensive ranges.

influxDB: Literally time-series data, specialized in visualizing changes in values that "change" over time. You can enter a string in fields, but it is not indexed and is not suitable for search purposes.
Elasticsearch: I forgot because visualization and graph drawing in Kibana are convenient, but this is a "full-text search engine". Click here if you want to search from a large amount of input character string data.

For example, when operating MySQL, Elasticsearch is suitable if you want to collect slow query logs for a long period of time and analyze it, and Elasticsearch is also possible if you want to monitor the traffic volume, but as a segregation, influxDB It seems to be.

Get started with influxDB + Grafana