I always used Elasticsearch as a visualization tool, but if you want to record numerical data such as so-called server metrics data, try influxDB as well.
influxDB is classified as software called Time series database (Time series database). As the name suggests, a time-series DB is a DB that has a mechanism for storing data that changes over time. [There is an article page in English Wikipedia](https://en.wikipedia.org/ wiki / Time_series_database) says that RRDTool is also a time series DB.
Considering the comparison with RRDTool, the features of influxDB are as follows.
Grafana is a dashboard tool forked from Kibana, but it is not specialized for influxDB and can be linked with Graphite and CloudWatch. This also has an authentication mechanism by default.
Since it's quick, I launched the official Docker image with Docker Compose.
** 2016-12-22 Addendum: ** Added Data Volume description to Grafana container settings. Because it was necessary to save the sqlite directory in order to persist Grafana's dashboard, user settings, etc. It is possible to save the dashboard settings etc. in an external DB such as MySQL, but for the sake of simplification, I decided to use sqlite prepared by default this time. (Reference: Grafana --Configuration)
docker-compose.yml
version: "2"
services:
influxdb:
image: influxdb
ports:
- "8083:8083"
- "8086:8086"
volumes:
- /tmp/influxdb:/var/lib/influxdb
grafana:
image: grafana/grafana
ports:
- "3000:3000"
volumes:
- /tmp/grafana:/var/lib/grafana
Of the two ports open in influxDB, 8083
is the Web UI and 8086
is the REST API endpoint. Also, since the data is saved in / var / lib / influxdb
, it is synchronized with the host OS as a data volume.
Grafana just opens a port for the web UI. In the case of Kibana, the connection with Elasticsearch was set in kibana.yml
, but since Grafana sets the connection with the DB on the Web UI, it is not necessary to consider the configuration file etc. at this point.
Access http: // (Docker Host IP): 3000
from a web browser, select" Data Sources "from the icon on the upper left, and select http: // (Docker Host IP): from "Add Data Source". Connect to 8086
.
The initial settings for the authentication user ID and password are root: root for influxDB and admin: admin for Grafana.
I used Python this time, but before that, I will talk about the structure of influxDB.
Database
: The largest data unit, similar to a database in an RDBMS. Database
needs to be created before data input.Measurement
: The position of the table in the RDBMS.Retention Policy
: A data retention policy consisting of DURATION
, which defines the retention period of data, and REPLICATION
, which defines how many copies of data are retained in the influxDB cluster. Basically, it is used in the form of being linked to Database
.I haven't used it properly because I have tried it yet, but it would be nice to have a Retention Policy
. It is possible to store unlimited data to prevent the event of overflowing capacity and to ensure availability.
With Elasticsearch, if you insert json appropriately, you can already use it, so you could do Zubora operation like throwing json obtained from API as it is, but in the case of influxDB, the data structure to be included in json is decided.
measurement
: Specify which measurement
to import to.fields
: Specify the so-called data itself in the form of key value. The key is limited to string type, but the value supports float, integer, boolean as well as string. It is also possible to include multiple field
s.tags
: Specify optional data. For example, if you are entering server metrics data, the server name is tags
. You can include more than one here, but note that only string types are supported.timestamp
: Timestamp of the data. The key to time-series data. If not specified, the time when the input process was performed will be applied automatically. InfluxDB only supports * UTC. *There seems to be some arbitrary use of fields
and tags
, but the big difference is that fields
is not indexed, but tags
is indexed. So you might use the GROUP BY
clause when querying, but we're using the tags
here. The metadata used for searching etc. should be tags
, and the data that changes over time should be suppressed as fields
. I usually don't query for changing data, such as when the CPU usage is 80%, so it's understandable that fields
isn't indexed.
I will briefly show the source.
from influxdb import InfluxDBClient
client = InfluxDBClient('127.0.0.1', 8086, 'root', 'root', 'sample')
#Judge the existence of database and create a new one before creation
dbs = client.get_list_database()
sample_db = {'name' : 'sample'}
if sample_db not in dbs:
client.create_database('sample')
#Create json data to import
import_array = [
{
"fields" : {
"cpu" : 50.0,
"mem" : 20.0,
},
"tags" : {
"category" : "fuga",
"machine" : "web02"
},
"measurement" : "metrics"
}
]
#Data input
client.write_points(import_array)
it's simple. ʻThe password is written directly in InfluxDB Client, but hard coding should be stopped during this operation. As for the data json, as I wrote earlier, I'm trying to put multiple
fields and
tags`. Now you can draw a graph for each machine.
Since it is an operation with GUI, I will omit the details, but I will visualize it by arranging parts called Panel on Dashboard like Kibana. In addition to Graph, there are also Panels such as Single stat that can display only the latest value in large numbers, Text that can write sentences with Markdown, etc., and the degree of freedom is high.
Also, when specifying the data to be displayed in Graph etc., it is set by SQL statement, so while it is easy to understand, it is necessary to hold down how to use SQL statement in influxDB. However, since you can create SQL by clicking candidates on the GUI and combining them, even a slightly vague understanding can be quite good. It's very easy.
When I imagined the usability of Elasticsearch + Kibana, I was confused by the fact that the json key that needs to be set when inputting was defined and the selfishness was different (rather, if it is json, I can search anything moderately. The ES that can be created is God), and as for the usability, it is "easy to visualize the data" and it is easy to see. Will it compete with Elasticsearch at first? I thought, but now I understand that they are tools with distinctly different defensive ranges.
fields
, but it is not indexed and is not suitable for search purposes.For example, when operating MySQL, Elasticsearch is suitable if you want to collect slow query logs for a long period of time and analyze it, and Elasticsearch is also possible if you want to monitor the traffic volume, but as a segregation, influxDB It seems to be.
Recommended Posts