The first Advent calendar writes about building a monitoring tool called Prometheus that I recently touched Is your website monitored for life and death? "Can I build it myself?" "It's not a big system to contract with SaaS ..." "It's hard to start a server from scratch", etc. I think it is one of the functions with high hurdles. Was However, by combining Docker and Prometheus, it is surprisingly easy to create an external monitoring mechanism, so I will write it.
What is external monitoring? Is the website or API server operating normally? Is a monitoring method to check by HTTP access Check "Is the response normal?" "Is the response returned within the specified number of seconds?" The good thing is that it is accessed in the same way as the user, so it is possible to check the normality/abnormality of the entire system or API including the network. I think that it is often used in combination with the method of monitoring server resources (CPU, memory, etc.) from the inside. I'm using an AWS service, but it's a simple diagram
I won't write it here, so check it out ...
This time I will do it on Mac, but since I am using Docker, I can boot on windows or linux
━ docker-compose.yml
┣ prometheus
┃ ┗ alert_roles.yml
┃ ┗ prometheus.yml
┣ blackbox_exporter
┃ ┗ config.yml
┗ alertmanager
┗ config.yml
docker-compose.yml
version: '3'
services:
prometheus:
image: prom/prometheus
container_name: prometheus
volumes:
- ./prometheus:/etc/prometheus
command: "--config.file=/etc/prometheus/prometheus.yaml"
ports:
- 9090:9090
restart: always
blackbox_exporter:
image: prom/blackbox-exporter:latest
volumes:
- ./blackbox_exporter/config.yml:/etc/blackbox_exporter/config.yml
alertmanager:
image: prom/alertmanager
container_name: alertmanager
volumes:
- ./alertmanager:/etc/alertmanager
command: "--config.file=/etc/alertmanager/config.yaml"
ports:
- 9093:9093
restart: always
blackbox_exporter
I will describe the URL monitoring settings in yml format
In modules:
, create settings for each protocol you want to check (tcp, pop3, ssh are also provided by default).
This time, http_post_2xx:
is added as the setting of the POST method.
HTTP headers can be set in headers:
Please refer to Configuration of blackbox_exporter for the contents that can be set.
blackbox_exporter/config.yml
modules:
http_2xx:
prober: http
http:
http_post_2xx:
prober: http
http:
method: POST
headers:
xxx: yyyy
prometheus
Describe the external monitoring settings on the prometheus main unit Evaluation contents are defined in `` alert_roles.yml''
prometheus/prometheus.yml
global:
#Interval when prometheus goes to exporter etc. to get information(This time blackbox_exporter)
scrape_interval: 15s
#Interval to evaluate rule(This time rule_alert specified in files_roles.yml is the evaluation content)
evaluation_interval: 15s
external_labels:
monitor: 'codelab-monitor'
rule_files:
- /etc/prometheus/alert_roles.yml
alerting:
alertmanagers:
- scheme: http
static_configs:
- targets:
- alertmanager:9093
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets:
- prometheus:9090
#Handle one monitoring condition in units called jobs
- job_name: 'blackbox_http'
metrics_path: /probe
# blackbox_exporter config.Specify the module defined in yml
params:
module: [http_post_2xx]
static_configs:
#Specify the URL to be monitored
- targets:
- '[target_url]'
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: blackbox_exporter:9115
alert_roles.yml
#Alert definition
groups:
- name: blackbox_exporter
rules:
- alert: http_success
#Metrics and conditions to evaluate
# probe_Conditions for success"blackbox_http"Applying to job
expr: probe_success{job='blackbox_http'} != 1
#Duration until evaluation is NG(NG in the alert state for 10 seconds)
for: 10s
labels:
severity: critical
annotations:
summary: "{{ $labels.instance }}: http request not return 200"
description: "{{ $labels.instance }} http request not return 200 for more than 10 seconds."
alertmanager
alertmaneager will notify you This time I'm setting a notification to slack See Official for slack webhooks
config.tml
global:
#Specify slack webhook URL
slack_api_url: '[slack webhook url]'
smtp_smarthost: 'localhost:25'
smtp_require_tls: false
smtp_from: 'Alertmanager'
route:
receiver: 'test-route'
#Grouping settings(Alert name)
group_by: '[alertname]'
#Time to first wait for alert group notifications to be sent
group_wait: 10s
#Minimum transmission interval in alert group(Time to wait for a new alert to be sent)
group_interval: 5m
#Time to wait before sending the notification again
repeat_interval: 1h
receivers:
- name: 'test-route'
#slack channel name
slack_configs:
- channel: '#general'
# email(* Not displayed in slack)
email_configs:
- to: "[email protected]"
Start docker container
This time I will use docker-compose
to raise multiple containers at once
python
$ docker-compose build
# "-d"Run in the background with an option
#Without options the container will stop when you close the terminal
$ docker-compose up -d
#Confirm container startup
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
ab4455256cff prom/prometheus "/bin/prometheus --c…" 5 days ago Up 5 days 0.0.0.0:9090->9090/tcp prometheus
830bf6475888 prom/alertmanager "/bin/alertmanager -…" 5 days ago Up 5 days 0.0.0.0:9093->9093/tcp alertmanager
b6fcd4f26f57 prom/blackbox-exporter:latest "/bin/blackbox_expor…" 5 days ago Up 5 days 9115/tcp prometheus_sample_blackbox_exporter_1
Go to http: // localhost: 9090
In actual operation, it supports https, port forwarding, etc., but this time it is a development terminal, so leave it as it is
Enter the evaluation metrics specified in alert_roles.yml
and click execute
to display the evaluation results in a graph.
This time the http request evaluates success (200 OK) and if it is 1.0 it will be normal
You can check the alert status by selecting Alert from the menu. The situation is color coded so you can see at a glance which rating is in error.
Change the URL to be monitored to a URL that does not exist, and check the Alert so that the http response returns the 500 series. When the http request returns an error, the status of the Alert menu turns red.
And slack will be notified that an error has occurred
This time it was a chat service called Slack, but if there is a request from people who do not have a habit of responding to chat notifications, "It is hard to notice if you notify Slack! It is also possible to call the phone when Alert occurs I think it is important not to notify, but to design in consideration of the method and possibility that humans can recognize.
I think Prometheus looks honestly simple ... (it has all the necessary features, so it's enough), For those who want to make a dashboard with a cool design, I think it will be cool if it is linked with Grafana. Since there is a Docker image like Prometheus, Grafana will start by adding it to docker-compose.yml If you are interested, why don't you give it a try?
(The image is an official demo)
For those who say, "It's pretty hard to build a monitor ..." How about building a monitoring function with Docker + Prometheus, which can be easily built and code-managed settings, to celebrate Christmas?
Recommended Posts