The first Advent calendar writes about building a monitoring tool called Prometheus that I recently touched Is your website monitored for life and death? "Can I build it myself?" "It's not a big system to contract with SaaS ..." "It's hard to start a server from scratch", etc. I think it is one of the functions with high hurdles. Was However, by combining Docker and Prometheus, it is surprisingly easy to create an external monitoring mechanism, so I will write it.

What is external (URL) monitoring?

What is external monitoring? Is the website or API server operating normally? Is a monitoring method to check by HTTP access Check "Is the response normal?" "Is the response returned within the specified number of seconds?" The good thing is that it is accessed in the same way as the user, so it is possible to check the normality/abnormality of the entire system or API including the network. I think that it is often used in combination with the method of monitoring server resources (CPU, memory, etc.) from the inside. I'm using an AWS service, but it's a simple diagram

スクリーンショット 2020-12-18 8.07.35.png

background

I want to monitor the external shape of a Web system
There is SaaS, but if the system is not large, I felt it was over-engineered (although it may be better to spend money on large scale)
I want to be able to build regardless of the environment (development terminal, cloud, etc.) so that flexible selection can be made.

What to do this time

Use the monitoring tool Prometheus to monitor the external shape
Build with Docker + Prometheus, which allows you to build an environment without depending on the OS
Start on the development terminal
Notify Slack when anomalies are detected

Do not explain

I won't write it here, so check it out ...

How to operate the terminal
About Docker and docker-compose

environment

This time I will do it on Mac, but since I am using Docker, I can boot on windows or linux

Mac ： 10.15.5
Docker ： 19.03.8
docker-compose ： 1.25.5

Docker image used

prom/prometheus ： 2.19.1
prom/blackbox-exporter ： 0.17.0
prom/alertmanager ： 0.21.0

Folder / file structure

━ docker-compose.yml
┣ prometheus
┃  ┗ alert_roles.yml
┃  ┗ prometheus.yml
┣ blackbox_exporter
┃  ┗ config.yml
┗ alertmanager
  ┗ config.yml

Create docker-compose.yml

prometheus: Image of the main body
blackbox_exporter: URL monitor exporter (external library-like)
alertmanager: exporter that notifies you when an alert occurs (this time used for slack notification)
Set each port number and config file

`docker-compose.yml`


version: '3'
services:
  prometheus:
    image: prom/prometheus
    container_name: prometheus
    volumes:
      - ./prometheus:/etc/prometheus
    command: "--config.file=/etc/prometheus/prometheus.yaml"
    ports:
      - 9090:9090
    restart: always
  blackbox_exporter:
    image: prom/blackbox-exporter:latest
    volumes:
      - ./blackbox_exporter/config.yml:/etc/blackbox_exporter/config.yml
  alertmanager:
    image: prom/alertmanager
    container_name: alertmanager
    volumes:
      - ./alertmanager:/etc/alertmanager
    command: "--config.file=/etc/alertmanager/config.yaml"
    ports:
      - 9093:9093
    restart: always

blackbox_exporter

I will describe the URL monitoring settings in yml format In modules:, create settings for each protocol you want to check (tcp, pop3, ssh are also provided by default). This time, http_post_2xx: is added as the setting of the POST method. HTTP headers can be set in headers: Please refer to Configuration of blackbox_exporter for the contents that can be set.

`blackbox_exporter/config.yml`


modules:
  http_2xx:
    prober: http
    http:
  http_post_2xx:
    prober: http
    http:
      method: POST
      headers:
        xxx: yyyy

prometheus

Describe the external monitoring settings on the prometheus main unit Evaluation contents are defined in `` alert_roles.yml''

`prometheus/prometheus.yml`


global:
  #Interval when prometheus goes to exporter etc. to get information(This time blackbox_exporter)
  scrape_interval:     15s 
  #Interval to evaluate rule(This time rule_alert specified in files_roles.yml is the evaluation content)
  evaluation_interval: 15s 
  external_labels:
      monitor: 'codelab-monitor'

rule_files:
  - /etc/prometheus/alert_roles.yml

alerting:
  alertmanagers:
    - scheme: http
      static_configs:
      - targets:
        - alertmanager:9093

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets:
        - prometheus:9090
  #Handle one monitoring condition in units called jobs
  - job_name: 'blackbox_http'
    metrics_path: /probe
    # blackbox_exporter config.Specify the module defined in yml
    params:
      module: [http_post_2xx]
    static_configs:
      #Specify the URL to be monitored
      - targets:
        - '[target_url]'
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: blackbox_exporter:9115

`alert_roles.yml`


#Alert definition
groups:
- name: blackbox_exporter
  rules:
  - alert: http_success
    #Metrics and conditions to evaluate
    # probe_Conditions for success"blackbox_http"Applying to job
    expr: probe_success{job='blackbox_http'} != 1
    #Duration until evaluation is NG(NG in the alert state for 10 seconds)
    for: 10s
    labels:
      severity: critical
    annotations:
      summary: "{{ $labels.instance }}: http request not return 200"
      description: "{{ $labels.instance }} http request not return 200 for more than 10 seconds."

alertmanager

alertmaneager will notify you This time I'm setting a notification to slack See Official for slack webhooks

`config.tml`


global:
  #Specify slack webhook URL
  slack_api_url: '[slack webhook url]'
  smtp_smarthost: 'localhost:25'
  smtp_require_tls: false
  smtp_from: 'Alertmanager'

route:
  receiver: 'test-route'
  #Grouping settings(Alert name)
  group_by: '[alertname]'
  #Time to first wait for alert group notifications to be sent
  group_wait: 10s
  #Minimum transmission interval in alert group(Time to wait for a new alert to be sent)
  group_interval: 5m
  #Time to wait before sending the notification again
  repeat_interval: 1h

receivers:
- name: 'test-route'
  #slack channel name
  slack_configs:
  - channel: '#general'
  # email(* Not displayed in slack)
  email_configs:
  - to: "[email protected]"

docker container start

Start docker container This time I will use docker-compose to raise multiple containers at once

`python`


$ docker-compose build
# "-d"Run in the background with an option
#Without options the container will stop when you close the terminal
$ docker-compose up -d

#Confirm container startup
$ docker ps
CONTAINER ID        IMAGE                           COMMAND                  CREATED             STATUS              PORTS                    NAMES
ab4455256cff        prom/prometheus                 "/bin/prometheus --c…"   5 days ago          Up 5 days           0.0.0.0:9090->9090/tcp   prometheus
830bf6475888        prom/alertmanager               "/bin/alertmanager -…"   5 days ago          Up 5 days           0.0.0.0:9093->9093/tcp   alertmanager
b6fcd4f26f57        prom/blackbox-exporter:latest   "/bin/blackbox_expor…"   5 days ago          Up 5 days           9115/tcp                 prometheus_sample_blackbox_exporter_1

Check Prometheus dashboard

Go to http: // localhost: 9090 In actual operation, it supports https, port forwarding, etc., but this time it is a development terminal, so leave it as it is Enter the evaluation metrics specified in alert_roles.yml and click execute to display the evaluation results in a graph. This time the http request evaluates success (200 OK) and if it is 1.0 it will be normal

スクリーンショット 2020-12-18 20.36.04.png

You can check the alert status by selecting Alert from the menu. The situation is color coded so you can see at a glance which rating is in error.

スクリーンショット 2020-12-18 20.53.45.png

Check slack notifications

Change the URL to be monitored to a URL that does not exist, and check the Alert so that the http response returns the 500 series. When the http request returns an error, the status of the Alert menu turns red.

スクリーンショット 2020-12-18 21.03.11.png

And slack will be notified that an error has occurred

スクリーンショット 2020-12-18 21.10.26.png

This time it was a chat service called Slack, but if there is a request from people who do not have a habit of responding to chat notifications, "It is hard to notice if you notify Slack! It is also possible to call the phone when Alert occurs I think it is important not to notify, but to design in consideration of the method and possibility that humans can recognize.

bonus

I think Prometheus looks honestly simple ... (it has all the necessary features, so it's enough), For those who want to make a dashboard with a cool design, I think it will be cool if it is linked with Grafana. Since there is a Docker image like Prometheus, Grafana will start by adding it to docker-compose.yml If you are interested, why don't you give it a try?

スクリーンショット 2020-12-18 21.23.08.png (The image is an official demo)

For those who say, "It's pretty hard to build a monitor ..." How about building a monitoring function with Docker + Prometheus, which can be easily built and code-managed settings, to celebrate Christmas?

Create external (URL) monitoring with Docker + Prometheus