In This Plugin, the CloudWatch API is executed on the blackbird side to get various Metrics of Elastic Load Balancing (hereinafter ELB).
You can get the following CloudWatch Metrics. From left to right, CloudWatch Metric name (see see home for the meaning of Metric itself), which statistic to take (Such as the total or average per unit time (strictly speaking, per acquisition interval)), a little explanation.
CloudWatch Metric Name | Statistics | Detail |
---|---|---|
HTTPCode_Backend_2XX | Sum | Total of 2XX series status of backend server within acquisition interval |
HTTPCode_Backend_3XX | Sum | Total 3XX status of backend server within acquisition interval |
HTTPCode_Backend_4XX | Sum | Total 4XX status of backend server within acquisition interval |
HTTPCode_ELB_4XX | Sum | Total 5XX status of ELB servers within the acquisition interval |
HTTPCode_ELB_5XX | Sum | Total 5XX status of ELB servers within the acquisition interval |
Latency (Average) | Average | Average response time within the acquisition interval |
Latency (Maximum) | Maximum | Maximum response time within the acquisition interval |
Latency (Minimum) | Minimum | Minimum response time within the acquisition interval |
RequestCount | Sum | Number of requests within the capture interval |
SpilloverCount | Sum | Number of overflows from ELB's internal request queue within the acquisition interval |
BackendConnectionErrors | Sum | The number that could not be connected to the backend server within the acquisition interval |
HealthyHostCount | Maximum | Number of backend servers in the acquisition interval that have successfully performed Health Check |
UnhealthyHostCount | Maximum | Number of backend servers in the acquisition interval that have failed Health Check |
About SpilloverCount
Spillover Count, as the name implies, means an overflowing number. The ELB internally stores Requests in a queue (called Surge Queue), and if the backend is too busy to handle, the request is temporarily stored in that queue. And when the queue overflows, SpilloverCount becomes 1 or more. (I think it's bad that I didn't get SurgeQueueLength while writing ... I'll get it soon.)
So, when the SurgeQueueLength increases, the backend server is not able to handle the trafic in the first place, which is quite bad, but if you do your best, you may be able to return it before the queue overflows, in that state. If the Spillover Count increases from the beginning, it probably means that it is hardly functioning as a service.
About BackendConnectionErrors
I think that it is similar in that it does not function as a service when it increases, but it performs epoll (such as nginx) or similar I / O and Network processing asynchronously. If the one is a backend server, the new connection itself can be established, but the process may not catch up in the first place. In such a case, BackendConnectionErrors does not appear, but it seems that SpilloverCount is increasing.
From here, we will talk about Zabbix template.
Triggers
It ’s just a rough function,
It will throw an alert like this. The place where each is constant is set to Macro of Template, so please edit it according to the characteristics of Traffic of the service. Also, since each trigger has three levels of Serbility, it is possible to do things like chat for Info, email for Average, and mobile phone for High.
Graphs
RequestCount
Omitted because it is an ordinary bar graph
Ry for ordinary bar charts
Case of Using pip
pip install blackbird-elb
Since it is registered on PyPi, please put it in quickly with pip.
Case of Using yum
First, let's create a repo file. (Of course, in the end, I'd like to do something like http://blackbird.example.com/blackbird_install.sh | sh
, but please wait a little longer.)
[blackbird]
name=blackbird package repository
baseurl=https://vagrants.github.io/blackbird/repo/yum/6/x86_64
enabled=0
gpgcheck=0
yum install blackbird --enablerepo=blackbird
Configure your blackbird
#Any section name in the ini file format is okay
#However, since a thread is created with this name internally, it is safer not to wear it elsewhere.
[ANYTHING_OK]
#The acquisition interval. You can also set 1 second, but since CloudWatch does not support it, the minimum value is 60 seconds.
interval = 300
#The region name. default is us-east-1。
region_name = ap-northeast-1
#AWS credential
#I'd like to authenticate with sts from IAM Role of EC2 Instance, so please wait for a while. I also don't want to write credentials here
aws_access_key_id = ACCESS_KEY_ID
aws_secret_access_key = SECRET_ACCESS_KEY
#ELB name
load_balancer_name = YOUR_ELB_NAME
#ELB's CloudWatch Metrics can be valued for each AZ, so availability_specify zone(I'm also happy because I know which AZ it depends on.)
availability_zone = ap-northeast-1a
# Zabbix Web(It's the Zabbix screen that you usually use)Specify the Host above. I don't know the destination. .. .. So don't forget to create Host on Zabbix side first.
hostname = YOUR_ELB_NAME_ON_ZABBIX
#module specifies which plugin to use, but here is fixed to elb
module = elb
Therefore, under ʻinclude_dir (blackbird can do something like
/etc/blackbird/conf.d/*.conf` called Nginx), this configuration file was set to defaults.cfg of blackbird itself. Please put it or add it under defaults.cfg.
Case of Install by pip
blackbird --config YOUR_DEFAULT_CONFIG
Case of Install by yum
sudo service blackbird start
With this, the value will be entered when the time specified for interval elapses! must.
Recommended Posts