Monitoring PostgreSQL + php-fpm + nginx + disk using Zabbix

a Lot of information in the network according to Zabbix, and a lot of bespoke templates, I want to introduce to the audience his modification.
Zabbix is a very convenient and flexible monitoring tool. Want a hundred monitor, I want a thousand stations, and do not want to — watch a single server, take the cream in all sections. I would not mind to give to github, if anyone collects similar.

image

It so happened that we decided to put on the hosting database wrapper php-fpm+nginx. As the database is postgres. Thoughts to collect data on the machine was before the purchase of hosting is needed, this is useful! Magic kick up the backside to the implementation of the system gave the brakes a hard drive on our VDS stations at the beginning of the script every minute, put the time and measured by the speed in the file, and then build graphs in Excel to compare to as it was/became, take quantitative statistics. And this is just one option! And suddenly the fault is not VDS, and our applications that run on it. In General, it is necessary to monitor many monitor have comfortable!

I will not dwell on how to install the server, lots of options and documentation on this issue fully. I used the official
https://www.zabbix.com/documentation/ru/2.2/manual/installation/install_from_packages
The operating system is CentOS 6.5
The files that you need archive habr-zabbix-mons.zip

The station agent the zabbix-agent be sure to put the zabbix-sender:
the
yum install -y http://repo.zabbix.com/zabbix/2.2/rhel/6/x86_64/zabbix-2.2.4-1.el6.x86_64.rpm
yum install -y http://repo.zabbix.com/zabbix/2.2/rhel/6/x86_64/zabbix-agent-2.2.4-1.el6.x86_64.rpm
yum install -y http://repo.zabbix.com/zabbix/2.2/rhel/6/x86_64/zabbix-sender-2.2.4-1.el6.x86_64.rpm

the
[root@fliber ~]# vi /etc/zabbix/zabbix_agentd.conf
LogFileSize=1
Hostname=Ваш_агент_addr
Server=123.45.67.89
ServerActive=123.45.67.89
Instead of "123.45.67.89" is the IP of the machine where you have installed zabbix-server.
Instead of "Ваш_агент_addr" put the name/ip of the machine, as added agent on the server, the field "Host name".

the
chkconfig zabbix-agent --level 345 on
service zabbix-agent start


the

Monitoring the speed of the hard disk


I use the program hdparm. You can use another if you have a preference:
the
yum install hdparm

Choose the section that will monitor:
the
[root@fliber ~]# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/vda1 219608668 114505872 104106808 78% /
11489640 11489640 0 tmpfs 0% /dev/shm

Added to /etc/zabbix/zabbix_agentd.d/user.conf
the
UserParameter=hdparm.rspeed,sudo /sbin/hdparm -t /dev/vda1 | awk 'BEGIN{s=0} /MB\/sec/ {s=$11} /kB\/sec/ {s=$11/1024} END{print s}'

Allow sudo to run without a console (disable requiretty) and add the command that we run on behalf of the user zabbix:
the
[root@fliber ~]# visudo
#Defaults requiretty
zabbix ALL=(ALL) NOPASSWD: /sbin/hdparm -t /dev/vda1

Prolonged time in the query parameter in the config of the agent, because hdparm spends 3-10 or more seconds per measurement, depending on the jumps at speed, apparently.
the
[root@fliber ~]# vi /etc/zabbix/zabbix_agentd.conf
Timeout=30
service zabbix-agent restart

Also on the server it is necessary to correct the response time of the agent
the
[root@pentagon ~]# vi /usr/local/etc/zabbix_server.conf
Timeout=30

Go to the Web Admin and added to Zabbix configuration host (Configuration->Hosts->Wasserwerke->Items) or to any pattern, for example OS Linux (Configuration->Templates->Template OS Linux > Items) new parameter — click on “Create Item”:
the
Name: Hdparm: HDD speed
Key: hdparm.rspeed
Type of information: Numeric (float)
Units: MB/s
Update interval (in sec): 601
Applications: Filesystems
Go to “Graphs”- > “Ceate graph”
the
Name: Hdparm: HDD read speed
Y axis MIN value: Fixed 0.0000
Items: Add: “Hdparm: HDD speed”

Ready! We have the option, and there is a count on it. If you added to the template, attach the template to the host. Admire!

image

In our case that was before 31.07 — it's bad, though the average speed was high, but very often it is down below 1MB/s. Now (after the transfer us to another other node) it is stable and rarely drops until there was at least 5-6MB/s. I Think that the same nod she fell not because of the drive, and because of the employment of some other more important resources, but the important thing is that we see the failures!

the
 

the

Monitoring the logs of nginx


It would seem, why to monitor logs? Connected metric analyst Yes, and watch all there. But these guys don't show us the bots that do not run js on the page, as well as people, if they have js disabled. The proposed solution will show the frequency the robots crawl your pages and will help to prevent high load on your server from the search engine bots. Well, any statistics, if you dig deeper loghttp.sh

Added to /etc/zabbix/zabbix_agentd.d/user.conf
the
UserParameter=log.http.all,/etc/zabbix/scripts/loghttp.sh

Put loghttp.sh in /etc/zabbix/scripts in the same folder of executable
the
chmod o+x loghttp.sh
yum install curl
chown zabbix:zabbix /etc/zabbix/scripts
service zabbix-agent restart

Check the path to access.log
the
[root@fliber ~]# vi loghttp.sh
LOG=Putop

Import loghttp.xml in templates zabbix: Configuration -> Templates in the header line “CONFIGURATION OF TEMPLATES” looking for the right the button “Import”, choose file, import.
Connect the template to the host: Configuration->Hosts->Wasserwerke tab “Templates” in the “Link new templates” are going to write the “Logs” drop-down list appears — choose our pattern. “Add”, “Save”.
The template is written to monitor log every 10 minutes, so take your time to watch charts, but the logs you can check. On the client side “/var/log/zabbix/zabbix_agentd.log” and server side “/tmp/zabbix_server.log” or “/var/log/zabbix/zabbix_server.log”.
If all is well, soon you will be able to observe such a picture:

image

Google fellow — skanit one speed, Sometimes Mail comes, rarely Bing, and Yahoo and can not see. Yandex, with varying success index, but a lot is still in the results shown, it would be great :)
On the chart Google and Yandex on the left scale, the other on the right. The value on the scale — the number of visits from measuring up to the measurement, ie for 10 minutes. You can put 1 hour, but then risk missing a lot of visits at the time of log rotation.

the
 

the

Monitor nginx


Why monitor nginx't know, never problems was not with him. But let it be for statistics. I tried to use a set of templates ZTC, but very much I do not like the flashing process Python in memory, 10MB each. Want native, I want bash! And most importantly — for a single query to collect all of the parameters. That's what I wanted to achieve when monitoring all services minimum server load and maximum parameters.
Similar scripts you can find a lot, but since I came comprehensively monitoring Web servers, post your version.

Teach nginx to give a status page, add the configuration to localhost
the
server {
listen localhost;
server_name status.localhost;
keepalive_timeout 0;
allow 127.0.0.1;
deny all;
location /server-status {
stub_status on;
}
access_log off;
}

Don't forget to apply the changes:
service nginx reload

Added to /etc/zabbix/zabbix_agentd.d/user.conf
the
UserParameter=nginx.ping/etc/zabbix/scripts/nginx.sh

Put nginx.sh in /etc/zabbix/scripts in the same folder of executable
the
chmod o+x nginx.sh
service zabbix-agent restart

If you don't put curl in the previous step, it is necessary to install:
yum install curl

Check, just in case that nginx.sh the variables SENDER and CURL the right way.

Import loghttp.xml in the zabbix template, the included template to the host.
Well, enjoy the pictures!

image

This monitor is able to inform that nginx is not running, or that he started going too slow. The default threshold is this: if over the last 10 measurements of the rate of reaction nginx did not fall below 10ms, create a Warning. The monitor will report when the server returned incorrect status(nginx in memory, and responds with gibberish).

the
 

the

Monitor php-fpm


It is useful to implement if you use a dynamic set of processes (pm = dynamic in /etc/php-fpm.d/www.conf) by default, or consciously. The monitor is able to alert about the unavailability of the service or its slowing down.
I tried to do the survey without nginx, but I failed to find a program that would help in interaction with php-fpm. Prompt, if who knows.
Maybe php-fpm does not give a status, check
the
[root@fliber ~]# vi /etc/php-fpm.d/www.conf
pm.status_path = /status

If something is changed that is used:
service php-fpm reload

Added to /etc/zabbix/zabbix_agentd.d/user.conf
the
UserParameter=php.fpm.ping/etc/zabbix/scripts/php-fpm.sh

Put php-fpm.sh in /etc/zabbix/scripts in the same folder of executable
the
chmod o+x php-fpm.sh
service zabbix-agent restart

List php-fpm.sh the path to the FastCGI server (the listen parameter in /etc/php-fpm.d/www.conf)
the
[root@fliber ~]# vi /etc/zabbix/scripts/php-fpm.sh
LISTEN='127.0.0.1:9000'
Sockets are also supported.

If you have not installed the cgi-fcgi, it is necessary to install:
yum install fcgi

Import php-fpm.xml in the zabbix template, the included template to the host.

image

the
 


the

Monitoring PostgreSQL


It's the main course! His chef had cooked the longest :)
As the prototype was chosen pg_monz — open_source, maintained, many options, works with the latest version of postgres. The lack of global — I collect all the parameters in the service, because I do not know which one and when “jump”.
Great pg_monz and included the collection of all parameters for databases and tables, only about 700 pieces, the server load has increased 10 times! (probably with pgbouncer will not be so noticeable) Although parameters were collected once every 300 seconds. It is understandable — for each parameter, run psql and execute the query, often to the same table, just different fields. In General from pg_monz were only names of fields and tables. Well, we try!

Added to /etc/zabbix/zabbix_agentd.d/user.conf
the
UserParameter=psql.ping[*],/etc/zabbix/scripts/psql.sh $1 $2 $3 $4 $5
UserParameter=psql.db.ping[*],/etc/zabbix/scripts/psql_db_stats.sh $1 $2 $3 $4 "$5"
UserParameter=psql.db.discovery[*],psql -h $1-p $2-U $3 -d $4 -t-c "select '{\"data\":['||string_agg ('a{\"{#DBNAME}\":\"'||datname||'\"}',',')||' ]}' from pg_database where not datistemplate and datname~'$5'"
UserParameter=psql.t.discovery[*],/etc/zabbix/scripts/psql_table_list.sh $1 $2 $3 $4 "$5" "$6"

Put psql*.sh in /etc/zabbix/scripts in the same folder of executable
the
chmod o+x psql*.sh
service zabbix-agent restart

Check, just in case that nginx.sh and psql_db_stats.sh in the variable PSQLC the right path to psql.

Import psql.xml templates in zabbix. If you do not plan to collect data for databases and tables, then immediately disconnect on the tab “Discovery” template “PSQL DB list” and “PSQL table list”. And if you plan to begin, set the macro in the tab “Macros” agent {$PGTBL_REGEXP} — the name of the table that will be monitored in detail. Although it is likely, in the beginning you want to view all table :)

Connect the template to a host, look at how data is collected...

image

All the template parameters (inherited from pg_monz) and the values of their default you can see in the tab “Macros” template. I'll try to make a description of these options:
the the the the the the the the the the the the the the the the
Macro default Description
{$PGDATABASE} postgres database Name to connect
{$PGHOST} 127.0.0.1 Host PostgreSQL (relative to Zabbix agent if there: 127.0.0.1)
{$PGLOGDIR} /var/lib/pgsql/9.3/data/pg_log the directory with the PostgreSQL logs
{$PGPORT} 5432 the port Number of PostgreSQL
{$PGROLE} postgres the username to connect to PostgreSQL
{$PGDB_REGEXP} .
(all databases)
the name of the database for gathering detailed information*
{$PGTBL_REGEXP} .
(all tables)
the name of the collection table details*
{$PGCHECKPOINTS_THRESHOLD} 10 If the number checkpoint's exceeds the threshold the trigger --
{$PGCONNECTIONS_THRESHOLD} 2 If the average number of sessions for the last 10 minutes exceeds the threshold, trigger the trigger
{$PGDBSIZE_THRESHOLD} 1073741824 If the database size exceeds the specified limit, in bytes, of the trigger --
{$PGTEMPBYTES_THRESHOLD} 1048576 If the write speed to temporary files for the last 10 minutes exceeds PGTEMPBYTES_THRESHOLD, in bytes, of the trigger --
{$PGCACHEHIT_THRESHOLD} 90 If for the last 10 minutes average of the cache hit is below the threshold that fires a trigger on database
{$PGDEADLOCK_THRESHOLD} 0 as soon As the number of deadlocks exceeds the trigger --
{$PGSLOWQUERY_SEC} 1 If the query runs longer PGSLOWQUERY_SEC seconds to consider it slow
{$PGSLOWQUERY_THRESHOLD} 1 If the average number of slow queries in the last 10 minutes exceeds the threshold, then the trigger --
* as a parameter, use a regular expression, for example
org — all tables and diagrams that contain the substring org
\.(organization|resource|okved)$ tables with names of the organization, resource, okved in any schema
^msn\. — all tables in the schema, msn

It turns out that the parameters {$PGDB_REGEXP} and {$PGTBL_REGEXP} is not just a name, it is the substring that will be searched for in the names of all databases and schemas.tables.
Only now the regular season will not work all, but only those that do not contain characters \, ', ", `, *, ?, [, ], {, }, ~, $, !, &, ;, (, ), <, >, |, #, @, 0x0a. If you want to remove this limit, edit /etc/zabbix/zabbix_agentd.conf
the
UnsafeUserParameters=1
then, you will only use the double quote "" to put a single. To read about of the regular season in PostgreSQL here:
www.postgresql.org/docs/9.3/static/functions-matching.html#FUNCTIONS-POSIX-REGEXP

If you do not receive the option “PSQL error log”, it is likely not right given path {$PGLOGDIR} — look at the agent file “postgresql-Sun.log” — where is this folder and write in the macro.

the
 

the

Monitor php-opcache


As is customary in the best houses — the dessert!
This article view of the entire chain from user request to data except one site — the php. To a little bit to open up the box, try to gather statistics built-in in php5 opcache accelerator function opcache_get_status.

Added to /etc/zabbix/zabbix_agentd.d/user.conf
the
UserParameter=php.opc.ping/etc/zabbix/scripts/php-opc.sh
UserParameter=php.opc.discovery/etc/zabbix/scripts/php-opc.sh discover

Put php-opc.* /etc/zabbix/scripts in the same folder of executable
the
chmod o+r php-opc.php
chmod o+x php-opc.sh
service zabbix-agent restart

List php-opc.sh the path to the FastCGI server (the listen parameter in /etc/php-fpm.d/www.conf)
the
[root@fliber ~]# vi /etc/zabbix/scripts/php-opc.sh
LISTEN='127.0.0.1:9000'
Sockets are also supported. Here you can disable discovery(unloaded network), SCRIPTS_ENABLE=0

If not set fcgi to monitor php-fpm, it is necessary to install:
yum install fcgi

Import php-opc.xml in the zabbix template, the included template to the host.

image

While I was creating triggers, reduced the size of the memory to cache your opcache 2 times. So that it is useful to see these statistics. If this server is not updated with new modules, would reduce to 4 times easily.

the
 

Opinion


Don't know what other parameters to monitor. These are the basic services that run on our VDS machine. We have — to share, of course. It would be nice to add a template script mysql, but the problem yet. If you really need — do :)
The approach can be discussed — I did it separate gadgets for each service, and it's too bad — took the script, added a config, filled in with a template ready.
If you collect these monitors in one package, of course, to do additional discovery on each service — may be more nginx, php-fpm, and postgres. They can listen to as ports, and sockets.

Version
CentOS release 6.5 (Final)
ZabbixServer 2.2.4
ZabbixAgent 2.2.4
hdparm 9.43
nginx 1.6.1
php-fpm 5.3.3
PostgreSQL 9.3.4
php-opcache 5.5.15

All screenshots taken from server statistics http://www.fliber.net registration — only for legal entities.
Templates and scripts here: http://www.fliber.net/assets/for_articles/2014-08-habr-zabbix-mons.zip
To monitor php-fpm and pfp-opcache in archive there is a version of the script to work with curl(with the same templates). Description of the settings commented out in the relevant script php*_curl.sh, see upd2
upd1
Now "HOST=Ваш_агент_addr", "SERVER=Ваш_сервер_addr" ask not necessary.
But you need to specify "Hostname=Ваш_агент_addr" in zabbix_agentd.conf. The value "system.hostname" will not do.
Outdated templates and scripts here: www.fliber.net
Thanks for the tip xenozauros.
upd2
Now to monitor php-fpm is not necessary to register the nginx configuration
the
server {...
location ~ ^/(status|ping)$ {
include /etc/nginx/fastcgi_params;
location 127.0.0.1:9000;
fastcgi_param SCRIPT_FILENAME status;
}
...}
service nginx reload

Now to monitor the php opcache is not necessary to register the nginx configuration
the
server {...
location /opc-status {
include /etc/nginx/fastcgi_params;
location 127.0.0.1:9000;
fastcgi_param SCRIPT_FILENAME /etc/zabbix/scripts/php-opc.php;
}
...}
service nginx reload

But you want prescription to LISTEN to php-fpm in the scripts and install the cgi-fcgi:
yum install fcgi

It is switching from the curl to the fcgi reduced the query execution time monitoring in 2 times
Also added the copyright notice and check for correct paths to the programs. Now when you run the script from the console, he knows how to swear, if not from the console, the error will be in server logs, not the correct answer.

Outdated templates and scripts here: www.fliber.net
Article based on information from habrahabr.ru

Комментарии

Популярные сообщения из этого блога

Templates ESKD and GOST 7.32 for Lyx 1.6.x

Customize your Google