Introduction to Linux monitoring and alerting

https://www.redhat.com/sysadmin/linux-monitoring-and-alerting

mail 的命令挺有意思的. 
Have you ever wanted to set up a process monitor that alerts you when it's offline without spending thousands of budget dollars to do so? Every system administrator has, and here's how to do it.

Posted October 3, 2019 | byKen Hess (Red Hat)

There are system administrators who love to do things themselves, and then there are those of us who must do things ourselves because budgets just don't allow for mega-purchases. Enterprise monitoring and alerting suites are for companies that either have large budgets or for those that have mission-critical applications, systems, or services that absolutely must be up 100% of the time. There are some open-source monitoring and alerting suites, but they require a dedicated system and a considerable amount of time to set up. Most also require agents to be installed on monitored endpoints, which requires approval and time to deploy. The quicker and easier solution is to create your own monitoring and alerting scripts and then schedule them via cron. The best part of localized (per server) monitoring and alerting is that you can customize thresholds for each system and service, rather than having to live with a global configuration that might not meet your needs.

This article takes you through the process of creating a script that checks every five minutes for the Apache web server process, attempts to restart it if it's down, and then alerts you via email if it's down for more than 30 seconds and cannot be restarted.

Most processes have a process ID (PID) file under the /run directory when they are running, and many of those have their own separate directories that contain their corresponding PID files. In this example, the Apache web server (httpd) has a PID file: /run/httpd/httpd.pid.

I named this script  apache.sh , and placed it into root's home directory. Be sure to change permissions on the file to 750 ( rwxr -x---) so that no other user can execute or even read this file, regardless of location:

$ sudo chmod 750 apache.sh

Note: If you don't have Apache installed, it doesn't matter, because you can replace the httpd.pid file pointed to in the script with any other PID file that works for your system.

There are many different ways to create such a script, but this is how I did it, and it works. I identified the PID file with the variable, FILE. I decided that rather than have an alert sent if the Apache web server was down, I would have the script attempt a service restart, and then check again. I repeated this process two more times, waiting for 10 seconds between checks. If the Apache service is still down and cannot be restarted after 30 seconds, then the script sends the system administrator team an email:

#!/bin/bash

FILE=/run/httpd/httpd.pid

if ! [ -f "$FILE" ]; then
systemctl start httpd.service
fi
sleep 10s
if ! [ -f "$FILE" ]; then
systemctl start httpd.service
fi
sleep 10s
if ! [ -f "$FILE" ]; then
systemctl start httpd.service
fi
sleep 10s
if ! [ -f "$FILE" ]; then
mail -s 'Apache is down' sysadmins@mydomain.com <<< 'Apache is down on SERVER1 and cannot be restarted'
fi

You could just as easily send an SMS message to a team on-call mobile phone. This script checks for the non-existence of the httpd.pid file and then takes action if it's not found. If the file exists, then no action is taken. No one wants to receive emails or notices that a service is up every five minutes.

Once you've tested your script and satisfied that it operates as desired, place this script into the root user's crontab:

$ sudo crontab -e

The entry I've made below runs the script every five minutes:

*/5 * * * * /root/apache.sh

This script is an example of a quick method for setting up a process monitor and alert on a local system. Yes, it's primitive and simple, but it works and it's free. It also doesn't require any budget discussions, nor does it require a maintenance window for agent installation. You'll also find that this script doesn't significantly impact performance on your system. These are all good things. And if you're an Ansible administrator, you could deliver this script to your entire fleet of systems without having to touch each one individually.

Want to learn more advanced techniques for monitoring in Linux? Check out The open source guide to DevOps monitoring tools.

最新文章

  1. bdb mvcc: buffer 何时可以被 看到; mvcc trans何时被移除
  2. Java提高篇(二六)-----hashCode
  3. 构造persen
  4. 避免重定向301&amp;302 (Avoid Redirects)
  5. 重要的事情说三遍:列表 ul / ol 等是块级元素,是块级元素,块级元素
  6. .Net连接到SAP【转载】
  7. android ListView_新闻案例
  8. 1029: [JSOI2007]建筑抢修 - BZOJ
  9. 走进 Redis 的世界
  10. python+selenium自动测试之WebDriver的常用API(基础篇一)
  11. laravel之ORM增删改查数据
  12. JS----对象的合并与克隆
  13. 区间dp之四边形不等式优化详解及证明
  14. 存在一个足够大的二维数组,每个数组中的值都是整数,使用javascript如何实现按每个数组中的平均值,从大到小排序这个二维数组?
  15. IntelliJ IDEA使用心得之Maven项目篇
  16. Spring Boot日志管理
  17. Python初学者随笔Week1
  18. 【Python】【数据类型】
  19. 浅谈__dict__
  20. X-Forwarded-For的一些理解

热门文章

  1. CLR内部异常(中)
  2. ubuntu 基于windows
  3. vue关于keep-alive的小坑
  4. Linux修改服务器Oracle字符集
  5. Matlab下的文件执行路径
  6. 错误: 找不到或无法加载主类 Welcome.java
  7. 【AtCoder】 ARC 102
  8. nRF51822 主从断开连接Reason,HCI ERROR CODE :0x003E
  9. zookeeper、hbase集成kerberos
  10. 深入理解数据库索引采用B树和B+树的原因