• 本次任务是用alertmanaer发一个报警邮件
  • 本次环境采用二进制普罗组件
  • 本次准备监控一个节点的内存,当使用率大于2%时候(测试),发邮件报警.

k8s集群使用普罗官方文档

环境准备

下载二进制https://prometheus.io/download/

https://github.com/prometheus/prometheus/releases/download/v2.0.0/prometheus-2.0.0.windows-amd64.tar.gz
https://github.com/prometheus/alertmanager/releases/download/v0.12.0/alertmanager-0.12.0.windows-amd64.tar.gz
https://github.com/prometheus/node_exporter/releases/download/v0.15.2/node_exporter-0.15.2.linux-amd64.tar.gz

解压

/root/
├── alertmanager -> alertmanager-0.12.0.linux-amd64
├── alertmanager-0.12.0.linux-amd64
├── alertmanager-0.12.0.linux-amd64.tar.gz
├── node_exporter-0.15.2.linux-amd64
├── node_exporter-0.15.2.linux-amd64.tar.gz
├── prometheus -> prometheus-2.0.0.linux-amd64
├── prometheus-2.0.0.linux-amd64
└── prometheus-2.0.0.linux-amd64.tar.gz

实验架构

配置alertmanager

创建 alert.yml

[root@n1 alertmanager]# ls
alertmanager alert.yml amtool data LICENSE NOTICE simple.yml

alert.yml 里面定义下: 谁发送 什么事件 发给谁 怎么发等.

cat alert.yml
global:
smtp_smarthost: 'smtp.163.com:25'
smtp_from: 'maotai@163.com'
smtp_auth_username: 'maotai@163.com'
smtp_auth_password: '123456' templates:
- '/root/alertmanager/template/*.tmpl' route:
group_by: ['alertname', 'cluster', 'service']
group_wait: 30s
group_interval: 5m
repeat_interval: 10m
receiver: default-receiver receivers:
- name: 'default-receiver'
email_configs:
- to: 'maotai@foxmail.com' - 配置好后启动即可
./alertmanager -config.file=./alert.yml

配置prometheus

报警规则rule.yml配置(将被prometheus.yml调用)

当使用率大于2%时候(测试),发邮件报警

$ cat rule.yml
groups:
- name: test-rule
rules:
- alert: NodeMemoryUsage
expr: (node_memory_MemTotal - (node_memory_MemFree+node_memory_Buffers+node_memory_Cached )) / node_memory_MemTotal * 100 > 2
for: 1m
labels:
severity: warning
annotations:
summary: "{{$labels.instance}}: High Memory usage detected"
description: "{{$labels.instance}}: Memory usage is above 80% (current value is: {{ $value }}"

关键在于这个公式

(node_memory_MemTotal - (node_memory_MemFree+node_memory_Buffers+node_memory_Cached )) / node_memory_MemTotal * 100 > 2

labels 给这个规则打个标签

annotations(报警说明)这部分是报警内容

监控k从哪里获取?(后面有说) node_memory_MemTotal/node_memory_Buffers/node_memory_Cached

prometheus.yml配置

  • 添加node_expolore这个job

  • 添加rule_files的报警规则,rule_files部分调用rule.yml

$ cat prometheus.yml
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute. alerting:
alertmanagers:
- static_configs:
- targets: ["localhost:9093"] rule_files:
- /root/prometheus/rule.yml scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['192.168.14.11:9090']
- job_name: linux
static_configs:
- targets: ['192.168.14.11:9100']
labels:
instance: db1

配置好后启动普罗然后访问,可以看到了node target了.

查看node_explore抛出的metric

查看alert,可以看到告警规则发生的状态

这些公式的key从这里可以看到(前提是当你安装了对应的explore),按照这个k来写告警公式

查看收到的邮件



微信报警配置

global:
# The smarthost and SMTP sender used for mail notifications.
resolve_timeout: 6m
smtp_smarthost: '172.16.100.14:25'
smtp_from: 'svnbuild_yf@iflytek.com'
smtp_auth_username: 'svnbuild_yf'
smtp_auth_password: 'tag#write@2015313'
smtp_require_tls: false # The auth token for Hipchat.
hipchat_auth_token: '1234556789'
# Alternative host for Hipchat.
hipchat_api_url: 'https://hipchat.foobar.org/'
wechat_api_url: "https://qyapi.weixin.qq.com/cgi-bin/"
wechat_api_secret: "4tQroVeB0xUcccccccc65Yfkj2Nkt90a80MH3ayI"
wechat_api_corp_id: "wxaf5acxxxx5f8eb98" # The directory from which notification templates are read.
templates:
- 'templates/*.tmpl' # The root route on which each incoming alert enters.
route:
# The labels by which incoming alerts are grouped together. For example,
# multiple alerts coming in for cluster=A and alertname=LatencyHigh would
# be batched into a single group.
group_by: ['alertname'] # When a new group of alerts is created by an incoming alert, wait at
# least 'group_wait' to send the initial notification.
# This way ensures that you get multiple alerts for the same group that start
# firing shortly after another are batched together on the first
# notification.
group_wait: 3s # When the first notification was sent, wait 'group_interval' to send a batch
# of new alerts that started firing for that group.
group_interval: 5m # If an alert has successfully been sent, wait 'repeat_interval' to
# resend them.
repeat_interval: 1h # A default receiver
receiver: ybyang2 routes:
- match:
job: "11"
#service: "node_exporter"
routes:
- match:
status: yellow
receiver: ybyang2
- match:
status: orange
receiver: berlin # Inhibition rules allow to mute a set of alerts given that another alert is
# firing.
# We use this to mute any warning-level notifications if the same alert is
# already critical.
inhibit_rules:
- source_match:
service: 'up'
target_match:
service: 'mysql'
# Apply inhibition if the alerqtname is the same.
equal: ["instance"] - source_match:
service: "mysql"
target_match:
service: "mysql-query"
equal: ['instance'] - source_match:
service: "A"
target_match:
service: "B"
equal: ["instance"] - source_match:
service: "B"
target_match:
service: "C"
equal: ["instance"] receivers:
- name: 'ybyang2'
email_configs:
- to: 'ybyang2@iflytek.com'
send_resolved: true
html: '{{ template "email.default.html" . }}'
headers: { Subject: "[mail] 测试技术部监控告警邮件" } - name: "berlin"
wechat_configs:
- send_resolved: true
to_user: "@all"
to_party: ""
to_tag: ""
agent_id: "1"
corp_id: "wxaf5a99ccccc5f8eb98"

最新文章

  1. java 中与 或 非 异或 和位移运算
  2. SAP NWBC for HTML and Desktop configuration steps[From sdn]
  3. python string module
  4. 如何恢复SQL Server 中的Master库
  5. C#使用IHttpModule接口修改http输出的方法浅谈
  6. 魅族pro 7详细打开Usb调试模式的方法
  7. vue踩坑(二):跨域以及携带cookie
  8. 【HDU - 4344】Mark the Rope(大整数分解)
  9. 产生10个随机数5-9之间 统计一个int类型的一维数组中有多少个在[min,max]之间的数
  10. MySQL锁系列之锁的种类和概念
  11. 怎么查看自己电脑的IP地址
  12. Java虚拟机性能管理神器 - VisualVM(4) - JDK版本与VisualVM版本对应关系
  13. 设置tab标签页 遮挡部分
  14. AndroidStudio 使用AIDL
  15. 页面提交 string数组和list对象集合举例
  16. 每天一个linux命令:【转载】ls命令
  17. Note: log switch off, only log_main and log_events will have logs!
  18. AngularJs学习——实现数据绑定的三种方式
  19. 关于spark RDD trans action算子、lineage、宽窄依赖详解
  20. 给 Magento 2 添加缓存层的分析与尝试

热门文章

  1. .net反编译工具ILSpy
  2. IIS 之 未能加载文件或程序集“IBM.Data.DB2”或它的某一个依赖项。试图加载格式不正确的程序。
  3. getaddrinfo()函数详解
  4. 虚拟机里面做了个MySQLS主从:
  5. 设计模式之适配器模式(Adapter Pattern)C++实现
  6. js中移除空白节点
  7. 【Oracle】存储过程之完整篇
  8. delattr
  9. C++编写简单的俄罗斯方块游戏
  10. html5实现全屏的api方法