为什么要做监控?

–熟悉IT监控系统的设计原理
–开发一个简版的类Zabbix监控系统
–掌握自动化开发项目的程序设计思路及架构解藕原则

常用监控系统设计讨论

Zabbix
Nagios

监控系统需求讨论

1.可监控常用系统服务、应用、网络设备等
2.一台主机上可监控多个不同服务、不同服务的监控间隔可不同
3.同一个服务在不同主机上的监控间隔、报警阈值可不同
4.可以批量的给一批主机添加、删除、修改要监控的服务
5.告警级别:
  • 不同的服务 因为业务重要程度不同,如果出了问题可以设置不同的报警级别
  • 可以指定特定的服务或告警级别的事件通知给特定的用户
  • 告警的升级设定
6.历史数据 的存储和优化
  • 实现用最少的空间占用量存储最多的有效数据
  • 如何做到1s中之内取出一台主机上所有服务的5年的监控数据?

7. 数据可视化,如何做出简洁美观的用户界面?

8.如何实现单机支持5000+机器监控需求?
9.采取何种通信方式?主动、被动?
10.如何实现监控服务器的水平扩展?
 

采用什么架构?

•Mysql
•主动通信? Snmp,wget…
•被动通信?Agent ---how to communicate with the monitor server
•Socket server –>  Sockect client
•能否用现成的c/s架构? Rabbit mq, redis 订阅发布, http ?
 

采用HTTP好处

1.接口设计简单

2.容易水平扩展做分布式

3.Socket稳定成熟,省去较多的通信维护精力

Http特性:

1.短连接

2.无状态

3.安全认证

4.被动通信

监控系统架构设计

 #!_*_coding:utf8_*_
from django.db import models # Create your models here. class Host(models.Model):
name = models.CharField(max_length=64,unique=True)
ip_addr = models.GenericIPAddressField(unique=True)
host_groups = models.ManyToManyField('HostGroup',blank=True) # A B C
templates = models.ManyToManyField("Template",blank=True) # A D E
monitored_by_choices = (
('agent','Agent'),
('snmp','SNMP'),
('wget','WGET'),
)
monitored_by = models.CharField(u'监控方式',max_length=64,choices=monitored_by_choices)
status_choices= (
(1,'Online'),
(2,'Down'),
(3,'Unreachable'),
(4,'Offline'),
)
status = models.IntegerField(u'状态',choices=status_choices,default=1)
memo = models.TextField(u"备注",blank=True,null=True) def __unicode__(self):
return self.name class HostGroup(models.Model):
name = models.CharField(max_length=64,unique=True)
templates = models.ManyToManyField("Template",blank=True)
memo = models.TextField(u"备注",blank=True,null=True)
def __unicode__(self):
return self.name class ServiceIndex(models.Model):
name = models.CharField(max_length=64)
key =models.CharField(max_length=64)
data_type_choices = (
('int',"int"),
('float',"float"),
('str',"string")
)
data_type = models.CharField(u'指标数据类型',max_length=32,choices=data_type_choices,default='int')
memo = models.CharField(u"备注",max_length=128,blank=True,null=True)
def __unicode__(self):
return "%s.%s" %(self.name,self.key) class Service(models.Model):
name = models.CharField(u'服务名称',max_length=64,unique=True)
interval = models.IntegerField(u'监控间隔',default=60)
plugin_name = models.CharField(u'插件名',max_length=64,default='n/a')
items = models.ManyToManyField('ServiceIndex',verbose_name=u"指标列表",blank=True)
memo = models.CharField(u"备注",max_length=128,blank=True,null=True) def __unicode__(self):
return self.name
#def get_service_items(obj):
# return ",".join([i.name for i in obj.items.all()]) class Template(models.Model):
name = models.CharField(u'模版名称',max_length=64,unique=True)
services = models.ManyToManyField('Service',verbose_name=u"服务列表")
triggers = models.ManyToManyField('Trigger',verbose_name=u"触发器列表",blank=True)
def __unicode__(self):
return self.name
'''
class TriggerExpression(models.Model):
name = models.CharField(u"触发器表达式名称",max_length=64,blank=True,null=True)
service = models.ForeignKey(Service,verbose_name=u"关联服务")
service_index = models.ForeignKey(ServiceIndex,verbose_name=u"关联服务指标")
logic_type_choices = (('or','OR'),('and','AND'))
logic_type = models.CharField(u"逻辑关系",choices=logic_type_choices,max_length=32,blank=True,null=True)
left_sibling = models.ForeignKey('self',verbose_name=u"左边条件",blank=True,null=True,related_name='left_sibling_condition' )
operator_type_choices = (('eq','='),('lt','<'),('gt','>'))
operator_type = models.CharField(u"运算符",choices=operator_type_choices,max_length=32)
data_calc_type_choices = (
('avg','Average'),
('max','Max'),
('hit','Hit'),
('last','Last'),
)
data_calc_func= models.CharField(u"数据处理方式",choices=data_calc_type_choices,max_length=64)
data_calc_args = models.CharField(u"函数传入参数",help_text=u"若是多个参数,则用,号分开,第一个值是时间",max_length=64)
threshold = models.IntegerField(u"阈值") def __unicode__(self):
return "%s %s(%s(%s))" %(self.service_index,self.operator_type,self.data_calc_func,self.data_calc_args)
''' class TriggerExpression(models.Model):
#name = models.CharField(u"触发器表达式名称",max_length=64,blank=True,null=True)
trigger = models.ForeignKey('Trigger',verbose_name=u"所属触发器")
service = models.ForeignKey(Service,verbose_name=u"关联服务")
service_index = models.ForeignKey(ServiceIndex,verbose_name=u"关联服务指标")
specified_index_key = models.CharField(verbose_name=u"只监控专门指定的指标key",max_length=64,blank=True,null=True)
operator_type_choices = (('eq','='),('lt','<'),('gt','>'))
operator_type = models.CharField(u"运算符",choices=operator_type_choices,max_length=32)
data_calc_type_choices = (
('avg','Average'),
('max','Max'),
('hit','Hit'),
('last','Last'),
)
data_calc_func= models.CharField(u"数据处理方式",choices=data_calc_type_choices,max_length=64)
data_calc_args = models.CharField(u"函数传入参数",help_text=u"若是多个参数,则用,号分开,第一个值是时间",max_length=64)
threshold = models.IntegerField(u"阈值") logic_type_choices = (('or','OR'),('and','AND'))
logic_type = models.CharField(u"与一个条件的逻辑关系",choices=logic_type_choices,max_length=32,blank=True,null=True)
#next_condition = models.ForeignKey('self',verbose_name=u"右边条件",blank=True,null=True,related_name='right_sibling_condition' )
def __unicode__(self):
return "%s %s(%s(%s))" %(self.service_index,self.operator_type,self.data_calc_func,self.data_calc_args)
class Meta:
pass #unique_together = ('trigger_id','service') class Trigger(models.Model):
name = models.CharField(u'触发器名称',max_length=64)
#expressions= models.TextField(u"表达式")
severity_choices = (
(1,'Information'),
(2,'Warning'),
(3,'Average'),
(4,'High'),
(5,'Diaster'),
)
#expressions = models.ManyToManyField(TriggerExpression,verbose_name=u"条件表达式")
severity = models.IntegerField(u'告警级别',choices=severity_choices)
enabled = models.BooleanField(default=True)
memo = models.TextField(u"备注",blank=True,null=True) def __unicode__(self):
return "<serice:%s, severity:%s>" %(self.name,self.get_severity_display()) class Action(models.Model):
name = models.CharField(max_length=64,unique=True)
host_groups = models.ManyToManyField('HostGroup',blank=True)
hosts = models.ManyToManyField('Host',blank=True) conditions = models.TextField(u'告警条件')
interval = models.IntegerField(u'告警间隔(s)',default=300)
operations = models.ManyToManyField('ActionOperation') recover_notice = models.BooleanField(u'故障恢复后发送通知消息',default=True)
recover_subject = models.CharField(max_length=128,blank=True,null=True)
recover_message = models.TextField(blank=True,null=True) enabled = models.BooleanField(default=True) def __unicode__(self):
return self.name class ActionOperation(models.Model):
name = models.CharField(max_length=64)
step = models.SmallIntegerField(u"第n次告警",default=1)
action_type_choices = (
('email','Email'),
('sms','SMS'),
('script','RunScript'),
)
action_type = models.CharField(u"动作类型",choices=action_type_choices,default='email',max_length=64)
#notifiers= models.ManyToManyField(host_models.UserProfile,verbose_name=u"通知对象",blank=True)
def __unicode__(self):
return self.name class Maintenance(models.Model):
name = models.CharField(max_length=64,unique=True)
hosts = models.ManyToManyField('Host',blank=True)
host_groups = models.ManyToManyField('HostGroup',blank=True)
content = models.TextField(u"维护内容")
start_time = models.DateTimeField()
end_time = models.DateTimeField() def __unicode__(self):
return self.name ''''
CPU
idle 80
usage 90
system 30
user
iowait 50 memory :
usage
free
swap
cache
buffer load:
load1
load 5
load 15
'''

表设计结构

最新文章

  1. mysql连接报错 Host &lsquo;xxx&rsquo;is blocked because of many connection errors;unblock with 'mysqladmin flush-hosts'
  2. TCP拆包粘包之分隔符解码器
  3. Cannot use object of type yii\db\Connection as array
  4. java中图片文件的判断
  5. 树莓派学习:源码方式安装opencv
  6. linux C socket
  7. WebService 设计总结
  8. c语言的笔记
  9. 学习Swift -- 构造器(中)
  10. vue指令v-pre示例解析
  11. Tomcat手动部署Web项目详细步骤
  12. psutil的几个例子
  13. EOS智能合约授权限制和数据存储
  14. springboot 集成mongodb
  15. POJ 2385 Apple Catching【DP】
  16. 抛异常 throw的注意事项
  17. one-hot句子向量 对比度增强
  18. System.Web.Routing入门及进阶 上篇
  19. abp core版本添加额外应用层
  20. bzoj3748 Kwadraty

热门文章

  1. angular 之路由
  2. tp5集成淘宝,微信,网易,新浪等第三方登录
  3. git 设置 代理服务器
  4. linux下运行jar
  5. 108. Convert Sorted Array to Binary Search Tree 109. Convert Sorted List to Binary Search Tree -- 将有序数组或有序链表转成平衡二叉排序树
  6. Android学习必备--java工具15个
  7. IOS-网络(大文件下载)
  8. Haproxy的负载均衡和高可用配置
  9. Visio2010建立ER图并直接导出为SQL语句
  10. Python数据类型-02.字符串