基于ELK搭建MySQL日志平台的要点和常见错误

第一部分概括

ELK是集分布式数据存储、可视化查询和日志解析于一体的日志分析平台。ELK=elasticsearch+Logstash+kibana，三者各司其职，相互配合，共同完成日志的数据处理工作。ELK各组件的主要功能如下：

elasticsearch，数据存储以及全文检索；
logstash，日志加工、“搬运工”；
kibana：数据可视化展示和运维管理。

我们在搭建平台时，还借助了filebeat插件。Filebeat是本地文件的日志数据采集器，可监控日志目录或特定日志文件（tail file），并可将数据转发给Elasticsearch或Logstatsh等。

本案例的实践，主要通过ELK收集、管理、检索mysql实例的慢查询日志和错误日志。

简单的数据流程图如下：

第二部分 elasticsearch

2.1 ES特点和优势

分布式实时文件存储，可将每一个字段存入索引，使其可以被检索到。
实时分析的分布式搜索引擎。分布式：索引分拆成多个分片，每个分片可有零个或多个副本；负载再平衡和路由在大多数情况下自动完成。
可以扩展到上百台服务器，处理PB级别的结构化或非结构化数据。也可以运行在单台PC上。
支持插件机制，分词插件、同步插件、Hadoop插件、可视化插件等。

2.2 ES主要概念

ES数据库	MySQL数据库
Index	Database
Tpye[在7.0之后type为固定值_doc]	Table
Document	Row
Field	Column
Mapping	Schema
Everything is indexed	Index
Query DSL[Descriptor structure language]	SQL
GET http://...	Select * from table …
PUT http://...	Update table set …

关系型数据库中的数据库（DataBase），等价于ES中的索引（Index）;
一个关系型数据库有N张表（Table），等价于1个索引Index下面有N多类型（Type）;
一个数据库表（Table）下的数据由多行（ROW）多列（column，属性）组成，等价于1个Type由多个文档（Document）和多Field组成;
在关系型数据库里，schema定义了表、每个表的字段，还有表和字段之间的关系。与之对应的，在ES中：Mapping定义索引下的Type的字段处理规则，即索引如何建立、索引类型、是否保存原始索引JSON文档、是否压缩原始JSON文档、是否需要分词处理、如何进行分词处理等;
关系型数据库中的增insert、删delete、改update、查search操作等价于ES中的增PUT/POST、删Delete、改_update、查GET.

2.3 执行权限问题

报错提示

[usernimei@testes01 bin]$ Exception in thread "main" org.elasticsearch.bootstrap.BootstrapException: java.nio.file.AccessDeniedException: /data/elasticsearch/elasticsearch-7.4./config/elasticsearch.keystore

Likely root cause: java.nio.file.AccessDeniedException: /data/elasticsearch/elasticsearch-7.4./config/elasticsearch.keystore

    at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:)

    at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:)

    at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:)

    at java.base/sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:)

    at java.base/java.nio.file.Files.newByteChannel(Files.java:)

    at java.base/java.nio.file.Files.newByteChannel(Files.java:)

    at org.apache.lucene.store.SimpleFSDirectory.openInput(SimpleFSDirectory.java:)

    at org.elasticsearch.common.settings.KeyStoreWrapper.load(KeyStoreWrapper.java:)

    at org.elasticsearch.bootstrap.Bootstrap.loadSecureSettings(Bootstrap.java:)

    at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:)

    at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:)

    at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:)

    at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:)

    at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:)

    at org.elasticsearch.cli.Command.main(Command.java:)

    at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:)

    at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:)

Refer to the log for complete error details

问题分析

第一次误用了root账号启动，此时路径下的elasticsearch.keystore 权限属于了root

-rw-rw----  root      root         Mar  : elasticsearch.keystore

解决方案--切换到root用户修改文件elasticsearch.keystore权限

调整到es用户下，即

chown -R es用户:es用户组 elasticsearch.keystore

问题2.4 maximum shards open 问题

根据官方解释，从Elasticsearch v7.0.0 开始，集群中的每个节点默认限制 1000 个shard，如果你的es集群有3个数据节点，那么最多 3000 shards。这里我们是只有一台es。所以只有1000。

[--11T11::,][WARN ][logstash.outputs.elasticsearch][main] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://qqelastic:xxxxxx@155.155.155.155:55944/][Manticore::SocketTimeout] Read timed out {:url=>http://qqelastic:xxxxxx@155.155.155.155:55944/, :error_message=>"Elasticsearch Unreachable: [http://qqelastic:xxxxxx@155.155.155.155:55944/][Manticore::SocketTimeout] Read timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}

[--11T11::,][ERROR][logstash.outputs.elasticsearch][main] Attempted to send a bulk request to elasticsearch' but Elasticsearch appears to be unreachable or down! {:error_message=>"Elasticsearch Unreachable: [http://qqelastic:xxxxxx@155.155.155.155:55944/][Manticore::SocketTimeout] Read timed out", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError", :will_retry_in_seconds=>2}

[--11T11::,][WARN ][logstash.outputs.elasticsearch][main] Restored connection to ES instance {:url=>"http://qqelastic:xxxxxx@155.155.155.155:55944/"}

[--11T11::,][WARN ][logstash.outputs.elasticsearch][main] Could not index event to Elasticsearch. {:status=>, :action=>["index", {:_id=>nil, :_index=>"mysql-error-testqq-2019.05.11", :routing=>nil, :_type=>"_doc"}, #<LogStash::Event:0x65416fce>], :response=>{"index"=>{"_index"=>"mysql-error-qqweixin-2020.05.11", "_type"=>"_doc", "_id"=>nil, "status"=>, "error"=>{"type"=>"validation_exception", "reason"=>"Validation Failed: 1: this action would add [2] total shards, but this cluster currently has [1000]/[1000] maximum shards open;"}}}}

可以用Kibana来设置

主要命令：

PUT /_cluster/settings

{

  "transient": {

    "cluster": {

      "max_shards_per_node":

    }

  }

}

操作截图如下：

注意事项：

建议设置后重启下lostash服务

第三部分 Filebeat

问题3.1 不读取log文件中的数据

--23T19::41.772+    INFO    [monitoring]    log/log.go:    Non-zero metrics in the last 30s   
 {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":,"time":{"ms":}},"total":{"ticks":,"time":{"ms":},"value":},"user":{"ticks":,"time":{"ms":}}},"handles":{"limit":{"hard":,"soft":},"open":},"info":{"ephemeral_id":"a4c61321-ad02-2c64-9624-49fe4356a4e9","uptime":{"ms":}},"memstats":{"gc_next":,"memory_alloc":,"memory_total":},"runtime":{"goroutines":}},"filebeat":{"harvester":{"open_files":,"running":}},"libbeat":{"config":{"module":{"running":}},"pipeline":{"clients":,"events":{"active":}}},"registrar":{"states":{"current":}},"system":{"load":{"":,"":0.05,"":0.01,"norm":{"":,"":0.0125,"":0.0025}}}}}}

修改 filebeat.yml 的配置参数

问题3.2 多个服务进程

--27T20::22.985+    ERROR    logstash/async.go:    Failed to publish events caused by: write tcp [::]:->[::]:: write: connection reset by peer

--27T20::23.985+    INFO    [monitoring]    log/log.go:    Non-zero metrics in the last 30s    {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":,"time":{"ms":}},"total":{"ticks":,"time":{"ms":},"value":},"user":{"ticks":,"time":{"ms":}}},"handles":{"limit":{"hard":,"soft":},"open":},"info":{"ephemeral_id":"a02ed909-a7a0-49ee-aff9-5fdab26ecf70","uptime":{"ms":}},"memstats":{"gc_next":,"memory_alloc":,"memory_total":,"rss":},"runtime":{"goroutines":}},"filebeat":{"events":{"active":,"added":},"harvester":{"open_files":,"running":}},"libbeat":{"config":{"module":{"running":}},"output":{"events":{"batches":,"failed":,"total":},"write":{"errors":}},"pipeline":{"clients":,"events":{"active":,"published":,"total":}}},"registrar":{"states":{"current":}},"system":{"load":{"":0.05,"":0.11,"":0.06,"norm":{"":0.0063,"":0.0138,"":0.0075}}}}}}

--27T20::24.575+    ERROR    pipeline/output.go:    Failed to publish events: write tcp [::]:->[::]:: write: connection reset by peer

原因是同时有多个logstash进程在运行，关闭重启

问题3.3 将Filebeat 配置成服务进行管理

filebeat 服务所在路径：

/etc/systemd/system

编辑filebeat.service文件

[Unit]

Description=filebeat.service

[Service]

User=root

ExecStart=/data/filebeat/filebeat-7.4.-linux-x86_64/filebeat -e -c /data/filebeat/filebeat-7.4.-linux-x86_64/filebeat.yml

[Install]

WantedBy=multi-user.target

管理服务的相关命令

systemctl start filebeat              #启动filebeat服务

systemctl enable filebeat             #设置开机自启动

systemctl disable filebeat            #停止开机自启动

systemctl status filebeat             #查看服务当前状态

systemctl restart filebeat　          #重新启动服务

systemctl list-units --type=service        #查看所有已启动的服务

问题3.4 Filebeat 服务启动报错

注意错误

Exiting: error loading config file: yaml: line : did not find expected key

主要问题是：filebeat.yml 文件中的格式有破坏，应特别注意修改和新增的地方，对照前后文，验证格式是否有变化。

问题 3.5 Linux 版本过低，无法以systemctl管理filebeat服务

此时我们可以以service来管理，在目录init.d下创建一个filebeat.service文件。主要脚本如下：

#!/bin/bash

agent="/data/filebeat/filebeat-7.4.2-linux-x86_64/filebeat"

args="-e -c /data/filebeat/filebeat-7.4.2-linux-x86_64/filebeat.yml"

start() {

    pid=`ps -ef |grep /data/filebeat/filebeat-7.4.-linux-x86_64/filebeat |grep -v grep |awk '{print $2}'`

    if [ ! "$pid" ];then

        echo "Starting filebeat: "

       nohup  $agent $args >/dev/null >& &

        if [ $? == '' ];then

            echo "start filebeat ok"

        else

            echo "start filebeat failed"

        fi

    else

        echo "filebeat is still running!"

        exit

    fi

}

stop() {

    echo -n $"Stopping filebeat: "

    pid=`ps -ef |grep /data/filebeat/filebeat-7.4.-linux-x86_64/filebeat |grep -v grep |awk '{print $2}'`

    if [ ! "$pid" ];then

echo "filebeat is not running"

    else

        kill $pid

echo "stop filebeat ok"

    fi

}

restart()

 {

    stop

    start

}

status(){

    pid=`ps -ef |grep /data/filebeat/filebeat-7.4.-linux-x86_64/filebeat |grep -v grep |awk '{print $2}'`

    if [ ! "$pid" ];then

        echo "filebeat is not running"

    else

        echo "filebeat is running"

    fi

}

case "$1" in

    start)

        start

    ;;

    stop)

        stop

    ;;

    restart)

        restart

    ;;

    status)

        status

    ;;

    *)

        echo $"Usage: $0 {start|stop|restart|status}"

        exit

esac

注意事项

1.文件授予执行权限

chmod  filebeat.service

2.设置开机自启动

chkconfig --add filebeat.service

上面的服务添加自启动时，会报错

解决方案在 service file的开头添加以下两行

即修改完善后的代码如下：

#!/bin/bash

# chkconfig:   2345 10 80

# description:  filebeat is a tool for colletct log data

agent="/data/filebeat/filebeat-7.4.2-linux-x86_64/filebeat"

args="-e -c /data/filebeat/filebeat-7.4.2-linux-x86_64/filebeat.yml"

start() {

    pid=`ps -ef |grep /data/filebeat/filebeat-7.4.-linux-x86_64/filebeat |grep -v grep |awk '{print $2}'`

    if [ ! "$pid" ];then

        echo "Starting filebeat: "

       nohup  $agent $args >/dev/null?>& &

        if [ $? == '' ];then

            echo "start filebeat ok"

        else

            echo "start filebeat failed"

        fi

    else

        echo "filebeat is still running!"

        exit

    fi

}

stop() {

    echo -n $"Stopping filebeat: "

    pid=`ps -ef |grep /data/filebeat/filebeat-7.4.-linux-x86_64/filebeat |grep -v grep |awk '{print $2}'`

    if [ ! "$pid" ];then

echo "filebeat is not running"

    else

        kill $pid

echo "stop filebeat ok"

    fi

}

restart()

 {

    stop

    start

}

status(){

    pid=`ps -ef |grep /data/filebeat/filebeat-7.4.-linux-x86_64/filebeat |grep -v grep |awk '{print $2}'`

    if [ ! "$pid" ];then

        echo "filebeat is not running"

    else

        echo "filebeat is running"

    fi

}

case "$1" in

    start)

        start

    ;;

    stop)

        stop

    ;;

    restart)

        restart

    ;;

    status)

        status

    ;;

    *)

        echo $"Usage: $0 {start|stop|restart|status}"

        exit

esac

第四部分 Logstash

问题 4.1 服务化配置

logstash最常见的运行方式即命令行运行./bin/logstash -f logstash.conf启动，结束命令是ctrl+c。这种方式的优点在于运行方便，缺点是不便于管理，同时如果遇到服务器重启，则维护成本会更高一些，如果在生产环境运行logstash推荐使用服务的方式。以服务的方式启动logstash，同时借助systemctl的特性实现开机自启动。

（1）安装目录下的config中的startup.options需要修改
修改主要项:
1.服务默认启动用户和用户组为logstash；可以修改为root；
2. LS_HOME 参数设置为 logstash的安装目录；例如：/data/logstash/logstash-7.6.0
3. LS_SETTINGS_DIR参数配置为含有logstash.yml的目录；例如：/data/logstash/logstash-7.6.0/config
4. LS_OPTS 参数项，添加 logstash.conf 指定项（-f参数）；例如：LS_OPTS="--path.settings ${LS_SETTINGS_DIR} -f /data/logstash/logstash-7.6.0/config/logstash.conf"

（2）以root身份执行logstash命令创建服务

创建服务的命令

安装目录/bin/system-install

执行创建命令后，在/etc/systemd/system/目录中生成了logstash.service 文件

（3）logstash 服务的管理

设置服务自启动：systemctl enable logstash
启动服务：systemctl start logstash
停止服务：systemctl stop logstash
重启服务：systemctl restart logstash
查看服务状态：systemctl status logstash

问题 4.2 安装logstash服务需先安装jdk

报错提示如下：

通过查看jave版本，验证是否已安装

上图说明没有安装。则将安装包下载（或上传）至本地，执行安装

执行安装命令如下：

yum localinstall jdk-8u211-linux-x64.rpm

安装OK，执行验证

问题 4.3 Linux 版本过低，安装 logstash 服务失效

问题提示

查看Linux系统版本

原因： centos 6.5 不支持 systemctl 管理服务

解决方案

方案验证

相关命令

.启动命令

initctl start logstash

.查看状态

initctl status logstash

注意事项：

注意以下生成服务的命令还是要执行的

./system-install

否则提示错误

initctl: Unknown job: logstash

问题 4.4 配置文件中定义的index name 命名需小写

"Invalid index name [mysql-error-Test-2019.05.13], must be lowercase", "index_uuid"=>"_na_", "index"=>"mysql-error-Test-2019.05.13"}}}}

May  :: hzvm1996 logstash[]: [--13T13::,][ERROR][logstash.outputs.elasticsearch][main] Could not index event to Elasticsearch. {:status=>, :action=>["index", {:_id=>nil, :_index=>"mysql-slow-Test-2020.05.13", :routing=>nil, :_type=>"_doc"}, #<LogStash::Event:0x1f0aedbc>], :response=>{"index"=>{"_index"=>"mysql-slow-Test-2019.05.13", "_type"=>"_doc", "_id"=>nil, "status"=>, "error"=>{"type"=>"invalid_index_name_exception", "reason"=>"Invalid index name [mysql-slow-Test-2019.05.13], must be lowercase", "index_uuid"=>"_na_", "index"=>"mysql-slow-Test-2019.05.13"}}}}

May  :: hzvm1996 logstash[]: [--13T13::,][ERROR][logstash.outputs.elasticsearch][main] Could not index event to Elasticsearch. {:status=>, :action=>["index", {:_id=>nil, :_index=>"mysql-error-Test-2020.05.13", :routing=>nil, :_type=>"_doc"}, #<LogStash::Event:0x4bdce1db>], :response=>{"index"=>{"_index"=>"mysql-error-Test-2019.05.13", "_type"=>"_doc", "_id"=>nil, "status"=>, "error"=>{"type"=>"invalid_index_name_exception", "reason"=>"Invalid index name [mysql-error-Test-2019.05.13], must be lowercase", "index_uuid"=>"_na_", "index"=>"mysql-error-Test-2019.05.13"}}}}

第五部分 kibana

问题5.1 开启密码认证

[root@testkibaba bin]# ./kibana-plugin install x-pack

Plugin installation was unsuccessful due to error "Kibana now contains X-Pack by default, there is no longer any need to install it as it is already present.

说明：新版本的Elasticsearch和Kibana都已经支持自带支持x-pack了，不需要进行显式安装。老版本的需要进行安装。

问题5.2 应用启动报错

[root@testkibana bin]# ./kibana

报错

Kibana should not be run as root.  Use --allow-root to continue.

添加个专门的账号

useradd qqweixinkibaba --添加账号

chown -R qqweixinkibaba:hzdbakibaba kibana-7.4.-linux-x86_64 --为新增账号赋予文档目录的权限

su qqweixinkibaba ---切换账号，让后再启动

问题5.3 登入kibana报错

{"statusCode":,"error":"Forbidden","message":"Forbidden"}

报错原因是：用kibana账号登录kibana报错，改为elastic用户就行了

问题5.4 多租户实现的问题

一个公司会有多个业务线，也可能会有多个研发小组，那么如何实现收集到的数据只对相应的团队开放呢？即实现只能看到自家的数据。一种思路就是搭建多个ELK，一个业务线一个ELK，但这个方法会导致资源浪费和增加运维工作量；另一种思路就是通过多租户来实现。

实现时，应注意以下问题：

要在 elastic 账号下，转到指定的空间（space）下，再设置 index pattern 。

先创建role（注意与space关联），最后创建user。

参考资料

1.https://www.jianshu.com/p/0a5acf831409 《ELK应用之Filebeat》

2.http://www.voidcn.com/article/p-nlietamt-zh.html 《filebeat 启动脚本》

3.https://www.bilibili.com/video/av68523257/?redirectFrom=h5 《ElasticTalk #22 Kibana 多租户介绍与实战》

4.https://www.cnblogs.com/shengyang17/p/10597841.html 《ES集群》

5.https://www.jianshu.com/p/54cdddf89989 《Logstash配置以服务方式运行》

6.https://www.elastic.co/guide/en/logstash/current/running-logstash.html#running-logstash-upstart 《Running Logstash as a Service on Debian or RPM》

巴特西

基于ELK搭建MySQL日志平台的要点和常见错误

第一部分概括

第二部分 elasticsearch

2.1 ES特点和优势

2.2 ES主要概念

2.3 执行权限问题

问题2.4 maximum shards open 问题

第三部分 Filebeat

问题3.1 不读取log文件中的数据

问题3.2 多个服务进程

问题3.3 将Filebeat 配置成服务进行管理

问题3.4 Filebeat 服务启动报错

问题 3.5 Linux 版本过低，无法以systemctl管理filebeat服务

第四部分 Logstash

问题 4.1 服务化配置

问题 4.2 安装logstash服务需先安装jdk

问题 4.3 Linux 版本过低，安装 logstash 服务失效

问题 4.4 配置文件中定义的index name 命名需小写

第五部分 kibana

问题5.1 开启密码认证

问题5.2 应用启动报错

问题5.3 登入kibana报错

问题5.4 多租户实现的问题

参考资料

最新文章

热门文章

巴特西

基于ELK搭建MySQL日志平台的要点和常见错误

第一部分 概括

第二部分 elasticsearch

2.1 ES特点和优势

2.2 ES主要概念

2.3 执行权限问题

问题2.4 maximum shards open 问题

第三部分 Filebeat

问题3.1 不读取log文件中的数据

问题3.2 多个服务进程

问题3.3 将Filebeat 配置成服务进行管理

问题3.4 Filebeat 服务启动报错

问题 3.5 Linux 版本过低，无法以systemctl管理filebeat服务

第四部分 Logstash

问题 4.1 服务化配置

问题 4.2 安装logstash服务需先安装jdk

问题 4.3 Linux 版本过低，安装 logstash 服务失效

问题 4.4 配置文件中定义的index name 命名需小写

第五部分 kibana

问题5.1 开启密码认证

问题5.2 应用启动报错

问题5.3 登入kibana报错

问题5.4 多租户实现的问题

参考资料

最新文章

热门文章

第一部分概括