ES三节点重启后报错no known master node
2024-09-01 12:15:35
问题
一直在研究ES的监控怎么做,想偷点懒,不去通过API获取然后计算,就想找个现成的插件或者监控软件,只要装个agent就可以,然后就找到了x-pack,插件装好了之后,需要重启ES集群,线上的ES集群我想着既然是集群一台一台重启应该不会有问题的,太高估了,重启一台后,整个集群挂了......
操作过程
1、系统
[centos@ip---- bin]$ cat /etc/redhat-release
CentOS Linux release 7.6. (Core)
2、ES版本
[centos@ip---- bin]$ ./elasticsearch --version
Version: 5.0., Build: f6b4951/--24T10::.101Z, JVM: 1.8.0_131
3、杀进程
ps -ef | grep pid
kill - pid
这样操作完就后悔了,不是每个服务都是这么杀的,不知道这步操作对集群挂了有没有一定的影响。
4、报错信息
[--17T08::,][INFO ][o.e.p.PluginsService ] [node-] loaded module [lang-painless]
[--17T08::,][INFO ][o.e.p.PluginsService ] [node-] loaded module [percolator]
[--17T08::,][INFO ][o.e.p.PluginsService ] [node-] loaded module [reindex]
[--17T08::,][INFO ][o.e.p.PluginsService ] [node-] loaded module [transport-netty3]
[--17T08::,][INFO ][o.e.p.PluginsService ] [node-] loaded module [transport-netty4]
[--17T08::,][INFO ][o.e.p.PluginsService ] [node-] no plugins loaded
[--17T08::,][INFO ][o.e.n.Node ] [node-] initialized
[--17T08::,][INFO ][o.e.n.Node ] [node-] starting ...
[--17T08::,][INFO ][o.e.t.TransportService ] [node-] publish_address {172.0.0.16:}, bound_addresses {172.30.36.146:}
[--17T08::,][INFO ][o.e.b.BootstrapCheck ] [node-] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks [--17T08::,][WARN ][o.e.n.Node ] [node-] timed out while waiting for initial discovery state - timeout: 30s
[--17T08::,][INFO ][o.e.h.HttpServer ] [node-] publish_address {172.0.0.16:}, bound_addresses {172.30.36.146:}
[--17T08::,][INFO ][o.e.n.Node ] [node-] started
[--17T08::,][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-] no known master node, scheduling a retry
[--17T08::,][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-] no known master node, scheduling a retry
[--17T08::,][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-] no known master node, scheduling a retry
[--17T08::,][DEBUG][o.e.a.a.c.s.TransportClusterStateAction] [node-] no known master node, scheduling a retry
[--17T08::,][DEBUG][o.e.a.a.c.s.TransportClusterStateAction] [node-] no known master node, scheduling a retry
[--17T08::,][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-] no known master node, scheduling a retry
[--17T08::,][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-] no known master node, scheduling a retry
[--17T08::,][DEBUG][o.e.a.a.c.s.TransportClusterStateAction] [node-] no known master node, scheduling a retry
[--17T08::,][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-] no known master node, scheduling a retry
[--17T08::,][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-] no known master node, scheduling a retry
[--17T08::,][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-] no known master node, scheduling a retry
[--17T08::,][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-] no known master node, scheduling a retry
[--17T08::,][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-] no known master node, scheduling a retry
[--17T08::,][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-] no known master node, scheduling a retry
[--17T08::,][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-] no known master node, scheduling a retry
[--17T08::,][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-] no known master node, scheduling a retry
[--17T08::,][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-] no known master node, scheduling a retry
[--17T08::,][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-] no known master node, scheduling a retry
[--17T08::,][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-] no known master node, scheduling a retry
[--17T08::,][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-] no known master node, scheduling a retry
[--17T08::,][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-] no known master node, scheduling a retry
[--17T08::,][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-] no known master node, scheduling a retry
[--17T08::,][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-] no known master node, scheduling a retry
[--17T08::,][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-] no known master node, scheduling a retry
[--17T08::,][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-] no known master node, scheduling a retry
[--17T08::,][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-] no known master node, scheduling a retry
[--17T08::,][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-] no known master node, scheduling a retry
[--17T08::,][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-] no known master node, scheduling a retry
[--17T08::,][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-] no known master node, scheduling a retry
[--17T08::,][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-] no known master node, scheduling a retry
[--17T08::,][DEBUG][o.e.a.a.c.s.TransportClusterStateAction] [node-] timed out while retrying [cluster:monitor/state] after failure (timeout [30s])
[--17T08::,][DEBUG][o.e.a.a.c.s.TransportClusterStateAction] [node-] timed out while retrying [cluster:monitor/state] after failure (timeout [30s])
[--17T08::,][WARN ][r.suppressed ] path: /_cluster/state/metadata, params: {metric=metadata}
org.elasticsearch.discovery.MasterNotDiscoveredException
at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$.onTimeout(TransportMasterNodeAction.java:) [elasticsearch-5.0..jar:5.0.]
at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:) [elasticsearch-5.0..jar:5.0.]
at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:) [elasticsearch-5.0..jar:5.0.]
at org.elasticsearch.cluster.service.ClusterService$NotifyTimeout.run(ClusterService.java:) [elasticsearch-5.0..jar:5.0.]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:) [elasticsearch-5.0..jar:5.0.]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:) [?:1.8.0_151]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:) [?:1.8.0_151]
at java.lang.Thread.run(Thread.java:) [?:1.8.0_151]
[--17T08::,][WARN ][r.suppressed ] path: /_cluster/state/metadata, params: {metric=metadata}
org.elasticsearch.discovery.MasterNotDiscoveredException
at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$.onTimeout(TransportMasterNodeAction.java:) [elasticsearch-5.0..jar:5.0.]
at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:) [elasticsearch-5.0..jar:5.0.]
at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:) [elasticsearch-5.0..jar:5.0.]
at org.elasticsearch.cluster.service.ClusterService$NotifyTimeout.run(ClusterService.java:) [elasticsearch-5.0..jar:5.0.]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:) [elasticsearch-5.0..jar:5.0.]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:) [?:1.8.0_151]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:) [?:1.8.0_151]
at java.lang.Thread.run(Thread.java:) [?:1.8.0_151]
[--17T08::,][DEBUG][o.e.a.a.c.s.TransportClusterStateAction] [node-] timed out while retrying [cluster:monitor/state] after failure (timeout [30s])
[--17T08::,][WARN ][r.suppressed ] path: /_cluster/state/metadata, params: {metric=metadata}
org.elasticsearch.discovery.MasterNotDiscoveredException
at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$.onTimeout(TransportMasterNodeAction.java:) [elasticsearch-5.0..jar:5.0.]
at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:) [elasticsearch-5.0..jar:5.0.]
at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:) [elasticsearch-5.0..jar:5.0.]
at org.elasticsearch.cluster.service.ClusterService$NotifyTimeout.run(ClusterService.java:) [elasticsearch-5.0..jar:5.0.]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:) [elasticsearch-5.0..jar:5.0.]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:) [?:1.8.0_151]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:) [?:1.8.0_151]
at java.lang.Thread.run(Thread.java:) [?:1.8.0_151]
5、配置文件
cluster.name: lile
node.name: node-
bootstrap.memory_lock: true
network.host: 172.0.0.16
http.port:
discovery.zen.ping.unicast.hosts: ["172.0.0.16","172.0.0.17","172.0.0.18"]
discovery.zen.minimum_master_nodes:
http.cors.enabled: true
http.cors.allow-origin: "*"
path.data: /data/elasticsearch/data
path.logs: /data/elasticsearch/logs
三、解决办法
各种重启都没有,在网上查到的,都是重启就好了,但是使劲的重启也没好。但是当discovery.zen.minimum_master_nodes这个值设置为1的时候,可以启动成功,但是三台都成了master了。后来看到有个这个参数,加上然后全部重启就好了。
discovery.zen.ping_timeout: 60s
四、分析原因
还没细究,感觉是集群互相查找的时间太短了,没有找到对方,因为得2台才能形成集群
最新文章
- 最短路(代码来源于kuangbin和百度)
- Union和Union All到底有什么区别
- [译]git clone
- No mapping found for HTTP request with URI [] in DispatcherServlet with name 'appServlet'
- ios自定义View自动布局时计算大小
- sqlserver 2008r2 表分区拆分问题
- Sublime Text 3配置与vim模式(待完整)
- UIAlertViewController+TextField 输入框
- C#调用C++编写的DLL函数, 以及各种类型的参数传递 z
- 入门 ASP.NET Web API 2 (C#)
- poptest老李谈分布式与集群 2
- Java字符串与数组
- cuteftp9破解及安装、使用
- size_t和unsigned int区别
- 单细胞RNA-seq比对定量用什么工具好?使用哪个版本的基因组?数据来说话
- Codeforces 1110 简要题解
- 在eclipse中编译调试ns3
- Triangle leetcode java
- 【基于EF Core的Code First模式的DotNetCore快速开发框架】完成对DB First代码生成的支持
- 有了malloc/free 为什么还要new/delete ?