
test=# select version();
KingbaseES V008R006C005B0041 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit
(1 row)


[kingbase@node101 bin]$ cat /etc/hosts localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 node101 #原主库 node102 #原备库


ID | Name    | Role    | Status    | Upstream | repmgrd | PID   | Paused? | Upstream last seen
1 | node101 | primary | * running | | running | 11180 | no | n/a
2 | node102 | standby | running | node101 | running | 9242 | no | 0 second(s) ago



[kingbase@node101 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
1 | node101 | primary | * running | | default | 100 | 1 | host= user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2 | node102 | standby | running | node101 | default | 100 | 1 | host= user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3




[root@node101 ~]# ifconfig enp0s3 down


Mar 29 16:39:47 node101 avahi-daemon[782]: Interface enp0s3.IPv4 no longer relevant for mDNS.
Mar 29 16:39:47 node101 avahi-daemon[782]: Leaving mDNS multicast group on interface enp0s3.IPv4 with address
Mar 29 16:39:47 node101 avahi-daemon[782]: Withdrawing address record for fe80::a00:27ff:fe73:47f6 on enp0s3.
Mar 29 16:39:47 node101 avahi-daemon[782]: Withdrawing address record for on enp0s3.


=== 从以下日志看,可以分成两部分===

[2022-03-29 16:36:52] [INFO] node "node102" (ID: 2) monitoring upstream node "node101" (ID: 1) in normal state
[2022-03-29 16:39:58] [WARNING] unable to ping "host= user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3"
[2022-03-29 16:39:58] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2022-03-29 16:39:58] [WARNING] unable to connect to upstream node "node101" (ID: 1)
[2022-03-29 16:39:58] [INFO] sleeping 6 seconds until next reconnection attempt
[2022-03-29 16:40:04] [INFO] checking state of node 1, 1 of 10 attempts
[2022-03-29 16:41:52] [INFO] checking state of node 1, 10 of 10 attempts
[2022-03-29 16:41:53] [WARNING] unable to ping "user=system connect_timeout=10 dbname=esrep host= port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 fallback_application_name=repmgr"
[2022-03-29 16:41:53] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2022-03-29 16:41:53] [WARNING] unable to reconnect to node 1 after 10 attempts
[2022-03-29 16:41:53] [NOTICE] setting "wal_retrieve_retry_interval" to 86405000 milliseconds
[2022-03-29 16:41:53] [WARNING] wal receiver not running
[2022-03-29 16:41:53] [NOTICE] WAL receiver disconnected on all sibling nodes
[2022-03-29 16:41:53] [INFO] WAL receiver disconnected on all 0 sibling nodes
[2022-03-29 16:41:53] [INFO] 0 active sibling nodes registered
[2022-03-29 16:41:53] [INFO] primary and this node have the same location ("default")
[2022-03-29 16:41:53] [INFO] no other sibling nodes - we win by default
[2022-03-29 16:41:53] [NOTICE] setting "wal_retrieve_retry_interval" to 5000 ms
[2022-03-29 16:41:53] [NOTICE] this node is the only available candidate and will now promote itself
[2022-03-29 16:41:53] [INFO] try to ping the trusted_servers "" before execute promote_command
[2022-03-29 16:41:55] [NOTICE] PING ( 56(84) bytes of data. --- ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.204/0.237/0.296/0.045 ms [2022-03-29 16:41:55] [NOTICE] successfully ping one or more of the trusted_servers ""
[2022-03-29 16:41:55] [INFO] promote_command is:
"/home/kingbase/cluster/R6HA/kha/kingbase/bin/repmgr standby promote -f /home/kingbase/cluster/R6HA/kha/kingbase/etc/repmgr.conf"
NOTICE: promoting standby to primary
DETAIL: promoting server "node102" (ID: 2) using sys_promote()
NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
DETAIL: server "node102" (ID: 2) was successfully promoted to primary
[2022-03-29 16:41:57] [INFO] switching to primary monitoring mode
[2022-03-29 16:41:57] [NOTICE] monitoring cluster primary "node102" (ID: 2)
[2022-03-29 16:41:57] [INFO] create a thread 0x7f9ae9028700 to check the cluster status
[2022-03-29 16:41:57] [INFO] child node: 1; attached: no
[2022-03-29 16:41:57] [INFO] check node status again, try 1 / 10 times
[2022-03-29 16:42:15] [INFO] check node status again, try 10 / 10 times
[2022-03-29 16:42:17] [WARNING] [thread 0x7f9ae9028700] unable to connect via ES to host ""
[2022-03-29 16:42:17] [INFO] child node: 1; attached: no
[2022-03-29 16:42:17] [INFO] found node down, recovery will be triggered after recovery delay time 20s
[2022-03-29 16:42:19] [INFO] child node: 1; attached: no
[2022-03-29 16:42:27] [INFO] child node: 1; attached: no
[2022-03-29 16:42:29] [WARNING] [thread 0x7f9ae9028700] unable to connect via ES to host ""
[2022-03-29 16:42:29] [INFO] child node: 1; attached: no
[2022-03-29 16:42:37] [INFO] child node: 1; attached: no
[2022-03-29 16:42:37] [INFO] recovery delay time reached. can do recovery now.
[2022-03-29 16:42:37] [INFO] [thread pid:2995] do_nodes_recovery thread begin. The pthread_t tid is 0x7f9ae9829700
[2022-03-29 16:42:37] [NOTICE] [thread pid:2995] node (ID: 1; host: "") is not attached, ready to auto-recovery
[2022-03-29 16:42:41] [NOTICE] [thread pid:2995] Now, the primary host ip:
[2022-03-29 16:42:47] [WARNING] unable to connect to remote host "" via ES
[2022-03-29 16:42:47] [NOTICE] [thread pid:2995] node "node101" (ID: 1) auto-recovery failed: unable to connect via ES to host "", user "", do nothing
[2022-03-29 16:42:47] [WARNING] [thread pid:2995] node (ID: 1) auto-recovery failed: unable to connect via ES to host "", user "", do nothing
[2022-03-29 16:42:47] [INFO] [thread pid:2995] Is standby node "node101" (ID: 1) ready for connection?
[2022-03-29 16:42:53] [ERROR] [thread pid:2995] standby node "node101" (ID: 1) connected ... FAILED
[2022-03-29 16:42:53] [DETAIL] [thread pid:2995] could not connect to server: No route to host
Is the server running on host "" and accepting
TCP/IP connections on port 54321? ......
[2022-03-29 16:44:05] [ERROR] [thread pid:3259] standby node "node101" (ID: 1) connected ... FAILED
[2022-03-29 16:44:05] [DETAIL] [thread pid:3259] could not connect to server: No route to host
Is the server running on host "" and accepting
TCP/IP connections on port 54321? [2022-03-29 16:44:05] [INFO] [thread pid:3259] do_nodes_recovery thread ends. The pthread_t tid is 0x7f9ae9829700
[2022-03-29 16:44:05] [WARNING] [thread 0x7f9ae9028700] unable to connect via ES to host ""
[2022-03-29 16:44:06] [INFO] thread tid:0x7f9ae9829700 is not running
[2022-03-29 16:44:06] [INFO] the recovery thread was exited, reset tid
[2022-03-29 16:44:06] [INFO] child node: 1; attached: no
[2022-03-29 16:44:06] [INFO] found node down, recovery will be triggered after recovery delay time 20s
[2022-03-29 16:44:07] [NOTICE] [thread 0x7f9ae9028700] the TimeLineID (1) of node (ID: 1) is smaller than the TimeLineID (2) of local node (ID: 2)
[2022-03-29 16:44:07] [NOTICE] [thread 0x7f9ae9028700] try to stop primary db on node (ID: 1, host: "")
[2022-03-29 16:44:06] [INFO] stop database ...
[2022-03-29 16:44:07] [INFO] stop db done.
[2022-03-29 16:44:08] [NOTICE] [thread 0x7f9ae9028700] success to stop primary db on node (ID: 1, host: "")
[2022-03-29 16:44:08] [INFO] child node: 1; attached: no
[2022-03-29 16:44:10] [INFO] child node: 1; attached: no
[2022-03-29 16:44:11] [INFO] node (ID: 1): no server running
[2022-03-29 16:44:11] [INFO] [thread 0x7f9ae9028700] the cluster has no other running primary node, exit
[2022-03-29 16:44:12] [INFO] child node: 1; attached: no
[2022-03-29 16:44:26] [INFO] child node: 1; attached: no
[2022-03-29 16:44:26] [INFO] recovery delay time reached. can do recovery now.
[2022-03-29 16:44:26] [INFO] [thread pid:3662] do_nodes_recovery thread begin. The pthread_t tid is 0x7f9ae9028700
[2022-03-29 16:44:26] [NOTICE] [thread pid:3662] node (ID: 1; host: "") is not attached, ready to auto-recovery
[2022-03-29 16:44:26] [NOTICE] [thread pid:3662] Now, the primary host ip:
[2022-03-29 16:44:27] [INFO] [thread pid:3662] ES connection to host "" succeeded, ready to do auto-recovery
[2022-03-29 16:44:27] [INFO] unlink file /tmp/.s.KINGBASE.54321.lock ### 执行“repmgr standby rejoin --force-rewind”,对主库执行recovery。 [2022-03-29 16:44:27] [NOTICE] executing repmgr command "/home/kingbase/cluster/R6HA/kha/kingbase/bin/repmgr --dbname="host= dbname=esrep user=system port=54321" node rejoin --force-rewind"
NOTICE: sys_rewind execution required for this node to attach to rejoin target node 2
DETAIL: rejoin target server's timeline 2 forked off current database system timeline 1 before current recovery point 0/7000028
NOTICE: executing sys_rewind
DETAIL: sys_rewind command is "/home/kingbase/cluster/R6HA/kha/kingbase/bin/sys_rewind -D '/home/kingbase/cluster/R6HA/kha/kingbase/data' --source-server='host= user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3'"
sys_rewind: servers diverged at WAL location 0/6001210 on timeline 1
sys_rewind: rewinding from last common checkpoint at 0/5000498 on timeline 1
sys_rewind: find last common checkpoint start time from 2022-03-29 16:44:27.204902 CST to 2022-03-29 16:44:27.308161 CST, in "0.103259" seconds.
sys_rewind: update the control file: minRecoveryPoint is '0/600FE20', minRecoveryPointTLI is '2', and database state is 'in archive recovery'
sys_rewind: we will remove the dir '/home/kingbase/cluster/R6HA/kha/kingbase/data/sys_replslot/repmgr_slot_2.rewind' and all the file/dir in it.
sys_rewind: rewind start wal location 0/5000468 (file 000000010000000000000005), end wal location 0/600FE20 (file 000000020000000000000006). time from 2022-03-29 16:44:27.204902 CST to 2022-03-29 16:44:28.615051 CST, in "1.410149" seconds.
sys_rewind: Done!
NOTICE: 0 files copied to /home/kingbase/cluster/R6HA/kha/kingbase/data
NOTICE: setting node 1's upstream to node 2
WARNING: unable to ping "host= user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3"
NOTICE: begin to start server at 2022-03-29 16:44:28.628597
NOTICE: starting server using "/home/kingbase/cluster/R6HA/kha/kingbase/bin/sys_ctl -w -t 90 -D '/home/kingbase/cluster/R6HA/kha/kingbase/data' -l /home/kingbase/cluster/R6HA/kha/kingbase/bin/logfile start"
NOTICE: start server finish at 2022-03-29 16:44:29.034402
DETAIL: node 1 is now attached to node 2
[2022-03-29 16:44:29] [NOTICE] kbha: node (ID: 1) rejoin success. [2022-03-29 16:44:29] [NOTICE] [thread pid:3662] node "node101" (ID: 1) auto-recovery success
[2022-03-29 16:44:29] [INFO] [thread pid:3662] Is standby node "node101" (ID: 1) ready for connection?
[2022-03-29 16:44:29] [INFO] [thread pid:3662] the standby node "node101" (ID: 1) connected ... OK
[2022-03-29 16:44:29] [INFO] [thread pid:3662] do_nodes_recovery thread ends. The pthread_t tid is 0x7f9ae9028700
[2022-03-29 16:44:30] [INFO] SET synchronous TO "quorum" on primary host
[2022-03-29 16:44:30] [INFO] thread tid:0x7f9ae9028700 is not running
[2022-03-29 16:44:30] [INFO] the recovery thread was exited, reset tid
[2022-03-29 16:44:30] [NOTICE] Some nodes reconnect, all standby nodes are OK now
[2022-03-29 16:44:34] [NOTICE] new standby "node101" (ID: 1) has connected
[2022-03-29 16:46:57] [INFO] monitoring primary node "node102" (ID: 2) in normal state





=== 如下图所示,主备发生了切换,并且原主库作为新备库加入到集群。===





  1. iOS Storyboard全解析
  2. 二叉查找树(二)之 C++的实现
  3. Linux命令行–基本的bash shell命令
  4. 文件压缩与挤压ZIP
  5. hdu2639(背包求第k优解)
  6. FZUOJ Problem 2178 礼品配送
  7. 使用axis公布weblogic(一个)
  8. SDKManager连不上墙外的网,列表刷新不出来怎么办?
  9. 解放程序员双手之Supervisor
  10. 程序猿最浪漫的表白,肯定会得到你的她——Jason niu 原文来自GitHub,本人已部分修改
  11. linux中tar及压缩解压命令用法
  12. Java第五周总结
  13. Go学习笔记04-函数
  14. 在 Windows 8、Windows 10 桌面模式下的 .NET Framework 程序中,引用 Windows.Runtime 的 API。
  15. C - Monthly Expense
  16. [UE4]模拟物理子弹
  17. Solr 配置中文分词器 IK
  18. 20155332 2016-2017-2 《Java程序设计》第10周学习总结
  19. 洛谷 P2900 [USACO08MAR]土地征用Land Acquisition 解题报告
  20. ASP .NET登录界面用户验证码代码


  1. ansible对文件内容操作
  2. Bitbucket 使用 SSH 拉取仓库失败的问题
  3. python采集A站m3u8视频格式视频
  4. JAVA学习的第一周
  5. 输入一个url全过程详解
  6. C#/VB.NET 添加多行文本水印到Word文档
  7. 多线程与高并发(一)—— 自顶向下理解Synchronized实现原理
  8. CF Round #808 题解 (Div. 2 ABCD)
  9. wamp升级php
  10. ora-26002:Informatica的Powercenter跑ETL时,出现ora-26002错误解决办法