KingbaseES V8R3集群运维案例之---主库系统down failover切换过程分析

案例说明：

KingbaseES V8R3集群failover时两个cluster都会触发，但只有一个cluster会调用脚本去执行真正的切换流程，另一个有对应的打印，但不会调用脚本，只是走相关的流程去修改状态。那么你需要确认一点，到底是主库还是备库的cluster在执行脚本，关键字：failover_stream.sh ，看cluster.log中有没有这个的名称。本案例通过主库系统宕机，分析了failover 切换过程。

适用版本：

KingbaseES V8R3

集群节点状态信息：

# 集群节点状态信息

TEST=# show pool_nodes;

 node_id |   hostname    | port  | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay

---------+---------------+-------+--------+-----------+---------+------------+-------------------+-------------------

 0       | 192.168.1.101 | 54321 | up     | 0.500000  | primary | 1          | true              | 0

 1       | 192.168.1.102 | 54321 | up     | 0.500000  | standby | 0          | false             | 0

(2 rows)

对于KingbaseES V8R3集群failover切换的分析：

1. 可以通过主备库的cluster.log，获取整个主备切换的过程。

2. 从failover.log日志中获取执行failover过程，注意只有执行failover_stream.sh的节点才会产生failover.log。

3. recovery.log日志记录了原主库failover完成后，从主库recovery为新备库的过程。

一、Failover切换过程分析

关闭主库系统：

[root@node101 ~]# shutdown -h now

备库cluster.log日志分析：（主库系统关机，备库调用failover_stream.sh执行failover切换）

1、主备库处于正常状态时，备库收到主库的heartbeat。

2022-08-30 15:38:03: pid 17612: DETAIL:  the last heartbeat from "192.168.1.101:9999" received 0 seconds ago

2022-08-30 15:38:06: pid 17612: LOG:  watchdog checking if kingbasecluster is alive using heartbeat

2022-08-30 15:38:06: pid 17612: DETAIL:  the last heartbeat from "192.168.1.101:9999" received 0 seconds ago

2、备库kingbasecluster检测到主库集群服务和数据库服务shutdown，kingbasecluster发生主备切换，当前的kingbasecluster由standby切换为master。

2022-08-30 15:38:09: pid 17585: LOG:  remote node "192.168.1.101:9999 Linux node101" is shutting down

2022-08-30 15:38:09: pid 17585: LOG:  watchdog cluster has lost the coordinator node

2022-08-30 15:38:09: pid 17585: LOG:  unassigning the remote node "192.168.1.101:9999 Linux node101" from watchdog cluster master

2022-08-30 15:38:09: pid 17585: LOG:  We have lost the cluster master node "192.168.1.101:9999 Linux node101"

2022-08-30 15:38:09: pid 17585: LOG:  watchdog node state changed from [STANDBY] to [JOINING]

2022-08-30 15:38:09: pid 17612: LOG:  watchdog checking if kingbasecluster is alive using heartbeat

2022-08-30 15:38:09: pid 17612: DETAIL:  the last heartbeat from "192.168.1.101:9999" received 1 seconds ago

2022-08-30 15:38:10: pid 17547: ERROR:  failed to authenticate

2022-08-30 15:38:10: pid 17547: DETAIL:  the database system is shutting down

2022-08-30 15:38:12: pid 17612: LOG:  watchdog checking if kingbasecluster is alive using heartbeat

2022-08-30 15:38:12: pid 17612: DETAIL:  the last heartbeat from "192.168.1.101:9999" received 4 seconds ago

3、备库kingbasecluster对主库的数据库服务根据：health_check_max_retries * health_check_retry_delay配置尝试连接。

2022-08-30 15:38:13: pid 17547: LOG:  health checking retry count 1

2022-08-30 15:38:13: pid 17585: LOG:  watchdog node state changed from [JOINING] to [INITIALIZING]

2022-08-30 15:38:14: pid 17585: LOG:  I am the only alive node in the watchdog cluster

2022-08-30 15:38:14: pid 17585: HINT:  skipping stand for coordinator state

2022-08-30 15:38:14: pid 17585: LOG:  watchdog node state changed from [INITIALIZING] to [MASTER]

2022-08-30 15:38:14: pid 17585: LOG:  I am announcing my self as master/coordinator watchdog node

2022-08-30 15:38:15: pid 17612: LOG:  watchdog checking if kingbasecluster is alive using heartbeat

2022-08-30 15:38:15: pid 17612: DETAIL:  the last heartbeat from "192.168.1.101:9999" received 7 seconds ago

2022-08-30 15:38:18: pid 17585: LOG:  I am the cluster leader node

2022-08-30 15:38:18: pid 17585: DETAIL:  our declare coordinator message is accepted by all nodes

2022-08-30 15:38:18: pid 17585: LOG:  setting the local node "192.168.1.102:9999 Linux node102" as watchdog cluster master

4、备库开启升级进程（escalation process），cluster vip漂移到新的master节点。

2022-08-30 15:38:18: pid 17585: LOG:  I am the cluster leader node. Starting escalation process

2022-08-30 15:38:18: pid 17585: LOG:  escalation process started with PID:22088

2022-08-30 15:38:18: pid 17547: LOG:  connect_inet_domain_socket: select() interrupted by certain signal. retrying...

2022-08-30 15:38:18: pid 22088: LOG:  watchdog: escalation started

2022-08-30 15:38:18: pid 17612: LOG:  watchdog checking if kingbasecluster is alive using heartbeat

2022-08-30 15:38:18: pid 17612: DETAIL:  the last heartbeat from "192.168.1.101:9999" received 10 seconds ago

2022-08-30 15:38:21: pid 22088: LOG:  successfully acquired the delegate IP:"192.168.1.201"

2022-08-30 15:38:21: pid 22088: DETAIL:  'if_up_cmd' returned with success

.......

5、备库对原主库数据库服务的连接次数超过阈值后，将原主库的数据库服务设置为down状态。

2022-08-30 15:39:20: pid 17547: LOG:  health checking retry count 10

2022-08-30 15:39:22: pid 17612: LOG:  watchdog checking if kingbasecluster is alive using heartbeat

2022-08-30 15:39:22: pid 17612: DETAIL:  the last heartbeat from "192.168.1.101:9999" received 73 seconds ago

2022-08-30 15:39:23: pid 17547: LOG:  failed to connect to kingbase server on "192.168.1.101:54321", getsockopt() detected error "

No route to host"

2022-08-30 15:39:23: pid 17547: ERROR:  failed to make persistent db connection

2022-08-30 15:39:23: pid 17547: DETAIL:  connection to host:"192.168.1.101:54321" failed

2022-08-30 15:39:23: pid 17547: LOG:  setting backend node 0 status to NODE DOWN

6、收到本节点kingbasecluster发出的failover request，并获取到failover lock。

2022-08-30 15:39:23: pid 17547: LOG:  received degenerate backend request for node_id: 0 from pid [17547]

2022-08-30 15:39:23: pid 17585: LOG:  new IPC connection received

2022-08-30 15:39:23: pid 17585: LOG:  watchdog received the failover command from local kingbasecluster on IPC interface

2022-08-30 15:39:23: pid 17585: LOG:  watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST] received from local

 kingbasecluster on IPC interface

2022-08-30 15:39:23: pid 17585: LOG:  I am the only kingbasecluster node in the watchdog cluster

2022-08-30 15:39:23: pid 17585: DETAIL:  no need to propagate the failover command [DEGENERATE_BACKEND_REQUEST]

2022-08-30 15:39:23: pid 17585: LOG:  new IPC connection received

2022-08-30 15:39:23: pid 17585: LOG:  received the failover command lock request from local kingbasecluster on IPC interface

2022-08-30 15:39:23: pid 17585: LOG:  local kingbasecluster node "192.168.1.102:9999 Linux node102" is requesting to become a lock

 holder for failover ID: 0

7、获取到failover lock后，开始执行failover_stream.sh进行failover的切换。

2022-08-30 15:39:23: pid 17585: LOG:  local kingbasecluster node "192.168.1.102:9999 Linux node102" is the lock holder

2022-08-30 15:39:23: pid 17547: LOG:  starting degeneration. shutdown host 192.168.1.101(54321)

2022-08-30 15:39:23: pid 17547: LOG:  Restart all children

2022-08-30 15:39:23: pid 17547: LOG:  execute command: /home/kingbase/cluster/R3HA/kingbasecluster/bin/failover_stream.sh 192.168.

1.102 0 0 192.168.1.101 192.168.1.101 1 0 /home/kingbase/cluster/R3HA/db/data

2022-08-30 15:39:25: pid 17612: LOG:  watchdog checking if kingbasecluster is alive using heartbeat

2022-08-30 15:39:25: pid 17612: DETAIL:  the last heartbeat from "192.168.1.101:9999" received 76 seconds ago

ssh: connect to host 192.168.1.101 port 22: No route to host

2022-08-30 15:39:31: pid 17612: LOG:  watchdog checking if kingbasecluster is alive using heartbeat

2022-08-30 15:39:31: pid 17612: DETAIL:  the last heartbeat from "192.168.1.101:9999" received 82 seconds ago

2022-08-30 15:39:33: pid 17585: LOG:  new IPC connection received

2022-08-30 15:39:33: pid 17585: LOG:  received the failover command lock request from local kingbasecluster on IPC interface

2022-08-30 15:39:33: pid 17585: LOG:  local kingbasecluster node "192.168.1.102:9999 Linux node102" is requesting to release [FAIL

OVER] lock for failover ID 0

8、执行failover切换，并查找新的primary。（当前备库切换为primary）

2022-08-30 15:39:33: pid 17547: LOG:  find_primary_node_repeatedly: waiting for finding a primary node

2022-08-30 15:39:33: pid 17547: LOG:  find_primary_node: checking backend no 0

2022-08-30 15:39:33: pid 17547: LOG:  find_primary_node: checking backend no 1

2022-08-30 15:39:33: pid 17547: LOG:  find_primary_node: primary node id is 1

2022-08-30 15:39:33: pid 17585: LOG:  new IPC connection received

2022-08-30 15:39:33: pid 17585: LOG:  received the failover command lock request from local kingbasecluster on IPC interface

2022-08-30 15:39:33: pid 17585: LOG:  local kingbasecluster node "192.168.1.102:9999 Linux node102" is requesting to release [FOLL

OW MASTER] lock for failover ID 0

2022-08-30 15:39:33: pid 17585: LOG:  local kingbasecluster node "192.168.1.102:9999 Linux node102" has released the [FOLLOW MASTE

R] lock for failover ID 0

2022-08-30 15:39:33: pid 17547: LOG:  failover: set new primary node: 1

2022-08-30 15:39:33: pid 17547: LOG:  failover: set new master node: 1

2022-08-30 15:39:33: pid 21501: LOG:  worker process received restart request

2022-08-30 15:39:33: pid 17585: LOG:  new IPC connection received

2022-08-30 15:39:33: pid 17585: LOG:  received the failover command lock request from local kingbasecluster on IPC interface

2022-08-30 15:39:33: pid 17585: LOG:  local kingbasecluster node "192.168.1.102:9999 Linux node102" is requesting to resign from a

 lock holder for failover ID 0

2022-08-30 15:39:33: pid 17585: LOG:  local kingbasecluster node "192.168.1.102:9999 Linux node102" has resigned from the lock hol

der

failover done. shutdown host 192.168.1.101(54321)2022-08-30 15:39:33: pid 17547: LOG:  failover done. shutdown host 192.168.1.101(

54321)

备库Failover.log分析：

1、执行failover_stream.sh进行failover切换

-----------------2022-08-30 15:39:23 failover beging---------------------------------------

----failover-stats is %H = hostname of the new master node [192.168.1.102], %P = old primary node id [0], %d = node id[0], %h = host name [192.168.1.101], %O = old primary host[192.168.1.101] %m = new master node id [1], %M = old master node id [0], %D = database cluster path [/home/kingbase/cluster/R3HA/db/data].

Tips：

     1）new_master : 192.168.1.102

     2) new_master_id:   1

     3) old_primary: 192.168.1.101

     4) old_primary_id : 0

2、ping 网关测试网络连通性

----ping trust ip

ping trust ip 192.168.1.1 success

3、ssh连接远程原主库，执行关闭数据库服务和释放db vip；并将db vip加载到当前主库节点。

----determine whether the faulty db is master or standby

master down, let 192.168.1.102 become new primary.....

 2022-08-30 15:39:25 del old primary VIP on 192.168.1.101

ssh connect host:192.168.1.101 failed, can not stop old primary db and del old primary db vip

2022-08-30 15:39:29 add VIP on 192.168.1.102

ADD VIP NOW AT 2022-08-30 15:39:30 ON enp0s3

execute: [/sbin/ip addr add 192.168.1.200/24 dev enp0s3 label enp0s3:2]

execute: /home/kingbase/cluster/R3HA/db/bin/arping -U 192.168.1.200 -I enp0s3 -w 1

ARPING 192.168.1.200 from 192.168.1.200 enp0s3

Sent 1 probes (1 broadcast(s))

Received 0 response(s)

2022-08-30 15:39:31 promote begin...let 192.168.1.102 become master

4、通过连接db vip执行“select 33333”测试当前数据库服务状态，如果正常，执行promote，提升为新主库，将同步模式设为async。

check db if is alive

ksql "port=54321 user=SUPERMANAGER_V8ADMIN  dbname=TEST connect_timeout=10"  -c "select 33333;"

2022-08-30 15:39:31 kingbase is ok , to prepare execute promote

execute promote

server promoting

check db if is alive after promote

ksql "port=54321 user=SUPERMANAGER_V8ADMIN  dbname=TEST connect_timeout=10"  -c "select 33333;"

2022-08-30 15:39:32 after execute promote , kingbase status is ok.

after execute promote, kingbase is ok.

2022-08-30 15:39:32 sync to async

ALTER SYSTEM

 SYS_RELOAD_CONF

-----------------

 t

(1 row)

5、连接新主库数据库服务，如果成功，在新主库上执行checkpoint，checkpoint成功，则failover切换完成。

2022-08-30 15:39:32 make checkpoint

check the db to see if it is alive

ksql "port=54321 user=SUPERMANAGER_V8ADMIN dbname=TEST connect_timeout=10"  -c "select 33333;"

2022-08-30 15:39:32 kingbase is ok , to prepare execute checkpoint

execute checkpoint

CHECKPOINT

check the db to see if it is alive after execute checkpoint

ksql "port=54321 user=SUPERMANAGER_V8ADMIN  dbname=TEST connect_timeout=10"  -c "select 33333;"

2022-08-30 15:39:33 after execute checkpoint, kingbase is ok.

after execute checkpoint, kingbase is ok.

-----------------2022-08-30 15:39:33 failover end---------------------------------------

二、原主库recovery为新备库

1、在原主库生成recovery.conf文件（系统通过crond计划任务，调用network_rewind.sh脚本自动recovery为备库）

[kingbase@node101 data]$ cp ../etc/recovery.done ./recovery.conf

2、查看recovery.log日志信息

[kingbase@node101 log]$  more recovery.log

---------------------------------------------------------------------

1）开启recovery，检测存储读写功能是否正常。

2022-08-30 16:42:01 recover beging...

my pid is 11927,officially began to perform recovery

2022-08-30 16:42:01 check read/write on mount point

2022-08-30 16:42:01 check read/write on mount point (1 / 6).

2022-08-30 16:42:01 stat the directory of the mount point "/home/kingbase/cluster/R3HA/db/data" ...

2022-08-30 16:42:01 stat the directory of the mount point "/home/kingbase/cluster/R3HA/db/data" ... OK

2022-08-30 16:42:01 create/write the file "/home/kingbase/cluster/R3HA/db/data/rw_status_file_574940445" ...

2022-08-30 16:42:01 create/write the file "/home/kingbase/cluster/R3HA/db/data/rw_status_file_574940445" ... OK

2022-08-30 16:42:01 stat the file "/home/kingbase/cluster/R3HA/db/data/rw_status_file_574940445" ...

2022-08-30 16:42:01 stat the file "/home/kingbase/cluster/R3HA/db/data/rw_status_file_574940445" ... OK

2022-08-30 16:42:01 read the file "/home/kingbase/cluster/R3HA/db/data/rw_status_file_574940445" ...

2022-08-30 16:42:01 read the file "/home/kingbase/cluster/R3HA/db/data/rw_status_file_574940445" ... OK

2022-08-30 16:42:01 delete the file "/home/kingbase/cluster/R3HA/db/data/rw_status_file_574940445" ...

2022-08-30 16:42:01 delete the file "/home/kingbase/cluster/R3HA/db/data/rw_status_file_574940445" ... OK

2022-08-30 16:42:01 success to check read/write on mount point (1 / 6).

2022-08-30 16:42:01 check read/write on mount point ... ok

2）ping 网关检测网络的连通性

2022-08-30 16:42:01 check if the network is ok

ping trust ip 192.168.1.1 success ping times :[3], success times:[2]

3）判断当前节点role（master OR standby）,如果是standby节点，并判断是否需要recovery。（ I,m status is [1] (1 is down), I will be in recovery.）

determine if i am master or standby

 node_id |   hostname    | port  | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay

---------+---------------+-------+--------+-----------+---------+------------+-------------------+-------------------

 0       | 192.168.1.101 | 54321 | down   | 0.500000  | standby | 0          | false             | 0

 1       | 192.168.1.102 | 54321 | up     | 0.500000  | primary | 0          | true              | 0

(2 rows)

i am standby in cluster,determine if recovery is needed

2022-08-30 16:42:03 now will del vip [192.168.1.200/24]

but no 192.168.1.200/24 on my DEV, nothing to do with del

ksql execute success,but node:node101 does not have correct streaming(or catchup) replication ,will retry ,retry times:[1/10]

......

ksql execute success,but node:node101 does not have correct streaming(or catchup) replication ,will retry ,retry times:[10/10]

primary node/Im node status is changed, primary ip[192.168.1.102], recovery.conf NEED_CHANGE [0] (0 is need ), I,m status is [1] (1 is down), I will be in recovery.

 node_id |   hostname    | port  | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay

---------+---------------+-------+--------+-----------+---------+------------+-------------------+-------------------

 0       | 192.168.1.101 | 54321 | down   | 0.500000  | standby | 0          | false             | 0

 1       | 192.168.1.102 | 54321 | up     | 0.500000  | primary | 0          | true              | 0

(2 rows)

4）如果需要做recovery，执行sys_rewind。

if recover node up, let it down , for rewind

2022-08-30 16:42:34 sys_rewind...

sys_rewind  --target-data=/home/kingbase/cluster/R3HA/db/data --source-server="host=192.168.1.102 port=54321 user=SUPERMANAGER_V8ADMIN dbname=TEST"

datadir_source = /home/kingbase/cluster/R3HA/db/data

rewinding from last common checkpoint at 0/66000028 on timeline 15

find last common checkpoint start time from 2022-08-30 16:42:34.727711 CST to 2022-08-30 16:42:34.839545 CST, in "0.111834" seconds.

reading source file list

reading target file list

reading WAL in target

Rewind datadir file from source

update the control file: minRecoveryPoint is '0/68024E80', minRecoveryPointTLI is '16', and database state is 'in archive recovery'

rewind start wal location 0/66000028 (file 0000000F0000000000000066), end wal location 0/68024E80 (file 000000100000000000000068). time from 2022-08-30 16:42:36.727711 CST to 2022-08-30 16:42:37.201678 CST, in "2.473967" seconds.

Done!

5）sys_rewind成功，修改集群配置文件。

 sed conf change #synchronous_standby_names

2022-08-30 16:42:39 file operate

cp recovery.conf...

 change recovery.conf ip -> primary.ip

2022-08-30 16:42:39 no need change recovery.conf, primary node is 192.168.1.102

6）启动新备库数据库服务，删除原复制槽并重建。备库数据库服务启动成功后，配置同步为sync。

delete pid file if exist

del the replication_slots if exis

drop the slot [slot_node101].

drop the slot [slot_node102].

2022-08-30 16:42:39 start up the kingbase...

waiting for server to start....LOG:  redirecting log output to logging collector process

HINT:  Future log output will appear in directory "/home/kingbase/cluster/R3HA/db/data/sys_log".

 done

server started

ksql "port=54321 user=SUPERMANAGER_V8ADMIN dbname=TEST connect_timeout=10"   -c "select 33333;"

 SYS_CREATE_PHYSICAL_REPLICATION_SLOT

--------------------------------------

 (slot_node101,)

(1 row)

2022-08-30 16:42:41 create the slot [slot_node101] success.

 SYS_CREATE_PHYSICAL_REPLICATION_SLOT

--------------------------------------

 (slot_node102,)

(1 row)

2022-08-30 16:42:41 create the slot [slot_node102] success.

2022-08-30 16:42:41 start up standby successful!

cluster is sync cluster.

SYNC RECOVER MODE ...

2022-08-30 16:42:41 remote primary node change sync

ALTER SYSTEM

 SYS_RELOAD_CONF

-----------------

 t

(1 row)

SYNC RECOVER MODE DONE

7）执行pcp_attach_node将当前节点加入到集群。

2022-08-30 16:42:44 attach pool...

IM Node is 0, will try [pcp_attach_node -U kingbase -W MTIzNDU2 -h 192.168.1.201 -n 0]

pcp_attach_node -- Command Successful

 node_id |   hostname    | port  | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay

---------+---------------+-------+--------+-----------+---------+------------+-------------------+-------------------

 0       | 192.168.1.101 | 54321 | up     | 0.500000  | standby | 0          | false             | 0

 1       | 192.168.1.102 | 54321 | up     | 0.500000  | primary | 0          | true              | 0

(2 rows)

2022-08-30 16:42:45 attach end..

recovery success,exit script with success

三、总结

通过对原主库执行系统重启服务，如果备库在阈值范围内尝试连接主库数据库服务失败，其failover切换过程和主库系统shutdown是一致的。

巴特西

KingbaseES V8R3集群运维案例之---主库系统down failover切换过程分析

最新文章

热门文章