数据库异常关闭时,数据库关闭时来不及或者没机会做checkpoint,则需要从上一个一致性检查的开始恢复。KingbaseES备机checkpoint是不能产生checkpoint WAL日志条目的,因为如果写这样类型的checkpoint的话,就会将接收的WAL打乱,那么日志将混乱,回放会出问题。

那么问题来了,备机支持checkpoint吗?它的checkpoint怎么做的?

KingbaseES为了缩短恢复时间,备机上也支持checkpoint,即CreateRestartPoint。但是在sys_control文件的checkpoint记录的位点还是从主机传过来WAL里面的checkpoint记录位置。

如下所示:

当在备库执行后台执行checkpoint或手工执行checkpoint时,sys_log日志,会出现以下“recovery restart point”的日志信息。

我们通过测试来看看主备库checkpoint的不同点。

节点信息:

一、在主库执行checkpoint

1)主库checkpoint

test=# checkpoint;
CHECKPOINT

2、 主库控制文件被更新

[kingbase@node101 bin]$ ./sys_controldata -D ../data
sys_control version number: 1201
Catalog version number: 202112261
Database system identifier: 7080367334319169673
Database cluster state: in production
sys_control last modified: Wed 10 Aug 2022 11:52:41 AM CST
Latest checkpoint location: 1/D20002E0
Latest checkpoint's REDO location: 1/D20002B0
Latest checkpoint's REDO WAL file: 0000000700000001000000D2

3、 查看wal日志中checkpoint记录

[kingbase@node101 bin]$ ./sys_waldump ../data/sys_wal/0000000700000001000000D2
.......
rmgr: XLOG len (rec/tot): 114/ 114, tx: 0, lsn: 1/D20002E0, prev 1/D20002B0, desc: CHECKPOINT_ONLINE redo 1/D20002B0; tli 7; prev tli 7; fpw true; xid 0:1950; oid 123514; multi 1; offset 0; oldest xid 867 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 1950; online
rmgr: Standby len (rec/tot): 42/ 42, tx: 0, lsn: 1/D2000358, prev 1/D20002E0, desc: RUNNING_XACTS nextXid 1950 latestCompletedXid 3523216096 oldestRunningXid 1950
sys_waldump: fatal: error in WAL record at 1/D2000358: invalid record length at 1/D2000388: wanted 24, got 0

=如下图所示,在wal日志文件中,增加了一条记录checkpoint的wal日志条目,并且和控制文件中的信息一致。=

二、备库手工执行checkpoint

1、查看checkpoint之前wal日志记录

2、执行checkpoint

test=# checkpoint;
CHECKPOINT

3、再次查看wal日志的变化

[kingbase@node102 bin]$ ./sys_waldump ../data/sys_wal/0000000700000001000000D2
.......
rmgr: XLOG len (rec/tot): 114/ 114, tx: 0, lsn: 1/D20002E0, prev 1/D20002B0, desc: CHECKPOINT_ONLINE redo 1/D20002B0; tli 7; prev tli 7; fpw true; xid 0:1950; oid 123514; multi 1; offset 0; oldest xid 867 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 1950; online
rmgr: Standby len (rec/tot): 42/ 42, tx: 0, lsn: 1/D2000358, prev 1/D20002E0, desc: RUNNING_XACTS nextXid 1950 latestCompletedXid 3523216096 oldestRunningXid 1950
sys_waldump: fatal: error in WAL record at 1/D2000358: invalid record length at 1/D2000388: wanted 24, got 0 # 如上所示,在备库执行checkpoint后,并没有在备库的wal日志中增加相关的日志条目。

=经测试,在备库单独执行checkpoint,是不会生成检查点;备库产生的检查点,是当主库执行checkpoint(后台或手工)后,写入wal日志,当wal日志同步到备库后,备库执行checkpoint(手工或后台),读取wal日志,根据wal日志的记录,产生‘recovery restart point’,并更新备库的controlfile。=

三、备库数据库进程意外重启的恢复

1、重启之前备库控制文件信息

[kingbase@node102 bin]$ ./sys_controldata -D ../data
sys_control version number: 1201
Catalog version number: 202112261
Database system identifier: 7080367334319169673
Database cluster state: in archive recovery
sys_control last modified: Thu 11 Aug 2022 10:17:12 AM CST
Latest checkpoint location: 1/D973F910
Latest checkpoint's REDO location: 1/D973F8E0
Latest checkpoint's REDO WAL file: 0000000900000001000000D9

2、数据库故障重启

[kingbase@node102 sys_wal]$ kill -9 2546
[kingbase@node102 bin]$ ./sys_ctl start -D /home/kingbase/cluster/R6HA/kha/kingbase/data

3、数据库启动后sys_log日志

2022-08-11 11:52:34.532 CST,,,32224,,62f47d02.7de0,1,,2022-08-11 11:52:34 CST,,0,LOG,00000,"database system was interrupted while in recovery at log time 2022-08-11 10:16:34 CST",,"If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.",,,,,,,""
2022-08-11 11:52:35.631 CST,,,32224,,62f47d02.7de0,2,,2022-08-11 11:52:34 CST,,0,LOG,00000,"entering standby mode",,,,,,,,,""
2022-08-11 11:52:35.636 CST,,,32224,,62f47d02.7de0,3,,2022-08-11 11:52:34 CST,1/0,0,LOG,00000,"redo starts at 1/D973F8E0",,,,,,,,,""
2022-08-11 11:52:35.636 CST,,,32224,,62f47d02.7de0,4,,2022-08-11 11:52:34 CST,1/0,0,LOG,00000,"redo wal segment count 44",,,,,,,,,""
2022-08-11 11:52:35.743 CST,"system","esrep",32228,"192.168.1.102:38099",62f47d03.7de4,1,"",2022-08-11 11:52:35 CST,,0,FATAL,57P03,"the database system is starting up",,,,,,,,,""

=如下所示,备库意外重启,从最近的checkpoint开始recovery,最近的checkpoint和重启前控制文件记录的checkpoint一致。=

四、总结

对于KingbaseES集群中的主备库checkpoint机制不太一样,主库通过后台或手工checkpoint时,会在wal日志中记录日志条目;但是备库的checkpoint不会记录日志条目,备库在做checkpoint时,会显示“recovery restart point”,备库控制文件的checkpoint信息,其实是读取的主库传递的wal日志记录,当备库意外宕机重启,做recovery时,是从最近的checkpoint位置开时执行恢复。

最新文章

  1. GIT本地配置和PUSH
  2. SDL播放视频
  3. Python在windows下的安装与配置
  4. 【Android测试】【第一节】性能——CPU
  5. MySQL 操作表命令
  6. Linux查找文件夹名
  7. Bootstrap_表单_按钮
  8. ser2net使用
  9. Knockout应用开发指南
  10. opencv---cvor
  11. bzoj 4237: 稻草人
  12. [SinGuLaRiTy] NOIP2017 提高组
  13. java第一个程序——Hello World
  14. 第一章:Netty介绍
  15. 你真的理解 Spring Boot 项目中的 parent 吗?
  16. 37.QT-QTSingleApplication-程序只运行一个实例
  17. ES7的新特性
  18. Confluence 6 数据库表-空间(Spaces)
  19. 关于HTML或JS加密解密的七种方式
  20. Python之字符编码(一)

热门文章

  1. UiPath Orchestrator安装步骤
  2. Redis开机自启
  3. 干货 |《2022B2B新增长系列之企服行业橙皮书》重磅发布
  4. java面向对象编程---方法
  5. Node.js精进(9)——性能监控(上)
  6. 区间统计——ST算法
  7. idea java 打包的方法
  8. 5-3 Dubbo | 负载均衡
  9. HBase学习(一) 基本概念和安装基本命令
  10. 【学习笔记】带你从0开始学习 01Trie