案例说明:

在生产环境下,由于安全需要,主机间不允许建立root用户的ssh信任连接,这样导致KingbaseES R6 repmgr集群,通过sys_monitor.sh脚本启动集群时,节点之间不能通过ssh正常访问,导致集群启动失败。本案例借助于es_server和es_client建立用户之间的信任连接,代替ssh访问。

测试数据库版本:

test=# select version();
version
----------------------------------------------------------------------------------------------------------------------
KingbaseES V008R006C003B0010 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit
(1 row)

如下图所示,由于不能建立root用户的信任连接,导致sys_monitor.sh启动无法正常启动:

一、配置es_server启动(所有node)

es_server 配置:



注意:这里的kingbase既不是数据库用户,也不是操作系统用户,而是连接es_server的用户。密码默认是 123456,该密码是通过md5算法加密的。

启动es_server:

[kingbase@node3 bin]$ ./esHAmodel.sh start
[kingbase@node3 bin]$ ps -ef |grep es_server
kingbase 28024 1 0 15:18 pts/2 00:00:00 /home/kingbase/cluster/R6HA/KHA/kingbase/bin/es_server [kingbase@node3 bin]$ netstat -an |grep 8890
tcp 0 0 0.0.0.0:8890 0.0.0.0:* LISTEN

测试es_server的连接:

[kingbase@node3 bin]$ ./es_client --help
es-client
Usage:
es-client [OPTION...] -o
Options:
-U, --username=NAME username for ES authentication
-h, --host=HOSTNAME ES Server host
-p, --port=PORT ES Server port number
-W, --password password
-d, --debug enable debug message (optional)
-?, --help print this help -o, --option use user-define cmd: like "ls ." [kingbase@node3 bin]$ ./es_client -h 192.168.7.248 -U kingbase -W 123456 -o "hostname"
node1 [kingbase@node3 bin]$ ./es_client -h 192.168.7.249 -U kingbase -W 123456 -o "hostname"
node2

二、配置repmgr.conf支持bmj方式连接

=如下图所示:在sys_monitor.sh脚本中,如果bmj=on,则使用es_server和es_client通讯,所以需修改repmgr.conf启动bmj通讯。=

配置repmgr.conf:(所有node)

[kingbase@node3 bin]$ cat ../etc/repmgr.conf
# 启用bmj
on_bmj=on
node_id=3
node_name=node243
promote_command='/home/kingbase/cluster/R6HA/KHA/kingbase/bin/repmgr standby promote -f /home/kingbase/cluster/R6HA/KHA/kingbase/etc/repmgr.conf'
follow_command='/home/kingbase/cluster/R6HA/KHA/kingbase/bin/repmgr standby follow -f /home/kingbase/cluster/R6HA/KHA/kingbase/etc/repmgr.conf -W --upstream-node-id=%n'
conninfo='host=192.168.7.243 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2' log_file='/home/kingbase/cluster/R6HA/KHA/kingbase/hamgr.log'
data_directory='/home/kingbase/cluster/R6HA/KHA/kingbase/data'
sys_bindir='/home/kingbase/cluster/R6HA/KHA/kingbase/bin'
ssh_options='-q -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o ServerAliveInterval=2 -o ServerAliveCountMax=5 -p 22'
reconnect_attempts=2
reconnect_interval=3
failover='automatic'
recovery='automatic'
monitoring_history='no'
trusted_servers='192.168.7.1'
virtual_ip='192.168.7.240/24'
net_device='enp0s3'
ipaddr_path='/sbin'
arping_path='/sbin'
synchronous='quorum'
repmgrd_pid_file='/home/kingbase/cluster/R6HA/KHA/kingbase/hamgrd.pid'
ping_path='/usr/bin'
#priority=0

三、sys_monitor.sh启动集群测试

[kingbase@node3 bin]$ ./sys_monitor.sh restart
2021-03-01 15:25:58 Ready to stop all DB ...
sh: /etc/cron.d/KINGBASECRON: Permission deniedsh: /etc/cron.d/KINGBASECRON: Permission deniedsh: /etc/cron.d/KINGBASECRON: Permission denied2021-03-01 15:25:59 begin to stop repmgrd on "[192.168.7.248]".
2021-03-01 15:25:59 repmgrd on "[192.168.7.248]" stop success.
2021-03-01 15:25:59 begin to stop repmgrd on "[192.168.7.243]".
2021-03-01 15:25:59 repmgrd on "[192.168.7.243]" stop success.
2021-03-01 15:25:59 begin to stop repmgrd on "[192.168.7.249]".
2021-03-01 15:25:59 repmgrd on "[192.168.7.249]" stop success.
2021-03-01 15:25:59 begin to stop DB on "[192.168.7.248]".
waiting for server to shut down.... done
server stopped2021-03-01 15:26:00 DB on "[192.168.7.248]" stop success.
2021-03-01 15:26:00 begin to stop DB on "[192.168.7.249]".
waiting for server to shut down.... done
server stopped2021-03-01 15:26:00 DB on "[192.168.7.249]" stop success.
2021-03-01 15:26:00 begin to stop DB on "[192.168.7.243]".
waiting for server to shut down..... done
server stopped2021-03-01 15:26:01 DB on "[192.168.7.243]" stop success.
2021-03-01 15:26:01 Done.
2021-03-01 15:26:02 Ready to start all DB ...
2021-03-01 15:26:02 begin to start DB on "[192.168.7.243]".
waiting for server to start.... done
server started2021-03-01 15:26:02 execute to start DB on "[192.168.7.243]" success, connect to check it.
2021-03-01 15:26:03 DB on "[192.168.7.243]" start success.
2021-03-01 15:26:03 Try to ping trusted_servers on host 192.168.7.248 ...
2021-03-01 15:26:05 Try to ping trusted_servers on host 192.168.7.243 ...
2021-03-01 15:26:07 Try to ping trusted_servers on host 192.168.7.249 ...
2021-03-01 15:26:09 begin to start DB on "[192.168.7.248]".
waiting for server to start.... done
server started2021-03-01 15:26:10 execute to start DB on "[192.168.7.248]" success, connect to check it.
2021-03-01 15:26:11 DB on "[192.168.7.248]" start success.
2021-03-01 15:26:11 begin to start DB on "[192.168.7.249]".
waiting for server to start.... done
server started2021-03-01 15:26:12 execute to start DB on "[192.168.7.249]" success, connect to check it.
2021-03-01 15:26:13 DB on "[192.168.7.249]" start success.
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------
1 | node248 | standby | ! running | node243 | default | 100 | 23 | host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
2 | node249 | witness | * running | node243 | default | 0 | 1 | host=192.168.7.249 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
3 | node243 | primary | * running | | default | 100 | 23 | host=192.168.7.243 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
WARNING: following issues were detected
- node "node248" (ID: 1) is running but the repmgr node record is inactive
2021-03-01 15:26:13 The primary DB is started.
WARNING: There are no 2 standbys in pg_stat_replication, please check all the standby servers replica from primary
2021-03-01 15:26:37 Success to load virtual ip [192.168.7.240/24] on primary host [192.168.7.243].
2021-03-01 15:26:37 Try to ping vip on host 192.168.7.248 ...
2021-03-01 15:26:39 Try to ping vip on host 192.168.7.243 ...
2021-03-01 15:26:41 Try to ping vip on host 192.168.7.249 ...
2021-03-01 15:26:43 begin to start repmgrd on "[192.168.7.248]".
2021-03-01 15:26:43 repmgrd on "[192.168.7.248]" already started.
2021-03-01 15:26:43 begin to start repmgrd on "[192.168.7.243]".
2021-03-01 15:26:43 repmgrd on "[192.168.7.243]" already started.
2021-03-01 15:26:43 begin to start repmgrd on "[192.168.7.249]".
2021-03-01 15:26:43 repmgrd on "[192.168.7.249]" already started.
ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen
----+---------+---------+-----------+----------+---------+-------+---------+--------------------
1 | node248 | standby | running | node243 | running | 3589 | no | 0 second(s) ago
2 | node249 | witness | * running | node243 | running | 23739 | no | 0 second(s) ago
3 | node243 | primary | * running | | running | 30496 | no | n/a
sh: /etc/cron.d/KINGBASECRON: Permission deniedsh: /etc/logrotate.d/kingbase: Permission deniedchown: changing ownership of ‘/etc/logrotate.d/kingbase’: Operation not permittedchmod: changing permissions of ‘/etc/logrotate.d/kingbase’: Operation not permittedsh: /etc/cron.d/KINGBASECRON: Permission deniedsh: /etc/logrotate.d/kingbase: Permission deniedchown: changing ownership of ‘/etc/logrotate.d/kingbase’: Operation not permittedchmod: changing permissions of ‘/etc/logrotate.d/kingbase’: Operation not permittedsh: /etc/cron.d/KINGBASECRON: Permission deniedsh: /etc/logrotate.d/kingbase: Permission deniedchown: changing ownership of ‘/etc/logrotate.d/kingbase’: Operation not permittedchmod: changing permissions of ‘/etc/logrotate.d/kingbase’: Operation not permitted2021-03-01 15:26:44 Done.

如下图所示:sys_monitor.sh脚本启动访问“/etc/cron.d/KINGBASECRON”和“/etc/lograte.d/kingbase”文件时,出现权限错误:

注:

1)/etc/cron.d/KINGBASECRON,是repmgr集群启动时建立的计划任务,用于启动repmgrd进程。
2)/etc/logrotate.d/kingbase,配置文件用于切割hamgr.log和kbha.log日志

sys_monitor.sh脚本中/etc/cron.d/KINGBASECRON相关配置:

sys_monitor.sh脚本中/etc/logrotate.d/kingbase相关配置:

1)修改/etc/cron.d/KINGBASECRON文件相关权限(如下图所示)(所有node)

2)修改/etc/logrotate.d/kingbase相关权限(所有node)

修改kingbase文件所有者:(所有node)

注释sys_monitor.sh脚本中修改kingbase配置文件所有者和权限的语句:

function init_log_rotate()
{
_host="$1"
_final_target_file="/etc/logrotate.d/kingbase"
eval _rep_log_file=`grep log_file ${rep_conf} | awk -F '=' '{print $2}'`
execute_command ${super_user} $host "\
echo -e '# Generate by sys_monitor.sh at `date`\n\
${kbha_file} {\n\
weekly\n\
maxsize 100M\n\
su ${execute_user} ${execute_user}\n\
create 0600 ${execute_user} ${execute_user}\n\
rotate 3\n\
copytruncate\n\
dateext\n\
}\n\
${_rep_log_file} {\n\
weekly\n\
maxsize 100M\n\
su ${execute_user} ${execute_user}\n\
create 0600 ${execute_user} ${execute_user}\n\
rotate 3\n\
copytruncate\n\
dateext\n\
}\n\
' > ${_final_target_file}"
#execute_command ${super_user} $host "chown ${super_user}:${super_user} ${_final_target_file}"
#execute_command ${super_user} $host "chmod 644 ${_final_target_file}"

如下图所示:

四、测试集群启动

[kingbase@node3 bin]$ ./sys_monitor.sh restart
2021-03-01 15:52:08 Ready to stop all DB ...
2021-03-01 15:52:08 begin to stop repmgrd on "[192.168.7.248]".
2021-03-01 15:52:08 repmgrd on "[192.168.7.248]" stop success.
2021-03-01 15:52:08 begin to stop repmgrd on "[192.168.7.243]".
2021-03-01 15:52:08 repmgrd on "[192.168.7.243]" stop success.
2021-03-01 15:52:08 begin to stop repmgrd on "[192.168.7.249]".
2021-03-01 15:52:08 repmgrd on "[192.168.7.249]" stop success.
2021-03-01 15:52:08 begin to stop DB on "[192.168.7.248]".
waiting for server to shut down..... done
server stopped2021-03-01 15:52:09 DB on "[192.168.7.248]" stop success.
2021-03-01 15:52:09 begin to stop DB on "[192.168.7.249]".
waiting for server to shut down.... done
server stopped2021-03-01 15:52:10 DB on "[192.168.7.249]" stop success.
2021-03-01 15:52:10 begin to stop DB on "[192.168.7.243]".
waiting for server to shut down..... done
server stopped2021-03-01 15:52:12 DB on "[192.168.7.243]" stop success.
2021-03-01 15:52:12 Done.
2021-03-01 15:52:12 Ready to start all DB ...
2021-03-01 15:52:12 begin to start DB on "[192.168.7.243]".
waiting for server to start.... done
server started2021-03-01 15:52:12 execute to start DB on "[192.168.7.243]" success, connect to check it.
2021-03-01 15:52:13 DB on "[192.168.7.243]" start success.
2021-03-01 15:52:13 Try to ping trusted_servers on host 192.168.7.248 ...
2021-03-01 15:52:15 Try to ping trusted_servers on host 192.168.7.243 ...
2021-03-01 15:52:17 Try to ping trusted_servers on host 192.168.7.249 ...
2021-03-01 15:52:19 begin to start DB on "[192.168.7.248]".
waiting for server to start.... done
server started2021-03-01 15:52:20 execute to start DB on "[192.168.7.248]" success, connect to check it.
2021-03-01 15:52:21 DB on "[192.168.7.248]" start success.
2021-03-01 15:52:21 begin to start DB on "[192.168.7.249]".
waiting for server to start.... done
server started2021-03-01 15:52:21 execute to start DB on "[192.168.7.249]" success, connect to check it.
2021-03-01 15:52:22 DB on "[192.168.7.249]" start success.
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------
1 | node248 | standby | running | node243 | default | 100 | 23 | host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
2 | node249 | witness | * running | node243 | default | 0 | 1 | host=192.168.7.249 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
3 | node243 | primary | * running | | default | 100 | 23 | host=192.168.7.243 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
2021-03-01 15:52:22 The primary DB is started.
WARNING: There are no 2 standbys in pg_stat_replication, please check all the standby servers replica from primary
2021-03-01 15:52:46 Success to load virtual ip [192.168.7.240/24] on primary host [192.168.7.243].
2021-03-01 15:52:46 Try to ping vip on host 192.168.7.248 ...
2021-03-01 15:52:48 Try to ping vip on host 192.168.7.243 ...
2021-03-01 15:52:50 Try to ping vip on host 192.168.7.249 ...
2021-03-01 15:52:52 begin to start repmgrd on "[192.168.7.248]".
[2021-03-01 15:54:17] [NOTICE] using provided configuration file "/home/kingbase/cluster/R6HA/KHA/kingbase/bin/../etc/repmgr.conf"
[2021-03-01 15:54:17] [NOTICE] redirecting logging output to "/home/kingbase/cluster/R6HA/KHA/kingbase/hamgr.log"
2021-03-01 15:52:52 repmgrd on "[192.168.7.248]" start success.
2021-03-01 15:52:52 begin to start repmgrd on "[192.168.7.243]".
[2021-03-01 15:52:52] [NOTICE] using provided configuration file "/home/kingbase/cluster/R6HA/KHA/kingbase/bin/../etc/repmgr.conf"
[2021-03-01 15:52:52] [NOTICE] redirecting logging output to "/home/kingbase/cluster/R6HA/KHA/kingbase/hamgr.log"
2021-03-01 15:52:52 repmgrd on "[192.168.7.243]" start success.
2021-03-01 15:52:52 begin to start repmgrd on "[192.168.7.249]".
[2021-03-01 14:50:47] [NOTICE] using provided configuration file "/home/kingbase/cluster/R6HA/KHA/kingbase/bin/../etc/repmgr.conf"
[2021-03-01 14:50:47] [NOTICE] redirecting logging output to "/home/kingbase/cluster/R6HA/KHA/kingbase/hamgr.log"
2021-03-01 15:52:53 repmgrd on "[192.168.7.249]" start success.
ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen
----+---------+---------+-----------+----------+---------+-------+---------+--------------------
1 | node248 | standby | running | node243 | running | 13909 | no | 0 second(s) ago
2 | node249 | witness | * running | node243 | running | 28830 | no | n/a
3 | node243 | primary | * running | | running | 6643 | no | n/a
2021-03-01 15:52:53 Done.

如下图所示:集群启动正常

附件:/etc/logrotate.d/kingbase权限故障处理

如下图所示:sys_monitor.sh脚本启动集群出现以下错误:

解决方案:

[root@node3 ~]# which chmod
/usr/bin/chmod
[root@node3 ~]# which chown
/usr/bin/chown [root@node3 ~]# ls -lh /usr/bin/chown
-rwxr-xr-x. 1 root root 62K Nov 20 2015 /usr/bin/chown
[root@node3 ~]# ls -lh /usr/bin/chmod
-rwxr-xr-x. 1 root root 58K Nov 20 2015 /usr/bin/chmod [root@node3 ~]# chmod u+s /usr/bin/chown
[root@node3 ~]# chmod u+s /usr/bin/chmod [root@node3 ~]# ls -lh /usr/bin/chmod
-rwsr-xr-x. 1 root root 58K Nov 20 2015 /usr/bin/chmod
[root@node3 ~]# ls -lh /usr/bin/chown
-rwsr-xr-x. 1 root root 62K Nov 20 2015 /usr/bin/chown [root@node3 ~]# ls -lh /etc/logrotate.d/kingbase
-rw-r--r--. 1 kingbase kingbase 492 Mar 1 15:52 /etc/logrotate.d/kingbase [root@node3 ~]# su - kingbase
Last login: Mon Mar 1 15:51:39 CST 2021 on pts/1
Last failed login: Mon Mar 1 15:58:21 CST 2021 from :0 on :0
There was 1 failed login attempt since the last successful login.
[kingbase@node3 ~]$ chown root.root /etc/logrotate.d/kingbase
[kingbase@node3 ~]$ ls -lh /etc/logrotate.d/kingbase
-rw-r--r--. 1 root root 492 Mar 1 15:52 /etc/logrotate.d/kingbase
[kingbase@node3 ~]$ chown kingbase.kingbase /etc/logrotate.d/kingbase
[kingbase@node3 ~]$ ls -lh /etc/logrotate.d/kingbase
-rw-r--r--. 1 kingbase kingbase 492 Mar 1 15:52 /etc/logrotate.d/kingbase #手工执行“sh /etc/logrotate.d/kingbase”
[kingbase@node3 bin]$ sh /etc/logrotate.d/kingbase
/etc/logrotate.d/kingbase: line 2: /home/kingbase/cluster/R6HA/KHA/kingbase/bin/../kbha.log: Permission denied
/etc/logrotate.d/kingbase: line 3: weekly: command not found
/etc/logrotate.d/kingbase: line 4: maxsize: command not found [kingbase@node3 kingbase]$ chmod u+x kbha.log
[kingbase@node3 kingbase]$ sh /etc/logrotate.d/kingbase
/etc/logrotate.d/kingbase: line 2: /home/kingbase/cluster/R6HA/KHA/kingbase/bin/../kbha.log: Text file busy
/etc/logrotate.d/kingbase: line 3: weekly: command not found
/etc/logrotate.d/kingbase: line 4: maxsize: command not found
Password:

=通过以上处理,在通过sys_monitor.sh脚本启动集群时,仍然出现“sh /etc/logrotate.d/kingbase"错误,故修改了sys_monitor.sh脚本后,问题解决。=

最新文章

  1. Allegro学习(http://www.asmyword.com/forum.php?mod=forumdisplay&fid=86)
  2. 使用Android Studio打Andorid apk包的流程
  3. PHPStorm下XDebug配置
  4. WinSock编程基础
  5. 深入理解jQuery插件开发(转)
  6. 6步骤实现CentOS系统环境精简优化
  7. PDF在xp或2003下正常在win7下乱码的问题
  8. Oracle查询指定某一天数据,日期匹配
  9. pychon - selenium2Libray源码简介
  10. Eclipse/MyEclipse中常用快捷键总结
  11. php类中双冒号和->的区别
  12. Go VS Code 调式常见问题处理
  13. ubuntu下搭建gtk+编程环境
  14. [CF1093E]Intersection of Permutations
  15. Java实现生产者与消费者模式
  16. jedis & common pool
  17. js循环json得到 键和值
  18. [随时更新][Android]小问题记录
  19. Java面试知识点之设计模式(一)
  20. JS截取字符串多余的为...

热门文章

  1. mysql备份数据库linux
  2. 说什么也要脱单——Python WEB开发:用Tornado框架制作简易【表白墙】网站
  3. 问题:CondaHTTPError: HTTP 000 CONNECTION FAILED for url <https://mirrors.tuna.tsinghua.edu.cn/anaconda/pk
  4. dubbo(九):timeout超时机制解析
  5. protobuf 的交叉编译使用(C++)
  6. JavaWEB-01-MySQL基础
  7. super详解(继承)
  8. DBUS接口和三极管反向电路
  9. input函数的使用
  10. 类型转换_float()函数