1. 问题描述

redhat在进行HA切换时,需要先停止service,并释放调当前主机占有的资源,比如说IP Address和Filesystem,但今天我在验证HA切换时,发现service一直停止失败,导致HA切换失败,具体报错信息如下

Jun 6 11:10:50 SHQZ-PS-IOT-SV2-BILL01 rgmanager[2529]: Stopping service service:service_bill

Jun 6 11:10:50 SHQZ-PS-IOT-SV2-BILL01 rgmanager[7526]: [ip] Removing IPv4 address 172.17.131.222/26 from eth0

Jun 6 11:10:51 SHQZ-PS-IOT-SV2-BILL01 ntpd[2152]: Deleting interface #7 eth0, 172.17.131.222#123, interface stats: received=0, sent=0, dropped=0, active_time=1542 secs

Jun 6 11:11:01 SHQZ-PS-IOT-SV2-BILL01 rgmanager[7670]: [fs] unmounting /billdata

Jun 6 11:11:06 SHQZ-PS-IOT-SV2-BILL01 rgmanager[7804]: [fs] unmounting /billdata

Jun 6 11:11:11 SHQZ-PS-IOT-SV2-BILL01 rgmanager[7938]: [fs] unmounting /billdata

Jun 6 11:11:11 SHQZ-PS-IOT-SV2-BILL01 rgmanager[7983]: [fs] 'umount /billdata' failed, error=1

Jun 6 11:11:11 SHQZ-PS-IOT-SV2-BILL01 rgmanager[2529]: stop on fs "billdata" returned 1 (generic error)

Jun 6 11:11:12 SHQZ-PS-IOT-SV2-BILL01 rgmanager[2529]: #12: RG service:service_bill failed to stop; intervention required

Jun 6 11:11:12 SHQZ-PS-IOT-SV2-BILL01 rgmanager[2529]: Service service:service_bill is failed

我这里只配置了IP Address和Filesystem两个资源,从上面的日志信息可以看到,首先删除IP Address,这个是没有问题的,接下来释放Filesystem资源,这时候就出问题了,rgmanage服务尝试unmounting /billdata 三次都失败了,导致service停止失败,HA切换也就失败了。

2. 解决方法

登陆到HA管理界面,点击"Resource",选择我配置的Filesystem资源



注意:勾选Force Unmount选项,然后点击"Apply"。

接下来,再次测试HA切换,现在资源都在SHQZ-PS-IOT-SV2-BILL01机器上,执行命令clusvcadm -r service_bill -m SHQZ-PS-IOT-SV2-BILL02,将服务切换到SHQZ-PS-IOT-SV2-BILL01上

# clustat查看集群状态
[root@SHQZ-PS-IOT-SV2-BILL01 cluster]# clustat
Cluster Status for cl_bill @ Wed Jun 6 14:13:42 2018
Member Status: Quorate Member Name ID Status
------ ---- ---- ------
SHQZ-PS-IOT-SV2-BILL02 2 Online, rgmanager
SHQZ-PS-IOT-SV2-BILL01 3 Online, Local, rgmanager Service Name Owner (Last) State
------- ---- ----- ------ -----
service:service_bill SHQZ-PS-IOT-SV2-BILL01 started # 切换服务到SHQZ-PS-IOT-SV2-BILL01上
[root@SHQZ-PS-IOT-SV2-BILL01 cluster]# clusvcadm -r service_bill -m SHQZ-PS-IOT-SV2-BILL02
Trying to relocate service:service_bill to SHQZ-PS-IOT-SV2-BILL02...Success
service:service_bill is now running on SHQZ-PS-IOT-SV2-BILL02

这次,我们总算是成功了,登陆SHQZ-PS-IOT-SV2-BILL02查看IP Address和Filesystem资源是否存在

[root@SHQZ-PS-IOT-SV2-BILL02 cluster]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN qlen 1000
link/ether 00:50:56:9e:34:c1 brd ff:ff:ff:ff:ff:ff
inet 172.17.131.224/26 brd 172.17.131.255 scope global eth0
inet 172.17.131.222/26 scope global secondary eth0
inet6 fe80::250:56ff:fe9e:34c1/64 scope link
valid_lft forever preferred_lft forever [root@SHQZ-PS-IOT-SV2-BILL02 cluster]# df -h /billdata
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg_bill-lvbilldata
985G 97G 838G 11% /billdata

可以看到资源都在SHQZ-PS-IOT-SV2-BILL02上了

3. 原因分析

勾选Force Unmount选项后,HA切换时,SHQZ-PS-IOT-SV2-BILL01日志如下:

Jun 6 12:41:05 SHQZ-PS-IOT-SV2-BILL01 rgmanager[2529]: Stopping service service:service_bill

Jun 6 12:41:05 SHQZ-PS-IOT-SV2-BILL01 rgmanager[5791]: [ip] Removing IPv4 address 172.17.131.222/26 from eth0

Jun 6 12:41:06 SHQZ-PS-IOT-SV2-BILL01 ntpd[2152]: Deleting interface #9 eth0, 172.17.131.222#123, interface stats: received=0, sent=0, dropped=0, active_time=1516 secs

Jun 6 12:41:15 SHQZ-PS-IOT-SV2-BILL01 rgmanager[5852]: [fs] unmounting /billdata

Jun 6 12:41:15 SHQZ-PS-IOT-SV2-BILL01 rgmanager[5918]: [fs] Sending SIGTERM to processes on /billdata

Jun 6 12:41:21 SHQZ-PS-IOT-SV2-BILL01 rgmanager[5953]: [fs] unmounting /billdata

Jun 6 12:41:24 SHQZ-PS-IOT-SV2-BILL01 rgmanager[2529]: Service service:service_bill is stopped

SHQZ-PS-IOT-SV2-BILL02日志如下:

Jun 6 12:41:24 SHQZ-PS-IOT-SV2-BILL02 rgmanager[2368]: Starting stopped service service:service_bill

Jun 6 12:41:25 SHQZ-PS-IOT-SV2-BILL02 rgmanager[12296]: [fs] mounting /dev/dm-2 on /billdata

Jun 6 12:41:25 SHQZ-PS-IOT-SV2-BILL02 rgmanager[12318]: [fs] mount -t ext4 /dev/dm-2 /billdata

Jun 6 12:41:25 SHQZ-PS-IOT-SV2-BILL02 kernel: EXT4-fs (dm-2): mounted filesystem with ordered data mode. Opts:

Jun 6 12:41:25 SHQZ-PS-IOT-SV2-BILL02 rgmanager[12404]: [ip] Adding IPv4 address 172.17.131.222/26 to eth0

Jun 6 12:41:29 SHQZ-PS-IOT-SV2-BILL02 rgmanager[2368]: Service service:service_bill started

Jun 6 12:41:29 SHQZ-PS-IOT-SV2-BILL02 ntpd[2123]: Listening on interface #6 eth0, 172.17.131.222#123 Enabled

通过分析上面的日志,我猜测原因可能是这样的:

如果/billdata这个目录一直有进程在访问(我的/billdata目录确实一直被进程访问),在勾选Force Unmount选项前,rgmanager服务在umount /billdata时,因为有进程在访问该目录,导致umount失败。

勾选Force Unmount后,rgmanger在umount /billdata时,如果有进程在占用/billdata目录,会发出SIGTERM信号给该进程,解除其对/billdata的占用,然后再umount,最后stop service。同时SHQZ-PS-IOT-SV2-BILL02上监测到service停止,就会重新启动service,接管IP Address和Filesystem资源

最新文章

  1. [Effective JavaScript 笔记]第34条:在原型中存储方法
  2. 卸载PythonToolKit的方法
  3. VS2012创建MVC3项目提示错误: 此模板尝试加载组件程序集 “NuGet.VisualStudio.Interop, Version=1.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a”。
  4. 配置samba服务器
  5. VMware vSphere Client的简单使用教程
  6. windows上制作懒人版MAC cdr/iso
  7. C++ socket开发1
  8. Present ViewController Modally
  9. python与数值计算环境搭建
  10. css3 3d初入门(一)
  11. Ubuntu 14.04下Redis安装报错:“You need tcl 8.5 or newer in order to run the Redis test”问题解决
  12. 基础5.jQuery常用事件
  13. Defraggler磁盘碎片整理工具,让你的电脑读写速度更快
  14. js遍历 for-of
  15. 20190326-HTML5标签、CSS的引用
  16. Running Web API using Docker and Kubernetes
  17. MySQL数据库实现分页查询的SQL语句写法!
  18. uva11865 二分流量+最小生成树
  19. css实现按钮固定在底部
  20. UE4开发安卓遇到的坑

热门文章

  1. 对拍(C++)
  2. Python 使用BrowserMob Proxy + selenium 获取Ajax加密数据
  3. 虚拟化技术之kvm虚拟机创建工具virt-install
  4. NOIP2007 树网的核 [提高组]
  5. 第1章 Hive入门
  6. golang 判断IPv4 or IPv6 address
  7. Vue生命周期,我奶奶看了都懂了
  8. webgl实现径向模糊
  9. Java中一个普通的循环为何从10开始到99连续相乘会得到0?
  10. mysql8.0的下载、安装、可视化软件(下载、安装、破解)