Problem Description
---------------------------------------------------
Tue Sep 01 04:05:33 2020
skgxpvfynet: mtype: 61 process 417356 failed because of a resource problem in the OS. The OS has most likely run out of buffers (rval: 4)
Errors in file /u01/app/oracle/diag/rdbms/syntong/syntong1/trace/syntong1_w001_417356.trc (incident=96021):
ORA-00603: ORACLE server session terminated by fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:sendmsg failed with status: 105
ORA-27301: OS failure message: No buffer space available
ORA-27302: failure occurred at: sskgxpsnd2
Incident details in: /u01/app/oracle/diag/rdbms/syntong/syntong1/incident/incdir_96021/syntong1_w001_417356_i96021.trc
opidrv aborting process W001 ospid (417356) as a result of ORA-603
 
Error Codes
---------------------------------------------------
ORA-00603 ORA-27504 ORA-27300 ORA-27301 ORA-27302
 
根据MOS文档查询,发现匹配文档中的105错误:
 
STATUS 105 - ENOBUFS - No buffer space available
This error means that a socket cannot be created until resources are freed.

ORA-00603: ORACLE server session terminated by fatal error

ORA-27504: IPC error creating OSD context

ORA-27300: OS system dependent operation:sendmsg failed with status: 105

ORA-27301: OS failure message: No buffer space available

ORA-27302: failure occurred at: sskgxpsnd2

See: Oracle Linux: ORA-27301:OS Failure Message: No Buffer Space Available (Doc ID 2041723.1)

查看MTU:

MTU of loopback adapter is too high. We can use the command "netstat -in" see the current MTU size

Linux: #netstat -in

Kernel Interface table
   Iface   MTU   Met   RX-OK   RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
   eth0   1500    0    1371747  0      0      0     1858   0       0     0     BMRU
   lo     65536   0      46943  0      0      0    46943   0       0     0     LRU
   virbr0 1500    0          0  0      0      0       32   0       0     0     BMRU

文档提供的解决方案:

Doc ID 2041723.1:

CAUSE

This happens due to less space available for network buffer reservation.

SOLUTION

1. On servers with High Physical Memory, the parameter vm.min_free_kbytes should be set in the order of 0.4% of total Physical Memory. This helps in keeping a larger range of defragmented memory pages available for network buffers reducing the probability of a low-buffer-space conditions.

*** For example, on a server which is having 256GB RAM, the parameter vm.min_free_kbytes should be set to 1073742 ***

On NUMA Enabled Systems, the value of vm.min_free_kbytes should be multiplied by the number of NUMA nodes since the value is to be split across all the nodes.

 
On NUMA Enabled Systems, the value of vm.min_free_kbytes = n * 0.4% of total Physical Memory. Here 'n' is the number of NUMA nodes.

2. Additionally, the MTU value should be modified as below

#ifconfig lo mtu 16436

To make the change persistent over reboot add the following line in the file /etc/sysconfig/network-scripts/ifcfg-lo :

MTU=16436

Save the file and restart the network service to load the changes

#service network restart
Note : While making the changes in CRS nodes, if network is restarted while CRS is up, it can hung CRS. So cluster services should be stopped prior to the network restart.

vm.min_free_kbytes

该参数表示Linux VM最低保留多少的空闲内存空间,当可用的内存低于配置参数时,系统会进行cache内存的回收,来进行内存的释放。

单位是kb

以下是我整理的苏大文正解决方案步骤:

SOLUTION:
 
 一、前期准备

1、检查集群及数据库运行情况

#su - grid

#crs_stat -t

#su - oracle

#sqlplus / as sysdba

SQL>select INST_ID,INSTANCE_NUMBER,INSTANCE_NAME,STATUS,DATABASE_STATUS,INSTANCE_ROLE from gv$Instance;

2、检查数据库备份情况

SQL>

col INPUT_BYTES_DISPLAY for a10
col OUTPUT_BYTES_DISPLAY for a10
col TIME_TAKEN_DISPLAY for a10
set line222
 
select input_type,
       status,
       to_char(start_time,
               'yyyy-mm-dd hh24:mi:ss'),
       to_char(end_time,
               'yyyy-mm-dd hh24:mi:ss'),
       input_bytes_display,
       output_bytes_display,
       time_taken_display
  from v$rman_backup_job_details
 where start_time > date '2020-09-10'
 order by 3 desc;
 

3、备份节点一/etc/sysctl.conf文件

#cp /etc/sysctl.conf /home/oracle/pst

 

二、配置调整

集群环境需要先关闭CRS集群服务,修改网络配置会Hang住CRS;修改系统参数需要重启数据库;

顺序关闭 数据库,集群 (共三套库:syntong1,carddb1,urpdb1)

1、关闭数据库:
#su - grid
#srvctl stop instance -d syntong -i syntong1
#srvctl stop instance -d syntong -i syntong2
#srvctl stop instance -d carddb -i carddb1
#srvctl stop instance -d carddb -i carddb2
#srvctl stop instance -d urpdb -i urpdb1
#srvctl stop instance -d urpdb -i urpdb2
#which crsctl
 
2、关闭集群:
#su - root
#cd /u01/app/grid/product/11.2.0/db_1/bin/crsctl
#crsctl stop cluster -all
 

节点一:(由于节点一报错ora-2700*,因此修改节点一)

3、修改网络配置:

修改配置文件

#vi /etc/sysconfig/network-scripts/ifcfg-lo

MTU=16436

重启网络服务

# systemctl restart network

 

4、修改系统参数:

设定 vm.min_free_kbytes 参数为物理内存的0.4%

本机内存大小为131357180 Kb,则配置参数大小为131357180*0.4%≈525429

#vi /etc/sysctl.conf

vm.min_free_kbytes = 525429

生效

#sysctl -p

顺序开启 集群,数据库 (共三套库:syntong1,carddb1,urpdb1)

5、开启集群:
#su - root
#cd /u01/app/grid/product/11.2.0/db_1/bin/crsctl
#crsctl start cluster -all
 
6、开启数据库:
#su - grid
#srvctl start instance -d syntong -i syntong1
#srvctl start instance -d syntong -i syntong2
#srvctl start instance -d carddb -i carddb1
#srvctl start instance -d carddb -i carddb2
#srvctl start instance -d urpdb -i urpdb1
#srvctl start instance -d urpdb -i urpdb2

检查集群及数据库运行情况

#su - grid

#crs_stat -t

#su - oracle

#sqlplus / as sysdba

SQL>select INST_ID,INSTANCE_NUMBER,INSTANCE_NAME,STATUS,DATABASE_STATUS,INSTANCE_ROLE from gv$Instance;

三、后续观察
查看是否继续出现ORA-2700*错误:
#su - oracle
#adrci
adrci>show problem
adrci>show incident
#cd $ORACLE_BASE/diag/rdbms/syntong/syntong1/trace
#tail -f alert_syntong1.log
 
 

最新文章

  1. 用bootstrap实现多张图片手动轮回
  2. 动态规划(DP)基础
  3. Andriod Dialog 加载框 自定义,公用
  4. Android 一键直接查看Sqlite数据库数据
  5. InitializingBean afterPropertiesSet
  6. Js文本溢出自动添加省略号ellipsis
  7. hiho #1014 : Trie树
  8. Android之TelephonyManager类的使用案例
  9. 一、记一次失败的 CAS 搭建 之 环境配置
  10. struts2-Action处理请求参数
  11. 张高兴的 UWP 开发笔记:应用内启动应用 (UWP Launch UWP)
  12. hive的高级查询(group by、 order by、 join 、 distribute by、sort by、 clusrer by、 union all等)
  13. mac下安装Maven和配置环境变量
  14. org.hibernate.AssertionFailure: null id in xxx entry (don't flush the Session after an exception occurs)
  15. reshape的两个函数melt和dcast
  16. 判断是否滚动加载结束 用一个公共变量isScroll来控制
  17. dart基础计数器
  18. bootstrap基础学习小记(一)简介模板、全局样式
  19. [BZOJ 1013][JSOI 2008] 球形空间产生器sphere 题解(高斯消元)
  20. PAT 1069 1070 1071 1072

热门文章

  1. JavaScript学习系列博客_5_JavaScript中的强制类型转换
  2. 设置Anaconda启动jupyter的默认目录
  3. Python 判断ip是否属于网段
  4. 自我介绍网页填写表格PHP,JavaScript,html,css代码
  5. html的JavaScript的简单输入验证
  6. python编程入门笔记
  7. php 解决表单重复提交实现方法介绍
  8. 牛客网数据库SQL实战解析(21-30题)
  9. java23种设计模式—— 二、单例模式
  10. 华为SEO搜索引擎主管招聘内容