如何使用crash分析vmcore - 之基础思路case1

dmesg查看内核日志

[2493382.671020] systemd-shutdown[1]: Sending SIGKILL to PID 28975 (docker-containe).
[2493382.671078] systemd-shutdown[1]: Sending SIGKILL to PID 29015 (systemd).
[2493420.208723] EXT4-fs (nvme0n1p1): sb orphan head is 140906170
[2493420.209198] sb_info orphan list:
[2493420.209663] inode nvme0n1p1:140906170 at ffff88490edabfb8: mode 100666, nlink 0, next 149423507
[2493420.210129] inode nvme0n1p1:149423507 at ffff8801b99391a8: mode 100666, nlink 0, next 17567381
[2493420.210583] inode nvme0n1p1:17567381 at ffff8806d4a26998: mode 100744, nlink 0, next 17570510
[2493420.211050] inode nvme0n1p1:17570510 at ffff886387f82ef8: mode 100644, nlink 0, next 17570503
[2493420.211508] inode nvme0n1p1:17570503 at ffff886a1f15bfb8: mode 100644, nlink 0, next 241700498
[2493420.211966] inode nvme0n1p1:241700498 at ffff8877481800e8: mode 100644, nlink 0, next 243138756
[2493420.212431] inode nvme0n1p1:243138756 at ffff88761ad10518: mode 100644, nlink 0, next 241565954
[2493420.212900] inode nvme0n1p1:241565954 at ffff8870d64bbfb8: mode 100755, nlink 0, next 241566333
[2493420.213366] inode nvme0n1p1:241566333 at ffff88721ae74c48: mode 100644, nlink 0, next 241050093
[2493420.213833] inode nvme0n1p1:241050093 at ffff887704958948: mode 100755, nlink 0, next 241567324
[2493420.214545] ------------[ cut here ]------------
[2493420.219336] kernel BUG at fs/ext4/super.c:879! <<<======这里指明BUG的代码位置
[2493420.223948] invalid opcode: 0000 [#1] SMP
[2493420.228133] Modules linked in: kpatch_D751550(OE) kpatch_D631237(OE) unix_diag(E) af_packet_diag(E) netlink_diag(E) dccp_diag(E) dccp(E) tcp_diag(E) udp_diag(E) inet_diag(E) [last unloaded: aisqos_hotfixes]
[2493420.246846] CPU: 58 PID: 1 Comm: systemd-shutdow Tainted: G W OE K 4.9.79-009.ali3000.alios7.x86_64 #1
[2493420.257009] Hardware name: Inventec AliServer Thor01-2U /TB800G4-G1 , BIOS A1.20 03/06/2018
[2493420.267339] task: ffff887e45918000 task.stack: ffffc90000014000
[2493420.273425] RIP: 0010:[<ffffffffa031a8df>] [<ffffffffa031a8df>] ext4_put_super+0x36f/0x3c0 [ext4] <<<=======这里指明BUG的代码位置
[2493420.282593] RSP: 0018:ffffc90000017de8 EFLAGS: 00010206
[2493420.288079] RAX: ffff88490edabf50 RBX: ffff887e43299000 RCX: 00000001949b336d
[2493420.295384] RDX: 0000000000000000 RSI: 0000000000000206 RDI: 0000000000000206
[2493420.302682] RBP: ffffc90000017e18 R08: 00000000000081a4 R09: 0000000000000000
[2493420.309988] R10: 0000000000000cb8 R11: 0000000000001e92 R12: ffff887e43299278
[2493420.317293] R13: ffff887e43298800 R14: ffff887e43299278 R15: ffffffffa034ff88
[2493420.324598] FS: 00007f3241ccf840(0000) GS:ffff887e78480000(0000) knlGS:0000000000000000
[2493420.332850] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2493420.338767] CR2: 00007f5e1372fbd0 CR3: 00000004daa52000 CR4: 00000000007606f0
[2493420.346065] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[2493420.353361] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[2493420.360660] PKRU: 55555554
[2493420.363536] Stack:
[2493420.365721] 9cbae75a00000000 ffff887e43298800 ffffffffa034a5e0 ffff887e3818c7b8
[2493420.373365] 0000000000000000 ffff887e45918bb0 ffffc90000017e38 ffffffff81244aaf
[2493420.380991] 0000000000000083 ffff887e357b8680 ffffc90000017e58 ffffffff81244e37
[2493420.388617] Call Trace:
[2493420.391239] [<ffffffff81244aaf>] generic_shutdown_super+0x6f/0x100
[2493420.397676] [<ffffffff81244e37>] kill_block_super+0x27/0x70
[2493420.403508] [<ffffffff81244f73>] deactivate_locked_super+0x43/0x70
[2493420.409945] [<ffffffff8124547a>] deactivate_super+0x5a/0x60
[2493420.415770] [<ffffffff81264b2f>] cleanup_mnt+0x3f/0x90
[2493420.421169] [<ffffffff81264bc2>] __cleanup_mnt+0x12/0x20
[2493420.426733] [<ffffffff810a7b50>] task_work_run+0x80/0xa0
[2493420.432306] [<ffffffff810032ba>] exit_to_usermode_loop+0xaa/0xb0
[2493420.438572] [<ffffffff81003baa>] syscall_return_slowpath+0xaa/0xb0
[2493420.445011] [<ffffffff8171a783>] entry_SYSCALL_64_fastpath+0xc3/0xc5
[2493420.451623] Code: 60 04 00 00 48 8b 80 e0 00 00 <0f> 0b 49 c7 c7 88 ff 34 a0 49 8b
[2493420.459829] RIP [<ffffffffa031a8df>] ext4_put_super+0x36f/0x3c0 [ext4]
[2493420.466633] RSP <ffffc90000017de8>
crash>

通过dmesg日志,我们可以通过两个方法判断 bug的代码位置:

1. [2493420.219336] kernel BUG at fs/ext4/super.c:879!

2. [2493420.273425] RIP: 0010:[<ffffffffa031a8df>]  [<ffffffffa031a8df>] ext4_put_super+0x36f/0x3c0 [ext4]
其中(0x36f代表和ext4_put_super函数入口的偏移量,0x3c0是基准地址 )

从2找到代码crash的具体位置:

(gdb) p 0x36f
$11 = 879

反汇编函数,找到位置

crash> dis -l ext4_put_super

在crash中查看代码

crash本身是可以查看代码的,前提是你需要加载模块, 比如:

加载模块ext4:

crash> mod -s ext4
crash> mod <<----列出所有的模块

第879行:

crash> l *ext4_put_super+0x36f
0xffffffffa031a8df is in ext4_put_super (fs/ext4/super.c:879).
874 * isn't empty. The on-disk one can be non-empty if we've
875 * detected an error and taken the fs readonly, but the
876 * in-memory list had better be clean by this point. */
877 if (!list_empty(&sbi->s_orphan))
878 dump_orphan_list(sb, sbi);
879 J_ASSERT(list_empty(&sbi->s_orphan));
880
881 sync_blockdev(sb->s_bdev);
882 invalidate_bdev(sb->s_bdev);
883 if (sbi->journal_bdev && sbi->journal_bdev != sb->s_bdev) {

只有当我们找到具体的代码,才能进一步分析代码,究竟为什么会crash,比如,这个函数的参数(可能是某个struct)的值到底是什么?

bt打印栈

bt栈[exception RIP: ext4_put_super+879] 有可以看到是在 函数ext4_put_super 的第879行

crash> bt
PID: 1 TASK: ffff887e45918000 CPU: 58 COMMAND: "systemd-shutdow"
#0 [ffffc90000017a58] machine_kexec at ffffffff810603e8
#1 [ffffc90000017ab8] __crash_kexec at ffffffff811211cd
#2 [ffffc90000017b80] __crash_kexec at ffffffff811212a5
#3 [ffffc90000017b98] crash_kexec at ffffffff811212eb
#4 [ffffc90000017bb8] oops_end at ffffffff81030905
#5 [ffffc90000017be0] die at ffffffff81030ddb
#6 [ffffc90000017c10] do_trap at ffffffff8102df02
#7 [ffffc90000017c60] do_error_trap at ffffffff8102e2d9
#8 [ffffc90000017d20] do_invalid_op at ffffffff8102e830
#9 [ffffc90000017d30] invalid_op at ffffffff8171b63e
[exception RIP: ext4_put_super+879]
RIP: ffffffffa031a8df RSP: ffffc90000017de8 RFLAGS: 00010206
RAX: ffff88490edabf50 RBX: ffff887e43299000 RCX: 00000001949b336d
RDX: 0000000000000000 RSI: 0000000000000206 RDI: 0000000000000206
RBP: ffffc90000017e18 R8: 00000000000081a4 R9: 0000000000000000
R10: 0000000000000cb8 R11: 0000000000001e92 R12: ffff887e43299278
R13: ffff887e43298800 R14: ffff887e43299278 R15: ffffffffa034ff88
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#10 [ffffc90000017de0] ext4_put_super at ffffffffa031a91c [ext4]
#11 [ffffc90000017e20] generic_shutdown_super at ffffffff81244aaf
#12 [ffffc90000017e40] kill_block_super at ffffffff81244e37
#13 [ffffc90000017e60] deactivate_locked_super at ffffffff81244f73
#14 [ffffc90000017e80] deactivate_super at ffffffff8124547a
#15 [ffffc90000017e98] cleanup_mnt at ffffffff81264b2f
#16 [ffffc90000017eb0] __cleanup_mnt at ffffffff81264bc2
#17 [ffffc90000017ec0] task_work_run at ffffffff810a7b50
#18 [ffffc90000017f00] exit_to_usermode_loop at ffffffff810032ba
#19 [ffffc90000017f30] syscall_return_slowpath at ffffffff81003baa
#20 [ffffc90000017f50] entry_SYSCALL_64_fastpath at ffffffff8171a783
RIP: 00007f3241195c47 RSP: 00007fffb3db5438 RFLAGS: 00000246
RAX: 0000000000000000 RBX: 0000560b87fbd920 RCX: 00007f3241195c47
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000560b87fbdd10
RBP: 0000560b87fbda00 R8: 0000000000000000 R9: 00007f32410e416d
R10: 0000000000000021 R11: 0000000000000246 R12: 0000560b87fbdd10
R13: 00007fffb3db5538 R14: 00007fffb3db5523 R15: 0000000000000000
ORIG_RAX: 00000000000000a6 CS: 0033 SS: 002b
crash>

反汇编上下函数

当我们,分析到了出错的具体的代码行,下一步需要分析,传入的参数和struct

首先,我们需要看下 函数 ext4_put_super的原型,发现是static void ext4_put_super(struct super_block *sb),只有一个参数, 而且是一个结构体struct super_block, 现在我们需要知道 *sb 指针的地址是多少呢? 那这个地址肯定是 上个函数 generic_shutdown_super 传递给它的.

现在分析的关键是,我们需要知道,当generic_shutdown_superffffffff81244aaf 处,调用到 ext4_put_super的时候,传给 ext4_put_super 的指针地址是多少?

首先,需要 反汇编 函数generic_shutdown_super 找到地址ffffffff81244aaf

crash> dis -l generic_shutdown_super
/usr/src/debug/kernel-4.9.79-009.ali3000/linux-4.9.79-009.ali3000.alios7.x86_64/fs/super.c: 436
0xffffffff81244aa0 <generic_shutdown_super+96>: mov 0x30(%r12),%rax
0xffffffff81244aa5 <generic_shutdown_super+101>: test %rax,%rax
0xffffffff81244aa8 <generic_shutdown_super+104>: je 0xffffffff81244aaf <generic_shutdown_super+111>
/usr/src/debug/kernel-4.9.79-009.ali3000/linux-4.9.79-009.ali3000.alios7.x86_64/fs/super.c: 437
0xffffffff81244aaa <generic_shutdown_super+106>: mov %rbx,%rdi <===rbx 和 rdi 数据一致
0xffffffff81244aad <generic_shutdown_super+109>: callq *%rax <===在这里调用下个函数
/usr/src/debug/kernel-4.9.79-009.ali3000/linux-4.9.79-009.ali3000.alios7.x86_64/include/linux/compiler.h: 243
0xffffffff81244aaf <generic_shutdown_super+111>: mov 0x608(%rbx),%rax
/usr/src/debug/kernel-4.9.79-009.ali3000/linux-4.9.79-009.ali3000.alios7.x86_64/fs/super.c: 439
0xffffffff81244ab6 <generic_shutdown_super+118>: lea 0x608(%rbx),%rdx
0xffffffff81244abd <generic_shutdown_super+125>: cmp %rax,%rdx
0xffffffff81244ac0 <generic_shutdown_super+128>: jne 0xffffffff81244b1f <generic_shutdown_super+223>

接着,反汇编ext4_put_super , 你会发现push了很多的寄存器的值到stack

crash> dis -l ext4_put_super
/usr/src/debug/kernel-4.9.79-009.ali3000/linux-4.9.79-009.ali3000.alios7.x86_64/fs/ext4/super.c: 824
0xffffffffa031a570 <ext4_put_super>: nopl 0x0(%rax,%rax,1) [FTRACE NOP]
0xffffffffa031a575 <ext4_put_super+5>: push %rbp
0xffffffffa031a576 <ext4_put_super+6>: mov %rsp,%rbp
0xffffffffa031a579 <ext4_put_super+9>: push %r15 <===第1个寄存器入栈
0xffffffffa031a57b <ext4_put_super+11>: push %r14 <===第2个寄存器入栈
0xffffffffa031a57d <ext4_put_super+13>: push %r13 <===第3个寄存器入栈
0xffffffffa031a57f <ext4_put_super+15>: push %r12 <===第4个寄存器入栈
0xffffffffa031a581 <ext4_put_super+17>: mov %rdi,%r13
0xffffffffa031a584 <ext4_put_super+20>: push %rbx <===第5个寄存器入栈(rbx是在上个函数的时候,就有值的,所以,ext4_put_super函数的第一个参数的指针的地址就是这个寄存器的值)
0xffffffffa031a585 <ext4_put_super+21>: sub $0x8,%rsp
0xffffffffa031a589 <ext4_put_super+25>: mov 0x460(%rdi),%rbx
/usr/src/debug/kernel-4.9.79-009.ali3000/linux-4.9.79-009.ali3000.alios7.x86_64/fs/ext4/super.c: 826
0xffffffffa031a590 <ext4_put_super+32>: mov 0xe0(%rbx),%r14
/usr/src/debug/kernel-4.9.79-009.ali3000/linux-4.9.79-009.ali3000.alios7.x86_64/fs/ext4/super.c: 830
0xffffffffa031a597 <ext4_put_super+39>: callq 0xffffffffa03133f0 <ext4_unregister_li_request>
crash> bt -f
#10 [ffffc90000017de0] ext4_put_super at ffffffffa031a91c [ext4]
ffffc90000017de8: 9cbae75a00000000( ) ffff887e43298800(第5个寄存器的值)
ffffc90000017df8: ffffffffa034a5e0(第4个寄存器的值) ffff887e3818c7b8(第3个寄存器的值)
ffffc90000017e08: 0000000000000000(第2个寄存器的值) ffff887e45918bb0(第1个寄存器的值)
ffffc90000017e18: ffffc90000017e38 ffffffff81244aaf(这两个是不代表寄存器的)
#11 [ffffc90000017e20] generic_shutdown_super at ffffffff81244aaf
ffffc90000017e28: 0000000000000083 ffff887e357b8680
ffffc90000017e38: ffffc90000017e58 ffffffff81244e37
crash> struct super_block ffff887e43298800
struct super_block {
s_list = {
next = 0xffffffff81cb3db0 <super_blocks>, <=======这里也验证了,就是地址ffff887e43298800表示的就是 struct super_block
prev = 0xffff887e43968800
},
s_dev = 271581185,
s_blocksize_bits = 12 '\f',
s_blocksize = 4096,
s_maxbytes = 17592186040320,
s_type = 0xffffffffa03589c0 <ext4_fs_type>,
s_op = 0xffffffffa034a5e0 <ext4_sops>,
dq_op = 0xffffffffa034a720 <ext4_quota_operations>,
s_qcop = 0xffffffff81843f60 <dquot_quotactl_sysfile_ops>,
s_export_op = 0xffffffffa034a580 <ext4_export_ops>,
s_flags = 805371904,
s_iflags = 1,
s_magic = 61267,
s_root = 0x0,
s_umount = {
count = {
counter = -4294967295
},
wait_list = {
next = 0xffff887e43298878,
prev = 0xffff887e43298878
},
wait_lock = {
raw_lock = {
val = {
counter = 0
}
}

Refers

https://blog.csdn.net/u013982161/article/details/51347944

最新文章

  1. [Linux]Linux下安装和配置solr/tomcat/IK分词器 详细实例一.
  2. 关于Repository、IUnitOfWork 在领域层和应用服务层之间的代码分布与实现
  3. ios最新的视频地址链接
  4. Jquery下拉列表添加移除数据
  5. afnetworking报错pointer being freed was not allocated
  6. c++实现kd树
  7. tar-usage
  8. rabbitmq server的安装以及常用的命令
  9. C++Primer 第十五章
  10. Apache Spark Shark的简介
  11. Spring学习笔记(二)
  12. oracle11g用户名密码不区分大小写
  13. Linux PCI网卡驱动的详细分析
  14. Maven真——聚合和继承(于)
  15. nginx HTTP/2.0 配置
  16. kettle 备注
  17. AI佳作解读系列(三)——深度学习中的合成数据研究
  18. 3dmax 笔记本电脑
  19. 用SQL语句创建和删除Access数据库中的表;添加列和删除列
  20. 【工具相关】Web-Sublime Text2-安装插件HTMLPrettify

热门文章

  1. c# Winform上传文件
  2. projecteuler----&amp;gt;problem=12----Highly divisible triangular number
  3. 关于The hierarchy of the type TestBeforeAdvice is inconsistent的问题
  4. php语法错误导致服务器错误(500)解决
  5. linux下的C语言开发 GDB的例子
  6. mysql workbench的简单使用
  7. ionic之AngularJS扩展动态组件
  8. linux Java环境变了配置
  9. ASP.NET XML文件
  10. springboot与dubbo整合遇到的坑