查看问题详情

编号项目分类查看权限最后更新
0000017Anolis OS 8kernelpublic2021-07-23 10:29
报告员wb-zmy745940 分派给cherryliyumei  
优先级low严重性minor出现频率always
状态 assigned处理状况open 
平台x86_64操作系统Anolis OS操作系统版本8
产品版本8.2-rc1 
目标版本8.2 正式版 
标题0000017: [Anolis 8.2-RC1-4.18-x86]kdump配置crashkernel=0M-2G:0M,2G-8G:192M,8G-:256M,erho c触发异常后,系统起不来
描述[缺陷描述]:

kdump配置crashkernel=0M-2G:0M,2G-8G:192M,8G-:256M,erho c > /proc/sysrq-trigger触发异常后,系统起不来


# cat /proc/cmdline
BOOT_IMAGE=(hd0,msdos1)/vmlinuz-4.18.0-193.el8.x86_64 root=/dev/mapper/ao-root ro mem=0 crashkernel=0M-2G:0M,2G-8G:192M,8G-:256M resume=/dev/mapper/ao-swap rd.lvm.lv=ao/root rd.lvm.lv=ao/swap rhgb quiet
# cat /sys/kernel/kexec_crash_size
268435456


[重现概率]

必现



[重现环境]

Host:虚拟机 ,x86

OS:Anolis OS release 8.2

kernel:4.18.0-193.el8.x86_64



# lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              4
On-line CPU(s) list: 0-3
Thread(s) per core:  2
Core(s) per socket:  1
Socket(s):           2
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               85
Model name:          Intel(R) Xeon(R) Platinum 8269CY CPU @ 2.50GHz
Stepping:            7
CPU MHz:             2500.000
BogoMIPS:            5000.00
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            1024K
L3 cache:            36608K
NUMA node0 CPU(s):   0-3
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti ibrs ibpb stibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves avx512_vnni



# cat /proc/meminfo
MemTotal:       16254768 kB
MemFree:        13529776 kB
MemAvailable:   15595552 kB
Buffers:            3268 kB
Cached:          2296212 kB
SwapCached:            0 kB
Active:           860072 kB
Inactive:        1600944 kB
Active(anon):     153496 kB
Inactive(anon):    24792 kB
Active(file):     706576 kB
Inactive(file):  1576152 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       1888252 kB
SwapFree:        1888252 kB
Dirty:                96 kB
Writeback:             0 kB
AnonPages:        156916 kB
Mapped:           170500 kB
Shmem:             26580 kB
KReclaimable:      72452 kB
Slab:             178440 kB
SReclaimable:      72452 kB
SUnreclaim:       105988 kB
KernelStack:        4292 kB
PageTables:         8784 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    10015636 kB
Committed_AS:    1355408 kB
VmallocTotal:   34359738367 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
Percpu:             1808 kB
HardwareCorrupted:     0 kB
AnonHugePages:     36864 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:      147328 kB
DirectMap2M:     5095424 kB
DirectMap1G:    13631488 kB


[重现步骤]:

1.安装相关包,重启kdump服务

yum -y install crash kexec-tools

systemctl restart kdump

2.修改grub文件,配置crashkernel,重启

sed -i 's/\(^GRUB_CMDLINE_LINUX=.*\)\(crashkernel\)=[^ ]\+ /\1 mem=0 \2=0M-2G:0M,2G-8G:192M,8G-:256M /' /etc/default/grub

grub2-mkconfig -o /boot/grub2/grub.cfg

shutdown -r now

3.查看crashkernel内存是否预留成功

cat /sys/kernel/kexec_crash_size

4.触发异常

echo c >/proc/sysrq-trigger

5.查看系统是否生成vmcore

ll /var/crash

6.crash解析vmcore



[期望结果]:

触发异常后,可生成vmcore,并crash可正常解析



[实际结果]:

echo c >/proc/sysrq-trigger 触发异常后,系统起不来



[原因定位]:

crashkernel设置为256M,触发异常后,系统起不来

设置crashkernel=0M-2G:0M,2G-8G:192M,8G-:512M,触发异常后,系统可起来,可正常生成vmcore;

# cat /proc/cmdline
BOOT_IMAGE=(hd0,msdos1)/vmlinuz-4.18.0-193.el8.x86_64 root=/dev/mapper/ao-root ro mem=0 crashkernel=0M-2G:0M,2G-8G:192M,8G-:512M resume=/dev/mapper/ao-swap rd.lvm.lv=ao/root rd.lvm.lv=ao/swap rhgb quiet

# cat /sys/kernel/kexec_crash_size
536870912
# cd /var/crash/127.0.0.1-2021-03-10-15\:39\:12/
# ll
total 111680
-rw-------. 1 root root 114313580 Mar 11 04:39 vmcore
-rw-r--r--. 1 root root     40970 Mar 11 04:39 vmcore-dmesg.txt





[修复建议]: 
标签没加标签.

活动

wb-wpp899309

2021-04-13 15:13

报告者   ~0000068

rc2内核:4.18.0-193.el8.x86_64 同样有这个问题
1、启动参数:
BOOT_IMAGE=(hd0,msdos1)/vmlinuz-4.18.0-193.el8.x86_64 root=/dev/mapper/ao-root ro crashkernel=0M-2G:0M,2G-8G:192M,8G-:256M resume=/dev/mappe/ao-swap rd.lvm.lv=ao/root rd.lvm.lv=ao/swap rhgb quiet
2、内存:# cat /proc/meminfo
MemTotal: 16156464 kB
MemFree: 15475960 kB
MemAvailable: 15596584 kB
Buffers: 3332 kB
Cached: 368824 kB
SwapCached: 0 kB
Active: 204508 kB
Inactive: 287424 kB
Active(anon): 120848 kB
Inactive(anon): 17332 kB
Active(file): 83660 kB
Inactive(file): 270092 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 528380 kB
SwapFree: 528380 kB
Dirty: 108 kB
Writeback: 0 kB
AnonPages: 115728 kB
Mapped: 112824 kB
Shmem: 18408 kB
KReclaimable: 40100 kB
Slab: 116172 kB
SReclaimable: 40100 kB
SUnreclaim: 76072 kB
KernelStack: 4288 kB
PageTables: 8380 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 8606612 kB
Committed_AS: 997084 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 0 kB
VmallocChunk: 0 kB
Percpu: 1568 kB
HardwareCorrupted: 0 kB
AnonHugePages: 26624 kB
ShmemHugePages: 0 kB
ShmemPmdMapped: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 0 kB
DirectMap4k: 114560 kB
DirectMap2M: 4079616 kB
DirectMap1G: 14680064 kB

xuehaolin

2021-04-16 16:10

开发人员   ~0000089

最后编辑: 2021-04-16 16:14

目前我们的内核不支持crashkernel=auto这个参数,只能手动配置crashkernel的值,而这个值的大小和系统内存大小有关。

crashkernel设置为256M,触发异常后,系统起不来是因为crashkernel值设置太小的问题。
设置crashkernel=0M-2G:0M,2G-8G:192M,8G-:512M,触发异常后,系统可起来,可正常生成vmcore。
# cat /sys/kernel/kexec_crash_size
536870912
可以看到kexec_crash_size被设置为512M了。

需要内核SIG组同志确认一下这个要支持不?

jacobwang

2021-05-06 10:44

经理   ~0000132

该问题优先级低, 不影响发布

wb-wpp899309

2021-06-30 15:04

报告者   ~0000294

# uname -r
4.18.0-305.an8.x86_64
[root@VM20210305-3 wb-wpp899309]# cat /etc/os-release
NAME="Anolis OS"
VERSION="8.4"
ID="anolis"
ID_LIKE="rhel fedora centos"
VERSION_ID="8.4"
PLATFORM_ID="platform:an8"
PRETTY_NAME="Anolis OS 8.4"
ANSI_COLOR="0;31"
HOME_URL="https://openanolis.cn/"
有相同的问题

geliwei-ali

2021-07-23 10:29

经理   ~0000328

RHCK的默认参数是crash=auto,如果测试case是8G以上分配256内存那需要确认,是不是256是合理的值,改成512后是不是足够用了(比如32G内存的情况)

问题历史

日期 用户名 字段 更改
2021-03-19 11:16 wb-zmy745940 新建问题
2021-03-19 11:16 wb-zmy745940 状态 新建 => 已分配
2021-03-19 11:16 wb-zmy745940 分派给 => geliwei-ali
2021-03-19 11:39 wb-zmy745940 分类 kdump-anaconda-addon => kexec-tools
2021-03-19 22:44 swordantcs 产品版本 8.2 正式版 => 8.2-rc1
2021-03-19 22:45 swordantcs 目标版本 => 8.2-rc2
2021-03-26 10:19 jacobwang 分派给 geliwei-ali => xuehaolin
2021-04-02 03:22 swordantcs 分类 kexec-tools => (无分类)
2021-04-02 12:21 swordantcs 分类 (无分类) => kernel
2021-04-13 15:13 wb-wpp899309 注释已添加: 0000068
2021-04-16 16:10 xuehaolin 注释已添加: 0000089
2021-04-16 16:14 xuehaolin 注释已编辑: 0000089
2021-04-26 10:31 xuehaolin 分派给 xuehaolin => cherryliyumei
2021-05-06 10:43 jacobwang 优先级 中 => 低
2021-05-06 10:44 jacobwang 注释已添加: 0000132
2021-05-06 11:02 jacobwang 目标版本 8.2-rc2 => 8.2 正式版
2021-06-30 15:04 wb-wpp899309 注释已添加: 0000294
2021-07-23 10:29 geliwei-ali 注释已添加: 0000328