查看问题详情

编号项目分类查看权限最后更新
0000509Anolis OS 8kernelpublic2021-11-29 16:00
报告员wb-wpp899309 分派给fghuims  
优先级high严重性major出现频率always
状态 assigned处理状况open 
平台aarch64操作系统Anolis OS操作系统版本8
标题0000509: [Anolis OS 8.2][4.19.91-25.rc1] [aarch64]kernel crash:Unable to handle kernel NULL pointer dereference at virtual address 000000
描述[缺陷描述]:
跑stress-ng压力测试,系统立马出现crash

vmcore部分信息:
[ 1531.077555] mmap: stress-ng (3661) uses deprecated remap_file_pages() syscall. See Documentation/vm/remap_file_pages.rst.
[ 1532.141477] Unable to handle kernel NULL pointer dereference at virtual address 00000000000006d8
[ 1532.150415] Mem abort info:
[ 1532.153247] ESR = 0x96000006
[ 1532.156337] Exception class = DABT (current EL), IL = 32 bits
[ 1532.162339] SET = 0, FnV = 0
[ 1532.165424] EA = 0, S1PTW = 0
[ 1532.168791] Data abort info:
[ 1532.171894] ISV = 0, ISS = 0x00000006
[ 1532.175968] CM = 0, WnR = 0
[ 1532.178965] user pgtable: 4k pages, 48-bit VAs, pgdp = 0000000021b47eb8
[ 1532.185675] [00000000000006d8] pgd=000020a9fe92b003, pud=000020a9fe92c003, pmd=0000000000000000
[ 1532.194508] Internal error: Oops: 96000006 [#1] SMP
[ 1532.199450] Modules linked in: unix_diag(E+) tun(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) bonding(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nf_tables_set(E) nft_chain_nat_ipv6(E) nf_nat_ipv6(E) nft_chain_route_ipv6(E) nft_chain_nat_ipv4(E) nf_nat_ipv4(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) libcrc32c(E) nft_chain_route_ipv4(E) ip6_tables(E) nft_compat(E) ip_set(E) nf_tables(E) nfnetlink(E) vfat(E) fat(E) ipmi_ssif(E) aes_ce_blk(E) crypto_simd(E) cryptd(E) aes_ce_cipher(E) mousedev(E) crc32_ce(E) crct10dif_ce(E) hibmc_drm(E) ghash_ce(E) ttm(E) sha2_ce(E) drm_kms_helper(E) sha256_arm64(E) sha1_ce(E) sbsa_gwdt(E) drm(E) hns_roce_hw_v2(E) hisi_sas_v3_hw(E) hisi_sas_main(E) hns_roce(E) fb_sys_fops(E) syscopyarea(E) sysfillrect(E)
[ 1532.272594] ib_core(E) ipmi_si(E) libsas(E) sysimgblt(E) ipmi_devintf(E) scsi_transport_sas(E) ipmi_msghandler(E) spi_dw_mmio(E) sch_fq_codel(E) ip_tables(E) sd_mod(E) sg(E) realtek(E) hns3(E) mlx5_core(E) ahci(E) nvme(E) i2c_designware_platform(E) nfit(E) libahci(E) i2c_designware_core(E) hclge(E) mlxfw(E) nvme_core(E) hnae3(E) libata(E) devlink(E) i2c_core(E) libnvdimm(E)
[ 1532.318940] Process stress-ng (pid: 3530, stack limit = 0x0000000074d061ef)
[ 1532.332537] CPU: 63 PID: 3530 Comm: stress-ng Kdump: loaded Tainted: G E 4.19.91-25.rc1.an8.aarch64 #1
[ 1532.349948] Hardware name: H3C R4960 G3/BC82AMDDA, BIOS 1.70 01/07/2021
[ 1532.363346] pstate: 80400009 (Nzcv daif +PAN -UAO)
[ 1532.374895] pc : is_mem_section_removable+0x70/0x1f8
[ 1532.386778] lr : show_mem_removable+0x90/0xd8
[ 1532.397659] sp : ffff00004cf8bc10
[ 1532.407363] x29: ffff00004cf8bc10 x28: 0000000000000001
[ 1532.419396] x27: ffff7e0000000000 x26: 0000000000040000
[ 1532.431226] x25: ffffa0a9efb62e00 x24: ffff0000095632e0
[ 1532.443006] x23: ffff00000959df60 x22: ffff000008d0b000
[ 1532.455008] x21: ffff0000091f9780 x20: ffffa0accfbbf000
[ 1532.466754] x19: 0000000000000001 x18: 0000000000000000
[ 1532.478463] x17: 0000000000000000 x16: 0000000000000000
[ 1532.490205] x15: 0000000000000000 x14: 0000000000000000
[ 1532.501953] x13: 0000000000000000 x12: 0000000000000000
[ 1532.513625] x11: 0000000000000000 x10: 0000000000000000
[ 1532.525193] x9 : ffff0000086e46d0 x8 : 0000000000000000
[ 1532.536789] x7 : 0000000000080000 x6 : 0000000000000680
[ 1532.548314] x5 : 0000000000000000 x4 : 0000000000000005
[ 1532.549907] sched: DL replenish lagged too much
[ 1532.570412] x3 : 0000000000000001 x2 : 0000000001000000
[ 1532.581842] x1 : ffff7e0001000000 x0 : 0000000000000001
[ 1532.593193] Call trace:
[ 1532.601623] is_mem_section_removable+0x70/0x1f8
[ 1532.612115] show_mem_removable+0x90/0xd8
[ 1532.621856] dev_attr_show+0x28/0x60
[ 1532.631043] sysfs_kf_seq_show+0x8c/0x160
[ 1532.640684] kernfs_seq_show+0x30/0x38
[ 1532.650232] seq_read+0x148/0x478
[ 1532.659094] kernfs_fop_read+0x2c/0x1e0
[ 1532.668493] __vfs_read+0x20/0x48
[ 1532.677371] vfs_read+0x98/0x168
[ 1532.686159] ksys_read+0x6c/0xe0
[ 1532.694933] __arm64_sys_read+0x20/0x28
[ 1532.704294] el0_svc_common.constprop.0+0xa8/0x200
[ 1532.714531] el0_svc_handler+0x30/0x80
[ 1532.723606] el0_svc+0x10/0x14
[ 1532.731795] Code: 8b030466 f8607aa8 8b060866 8b061d06 (f9402cc3)
[ 1532.742939] ---[ end trace 3dc4b79508ccd27a ]---
[ 1532.752705] Kernel panic - not syncing: Fatal exception
[ 1532.762683] SMP: stopping secondary CPUs
[ 1532.771222] Kernel Offset: disabled
[ 1532.779130] CPU features: 0x88,22200a38
[ 1532.787220] Memory Limit: none
[ 1532.796499] Starting crashdump kernel...
[ 1532.804315] Bye!


[重现概率]:
目前跑了2遍必现,执行完stress-ng命令,系统立马发生crash

[重现环境]:
内核:
4.19.91-25.rc1.an8.aarch64

# cat /etc/os-release
NAME="Anolis OS"
VERSION="8.2"
ID="anolis"
ID_LIKE="rhel fedora centos"
VERSION_ID="8.2"
PLATFORM_ID="platform:an8"
PRETTY_NAME="Anolis OS 8.2"
ANSI_COLOR="0;31"
HOME_URL="https://openanolis.org/"

cpu信息:
# lscpu
Architecture: aarch64
Byte Order: Little Endian
CPU(s): 96
On-line CPU(s) list: 0-95
Thread(s) per core: 1
Core(s) per socket: 48
Socket(s): 2
NUMA node(s): 1
Vendor ID: 0x48
Model: 0
Stepping: 0x1
CPU max MHz: 2600.0000
CPU min MHz: 200.0000
BogoMIPS: 200.00
L1d cache: 64K
L1i cache: 64K
L2 cache: 512K
L3 cache: 24576K
NUMA node0 CPU(s): 0-95
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm

内存信息:
# free -h
                    total used free shared buff/cache available
Mem: 755Gi 3.5Gi 739Gi 10Mi 12Gi 748Gi
Swap: 2.0Gi 0B 2.0Gi

[期望结果]:
跑stress-ng压力过程中,系统正常,不出现crash

[实际结果]:
跑stress-ng压力后,系统立马crash
问题重现步骤[重现步骤]:
1、准备工作
配置参数值
echo 1 > /proc/sys/kernel/panic
echo 1 > /proc/sys/kernel/hardlockup_panic
echo 1 > /proc/sys/kernel/softlockup_panic
echo 50 > /proc/sys/kernel/watchdog_thresh
echo 1200 > /proc/sys/kernel/hung_task_timeout_secs
echo 0 > /proc/sys/kernel/hung_task_panic

挂载数据盘
[ -d /disk1 ] || mkdir /disk1
wipefs -a --force /dev/nvme0n1p1 # 虚拟机环境更多的是/dev/vdb1
mkfs -t ext4 -q -F /dev/nvme0n1p1
mount -t ext4 /dev/nvme0n1p1 /disk1
 
创建日志目录
mkdir -p /disk1/tmpdir/stress-ng

2、下载stress-ng,编译
git clone https://github.com/ColinIanKing/stress-ng.git
cd stress-ng
make
make install

3、执行命令
nohup stress-ng -a 1 -x softlockup,resources -t 72h --metrics --times --verify -v -Y /disk1/tmpdir/stress-ng/stress-statistic-12.yaml --log-file /disk1/tmpdir/stress-ng/stress-logfile-12.txt --temp-path /disk1/tmpdir/stress-ng/ &

标签没加标签.

活动

Shiloong

2021-11-23 16:04

开发人员   ~0000771

@wb-wpp899309 请问这个只在 ARM64 架构上才有吗? 有没有测试 Alinux2的环境?
@baolinwang 帮忙安排个同学看看吧, thanks!

fghuims

2021-11-29 16:00

开发人员   ~0000794

目前在测试环境不能复现,且增加了一个相关已知patch

问题历史

日期 用户名 字段 更改
2021-11-10 15:29 wb-wpp899309 新建问题
2021-11-10 21:25 jacobwang 分派给 => Shiloong
2021-11-10 21:25 jacobwang 状态 新建 => 已分配
2021-11-23 16:04 Shiloong 分派给 Shiloong => baolinwang
2021-11-23 16:04 Shiloong 注释已添加: 0000771
2021-11-23 16:25 baolinwang 分派给 baolinwang => fghuims
2021-11-29 16:00 fghuims 注释已添加: 0000794