查看问题详情

编号项目分类查看权限最后更新
0000622Anolis OS 8- cloud kernel 5.10public2022-01-17 09:20
报告员kangwen429 分派给 
优先级high严重性major出现频率random
状态 new处理状况open 
平台x86_64操作系统Anolis OS操作系统版本8
标题0000622: [Anolis 8.4-5.10-x86]升级5.10.84-10_rc2.an8.x86 版本内核后,稳定性测试出现crash:watchdog: BUG: soft lockup RIP: 0010:rt_flush_dev+0x84/0xb0
描述升级5.10.84-10_rc2.an8.x86 版本内核后,稳定性测试出现crash:watchdog: BUG: soft lockup RIP: 0010:rt_flush_dev+0x84/0xb0

部分vmcore-dmesg日志如下,更多日志参看附件:

[10582.149998] Kernel panic - not syncing: softlockup: hung tasks
[10582.150683] CPU: 42 PID: 119122 Comm: kworker/u128:21 Kdump: loaded Tainted: G W EL 5.10.84-10_rc2.an8.x86_64 #1
[10582.151411] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 90210cb 04/01/2014
[10582.152116] Workqueue: netns cleanup_net
[10582.152746] Call Trace:
[10582.153333] <IRQ>
[10582.153921] dump_stack+0x57/0x6a
[10582.154520] panic+0x10d/0x2e9
[10582.155100] watchdog_timer_fn.cold.14+0xc/0x16
[10582.155699] ? report_softlockup+0x1a0/0x1a0
[10582.156286] __hrtimer_run_queues+0xf1/0x230
[10582.156876] hrtimer_interrupt+0x100/0x210
[10582.157433] __sysvec_apic_timer_interrupt+0x5d/0xd0
[10582.158000] asm_call_irq_on_stack+0xf/0x20
[10582.158548] </IRQ>
[10582.159057] sysvec_apic_timer_interrupt+0x73/0x80
[10582.159606] asm_sysvec_apic_timer_interrupt+0x12/0x20
[10582.160150] RIP: 0010:rt_flush_dev+0x84/0xb0
[10582.160675] Code: ff ff 48 39 c6 74 36 48 39 1a 75 1e 48 8b 0d ab 60 35 02 48 89 0a 48 8b 89 d0 04 00 00 65 ff 01 48 8b 8b d0 04 00 00 65 ff 09 <48> 8b 8a a0 00 00 00 48 8d 91 60 ff ff ff 48 39 ce 75 ca 4c 89 e7
[10582.161902] RSP: 0018:ffffa0ea10d2bd10 EFLAGS: 00000287
[10582.162476] RAX: ffff8ea0b2718e20 RBX: ffff8e958125d000 RCX: ffff8e7f0152e820
[10582.163097] RDX: ffff8e7f0152e780 RSI: ffff8eb4433b4988 RDI: ffff8eb4433b4980
[10582.163705] RBP: 0000000000034980 R08: 0000000000000000 R09: 0000000000000016
[10582.164302] R10: 00000000ffffffff R11: 0000000000000001 R12: ffff8eb4433b4980
[10582.164911] R13: 0000000000000016 R14: 0000000000000020 R15: 0000000000000000
[10582.165528] fib_netdev_event+0x110/0x140
[10582.166071] raw_notifier_call_chain+0x41/0x50
[10582.166628] ? dev_disable_lro+0xe0/0xe0
[10582.167172] rollback_registered_many+0x320/0x5b0
[10582.167728] unregister_netdevice_many+0x17/0x70
[10582.168276] default_device_exit_batch+0x131/0x150
[10582.168834] ? do_wait_intr_irq+0xa0/0xa0
[10582.169391] cleanup_net+0x224/0x340
[10582.169942] process_one_work+0x19e/0x340
[10582.170503] worker_thread+0x30/0x360
[10582.171037] ? process_one_work+0x340/0x340
[10582.171575] kthread+0x116/0x130
[10582.172102] ? __kthread_cancel_work+0x40/0x40
[10582.172648] ret_from_fork+0x1f/0x30
[10582.174524] Kernel Offset: 0x22000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

[预期结果]:稳定性测试不会导致环境crash

[实际结果]:执行stress-ng测试1h左右,环境出现crash

[复现概率]:大概率会复现。

[环境信息]:
内核信息:
# uname -r
5.10.84-10_rc2.an8.x86_64


机型:
ECS

操作系统信息:
# cat /etc/os-release
NAME="Anolis OS"
VERSION="8.4"
ID="anolis"
ID_LIKE="rhel fedora centos"
VERSION_ID="8.4"
PLATFORM_ID="platform:an8"
PRETTY_NAME="Anolis OS 8.4"
ANSI_COLOR="0;31"
HOME_URL="https://openanolis.cn/"

# free -mh
              total used free shared buff/cache available
Mem: 247Gi 820Mi 245Gi 2.0Mi 755Mi 244Gi
Swap: 0B 0B 0B

# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 64
On-line CPU(s) list: 0-63
Thread(s) per core: 2
Core(s) per socket: 32
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
BIOS Vendor ID: Alibaba Cloud
CPU family: 6
Model: 106
Model name: Intel(R) Xeon(R) Platinum 8369B CPU @ 2.70GHz
BIOS Model name: pc-i440fx-2.1
Stepping: 6
CPU MHz: 2699.998
BogoMIPS: 5399.99
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 48K
L1i cache: 32K
L2 cache: 1280K
L3 cache: 49152K
NUMA node0 CPU(s): 0-63
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ibrs_enhanced fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves wbnoinvd arat avx512vbmi avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid fsrm arch_capabilities
问题重现步骤安装最新版本的stress-ng,设置系统参数:
echo 1  > /proc/sys/kernel/panic
echo 1  > /proc/sys/kernel/hardlockup_panic
echo 1  > /proc/sys/kernel/softlockup_panic
echo 50 > /proc/sys/kernel/watchdog_thresh
echo 1200 > /proc/sys/kernel/hung_task_timeout_secs
echo 0   > /proc/sys/kernel/hung_task_panic

执行压力测试:
nohup stress-ng -a 1 -x seccomp,mmap,mmapaddr,mmapfixed,mmapfork,mmapmany,mremap,rlimit,stack,bigheap,env,brk,bad-altstack,aio,sysfs,bad-altstack,shm,close,clock,fallocate,l1cache,pci,sigio,rlimit,binderfs,munmap,softlockup,resources,fifo,set,zlib,wcs,tree,splice,sockfd,sctp,radixsort,pipe,mergesort,key,inotify,heapsort,epoll,dccp,cap,aiol,vforkmany,switch,sock,cyclic,cpu-online,mlockmany,oom-pipe,sysinval,watchdog -t 72h --metrics --times --verify -v -Y /disk1/tmpdir/stress-ng/stress-statistic-11.yaml --log-file /disk1/tmpdir/stress-ng/stress-logfile-11.txt --temp-path /disk1/tmpdir/stress-ng/ --oomable --skip-silent &
标签没加标签.

活动

kangwen429

2022-01-11 15:05

报告者  

vmcore-dmesg.rar (176,394 字节)

问题历史

日期 用户名 字段 更改
2022-01-11 15:05 kangwen429 新建问题
2022-01-11 15:05 kangwen429 添加了以下文件:: vmcore-dmesg.rar
2022-01-11 15:06 kangwen429 标题 [Anolis 8.4-5.10-x86]升级5.10.84-10_rc2.an8.x86 版本内核后,稳定性测试出现crash:watchdog: BUG: soft lockup RIP: 0010:rt_flush_dev+0x80/0xc0 => [Anolis 8.4-5.10-x86]升级5.10.84-10_rc2.an8.x86 版本内核后,稳定性测试出现crash:watchdog: BUG: soft lockup RIP: 0010:rt_flush_dev+0x8
2022-01-11 15:06 kangwen429 描述已修改
2022-01-11 15:07 kangwen429 标题 [Anolis 8.4-5.10-x86]升级5.10.84-10_rc2.an8.x86 版本内核后,稳定性测试出现crash:watchdog: BUG: soft lockup RIP: 0010:rt_flush_dev+0x8 => [Anolis 8.4-5.10-x86]升级5.10.84-10_rc2.an8.x86 版本内核后,稳定性测试出现crash:watchdog: BUG: soft lockup RIP: 0010:rt_flush_dev+0x84/0xb0