查看问题详情

编号项目分类查看权限最后更新
0000517Anolis OS 8kernelpublic2021-11-11 13:50
报告员meil_wei 分派给 
优先级normal严重性minor出现频率have not tried
状态 new处理状况open 
标题0000517: [Anolis OS 8.4][4.19.91-25.rc1] [x86]stress-ng压力测试,运行6个小时出现Kernel panic - not syncing: softlockup: hung tasks
描述[缺陷描述]:
stress-ng压力测试,运行6个小时出现Kernel panic - not syncing: softlockup: hung tasks
crash /usr/lib/debug/lib/modules/4.19.91-25.rc1.an8.x86_64/vmlinux /var/crash/127.0.0.1-2021-11-10-21\:11\:38/vmcore

crash 7.2.9-2.an8
Copyright (C) 2002-2020 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.

GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...

WARNING: kernel relocated [384MB]: patching 99096 gdb minimal_symbol values

crash: inconsistent active task indications for CPU 18:
           runqueue: ffff93a0ca639f80 "stress-ng" (default)
       current_task: ffff931060068000 "stress-ng" (reassigned)

crash: inconsistent active task indications for CPU 24:
           runqueue: ffff93702924bf00 "stress-ng" (default)
       current_task: ffff93701f4c1f80 "stress-ng"

crash: inconsistent active task indications for CPU 39:
           runqueue: ffff93a0208b0000 "stress-ng" (default)
       current_task: ffff9340cf0e1f80 "stress-ng"

crash: inconsistent active task indications for CPU 50:
           runqueue: ffff93a0334b5e80 "stress-ng" (default)
       current_task: ffff937116145e80 "stress-ng" (reassigned)

crash: inconsistent active task indications for CPU 53:
           runqueue: ffff933ffeecde80 "stress-ng" (default)
       current_task: ffff937020563f00 "stress-ng"

crash: inconsistent active task indications for CPU 56:
           runqueue: ffff930f811dbf00 "stress-ng" (default)
       current_task: ffff93a033639f80 "stress-ng"

crash: inconsistent active task indications for CPU 57:
           runqueue: ffff93703e75de80 "stress-ng" (default)
       current_task: ffff9370f2300000 "stress-ng" (reassigned)

crash: inconsistent active task indications for CPU 58:
           runqueue: ffff93710b5f5e80 "stress-ng" (default)
       current_task: ffff937084bb1f80 "stress-ng"

crash: inconsistent active task indications for CPU 62:
           runqueue: ffff930f85ae5e80 "stress-ng" (default)
       current_task: ffff93a0334b0000 "stress-ng"

crash: inconsistent active task indications for CPU 68:
           runqueue: ffff93a064e4bf00 "stress-ng" (default)
       current_task: ffff933fbe2fde80 "stress-ng"

crash: inconsistent active task indications for CPU 75:
           runqueue: ffff9340fc22bf00 "stress-ng" (default)
       current_task: ffff93700b701f80 "stress-ng" (reassigned)

crash: inconsistent active task indications for CPU 80:
           runqueue: ffff930eb91e9f80 "stress-ng" (default)
       current_task: ffff93703e5ede80 "stress-ng" (reassigned)

crash: inconsistent active task indications for CPU 83:
           runqueue: ffff9370249a3f00 "stress-ng" (default)
       current_task: ffff93702fa78000 "stress-ng" (reassigned)

crash: inconsistent active task indications for CPU 89:
           runqueue: ffff9340c92f3f00 "stress-ng" (default)
       current_task: ffff93703e5e3f00 "stress-ng"

crash: inconsistent active task indications for CPU 91:
           runqueue: ffff9370f2305e80 "stress-ng" (default)
       current_task: ffff937026109f80 "stress-ng"

crash: inconsistent active task indications for CPU 133:
           runqueue: ffff93707e338000 "stress-ng" (default)
       current_task: ffff93703e5ebf00 "stress-ng"

crash: inconsistent active task indications for CPU 138:
           runqueue: ffff930f812c1f80 "stress-ng" (default)
       current_task: ffff937024bd8000 "stress-ng" (reassigned)

crash: inconsistent active task indications for CPU 141:
           runqueue: ffff937024bd9f80 "stress-ng" (default)
       current_task: ffff93707a3a3f00 "stress-ng"

crash: inconsistent active task indications for CPU 144:
           runqueue: ffff9340e38e1f80 "stress-ng" (default)
       current_task: ffff934007935e80 "stress-ng"

crash: inconsistent active task indications for CPU 150:
           runqueue: ffff9370d5075e80 "stress-ng" (default)
       current_task: ffff930eb90d5e80 "stress-ng" (reassigned)

crash: inconsistent active task indications for CPU 151:
           runqueue: ffff9370234b5e80 "stress-ng" (default)
       current_task: ffff93a013f10000 "stress-ng"

crash: inconsistent active task indications for CPU 155:
           runqueue: ffff930e5b3a1f80 "stress-ng" (default)
       current_task: ffff93707e278000 "stress-ng" (reassigned)

crash: inconsistent active task indications for CPU 159:
           runqueue: ffff937114348000 "stress-ng" (default)
       current_task: ffff93a0120d9f80 "stress-ng"

crash: inconsistent active task indications for CPU 161:
           runqueue: ffff9370290dde80 "stress-ng" (default)
       current_task: ffff93a031bdbf00 "stress-ng" (reassigned)

crash: inconsistent active task indications for CPU 164:
           runqueue: ffff933ffeec9f80 "stress-ng" (default)
       current_task: ffff93703e47de80 "stress-ng"

crash: inconsistent active task indications for CPU 168:
           runqueue: ffff937020763f00 "stress-ng" (default)
       current_task: ffff93708c9f8000 "stress-ng" (reassigned)

crash: inconsistent active task indications for CPU 169:
           runqueue: ffff9370f2303f00 "stress-ng" (default)
       current_task: ffff9370b5f30000 "stress-ng" (reassigned)

crash: inconsistent active task indications for CPU 170:
           runqueue: ffff93707a3a0000 "stress-ng" (default)
       current_task: ffff930f85ae3f00 "stress-ng" (reassigned)

crash: inconsistent active task indications for CPU 174:
           runqueue: ffff937082345e80 "stress-ng" (default)
       current_task: ffff934000a73f00 "stress-ng"

crash: inconsistent active task indications for CPU 175:
           runqueue: ffff93a064ee8000 "stress-ng" (default)
       current_task: ffff933ffefb1f80 "stress-ng" (reassigned)

crash: inconsistent active task indications for CPU 182:
           runqueue: ffff934007931f80 "stress-ng" (default)
       current_task: ffff93712bc4de80 "stress-ng" (reassigned)

crash: inconsistent active task indications for CPU 183:
           runqueue: ffff93701c85bf00 "stress-ng" (default)
       current_task: ffff937084bb0000 "stress-ng" (reassigned)

crash: inconsistent active task indications for CPU 184:
           runqueue: ffff9340c92f0000 "stress-ng" (default)
       current_task: ffff93702fb43f00 "stress-ng"

crash: inconsistent active task indications for CPU 185:
           runqueue: ffff930e5299de80 "stress-ng" (default)
       current_task: ffff930eb91abf00 "stress-ng" (reassigned)

crash: inconsistent active task indications for CPU 186:
           runqueue: ffff937020761f80 "stress-ng" (default)
       current_task: ffff933ffeec8000 "stress-ng"

crash: inconsistent active task indications for CPU 188:
           runqueue: ffff93a019bc0000 "stress-ng" (default)
       current_task: ffff9340c92f5e80 "stress-ng"

crash: inconsistent active task indications for CPU 189:
           runqueue: ffff93707e339f80 "stress-ng" (default)
       current_task: ffff937020565e80 "stress-ng" (reassigned)

      KERNEL: /usr/lib/debug/lib/modules/4.19.91-25.rc1.an8.x86_64/vmlinux
    DUMPFILE: /var/crash/127.0.0.1-2021-11-10-21:11:38/vmcore [PARTIAL DUMP]
        CPUS: 192
        DATE: Wed Nov 10 21:11:29 CST 2021
      UPTIME: 04:44:05
LOAD AVERAGE: 33779.73, 33837.45, 33877.68
       TASKS: 53809
    NODENAME: iZbp18qbfn9gqbwvpref2wZ
     RELEASE: 4.19.91-25.rc1.an8.x86_64
     VERSION: #1 SMP Mon Nov 1 23:08:57 CST 2021
     MACHINE: x86_64 (3400 Mhz)
      MEMORY: 766.7 GB
       PANIC: "Kernel panic - not syncing: softlockup: hung tasks"
         PID: 6961
     COMMAND: "stress-ng"
        TASK: ffff93a0121dde80 [THREAD_INFO: ffff93a0121dde80]
         CPU: 12
       STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 6961 TASK: ffff93a0121dde80 CPU: 12 COMMAND: "stress-ng"
 #0 [ffff931140703d68] machine_kexec at ffffffff99063a8a
 #1 [ffff931140703db8] __crash_kexec at ffffffff991450aa
 #2 [ffff931140703e78] panic at ffffffff9909fd57
 #3 [ffff931140703f20] __hrtimer_run_queues at ffffffff99127870
 #4 [ffff931140703f78] hrtimer_interrupt at ffffffff99128360
 0000005 [ffff931140703fd8] smp_apic_timer_interrupt at ffffffff99a0259a
 0000006 [ffff931140703ff0] apic_timer_interrupt at ffffffff99a01aff
--- <IRQ stack> ---
 #7 [ffffb56cf25efcb8] apic_timer_interrupt at ffffffff99a01aff
    [exception RIP: smp_call_function_many+494]
    RIP: ffffffff9913ba0e RSP: ffffb56cf25efd60 RFLAGS: 00000202
    RAX: 0000000000000067 RBX: ffffffff99079c60 RCX: ffff931140be9f40
    RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff931138fa5080
    RBP: ffffb56cf25efda8 R8: 0000000000027080 R9: ffffffff9905849c
    R10: fffffa5d0ddcf480 R11: 0000000000000003 R12: 0000000000000001
    R13: 0000000000023880 R14: 00000000000000c0 R15: ffff9311407238c0
    ORIG_RAX: ffffffffffffff13 CS: 0010 SS: 0000
 #8 [ffffb56cf25efda0] flush_tlb_mm_range at ffffffff9907a03c
 #9 [ffffb56cf25efdd8] change_protection at ffffffff99245fcf
0000010 [ffffb56cf25efe90] change_prot_numa at ffffffff99266cf8
0000011 [ffffb56cf25efe98] task_numa_work at ffffffff990d29ed
#12 [ffffb56cf25efee8] task_work_run at ffffffff990be614
0000013 [ffffb56cf25eff20] exit_to_usermode_loop at ffffffff9900398b
0000014 [ffffb56cf25eff38] prepare_exit_to_usermode at ffffffff99003e90
    RIP: 00000000004170be RSP: 00007fe7d7815e80 RFLAGS: 00000202
    RAX: 00007fe8009f3880 RBX: 00007fe7f2050000 RCX: 00000000000000e2
    RDX: 00007fe80024fa69 RSI: 000000000000004f RDI: 00007fe8009f4000
    RBP: 0000000000003a69 R8: 00000000000000d1 R9: 000000000000000b
    R10: 00007fe7f224bfde R11: 001cc9dcd0b2a184 R12: 00007fe7f204c000
    R13: 0000000000000001 R14: 0000000000495030 R15: 00007fe7d7815fc0
    ORIG_RAX: ffffffffffffff13 CS: 0033 SS: 002b
crash>

[重现概率]:
还未复现

[重现环境]:
线上ecs
规格:ecs.ebmhfg7.48xlarge
内核:4.19.91-25.rc1.an8.x86_64

cat /etc/os-release
NAME="Anolis OS"
VERSION="8.4"
ID="anolis"
ID_LIKE="rhel fedora centos"
VERSION_ID="8.4"
PLATFORM_ID="platform:an8"
PRETTY_NAME="Anolis OS 8.4"
ANSI_COLOR="0;31"
HOME_URL="https://openanolis.cn/"

lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 192
On-line CPU(s) list: 0-191
Thread(s) per core: 2
Core(s) per socket: 24
Socket(s): 4
NUMA node(s): 4
Vendor ID: GenuineIntel
BIOS Vendor ID: Intel(R) Corporation
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Platinum 8369HC CPU @ 3.40GHz
BIOS Model name: Intel(R) Xeon(R) Platinum 8369HC CPU @ 3.40GHz
Stepping: 11
CPU MHz: 3798.790
CPU max MHz: 4200.0000
CPU min MHz: 1200.0000
BogoMIPS: 6800.00
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 33792K
NUMA node0 CPU(s): 0-23,96-119
NUMA node1 CPU(s): 24-47,120-143
NUMA node2 CPU(s): 48-71,144-167
NUMA node3 CPU(s): 72-95,168-191
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single intel_ppin ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local avx512_bf16 dtherm ida arat pln pts pku ospke avx512_vnni md_clear flush_l1d arch_capabilities

free -m
              total used free shared buff/cache available
Mem: 772367 2626 760430 17 9310 766181
Swap: 0 0 0

[期望结果]:
跑stress-ng压力过程中,系统正常,不出现crash

[实际结果]:
跑stress-ng压力后,出现crash
问题重现步骤[重现步骤]:
1、准备工作
配置参数值
echo 1 > /proc/sys/kernel/panic
echo 1 > /proc/sys/kernel/hardlockup_panic
echo 1 > /proc/sys/kernel/softlockup_panic
echo 50 > /proc/sys/kernel/watchdog_thresh
echo 1200 > /proc/sys/kernel/hung_task_timeout_secs
echo 0 > /proc/sys/kernel/hung_task_panic

挂载数据盘
[ -d /disk1 ] || mkdir /disk1
wipefs -a --force /dev/nvme0n1p1 # 虚拟机环境更多的是/dev/vdb1
mkfs -t ext4 -q -F /dev/nvme0n1p1
mount -t ext4 /dev/nvme0n1p1 /disk1
 
创建日志目录
mkdir -p /disk1/tmpdir/stress-ng

2、下载stress-ng,编译
git clone https://github.com/ColinIanKing/stress-ng.git
cd stress-ng
make
make install

3、执行命令
ulimit -s unlimited
echo 1 > /sys/kernel/mm/transparent_hugepage/hugetext_enabled
nohup stress-ng -a 1 -x softlockup,resources,fifo,set,zlib,wcs,tree,splice,sockfd,sctp,radixsort,pipe,mergesort,key,inotify,heapsort,epoll,dccp,cap,aiol,vforkmany,switch,sock,cyclic -t 48h --metrics --times --verify -v -Y /disk1/tmpdir/stress-ng/stress-statistic-11.yaml --log-file /disk1/tmpdir/stress-ng/stress-logfile-11.txt --temp-path /disk1/tmpdir/stress-ng/ --vm-bytes 90% --vm-hang 10 &
标签没加标签.

活动

这个问题没有注释信息

问题历史

日期 用户名 字段 更改
2021-11-11 13:50 meil_wei 新建问题