查看问题详情

编号项目分类查看权限最后更新
0000301Anolis OS 8- cloud kernel 4.19public2021-11-23 16:09
报告员CruzZhao 分派给CruzZhao  
优先级normal严重性minor出现频率always
状态 assigned处理状况open 
平台x86_64操作系统Anolis OS操作系统版本8
产品版本8.2-rc1 
标题0000301: [sched]leaf_cfs_rq_list在throttle场景下维护出错导致hard lockup
描述在throttle场景下会出现hard lockup

[68334.886818] ------------[ cut here ]------------
[68334.886826] rq->tmp_alone_branch != &rq->leaf_cfs_rq_list
[68334.886854] WARNING: CPU: 22 PID: 0 at kernel/sched/fair.c:4658 unthrottle_cfs_rq+0x277/0x280
[68334.886855] Modules linked in: ipt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat nf_nat_ipv4 xt_addrtype iptable_filter bpfilter xt_conntrack nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c br_netfilter bridge stp llc sunrpc intel_rapl_msr intel_rapl_common iosf_mbi sb_edac kvm irqbypass crct10dif_pclmul i2c_piix4 crc32_pclmul ghash_clmulni_intel pcbc aesni_intel mousedev crypto_simd cryptd psmouse glue_helper pcspkr ip_tables ata_generic pata_acpi cirrus drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel libata serio_raw i2c_core uhci_hcd floppy
[68334.886883] CPU: 22 PID: 0 Comm: swapper/22 Kdump: loaded Not tainted 4.19.91-23.1.redis.5.al7.x86_64 #1
[68334.886884] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[68334.886887] RIP: 0010:unthrottle_cfs_rq+0x277/0x280
[68334.886889] Code: ff 0f 0b e9 0e fe ff ff 80 3d 38 f6 25 01 00 0f 85 08 ff ff ff 48 c7 c7 f0 fa 09 b7 31 c0 c6 05 22 f6 25 01 01 e8 c9 21 fc ff <0f> 0b e9 ec fe ff ff 66 90 66 66 66 66 90 55 48 89 fd 53 e8 a1 39
[68334.886891] RSP: 0018:ffff8e8b2fb83ea0 EFLAGS: 00010086
[68334.886893] RAX: 000000000000002d RBX: ffff8e8b2b39c000 RCX: 0000000000000000
[68334.886894] RDX: 0000000000000005 RSI: ffffffffb78e72ad RDI: 0000000000000046
[68334.886896] RBP: ffff8e8b28fadc00 R08: 00000000e949c223 R09: ffff8e8b2fb83e40
[68334.886897] R10: ffffffffb78e7aa4 R11: 0000000000000295 R12: ffff8e8b2ba28600
[68334.886899] R13: ffff8e8b2fba2880 R14: 0000000000000001 R15: 0000000000000001
[68334.886901] FS:  0000000000000000(0000) GS:ffff8e8b2fb80000(0000) knlGS:0000000000000000
[68334.886902] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[68334.886904] CR2: 00007efff63a8000 CR3: 000000038520a001 CR4: 00000000000606e0
[68334.886909] Call Trace:
[68334.886931]  <IRQ>
[68334.886936]  distribute_cfs_runtime+0xd6/0x100
[68334.886939]  sched_cfs_period_timer+0x13c/0x270
[68334.886941]  ? sched_cfs_slack_timer+0xb0/0xb0
[68334.886944]  __hrtimer_run_queues+0xeb/0x250
[68334.886947]  hrtimer_interrupt+0x122/0x270
[68334.886951]  ? update_ts_time_stats+0x53/0x80
[68334.886955]  smp_apic_timer_interrupt+0x6a/0x140
[68334.886958]  apic_timer_interrupt+0xf/0x20
[68334.886959]  </IRQ>
[68334.886963] RIP: 0010:native_safe_halt+0xe/0x10
[68334.886964] Code: 01 00 f0 80 48 02 20 48 8b 00 a8 08 0f 84 7a ff ff ff eb bc 90 90 90 90 90 90 90 90 e9 07 00 00 00 0f 00 2d 56 33 58 00 fb f4 <c3> 90 e9 07 00 00 00 0f 00 2d 46 33 58 00 f4 c3 90 90 66 66 66 66
[68334.886965] RSP: 0018:ffffac34019afe68 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[68334.886967] RAX: 0000000080000000 RBX: 0000000000000001 RCX: ffff8e8b2d706300
[68334.886968] RDX: 0000000000000001 RSI: ffffffffb730b280 RDI: ffff8e8b2fbac700
[68334.886969] RBP: 0000000000000016 R08: 00000000e949c223 R09: ffff8e8b3ffffb48
[68334.886970] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffb730b2f8
[68334.886971] R13: 0000000000000016 R14: ffffffffb730b280 R15: 00003e266f7b1dc1
[68334.886974]  default_idle+0x1a/0x140
[68334.886979]  default_enter_idle+0x22/0x32
[68334.886981]  cpuidle_enter_state+0x80/0x2d0
[68334.886984]  do_idle+0x1cc/0x270
[68334.886986]  cpu_startup_entry+0x5f/0x70
[68334.886990]  start_secondary+0x197/0x1d0
[68334.886994]  secondary_startup_64+0xa4/0xb0
[68334.886997] ---[ end trace 17771a4b59b4ec99 ]---
[68336.089741] ------------[ cut here ]------------
[68336.089743] rq->tmp_alone_branch != &rq->leaf_cfs_rq_list
[68336.089770] WARNING: CPU: 7 PID: 30632 at kernel/sched/fair.c:374 enqueue_task_fair+0x9a5/0x9b0
[68336.089771] Modules linked in: ipt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat nf_nat_ipv4 xt_addrtype iptable_filter bpfilter xt_conntrack nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c br_netfilter bridge stp llc sunrpc intel_rapl_msr intel_rapl_common iosf_mbi sb_edac kvm irqbypass crct10dif_pclmul i2c_piix4 crc32_pclmul ghash_clmulni_intel pcbc aesni_intel mousedev crypto_simd cryptd psmouse glue_helper pcspkr ip_tables ata_generic pata_acpi cirrus drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel libata serio_raw i2c_core uhci_hcd floppy
[68336.089800] CPU: 7 PID: 30632 Comm: stress-ng Kdump: loaded Tainted: G        W         4.19.91-23.1.redis.5.al7.x86_64 #1
[68336.089801] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[68336.089804] RIP: 0010:enqueue_task_fair+0x9a5/0x9b0
[68336.089806] Code: ff 0f 0b e9 ff f6 ff ff 80 3d bc f8 25 01 00 0f 85 9d f7 ff ff 48 c7 c7 f0 fa 09 b7 31 c0 c6 05 a6 f8 25 01 01 e8 4b 24 fc ff <0f> 0b e9 81 f7 ff ff 0f 1f 40 00 66 66 66 66 90 41 56 41 55 41 54
[68336.089807] RSP: 0000:ffff8e8b2f7c3e68 EFLAGS: 00010096
[68336.089808] RAX: 000000000000002d RBX: 0000000000000000 RCX: 0000000000000000
[68336.089809] RDX: 0000000000000005 RSI: ffffffffb78e72ad RDI: 0000000000000046
[68336.089810] RBP: ffff8e8b2f822900 R08: 00000000e949c223 R09: ffff8e8b2f7c3e08
[68336.089811] R10: ffffffffb78e7aa4 R11: 00000000000002c1 R12: ffff8e8b2f822880
[68336.089812] R13: ffff8e8b2f822880 R14: 0000000000000082 R15: ffff8e8b262cba00
[68336.089814] FS:  00007f231365f740(0000) GS:ffff8e8b2f7c0000(0000) knlGS:0000000000000000
[68336.089815] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[68336.089816] CR2: 00007f23122c4e70 CR3: 000000039e28e005 CR4: 00000000000606e0
[68336.089820] Call Trace:
[68336.089823]  <IRQ>
[68336.089828]  ? remove_entity_load_avg+0x27/0x70
[68336.089832]  ttwu_do_activate+0x63/0x90
[68336.089835]  try_to_wake_up+0x1ef/0x580
[68336.089839]  ? hrtimer_run_softirq+0xa0/0xa0
[68336.089840]  hrtimer_wakeup+0x1e/0x30
[68336.089842]  __hrtimer_run_queues+0xeb/0x250
[68336.089844]  hrtimer_interrupt+0x122/0x270
[68336.089848]  smp_apic_timer_interrupt+0x6a/0x140
[68336.089851]  apic_timer_interrupt+0xf/0x20
[68336.089852]  </IRQ>
[68336.089854] RIP: 0033:0x49752c
[68336.089856] Code: b3 42 00 84 c0 0f 84 44 02 00 00 e8 4e 13 f7 ff 83 f8 ff 0f 84 69 03 00 00 85 c0 41 89 c4 0f 85 ec 01 00 00 8b 35 34 bc 84 00 <31> ff bb e0 1b c2 00 e8 d8 0f f7 ff 0f 1f 84 00 00 00 00 00 48 8b
[68336.089857] RSP: 002b:00007ffe39138120 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[68336.089859] RAX: 0000000000000000 RBX: 00007ffe391383ba RCX: 00007f23128e8300
[68336.089860] RDX: 00007f23122c4e48 RSI: 00000000000076df RDI: 00007f23122c2640
[68336.089860] RBP: 0000000000c21e68 R08: 00007f23128e4270 R09: 00000000000076df
[68336.089861] R10: 00007f231365fa10 R11: 00007f231365f740 R12: 0000000000000000
[68336.089862] R13: 0000000000000004 R14: 0000000000000004 R15: 0000000000000014
[68336.089864] ---[ end trace 17771a4b59b4ec9a ]---
问题重现步骤#!/bin/sh

KUBEPOD_DIR=/sys/fs/cgroup/cpu,cpuacct/kubepods

POD_DIR=$KUBEPOD_DIR/pod

CONTAINER_DIR=$POD_DIR/container

SCHBENCH=sysbench



prepare_environment(){

for ((i=0;i<12;i++));

do

CG_DIR[$i]=$POD_DIR/$i

done

[ -d $KUBEPOD_DIR ] || {

mkdir $KUBEPOD_DIR

}

echo 50000 > $KUBEPOD_DIR/cpu.cfs_quota_us



[ -d $POD_DIR ] || {

mkdir $POD_DIR

}

echo 50000 > $POD_DIR/cpu.cfs_quota_us

}



run_test(){

for ((i=0;i<12;i++)); do

[ -d ${CG_DIR[$i]} ] || {

mkdir ${CG_DIR[$i]}

}

echo 100000 > ${CG_DIR[$i]}/cpu.cfs_period_us

echo 5000 > ${CG_DIR[$i]}/cpu.cfs_quota_us

nohup stress-ng -c 24 -l 90 & > /dev/null 2>&1 &&pid[$i]=$!

echo ${pid[$i]} > ${CG_DIR[$i]}/cgroup.procs

done



sleep 60

for ((i=0;i<12;i++)); do

echo ${pid[$i]} > /sys/fs/cgroup/cpu,cpuacct/cgroup.procs

kill -9 ${pid[$i]}

done

}

clear_environment(){

rmdir $CG_DIR

rmdir $POD_DIR

}

prepare_environment

run_test

clear_environment
附注Aone id: 37060933
标签没加标签.

活动

Shiloong

2021-11-23 16:09

开发人员   ~0000772

@CruzZhao 帮忙更新一下这个问题的状态吧? 是否已经修复了? Thanks!

问题历史

日期 用户名 字段 更改
2021-09-27 21:16 CruzZhao 新建问题
2021-10-18 15:57 geliwei-ali 分派给 => Shiloong
2021-10-18 15:57 geliwei-ali 状态 新建 => 已分配
2021-11-23 16:08 Shiloong 分派给 Shiloong => CruzZhao
2021-11-23 16:09 Shiloong 注释已添加: 0000772