查看问题详情
编号 | 项目 | 分类 | 查看权限 | 报告日期 | 最后更新 |
---|---|---|---|---|---|
0000301 | Anolis OS 8 | - cloud kernel 4.19 | public | 2021-09-27 21:16 | 2021-11-23 16:09 |
报告员 | CruzZhao | 分派给 | CruzZhao | ||
优先级 | normal | 严重性 | minor | 出现频率 | always |
状态 | assigned | 处理状况 | open | ||
平台 | x86_64 | 操作系统 | Anolis OS | 操作系统版本 | 8 |
产品版本 | 8.2-rc1 | ||||
标题 | 0000301: [sched]leaf_cfs_rq_list在throttle场景下维护出错导致hard lockup | ||||
描述 | 在throttle场景下会出现hard lockup [68334.886818] ------------[ cut here ]------------ [68334.886826] rq->tmp_alone_branch != &rq->leaf_cfs_rq_list [68334.886854] WARNING: CPU: 22 PID: 0 at kernel/sched/fair.c:4658 unthrottle_cfs_rq+0x277/0x280 [68334.886855] Modules linked in: ipt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat nf_nat_ipv4 xt_addrtype iptable_filter bpfilter xt_conntrack nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c br_netfilter bridge stp llc sunrpc intel_rapl_msr intel_rapl_common iosf_mbi sb_edac kvm irqbypass crct10dif_pclmul i2c_piix4 crc32_pclmul ghash_clmulni_intel pcbc aesni_intel mousedev crypto_simd cryptd psmouse glue_helper pcspkr ip_tables ata_generic pata_acpi cirrus drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel libata serio_raw i2c_core uhci_hcd floppy [68334.886883] CPU: 22 PID: 0 Comm: swapper/22 Kdump: loaded Not tainted 4.19.91-23.1.redis.5.al7.x86_64 #1 [68334.886884] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 [68334.886887] RIP: 0010:unthrottle_cfs_rq+0x277/0x280 [68334.886889] Code: ff 0f 0b e9 0e fe ff ff 80 3d 38 f6 25 01 00 0f 85 08 ff ff ff 48 c7 c7 f0 fa 09 b7 31 c0 c6 05 22 f6 25 01 01 e8 c9 21 fc ff <0f> 0b e9 ec fe ff ff 66 90 66 66 66 66 90 55 48 89 fd 53 e8 a1 39 [68334.886891] RSP: 0018:ffff8e8b2fb83ea0 EFLAGS: 00010086 [68334.886893] RAX: 000000000000002d RBX: ffff8e8b2b39c000 RCX: 0000000000000000 [68334.886894] RDX: 0000000000000005 RSI: ffffffffb78e72ad RDI: 0000000000000046 [68334.886896] RBP: ffff8e8b28fadc00 R08: 00000000e949c223 R09: ffff8e8b2fb83e40 [68334.886897] R10: ffffffffb78e7aa4 R11: 0000000000000295 R12: ffff8e8b2ba28600 [68334.886899] R13: ffff8e8b2fba2880 R14: 0000000000000001 R15: 0000000000000001 [68334.886901] FS: 0000000000000000(0000) GS:ffff8e8b2fb80000(0000) knlGS:0000000000000000 [68334.886902] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [68334.886904] CR2: 00007efff63a8000 CR3: 000000038520a001 CR4: 00000000000606e0 [68334.886909] Call Trace: [68334.886931] <IRQ> [68334.886936] distribute_cfs_runtime+0xd6/0x100 [68334.886939] sched_cfs_period_timer+0x13c/0x270 [68334.886941] ? sched_cfs_slack_timer+0xb0/0xb0 [68334.886944] __hrtimer_run_queues+0xeb/0x250 [68334.886947] hrtimer_interrupt+0x122/0x270 [68334.886951] ? update_ts_time_stats+0x53/0x80 [68334.886955] smp_apic_timer_interrupt+0x6a/0x140 [68334.886958] apic_timer_interrupt+0xf/0x20 [68334.886959] </IRQ> [68334.886963] RIP: 0010:native_safe_halt+0xe/0x10 [68334.886964] Code: 01 00 f0 80 48 02 20 48 8b 00 a8 08 0f 84 7a ff ff ff eb bc 90 90 90 90 90 90 90 90 e9 07 00 00 00 0f 00 2d 56 33 58 00 fb f4 <c3> 90 e9 07 00 00 00 0f 00 2d 46 33 58 00 f4 c3 90 90 66 66 66 66 [68334.886965] RSP: 0018:ffffac34019afe68 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 [68334.886967] RAX: 0000000080000000 RBX: 0000000000000001 RCX: ffff8e8b2d706300 [68334.886968] RDX: 0000000000000001 RSI: ffffffffb730b280 RDI: ffff8e8b2fbac700 [68334.886969] RBP: 0000000000000016 R08: 00000000e949c223 R09: ffff8e8b3ffffb48 [68334.886970] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffb730b2f8 [68334.886971] R13: 0000000000000016 R14: ffffffffb730b280 R15: 00003e266f7b1dc1 [68334.886974] default_idle+0x1a/0x140 [68334.886979] default_enter_idle+0x22/0x32 [68334.886981] cpuidle_enter_state+0x80/0x2d0 [68334.886984] do_idle+0x1cc/0x270 [68334.886986] cpu_startup_entry+0x5f/0x70 [68334.886990] start_secondary+0x197/0x1d0 [68334.886994] secondary_startup_64+0xa4/0xb0 [68334.886997] ---[ end trace 17771a4b59b4ec99 ]--- [68336.089741] ------------[ cut here ]------------ [68336.089743] rq->tmp_alone_branch != &rq->leaf_cfs_rq_list [68336.089770] WARNING: CPU: 7 PID: 30632 at kernel/sched/fair.c:374 enqueue_task_fair+0x9a5/0x9b0 [68336.089771] Modules linked in: ipt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat nf_nat_ipv4 xt_addrtype iptable_filter bpfilter xt_conntrack nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c br_netfilter bridge stp llc sunrpc intel_rapl_msr intel_rapl_common iosf_mbi sb_edac kvm irqbypass crct10dif_pclmul i2c_piix4 crc32_pclmul ghash_clmulni_intel pcbc aesni_intel mousedev crypto_simd cryptd psmouse glue_helper pcspkr ip_tables ata_generic pata_acpi cirrus drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel libata serio_raw i2c_core uhci_hcd floppy [68336.089800] CPU: 7 PID: 30632 Comm: stress-ng Kdump: loaded Tainted: G W 4.19.91-23.1.redis.5.al7.x86_64 #1 [68336.089801] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 [68336.089804] RIP: 0010:enqueue_task_fair+0x9a5/0x9b0 [68336.089806] Code: ff 0f 0b e9 ff f6 ff ff 80 3d bc f8 25 01 00 0f 85 9d f7 ff ff 48 c7 c7 f0 fa 09 b7 31 c0 c6 05 a6 f8 25 01 01 e8 4b 24 fc ff <0f> 0b e9 81 f7 ff ff 0f 1f 40 00 66 66 66 66 90 41 56 41 55 41 54 [68336.089807] RSP: 0000:ffff8e8b2f7c3e68 EFLAGS: 00010096 [68336.089808] RAX: 000000000000002d RBX: 0000000000000000 RCX: 0000000000000000 [68336.089809] RDX: 0000000000000005 RSI: ffffffffb78e72ad RDI: 0000000000000046 [68336.089810] RBP: ffff8e8b2f822900 R08: 00000000e949c223 R09: ffff8e8b2f7c3e08 [68336.089811] R10: ffffffffb78e7aa4 R11: 00000000000002c1 R12: ffff8e8b2f822880 [68336.089812] R13: ffff8e8b2f822880 R14: 0000000000000082 R15: ffff8e8b262cba00 [68336.089814] FS: 00007f231365f740(0000) GS:ffff8e8b2f7c0000(0000) knlGS:0000000000000000 [68336.089815] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [68336.089816] CR2: 00007f23122c4e70 CR3: 000000039e28e005 CR4: 00000000000606e0 [68336.089820] Call Trace: [68336.089823] <IRQ> [68336.089828] ? remove_entity_load_avg+0x27/0x70 [68336.089832] ttwu_do_activate+0x63/0x90 [68336.089835] try_to_wake_up+0x1ef/0x580 [68336.089839] ? hrtimer_run_softirq+0xa0/0xa0 [68336.089840] hrtimer_wakeup+0x1e/0x30 [68336.089842] __hrtimer_run_queues+0xeb/0x250 [68336.089844] hrtimer_interrupt+0x122/0x270 [68336.089848] smp_apic_timer_interrupt+0x6a/0x140 [68336.089851] apic_timer_interrupt+0xf/0x20 [68336.089852] </IRQ> [68336.089854] RIP: 0033:0x49752c [68336.089856] Code: b3 42 00 84 c0 0f 84 44 02 00 00 e8 4e 13 f7 ff 83 f8 ff 0f 84 69 03 00 00 85 c0 41 89 c4 0f 85 ec 01 00 00 8b 35 34 bc 84 00 <31> ff bb e0 1b c2 00 e8 d8 0f f7 ff 0f 1f 84 00 00 00 00 00 48 8b [68336.089857] RSP: 002b:00007ffe39138120 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 [68336.089859] RAX: 0000000000000000 RBX: 00007ffe391383ba RCX: 00007f23128e8300 [68336.089860] RDX: 00007f23122c4e48 RSI: 00000000000076df RDI: 00007f23122c2640 [68336.089860] RBP: 0000000000c21e68 R08: 00007f23128e4270 R09: 00000000000076df [68336.089861] R10: 00007f231365fa10 R11: 00007f231365f740 R12: 0000000000000000 [68336.089862] R13: 0000000000000004 R14: 0000000000000004 R15: 0000000000000014 [68336.089864] ---[ end trace 17771a4b59b4ec9a ]--- | ||||
问题重现步骤 | #!/bin/sh KUBEPOD_DIR=/sys/fs/cgroup/cpu,cpuacct/kubepods POD_DIR=$KUBEPOD_DIR/pod CONTAINER_DIR=$POD_DIR/container SCHBENCH=sysbench prepare_environment(){ for ((i=0;i<12;i++)); do CG_DIR[$i]=$POD_DIR/$i done [ -d $KUBEPOD_DIR ] || { mkdir $KUBEPOD_DIR } echo 50000 > $KUBEPOD_DIR/cpu.cfs_quota_us [ -d $POD_DIR ] || { mkdir $POD_DIR } echo 50000 > $POD_DIR/cpu.cfs_quota_us } run_test(){ for ((i=0;i<12;i++)); do [ -d ${CG_DIR[$i]} ] || { mkdir ${CG_DIR[$i]} } echo 100000 > ${CG_DIR[$i]}/cpu.cfs_period_us echo 5000 > ${CG_DIR[$i]}/cpu.cfs_quota_us nohup stress-ng -c 24 -l 90 & > /dev/null 2>&1 &&pid[$i]=$! echo ${pid[$i]} > ${CG_DIR[$i]}/cgroup.procs done sleep 60 for ((i=0;i<12;i++)); do echo ${pid[$i]} > /sys/fs/cgroup/cpu,cpuacct/cgroup.procs kill -9 ${pid[$i]} done } clear_environment(){ rmdir $CG_DIR rmdir $POD_DIR } prepare_environment run_test clear_environment | ||||
附注 | Aone id: 37060933 | ||||
标签 | 没加标签. | ||||
日期 | 用户名 | 字段 | 更改 |
---|---|---|---|
2021-09-27 21:16 | CruzZhao | 新建问题 | |
2021-10-18 15:57 | geliwei-ali | 分派给 | => Shiloong |
2021-10-18 15:57 | geliwei-ali | 状态 | 新建 => 已分配 |
2021-11-23 16:08 | Shiloong | 分派给 | Shiloong => CruzZhao |
2021-11-23 16:09 | Shiloong | 注释已添加: 0000772 |