查看问题详情

编号项目分类查看权限最后更新
0000345Anolis OS 8- cloud kernel 4.19public2021-11-23 16:07
报告员guanjun 分派给guanjun  
优先级high严重性major出现频率always
状态 resolved处理状况fixed 
标题0000345: ANCK4.19在某些情况下触发divide zero错误
描述NVMe标盘硬件支持129个queue,最新的NVMe驱动按照possible cpus(如果机器有128个cpu(关闭HT),possible cpus=256)和硬件支持queue数量的较小值(129)分配irq数量129。这种情况下会触发内核除零bug,调用栈如下:

4.19.91-22.2.al7.x86_64

021-10-02 18:41:58 [ 31.234817] CPU: 9 PID: 1154 Comm: kworker/u513:2 Not tainted 4.19.91-22.2.al7.x86_64 #1
2021-10-02 18:41:58 [ 31.234819] Hardware name: Inventec Horsea-12U /Horsea-F ,
BIOS 1.1.EY.IV.D.060.02 08/17/2020
2021-10-02 18:41:58 [ 31.234827] Workqueue: nvme-reset-wq nvme_reset_work [nvme]
2021-10-02 18:41:58 [ 31.270457] RIP: 0010:__irq_build_affinity_masks.isra.3+0x17a/0x360
2021-10-02 18:41:58 [ 31.270459] Code: 24 14 48 63 54 24 24 48 c1 e2 06 48 03 54 24 28 89 c3 e8 c9 c1 35 00 be 0
0 02 00 00 4c 89 ef e8 4c c7 35 00 39 c3 0f 4f d8 99 <f7> fb 85 db 89 5c 24 10 89 54 24 04 89 44 24 08 0f 8e b8 01
 00 00
2021-10-02 18:41:58 [ 31.306520] RSP: 0018:ffffb2025cd47b60 EFLAGS: 00010287
2021-10-02 18:41:58 [ 31.306522] RAX: 0000000000000010 RBX: 0000000000000000 RCX: 0000000000000200
2021-10-02 18:41:58 [ 31.306522] RDX: 0000000000000000 RSI: 0000000000000200 RDI: 0000000000000000
2021-10-02 18:41:58 [ 31.306525] RBP: 0000000000000040 R08: 0000000000000010 R09: 0000000000000008
2021-10-02 18:41:58 [ 31.344929] R10: ffffb2025cd47bf0 R11: ffffd2393d0cd400 R12: 0000000000000040
2021-10-02 18:41:58 [ 31.344929] R13: ffffb2025cd47bf0 R14: 0000000000000081 R15: 000000000000f1a0
2021-10-02 18:41:58 [ 31.344930] FS: 0000000000000000(0000) GS:ffff8dc81f640000(0000) knlGS:0000000000000000
2021-10-02 18:41:58 [ 31.344931] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2021-10-02 18:41:58 [ 31.344932] CR2: 00007f416fa9722d CR3: 0000003f4e97c000 CR4: 0000000000340ee0
2021-10-02 18:41:58 [ 31.344932] Call Trace:
2021-10-02 18:41:58 [ 31.344936] irq_build_affinity_masks.isra.4+0xf3/0x170
2021-10-02 18:41:58 [ 31.344938] irq_create_affinity_masks+0x205/0x300
2021-10-02 18:41:58 [ 31.344941] __pci_enable_msix_range+0x209/0x520
2021-10-02 18:41:58 [ 31.344942] pci_alloc_irq_vectors_affinity+0xbb/0x110
2021-10-02 18:41:58 [ 31.344944] nvme_reset_work+0xad2/0x162d [nvme]
2021-10-02 18:41:58 [ 31.344948] ? dequeue_entity+0x1e6/0x970
2021-10-02 18:41:58 [ 31.344950] ? sched_clock+0x5/0x10
2021-10-02 18:41:58 [ 31.344952] ? sched_clock_cpu+0xc/0xa0
2021-10-02 18:41:58 [ 31.344953] ? try_to_wake_up+0x219/0x580
2021-10-02 18:41:58 [ 31.344954] process_one_work+0x15b/0x370
2021-10-02 18:41:58 [ 31.344956] worker_thread+0x49/0x3e0
2021-10-02 18:41:58 [ 31.344957] kthread+0xf8/0x130
2021-10-02 18:41:58 [ 31.344958] ? process_one_work+0x370/0x370
2021-10-02 18:41:58 [ 31.344959] ? kthread_park+0xb0/0xb0
2021-10-02 18:41:58 [ 31.344961] ret_from_fork+0x1f/0x40
2021-10-02 18:41:58 [ 31.344963] Modules linked in: kvm_amd(+) sunrpc kvm irqbypass crct10dif_pclmul crc32_pclmu
l ghash_clmulni_intel pcbc aesni_intel crypto_simd cryptd glue_helper pcspkr nvme i2c_algo_bit ttm drm_kms_helper
syscopyarea sysfillrect sysimgblt vfat fb_sys_fops fat sp5100_tco nvme_core drm sg i2c_piix4 i2c_designware_platfo
rm ipmi_si(+) i2c_designware_core iosf_mbi ipmi_devintf i2c_core ipmi_msghandler pcc_cpufreq acpi_cpufreq ip_table
s sd_mod crc32c_intel ahci libahci libata
2021-10-02 18:41:58 [ 31.344991] ---[ end trace 1506a87a8299d5c8 ]---
2021-10-02 18:41:59 [ 32.765477] RIP: 0010:__irq_build_affinity_masks.isra.3+0x17a/0x360
2021-10-02 18:41:59 [ 32.765480] Code: 24 14 48 63 54 24 24 48 c1 e2 06 48 03 54 24 28 89 c3 e8 c9 c1 35 00 be 0
0 02 00 00 4c 89 ef e8 4c c7 35 00 39 c3 0f 4f d8 99 <f7> fb 85 db 89 5c 24 10 89 54 24 04 89 44 24 08 0f 8e b8 01
 00 00
2021-10-02 18:41:59 [ 32.765481] RSP: 0018:ffffb2025cd47b60 EFLAGS: 00010287
2021-10-02 18:41:59 [ 32.765482] RAX: 0000000000000010 RBX: 0000000000000000 RCX: 0000000000000200
2021-10-02 18:41:59 [ 32.765482] RDX: 0000000000000000 RSI: 0000000000000200 RDI: 0000000000000000
2021-10-02 18:41:59 [ 32.765483] RBP: 0000000000000040 R08: 0000000000000010 R09: 0000000000000008
2021-10-02 18:41:59 [ 32.765483] R10: ffffb2025cd47bf0 R11: ffffd2393d0cd400 R12: 0000000000000040
2021-10-02 18:41:59 [ 32.765483] R13: ffffb2025cd47bf0 R14: 0000000000000081 R15: 000000000000f1a0
2021-10-02 18:41:59 [ 32.765484] FS: 0000000000000000(0000) GS:ffff8dc81f640000(0000) knlGS:0000000000000000
2021-10-02 18:41:59 [ 32.765485] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2021-10-02 18:41:59 [ 32.765486] CR2: 00007f416fa9722d CR3: 0000003f4e97c000 CR4: 0000000000340ee0
2021-10-02 18:41:59 [ 32.765487] Kernel panic - not syncing: Fatal exception
2021-10-02 18:41:59 [ 32.766622] Kernel Offset: 0x1e000000 from 0xffffffff81000000 (relocation range: 0xffffffff
80000000-0xffffffffbfffffff)
 

对应代码范围:

138 ncpus = cpumask_weight(nmsk);
139 vecs_to_assign = min(vecs_per_node, ncpus);
140
141 /* Account for rounding errors */
142 extra_vecs = ncpus - vecs_to_assign * (ncpus / vecs_to_assign);

140 in kernel/irq/affinity.c
141 in kernel/irq/affinity.c
142 in kernel/irq/affinity.c
   0xffffffff81103d99 <+377>: cltd
   0xffffffff81103d9a <+378>: idiv %ebx
   0xffffffff81103da2 <+386>: mov %edx,0x4(%rsp)
   0xffffffff81103da6 <+390>: mov %eax,0x8(%rsp)
标签没加标签.

活动

guanjun

2021-10-20 10:52

开发人员   ~0000512

修复代码已经合入ANCK4.19,该问题可以关闭

问题历史

日期 用户名 字段 更改
2021-10-16 16:56 guanjun 新建问题
2021-10-18 15:14 geliwei-ali 分派给 => Shiloong
2021-10-18 15:14 geliwei-ali 状态 新建 => 已分配
2021-10-20 10:52 guanjun 注释已添加: 0000512
2021-11-23 16:07 Shiloong 分派给 Shiloong => guanjun
2021-11-23 16:07 Shiloong 状态 已分配 => 已解决
2021-11-23 16:07 Shiloong 处理状况 未处理 => 已修正