排查Linux D住进程原因¶
查询僵尸和D进程¶
僵尸进程 zombie
是虽然不占用资源,但是表明系统有不正常的资源回收问题。
根据经验,我们知道Linux中有一种状态为 D
的进程,会阻塞CPU使用(因为CPU需要等待设备返回,例如等待磁盘),所以我们检查一下系统是否存在 D
进程:
ps r -A
可以看到如下进程:
PID TTY STAT TIME COMMAND
...
932 ? D 78:08 [load_calc]
32180 ? Dl 49:37 /opt/taobao/java/bin/java -classpath /opt/flink/conf:/opt/flink/lib/blink-launcher-blink-3.4.3-SNAPSHOT.jar:/opt/flink/lib/blink-metrics-ceresdb-blink-3.4.3-SNAPSHOT.jar:/opt/flink/lib/blink-pangu-fs-blink-3.4.3-SNAPSHOT.jar:/opt/flink/lib/blink-statebackend-antkv_2.11-blink-3.4.3-SNAPSHOT.jar:/opt/flink/lib/blink-statebackend-gemini-2.1.19-20210307.140847-1.jar:/opt/flink/lib/
...
184543 ? Dl 444:32 /opt/taobao/java/bin/java -classpath /opt/flink/conf:/opt/flink/lib/blink-launcher-blink-3.4.2-SNAPSHOT.jar:/opt/flink/lib/blink-metrics-ceresdb-blink-3.4.2-SNAPSHOT.jar:/opt/flink/lib/blink-pangu-fs-blink-3.4.2-SNAPSHOT.jar:/opt/flink/lib/blink-statebackend-antkv_2.11-blink-3.4.2-SNAPSHOT.jar:/opt/flink/lib/blink-statebackend-gemini-2.1.19-20210307.140847-1.jar:/opt/flink/lib/
...
检查
D
住进程的堆栈:
1#cat /proc/932/stack
2[<0>] load_calc_func+0x57/0x130
3[<0>] kthread+0xf5/0x130
4[<0>] ret_from_fork+0x1f/0x30
5[<0>] 0xffffffffffffffff
6
7#cat /proc/32180/stack
8[<0>] __refrigerator+0x75/0x160
9[<0>] do_exit+0x224/0xc60
10[<0>] do_group_exit+0x3a/0xa0
11[<0>] get_signal+0x156/0x8c0
12[<0>] do_signal+0x36/0x610
13[<0>] exit_to_usermode_loop+0x95/0x100
14[<0>] do_syscall_64+0x178/0x1a0
15[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
16[<0>] 0xffffffffffffffff
17
18#cat /proc/184543/stack
19[<0>] __refrigerator+0x75/0x160
20[<0>] do_exit+0x224/0xc60
21[<0>] do_group_exit+0x3a/0xa0
22[<0>] get_signal+0x156/0x8c0
23[<0>] do_signal+0x36/0x610
24[<0>] exit_to_usermode_loop+0x95/0x100
25[<0>] do_syscall_64+0x178/0x1a0
26[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
27[<0>] 0xffffffffffffffff
可以看到:
[load_calc]
系统进程在ret_from_fork+0x1f/0x30
出现异常阻塞blink
进程在entry_SYSCALL_64_after_hwframe+0x44/0xa9
出现异常阻塞
进程信息排查¶
Linux “魔力” 系统请求组合键 SysRq 的提供了 t
指令dump出当前任务的信息,所以我在服务器上执行以下命令,打开进程信息:
echo t > /proc/sysrq-trigger
此时,
dmesg -T
可以看到大量的系统调用被记录下来,其中有非常值得注意的 “Bad RIP value”含义
可以看到 “Bad RIP value”含义 同样集中在系统调用 entry_SYSCALL_64_after_hwframe+0x44/0xa9
,也就是阻塞进程的调用。
1[Tue Aug 10 17:16:00 2021] Call Trace:
2[Tue Aug 10 17:16:00 2021] ? __schedule+0x3cf/0x660
3[Tue Aug 10 17:16:00 2021] schedule+0x33/0xc0
4[Tue Aug 10 17:16:00 2021] schedule_hrtimeout_range_clock+0xb9/0x1a0
5[Tue Aug 10 17:16:00 2021] ? __hrtimer_init+0xb0/0xb0
6[Tue Aug 10 17:16:00 2021] ep_poll+0x351/0x450
7[Tue Aug 10 17:16:00 2021] ? do_futex+0x2f7/0x590
8[Tue Aug 10 17:16:00 2021] ? wake_up_q+0x70/0x70
9[Tue Aug 10 17:16:00 2021] do_epoll_wait+0xaf/0xd0
10[Tue Aug 10 17:16:00 2021] __x64_sys_epoll_wait+0x1a/0x20
11[Tue Aug 10 17:16:00 2021] do_syscall_64+0x55/0x1a0
12[Tue Aug 10 17:16:00 2021] entry_SYSCALL_64_after_hwframe+0x44/0xa9
13[Tue Aug 10 17:16:00 2021] RIP: 0033:0x7f4f6a70b4f3
14[Tue Aug 10 17:16:00 2021] Code: Bad RIP value.
15[Tue Aug 10 17:16:00 2021] RSP: 002b:00007ffc43c43470 EFLAGS: 00000293 ORIG_RAX: 00000000000000e8
16[Tue Aug 10 17:16:00 2021] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f4f6a70b4f3
17[Tue Aug 10 17:16:00 2021] RDX: 0000000000000040 RSI: 0000564608bb2dc0 RDI: 0000000000000007
18[Tue Aug 10 17:16:00 2021] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000004000001
19[Tue Aug 10 17:16:00 2021] R10: 000000000000e95f R11: 0000000000000293 R12: 00005646074fce40
20[Tue Aug 10 17:16:00 2021] R13: 0000000000000000 R14: 0000000000000000 R15: 00005646074fce40
21[Tue Aug 10 17:16:00 2021] auditd S 0 2101 1 0x00000000
22[Tue Aug 10 17:16:00 2021] Call Trace:
23[Tue Aug 10 17:16:00 2021] ? __schedule+0x3cf/0x660
24[Tue Aug 10 17:16:00 2021] ? __ext4_journal_stop+0x36/0xa0 [ext4]
25[Tue Aug 10 17:16:00 2021] schedule+0x33/0xc0
26[Tue Aug 10 17:16:00 2021] futex_wait_queue_me+0xc1/0x120
27[Tue Aug 10 17:16:00 2021] futex_wait+0xf6/0x250
28[Tue Aug 10 17:16:00 2021] do_futex+0x12b/0x590
29[Tue Aug 10 17:16:00 2021] __x64_sys_futex+0x88/0x180
30[Tue Aug 10 17:16:00 2021] do_syscall_64+0x55/0x1a0
31[Tue Aug 10 17:16:00 2021] entry_SYSCALL_64_after_hwframe+0x44/0xa9
32[Tue Aug 10 17:16:00 2021] RIP: 0033:0x7f4f6b4269b5
33[Tue Aug 10 17:16:00 2021] Code: Bad RIP value.
34[Tue Aug 10 17:16:00 2021] RSP: 002b:00007f4f68caad20 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
35[Tue Aug 10 17:16:00 2021] RAX: ffffffffffffffda RBX: 00005646074fa220 RCX: 00007f4f6b4269b5
36[Tue Aug 10 17:16:00 2021] RDX: 000000000a1ab4e3 RSI: 0000000000000080 RDI: 00005646074fa254
37[Tue Aug 10 17:16:00 2021] RBP: 00005646074fa228 R08: 00005646074fa200 R09: 00000000050d5a71
38[Tue Aug 10 17:16:00 2021] R10: 0000000000000000 R11: 0000000000000246 R12: 00005646074fa250
39[Tue Aug 10 17:16:00 2021] R13: 0000564608bb76c0 R14: 0000000000000000 R15: 00007f4f68cab700
40[Tue Aug 10 17:16:00 2021] kworker/107:1H I 0 2063 2 0x80000000
41[Tue Aug 10 17:16:00 2021] Workqueue: (null) (kblockd)
再次出现 [load_calc]
系统进程D¶
我在 AMD CPU c-state ,调整了内核参数( processor.max_cstate=1 intel_idle.max_cstate=0
)关闭海光CPU的电源管理,禁止进入idle状态( CPU c-state )。但是,服务器重启,再次出现 [load_calc]
进程D。检查进程堆栈,也和上文相同。
我尝试上文方法 Linux “魔力” 系统请求组合键 SysRq 的提供了
t
指令dump出当前任务的信息:echo t > /proc/sysrq-trigger
结果发现这次命令卡住没有返回,同时系统一下子出现了更多的D进程:
$ps r -A | grep D
PID TTY STAT TIME COMMAND
1016 ? D 0:06 [load_calc]
13693 ? D 0:00 runc init
13806 ? D 0:00 runc init
13895 ? D 0:00 runc init
14020 ? D 0:00 runc init
14073 ? D 0:00 runc init
14095 ? D 0:00 runc init
16303 ? Dl 0:00 runsc --version
18678 ? Dl 0:00 runsc --version
458475 ? D 0:01 [kworker/u256:0+]
重复出现如下调用 :
[Thu Jul 21 22:46:47 2022] RIP: 0033:0x460193
[Thu Jul 21 22:46:47 2022] Code: 63 d5 48 8b 34 d0 e8 1c ae fb ff 83 c5 01 48 89 d8 89 2d a8 87 28 00 48 83 c4 08 5b 5d c3 66 0f 1f 84 00 00 00 00 00 b3 01 c7 <05> a4 87 28 00 01 00 00 00 b2 01 48 01 eb 80 3b 7b 0f 85 cc fe ff
[Thu Jul 21 22:46:47 2022] RSP: 002b:000000c0009a1df8 EFLAGS: 00000286 ORIG_RAX: 00000000000000ca
[Thu Jul 21 22:46:47 2022] RAX: ffffffffffffffda RBX: 000000c000259180 RCX: 0000000000460193
[Thu Jul 21 22:46:47 2022] RDX: 0000000000000000 RSI: 0000000000000080 RDI: 000000c0002592c8
[Thu Jul 21 22:46:47 2022] RBP: 000000c0009a1e40 R08: 0000000000000000 R09: 0000000000000000
[Thu Jul 21 22:46:47 2022] R10: 0000000000000000 R11: 0000000000000286 R12: 0000000000000005
[Thu Jul 21 22:46:47 2022] R13: 0000000000000001 R14: 000000c00152e360 R15: 0000000000000010
[Thu Jul 21 22:46:47 2022] resource-agent S 0 24480 9939 0x00000080
[Thu Jul 21 22:46:47 2022] Call Trace:
[Thu Jul 21 22:46:47 2022] ? __schedule+0x396/0x6a0
[Thu Jul 21 22:46:47 2022] schedule+0x33/0xc0
[Thu Jul 21 22:46:47 2022] futex_wait_queue_me+0xc1/0x120
[Thu Jul 21 22:46:47 2022] futex_wait+0xf6/0x250
[Thu Jul 21 22:46:47 2022] ? get_futex_key+0x3a9/0x420
[Thu Jul 21 22:46:47 2022] do_futex+0xb8/0x600
[Thu Jul 21 22:46:47 2022] __x64_sys_futex+0xa3/0x1a0
[Thu Jul 21 22:46:47 2022] ? __audit_syscall_exit+0x206/0x290
[Thu Jul 21 22:46:47 2022] do_syscall_64+0x7b/0x200
[Thu Jul 21 22:46:47 2022] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[Thu Jul 21 22:46:49 2022] RIP: 0033:0x40a821
[Thu Jul 21 22:46:49 2022] Code: 00 17 4b 00 00 11 00 19 00 f0 a5 6e 00 00 00 00 00 08 00 00 00 00 00 00 00 bf 7e 00 00 12 00 0d 00 40 15 44 00 00 00 00 00 72 <05> 00 00 00 00 00 00 a5 1f 00 00 12 00 0d 00 b0 51 45 00 00 00 00
[Thu Jul 21 22:46:49 2022] RSP: 002b:000000c0016d3ed8 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13
[Thu Jul 21 22:46:49 2022] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000026e4e00
[Thu Jul 21 22:46:49 2022] RDX: 000000c0031b6800 RSI: 000000c0031b5000 RDI: 0000000000000002
[Thu Jul 21 22:46:49 2022] RBP: 000000c0016d3e40 R08: 0000000000000000 R09: 0000000000000000
[Thu Jul 21 22:46:49 2022] R10: 0000000000000000 R11: 0000000000000286 R12: 0000000000000075
[Thu Jul 21 22:46:49 2022] R13: 0000000002675000 R14: 0000000000000000 R15: 0000000000000000
[Thu Jul 21 22:46:49 2022] resource-agent S 0 49034 9939 0x00000080
[Thu Jul 21 22:46:49 2022] Call Trace:
[Thu Jul 21 22:46:49 2022] ? __schedule+0x396/0x6a0
[Thu Jul 21 22:46:49 2022] ? sched_clock+0x5/0x10
[Thu Jul 21 22:46:49 2022] ? sched_clock_cpu+0xc/0xa0
[Thu Jul 21 22:46:49 2022] ? record_times+0x16/0x80
[Thu Jul 21 22:46:49 2022] ? psi_group_change+0x47/0x200
[Thu Jul 21 22:46:49 2022] ? psi_task_switch+0xa9/0x170
[Thu Jul 21 22:46:49 2022] ? __schedule+0x396/0x6a0
[Thu Jul 21 22:46:49 2022] ? schedule+0x33/0xc0
[Thu Jul 21 22:46:49 2022] ? exit_to_usermode_loop+0x6e/0x100
[Thu Jul 21 22:46:49 2022] ? prepare_exit_to_usermode+0x9c/0xc0
[Thu Jul 21 22:46:49 2022] ? retint_user+0x8/0x8
[Thu Jul 21 22:46:49 2022] resource-agent R running task 0 71826 9939 0x00000080
[Thu Jul 21 22:46:49 2022] Call Trace:
[Thu Jul 21 22:46:50 2022] ? __schedule+0x396/0x6a0
[Thu Jul 21 22:46:50 2022] schedule+0x33/0xc0
[Thu Jul 21 22:46:50 2022] exit_to_usermode_loop+0x6e/0x100
[Thu Jul 21 22:46:50 2022] prepare_exit_to_usermode+0x9c/0xc0
[Thu Jul 21 22:46:50 2022] retint_user+0x8/0x8
系统大量的内核进程开始运行,异常缓慢:
#ps r -A PID TTY STAT TIME COMMAND 26 ? R 0:00 [migration/3] 27 ? R 0:11 [ksoftirqd/3] 77 ? R 0:00 [migration/13] 78 ? R 0:00 [ksoftirqd/13] 226 ? R 1:18 [migration/42] 313 ? R 0:00 [migration/59] 314 ? R 0:00 [ksoftirqd/59] 498 ? R 0:00 [migration/96] 499 ? R 0:00 [ksoftirqd/96] 518 ? R 0:00 [migration/100] 519 ? R 0:00 [ksoftirqd/100] 578 ? R 0:00 [migration/112] 579 ? R 0:00 [ksoftirqd/112] 653 ? R 0:00 [migration/127] 654 ? R 0:00 [ksoftirqd/127] 742 ? R 0:01 [kworker/42:1-mm] 1016 ? D 0:06 [load_calc]
采用 内核崩溃dump解决方案 强制core dump,重启服务器
重启后 [load_calc]
系统进程依然 D
但是看不出其他异常,系统运行应用完全正常。这个问题后续再想办法排查…