排查Linux D住进程原因

查询僵尸和D进程

僵尸进程 zombie 是虽然不占用资源,但是表明系统有不正常的资源回收问题。

根据经验,我们知道Linux中有一种状态为 D 的进程,会阻塞CPU使用(因为CPU需要等待设备返回,例如等待磁盘),所以我们检查一下系统是否存在 D 进程:

ps r -A

可以看到如下进程:

PID TTY      STAT   TIME COMMAND
...
932 ?        D     78:08 [load_calc]
32180 ?      Dl    49:37 /opt/taobao/java/bin/java -classpath /opt/flink/conf:/opt/flink/lib/blink-launcher-blink-3.4.3-SNAPSHOT.jar:/opt/flink/lib/blink-metrics-ceresdb-blink-3.4.3-SNAPSHOT.jar:/opt/flink/lib/blink-pangu-fs-blink-3.4.3-SNAPSHOT.jar:/opt/flink/lib/blink-statebackend-antkv_2.11-blink-3.4.3-SNAPSHOT.jar:/opt/flink/lib/blink-statebackend-gemini-2.1.19-20210307.140847-1.jar:/opt/flink/lib/
...
184543 ?     Dl   444:32 /opt/taobao/java/bin/java -classpath /opt/flink/conf:/opt/flink/lib/blink-launcher-blink-3.4.2-SNAPSHOT.jar:/opt/flink/lib/blink-metrics-ceresdb-blink-3.4.2-SNAPSHOT.jar:/opt/flink/lib/blink-pangu-fs-blink-3.4.2-SNAPSHOT.jar:/opt/flink/lib/blink-statebackend-antkv_2.11-blink-3.4.2-SNAPSHOT.jar:/opt/flink/lib/blink-statebackend-gemini-2.1.19-20210307.140847-1.jar:/opt/flink/lib/
...
  • 检查 D 住进程的堆栈:
d_process_stack
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
#cat /proc/932/stack 
[<0>] load_calc_func+0x57/0x130
[<0>] kthread+0xf5/0x130
[<0>] ret_from_fork+0x1f/0x30
[<0>] 0xffffffffffffffff

#cat /proc/32180/stack
[<0>] __refrigerator+0x75/0x160
[<0>] do_exit+0x224/0xc60
[<0>] do_group_exit+0x3a/0xa0
[<0>] get_signal+0x156/0x8c0
[<0>] do_signal+0x36/0x610
[<0>] exit_to_usermode_loop+0x95/0x100
[<0>] do_syscall_64+0x178/0x1a0
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[<0>] 0xffffffffffffffff

#cat /proc/184543/stack
[<0>] __refrigerator+0x75/0x160
[<0>] do_exit+0x224/0xc60
[<0>] do_group_exit+0x3a/0xa0
[<0>] get_signal+0x156/0x8c0
[<0>] do_signal+0x36/0x610
[<0>] exit_to_usermode_loop+0x95/0x100
[<0>] do_syscall_64+0x178/0x1a0
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[<0>] 0xffffffffffffffff

可以看到上述 blink 进程在 entry_SYSCALL_64_after_hwframe+0x44/0xa9 出现异常阻塞

进程信息排查

Linux “魔力” 系统请求组合键 SysRq 的提供了 t 指令dump出当前任务的信息,所以我在服务器上执行以下命令,打开进程信息:

echo t > /proc/sysrq-trigger
  • 此时, dmesg -T 可以看到大量的系统调用被记录下来,其中有非常值得注意的 “Bad RIP value”含义

可以看到 “Bad RIP value”含义 同样集中在系统调用 entry_SYSCALL_64_after_hwframe+0x44/0xa9 ,也就是阻塞进程的调用。

dump_tasks_info
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
[Tue Aug 10 17:16:00 2021] Call Trace:
[Tue Aug 10 17:16:00 2021]  ? __schedule+0x3cf/0x660
[Tue Aug 10 17:16:00 2021]  schedule+0x33/0xc0
[Tue Aug 10 17:16:00 2021]  schedule_hrtimeout_range_clock+0xb9/0x1a0
[Tue Aug 10 17:16:00 2021]  ? __hrtimer_init+0xb0/0xb0
[Tue Aug 10 17:16:00 2021]  ep_poll+0x351/0x450
[Tue Aug 10 17:16:00 2021]  ? do_futex+0x2f7/0x590
[Tue Aug 10 17:16:00 2021]  ? wake_up_q+0x70/0x70
[Tue Aug 10 17:16:00 2021]  do_epoll_wait+0xaf/0xd0
[Tue Aug 10 17:16:00 2021]  __x64_sys_epoll_wait+0x1a/0x20
[Tue Aug 10 17:16:00 2021]  do_syscall_64+0x55/0x1a0
[Tue Aug 10 17:16:00 2021]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[Tue Aug 10 17:16:00 2021] RIP: 0033:0x7f4f6a70b4f3
[Tue Aug 10 17:16:00 2021] Code: Bad RIP value.
[Tue Aug 10 17:16:00 2021] RSP: 002b:00007ffc43c43470 EFLAGS: 00000293 ORIG_RAX: 00000000000000e8
[Tue Aug 10 17:16:00 2021] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f4f6a70b4f3
[Tue Aug 10 17:16:00 2021] RDX: 0000000000000040 RSI: 0000564608bb2dc0 RDI: 0000000000000007
[Tue Aug 10 17:16:00 2021] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000004000001
[Tue Aug 10 17:16:00 2021] R10: 000000000000e95f R11: 0000000000000293 R12: 00005646074fce40
[Tue Aug 10 17:16:00 2021] R13: 0000000000000000 R14: 0000000000000000 R15: 00005646074fce40
[Tue Aug 10 17:16:00 2021] auditd          S    0  2101      1 0x00000000
[Tue Aug 10 17:16:00 2021] Call Trace:
[Tue Aug 10 17:16:00 2021]  ? __schedule+0x3cf/0x660
[Tue Aug 10 17:16:00 2021]  ? __ext4_journal_stop+0x36/0xa0 [ext4]
[Tue Aug 10 17:16:00 2021]  schedule+0x33/0xc0
[Tue Aug 10 17:16:00 2021]  futex_wait_queue_me+0xc1/0x120
[Tue Aug 10 17:16:00 2021]  futex_wait+0xf6/0x250
[Tue Aug 10 17:16:00 2021]  do_futex+0x12b/0x590
[Tue Aug 10 17:16:00 2021]  __x64_sys_futex+0x88/0x180
[Tue Aug 10 17:16:00 2021]  do_syscall_64+0x55/0x1a0
[Tue Aug 10 17:16:00 2021]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[Tue Aug 10 17:16:00 2021] RIP: 0033:0x7f4f6b4269b5
[Tue Aug 10 17:16:00 2021] Code: Bad RIP value.
[Tue Aug 10 17:16:00 2021] RSP: 002b:00007f4f68caad20 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
[Tue Aug 10 17:16:00 2021] RAX: ffffffffffffffda RBX: 00005646074fa220 RCX: 00007f4f6b4269b5
[Tue Aug 10 17:16:00 2021] RDX: 000000000a1ab4e3 RSI: 0000000000000080 RDI: 00005646074fa254
[Tue Aug 10 17:16:00 2021] RBP: 00005646074fa228 R08: 00005646074fa200 R09: 00000000050d5a71
[Tue Aug 10 17:16:00 2021] R10: 0000000000000000 R11: 0000000000000246 R12: 00005646074fa250
[Tue Aug 10 17:16:00 2021] R13: 0000564608bb76c0 R14: 0000000000000000 R15: 00007f4f68cab700
[Tue Aug 10 17:16:00 2021] kworker/107:1H  I    0  2063      2 0x80000000
[Tue Aug 10 17:16:00 2021] Workqueue:            (null) (kblockd)