排查Linux D住进程原因

查询僵尸和D进程

僵尸进程 zombie 是虽然不占用资源,但是表明系统有不正常的资源回收问题。

根据经验,我们知道Linux中有一种状态为 D 的进程,会阻塞CPU使用(因为CPU需要等待设备返回,例如等待磁盘),所以我们检查一下系统是否存在 D 进程:

ps r -A

可以看到如下进程:

PID TTY      STAT   TIME COMMAND
...
932 ?        D     78:08 [load_calc]
32180 ?      Dl    49:37 /opt/taobao/java/bin/java -classpath /opt/flink/conf:/opt/flink/lib/blink-launcher-blink-3.4.3-SNAPSHOT.jar:/opt/flink/lib/blink-metrics-ceresdb-blink-3.4.3-SNAPSHOT.jar:/opt/flink/lib/blink-pangu-fs-blink-3.4.3-SNAPSHOT.jar:/opt/flink/lib/blink-statebackend-antkv_2.11-blink-3.4.3-SNAPSHOT.jar:/opt/flink/lib/blink-statebackend-gemini-2.1.19-20210307.140847-1.jar:/opt/flink/lib/
...
184543 ?     Dl   444:32 /opt/taobao/java/bin/java -classpath /opt/flink/conf:/opt/flink/lib/blink-launcher-blink-3.4.2-SNAPSHOT.jar:/opt/flink/lib/blink-metrics-ceresdb-blink-3.4.2-SNAPSHOT.jar:/opt/flink/lib/blink-pangu-fs-blink-3.4.2-SNAPSHOT.jar:/opt/flink/lib/blink-statebackend-antkv_2.11-blink-3.4.2-SNAPSHOT.jar:/opt/flink/lib/blink-statebackend-gemini-2.1.19-20210307.140847-1.jar:/opt/flink/lib/
...
  • 检查 D 住进程的堆栈:

d_process_stack
 1#cat /proc/932/stack 
 2[<0>] load_calc_func+0x57/0x130
 3[<0>] kthread+0xf5/0x130
 4[<0>] ret_from_fork+0x1f/0x30
 5[<0>] 0xffffffffffffffff
 6
 7#cat /proc/32180/stack
 8[<0>] __refrigerator+0x75/0x160
 9[<0>] do_exit+0x224/0xc60
10[<0>] do_group_exit+0x3a/0xa0
11[<0>] get_signal+0x156/0x8c0
12[<0>] do_signal+0x36/0x610
13[<0>] exit_to_usermode_loop+0x95/0x100
14[<0>] do_syscall_64+0x178/0x1a0
15[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
16[<0>] 0xffffffffffffffff
17
18#cat /proc/184543/stack
19[<0>] __refrigerator+0x75/0x160
20[<0>] do_exit+0x224/0xc60
21[<0>] do_group_exit+0x3a/0xa0
22[<0>] get_signal+0x156/0x8c0
23[<0>] do_signal+0x36/0x610
24[<0>] exit_to_usermode_loop+0x95/0x100
25[<0>] do_syscall_64+0x178/0x1a0
26[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
27[<0>] 0xffffffffffffffff

可以看到:

  • [load_calc] 系统进程在 ret_from_fork+0x1f/0x30 出现异常阻塞

  • blink 进程在 entry_SYSCALL_64_after_hwframe+0x44/0xa9 出现异常阻塞

进程信息排查

Linux “魔力” 系统请求组合键 SysRq 的提供了 t 指令dump出当前任务的信息,所以我在服务器上执行以下命令,打开进程信息:

echo t > /proc/sysrq-trigger
  • 此时, dmesg -T 可以看到大量的系统调用被记录下来,其中有非常值得注意的 “Bad RIP value”含义

可以看到 “Bad RIP value”含义 同样集中在系统调用 entry_SYSCALL_64_after_hwframe+0x44/0xa9 ,也就是阻塞进程的调用。

dump_tasks_info
 1[Tue Aug 10 17:16:00 2021] Call Trace:
 2[Tue Aug 10 17:16:00 2021]  ? __schedule+0x3cf/0x660
 3[Tue Aug 10 17:16:00 2021]  schedule+0x33/0xc0
 4[Tue Aug 10 17:16:00 2021]  schedule_hrtimeout_range_clock+0xb9/0x1a0
 5[Tue Aug 10 17:16:00 2021]  ? __hrtimer_init+0xb0/0xb0
 6[Tue Aug 10 17:16:00 2021]  ep_poll+0x351/0x450
 7[Tue Aug 10 17:16:00 2021]  ? do_futex+0x2f7/0x590
 8[Tue Aug 10 17:16:00 2021]  ? wake_up_q+0x70/0x70
 9[Tue Aug 10 17:16:00 2021]  do_epoll_wait+0xaf/0xd0
10[Tue Aug 10 17:16:00 2021]  __x64_sys_epoll_wait+0x1a/0x20
11[Tue Aug 10 17:16:00 2021]  do_syscall_64+0x55/0x1a0
12[Tue Aug 10 17:16:00 2021]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
13[Tue Aug 10 17:16:00 2021] RIP: 0033:0x7f4f6a70b4f3
14[Tue Aug 10 17:16:00 2021] Code: Bad RIP value.
15[Tue Aug 10 17:16:00 2021] RSP: 002b:00007ffc43c43470 EFLAGS: 00000293 ORIG_RAX: 00000000000000e8
16[Tue Aug 10 17:16:00 2021] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f4f6a70b4f3
17[Tue Aug 10 17:16:00 2021] RDX: 0000000000000040 RSI: 0000564608bb2dc0 RDI: 0000000000000007
18[Tue Aug 10 17:16:00 2021] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000004000001
19[Tue Aug 10 17:16:00 2021] R10: 000000000000e95f R11: 0000000000000293 R12: 00005646074fce40
20[Tue Aug 10 17:16:00 2021] R13: 0000000000000000 R14: 0000000000000000 R15: 00005646074fce40
21[Tue Aug 10 17:16:00 2021] auditd          S    0  2101      1 0x00000000
22[Tue Aug 10 17:16:00 2021] Call Trace:
23[Tue Aug 10 17:16:00 2021]  ? __schedule+0x3cf/0x660
24[Tue Aug 10 17:16:00 2021]  ? __ext4_journal_stop+0x36/0xa0 [ext4]
25[Tue Aug 10 17:16:00 2021]  schedule+0x33/0xc0
26[Tue Aug 10 17:16:00 2021]  futex_wait_queue_me+0xc1/0x120
27[Tue Aug 10 17:16:00 2021]  futex_wait+0xf6/0x250
28[Tue Aug 10 17:16:00 2021]  do_futex+0x12b/0x590
29[Tue Aug 10 17:16:00 2021]  __x64_sys_futex+0x88/0x180
30[Tue Aug 10 17:16:00 2021]  do_syscall_64+0x55/0x1a0
31[Tue Aug 10 17:16:00 2021]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
32[Tue Aug 10 17:16:00 2021] RIP: 0033:0x7f4f6b4269b5
33[Tue Aug 10 17:16:00 2021] Code: Bad RIP value.
34[Tue Aug 10 17:16:00 2021] RSP: 002b:00007f4f68caad20 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
35[Tue Aug 10 17:16:00 2021] RAX: ffffffffffffffda RBX: 00005646074fa220 RCX: 00007f4f6b4269b5
36[Tue Aug 10 17:16:00 2021] RDX: 000000000a1ab4e3 RSI: 0000000000000080 RDI: 00005646074fa254
37[Tue Aug 10 17:16:00 2021] RBP: 00005646074fa228 R08: 00005646074fa200 R09: 00000000050d5a71
38[Tue Aug 10 17:16:00 2021] R10: 0000000000000000 R11: 0000000000000246 R12: 00005646074fa250
39[Tue Aug 10 17:16:00 2021] R13: 0000564608bb76c0 R14: 0000000000000000 R15: 00007f4f68cab700
40[Tue Aug 10 17:16:00 2021] kworker/107:1H  I    0  2063      2 0x80000000
41[Tue Aug 10 17:16:00 2021] Workqueue:            (null) (kblockd)

再次出现 [load_calc] 系统进程D

我在 AMD CPU c-state ,调整了内核参数( processor.max_cstate=1 intel_idle.max_cstate=0 )关闭海光CPU的电源管理,禁止进入idle状态( CPU c-state )。但是,服务器重启,再次出现 [load_calc] 进程D。检查进程堆栈,也和上文相同。

结果发现这次命令卡住没有返回,同时系统一下子出现了更多的D进程:

$ps r -A | grep D
   PID TTY      STAT   TIME COMMAND
  1016 ?        D      0:06 [load_calc]
 13693 ?        D      0:00 runc init
 13806 ?        D      0:00 runc init
 13895 ?        D      0:00 runc init
 14020 ?        D      0:00 runc init
 14073 ?        D      0:00 runc init
 14095 ?        D      0:00 runc init
 16303 ?        Dl     0:00 runsc --version
 18678 ?        Dl     0:00 runsc --version
458475 ?        D      0:01 [kworker/u256:0+]

重复出现如下调用 :

重复出现大量 entry_SYSCALL_64_after_hwframe D住
[Thu Jul 21 22:46:47 2022] RIP: 0033:0x460193
[Thu Jul 21 22:46:47 2022] Code: 63 d5 48 8b 34 d0 e8 1c ae fb ff 83 c5 01 48 89 d8 89 2d a8 87 28 00 48 83 c4 08 5b 5d c3 66 0f 1f 84 00 00 00 00 00 b3 01 c7 <05> a4 87 28 00 01 00 00 00 b2 01 48 01 eb 80 3b 7b 0f 85 cc fe ff
[Thu Jul 21 22:46:47 2022] RSP: 002b:000000c0009a1df8 EFLAGS: 00000286 ORIG_RAX: 00000000000000ca
[Thu Jul 21 22:46:47 2022] RAX: ffffffffffffffda RBX: 000000c000259180 RCX: 0000000000460193
[Thu Jul 21 22:46:47 2022] RDX: 0000000000000000 RSI: 0000000000000080 RDI: 000000c0002592c8
[Thu Jul 21 22:46:47 2022] RBP: 000000c0009a1e40 R08: 0000000000000000 R09: 0000000000000000
[Thu Jul 21 22:46:47 2022] R10: 0000000000000000 R11: 0000000000000286 R12: 0000000000000005
[Thu Jul 21 22:46:47 2022] R13: 0000000000000001 R14: 000000c00152e360 R15: 0000000000000010
[Thu Jul 21 22:46:47 2022] resource-agent  S    0 24480   9939 0x00000080
[Thu Jul 21 22:46:47 2022] Call Trace:
[Thu Jul 21 22:46:47 2022]  ? __schedule+0x396/0x6a0
[Thu Jul 21 22:46:47 2022]  schedule+0x33/0xc0
[Thu Jul 21 22:46:47 2022]  futex_wait_queue_me+0xc1/0x120
[Thu Jul 21 22:46:47 2022]  futex_wait+0xf6/0x250
[Thu Jul 21 22:46:47 2022]  ? get_futex_key+0x3a9/0x420
[Thu Jul 21 22:46:47 2022]  do_futex+0xb8/0x600
[Thu Jul 21 22:46:47 2022]  __x64_sys_futex+0xa3/0x1a0
[Thu Jul 21 22:46:47 2022]  ? __audit_syscall_exit+0x206/0x290
[Thu Jul 21 22:46:47 2022]  do_syscall_64+0x7b/0x200
[Thu Jul 21 22:46:47 2022]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
退出用户态也卡住
[Thu Jul 21 22:46:49 2022] RIP: 0033:0x40a821
[Thu Jul 21 22:46:49 2022] Code: 00 17 4b 00 00 11 00 19 00 f0 a5 6e 00 00 00 00 00 08 00 00 00 00 00 00 00 bf 7e 00 00 12 00 0d 00 40 15 44 00 00 00 00 00 72 <05> 00 00 00 00 00 00 a5 1f 00 00 12 00 0d 00 b0 51 45 00 00 00 00
[Thu Jul 21 22:46:49 2022] RSP: 002b:000000c0016d3ed8 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13
[Thu Jul 21 22:46:49 2022] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000026e4e00
[Thu Jul 21 22:46:49 2022] RDX: 000000c0031b6800 RSI: 000000c0031b5000 RDI: 0000000000000002
[Thu Jul 21 22:46:49 2022] RBP: 000000c0016d3e40 R08: 0000000000000000 R09: 0000000000000000
[Thu Jul 21 22:46:49 2022] R10: 0000000000000000 R11: 0000000000000286 R12: 0000000000000075
[Thu Jul 21 22:46:49 2022] R13: 0000000002675000 R14: 0000000000000000 R15: 0000000000000000
[Thu Jul 21 22:46:49 2022] resource-agent  S    0 49034   9939 0x00000080
[Thu Jul 21 22:46:49 2022] Call Trace:
[Thu Jul 21 22:46:49 2022]  ? __schedule+0x396/0x6a0
[Thu Jul 21 22:46:49 2022]  ? sched_clock+0x5/0x10
[Thu Jul 21 22:46:49 2022]  ? sched_clock_cpu+0xc/0xa0
[Thu Jul 21 22:46:49 2022]  ? record_times+0x16/0x80
[Thu Jul 21 22:46:49 2022]  ? psi_group_change+0x47/0x200
[Thu Jul 21 22:46:49 2022]  ? psi_task_switch+0xa9/0x170
[Thu Jul 21 22:46:49 2022]  ? __schedule+0x396/0x6a0
[Thu Jul 21 22:46:49 2022]  ? schedule+0x33/0xc0
[Thu Jul 21 22:46:49 2022]  ? exit_to_usermode_loop+0x6e/0x100
[Thu Jul 21 22:46:49 2022]  ? prepare_exit_to_usermode+0x9c/0xc0
[Thu Jul 21 22:46:49 2022]  ? retint_user+0x8/0x8
[Thu Jul 21 22:46:49 2022] resource-agent  R  running task        0 71826   9939 0x00000080
[Thu Jul 21 22:46:49 2022] Call Trace:
[Thu Jul 21 22:46:50 2022]  ? __schedule+0x396/0x6a0
[Thu Jul 21 22:46:50 2022]  schedule+0x33/0xc0
[Thu Jul 21 22:46:50 2022]  exit_to_usermode_loop+0x6e/0x100
[Thu Jul 21 22:46:50 2022]  prepare_exit_to_usermode+0x9c/0xc0
[Thu Jul 21 22:46:50 2022]  retint_user+0x8/0x8
  • 系统大量的内核进程开始运行,异常缓慢:

    #ps r -A
       PID TTY      STAT   TIME COMMAND
        26 ?        R      0:00 [migration/3]
        27 ?        R      0:11 [ksoftirqd/3]
        77 ?        R      0:00 [migration/13]
        78 ?        R      0:00 [ksoftirqd/13]
       226 ?        R      1:18 [migration/42]
       313 ?        R      0:00 [migration/59]
       314 ?        R      0:00 [ksoftirqd/59]
       498 ?        R      0:00 [migration/96]
       499 ?        R      0:00 [ksoftirqd/96]
       518 ?        R      0:00 [migration/100]
       519 ?        R      0:00 [ksoftirqd/100]
       578 ?        R      0:00 [migration/112]
       579 ?        R      0:00 [ksoftirqd/112]
       653 ?        R      0:00 [migration/127]
       654 ?        R      0:00 [ksoftirqd/127]
       742 ?        R      0:01 [kworker/42:1-mm]
      1016 ?        D      0:06 [load_calc]
    
  • 采用 内核崩溃dump解决方案 强制core dump,重启服务器

重启后 [load_calc] 系统进程依然 D 但是看不出其他异常,系统运行应用完全正常。这个问题后续再想办法排查…

参考