磁盘空间打爆以后恢复journal日志

遇到一个线上问题,根目录磁盘空间被打爆以后, journal 日志文件损坏了:

  • 检查磁盘硬件应该是正常的,因为能够独立删除 /var/log 目录下文件,并且也能够写文件

  • 但是执行任何 journalctl 命令都会报错 Error was encountered while opening journal files: Input/output error :

无法操作journal日志文件,始终报错
#journalctl --verify
Error was encountered while opening journal files: Input/output error

#journalctl -e
Error was encountered while opening journal files: Input/output error

#journalctl --vacuum-size=10M
Error was encountered while opening journal files: Input/output error

已经手工清理了 /var/log 目录下 的一些 sar 文件,也就是空出了根目录大约几百兆

  • 尝试校验journal日志文件

DEBUG模式校验journal日志文件
SYSTEMD_LOG_LEVEL=debug journalctl --verify

但是还是不行

  • 实在找不出解决的方法,似乎没有 repair 命令参数,所以最后还是清理掉所有历史日志重新开始:

无法修复jouranl日志文件,所以清理掉所有日志重新开始
#journalctl
Error was encountered while opening journal files: Input/output error

#systemctl stop systemd-journald.service
Warning: Stopping systemd-journald.service, but it can still be activated by:
  systemd-journald.socket

#systemctl stop systemd-journald.socket

#rm -f /var/log/journal/0eef22fd6e9d4b3da022179e6b831d26/*

#ls

#systemctl start systemd-journald.service

#ls
system.journal

#journalctl
-- Logs begin at Wed 2023-12-27 21:54:01 CST, end at Wed 2023-12-27 21:54:03 CST. --
Dec 27 21:54:01 gpuxdn033188212154.ea133 systemd-journal[347367]: Permanent journal is using 8.0M (max allowed 4.0G, trying to leave 4.0G free of 4.4G available → current limit 488.5M).
Dec 27 21:54:01 gpuxdn033188212154.ea133 systemd-journald[341770]: Received SIGTERM from PID 1 (systemd).
Dec 27 21:54:01 gpuxdn033188212154.ea133 systemd-journal[347367]: Journal started
Dec 27 21:54:01 gpuxdn033188212154.ea133 polkitd[199757]: Unregistered Authentication Agent for unix-process:347362:2498725326 (system bus name :1.1430, object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale en_US.UTF-8) (disconnected from bu
Dec 27 21:54:01 gpuxdn033188212154.ea133 admin[347378]: alicmd:root:systemctl start systemd-journald.service:admin pts/0 2023-12-27 21:22 (33.189.253.79)
Dec 27 21:54:02 gpuxdn033188212154.ea133 admin[347542]: alicmd:root:ls:admin pts/0 2023-12-27 21:22 (33.189.253.79)
Dec 27 21:54:03 gpuxdn033188212154.ea133 su[347621]: (to root) root on none
Dec 27 21:54:03 gpuxdn033188212154.ea133 su[347621]: pam_unix(su:session): session opened for user root by (uid=0)
Dec 27 21:54:03 gpuxdn033188212154.ea133 su[347621]: pam_unix(su:session): session closed for user root

参考