排查已经被删除pods(重启)
在kubernetes集群中, deployments
的pods在异常crash时会自动恢复,不过我们还是需要排查原先pods为何会crash:
kubectl -n kube-system get pods -o wide| grep scheduler
显示:
scheduler-86c895bdc9-b7rnr 1/1 Running 3 65d 192.168.37.200 z-k8s-m-1 <none> <none>
scheduler-86c895bdc9-ng4gw 1/1 Running 5 65d 192.168.37.192 z-k8s-m-2 <none> <none>
scheduler-86c895bdc9-w8mrh 1/1 Running 4 60d 192.168.37.231 z-k8s-m-3 <none> <none>
要知道哪个pod重启了,可以通过查看指定
namespace
的events:kubectl -n kube-system get event
会看到最近 1
小时事件:
LAST SEEN TYPE REASON OBJECT MESSAGE
7m9s Warning Unhealthy pod/apiserver-65cc4bc696-7grkn Readiness probe failed: Get https://192.168.37.203:6443/readyz: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
59m Warning Unhealthy pod/apiserver-65cc4bc696-kndsn Readiness probe failed: Get https://192.168.37.158:6443/readyz: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
29m Normal Pulling pod/scheduler-756b6fcfc7-9wxww Pulling image "reg.docker.huatai.me/k8s/scheduler:release-v1.2.5_20211026111044_216c6766"
29m Normal Pulled pod/scheduler-756b6fcfc7-9wxww Successfully pulled image "reg.docker.huatai.me/k8s/scheduler:release-v1.2.5_20211026111044_216c6766"
29m Normal Created pod/scheduler-756b6fcfc7-9wxww Created container scheduler
29m Normal Started pod/scheduler-756b6fcfc7-9wxww Started container scheduler
29m Normal WithOutPostStartHook pod/scheduler-756b6fcfc7-9wxww Container scheduler with out poststart hook
29m Normal Pulling pod/scheduler-86c895bdc9-ng4gw Pulling image "reg.docker.huatai.me/k8s/scheduler:release-v1.2.4_20210928203943_53875ece"
29m Normal Pulled pod/scheduler-86c895bdc9-ng4gw Successfully pulled image "reg.docker.huatai.me/k8s/scheduler:release-v1.2.4_20210928203943_53875ece"
29m Normal Created pod/scheduler-86c895bdc9-ng4gw Created container scheduler
29m Normal Started pod/scheduler-86c895bdc9-ng4gw Started container scheduler
29m Normal WithOutPostStartHook pod/scheduler-86c895bdc9-ng4gw Container scheduler with out poststart hook
34m Normal Pulling pod/scheduler-86c895bdc9-w8mrh Pulling image "reg.docker.huatai.me/k8s/scheduler:release-v1.2.4_20210928203943_53875ece"
34m Normal Pulled pod/scheduler-86c895bdc9-w8mrh Successfully pulled image "reg.docker.huatai.me/k8s/scheduler:release-v1.2.4_20210928203943_53875ece"
34m Normal Created pod/scheduler-86c895bdc9-w8mrh Created container scheduler
34m Normal Started pod/scheduler-86c895bdc9-w8mrh Started container scheduler
34m Normal WithOutPostStartHook pod/scheduler-86c895bdc9-w8mrh Container scheduler with out poststart hook
要过滤出指定pods,一种方法是直接使用 grep
命令,例如 kubectl get events | grep scheduler-86c895bdc9-w8mrh
。另外一种方法是采用标准的 field-selector
:
首先找出
field
kubectl -n kube-system get events --output json可以看到:
... { "apiVersion": "v1", ... "involvedObject": { ... "name": "scheduler-86c895bdc9-w8mrh", ... } }
可以知道
field-selector
就是involvedObject.name
,所以查询命令:kubectl -n kube-system get events --field-selector involvedObject.name=scheduler-86c895bdc9-w8mrh效果和使用
grep
相同
但是,这里存在一个问题,就是你不知道重启前,被替换掉的pod的名字,所以你就很难查看之前pods如何crash掉的。
以前版本有一个查询
-a
参数:kubectl -n kube-system get pods -a
类似于 docker ps -a
可以查看已经清理掉掉pods,不过,近期版本已经不再支持,需要采用命令:
kubectl -n kube-system get event -o custom-columns=NAME:.metadata.name | cut -d "." -f1
但是,很不幸,这个命令只输出最近一个小时的时间。对于我上述案例,最近一小时并没有包含之前创建pods的情况记录,所以也找不到对应pods名字。
不过,对于管控pods,调度的 scheduler
只在限定的节点 ( z-k8s-m-1
到 z-k8s-m-3
),所以还是可以找到对于pods上的记录和日志
备注
获取pod重启时间 用于找出最近重启的应用服务器(重建)