kind部署 fedora-dev-tini
(tini替代systmed)¶
在完成 Fedora镜像(采用tini替代systemd) 制作之后,将镜像推送到 kind集群本地Registry 再次尝试部署个人开发环境( 暂时放弃 kind部署 fedora-dev )
准备工作¶
将制作完成的 Fedora镜像(采用tini替代systemd)
fedora-dev-tini
打上tag并推送Local Registry:
docker tag fedora-dev-tini localhost:5001/fedora-dev-tini:latest
docker push localhost:5001/fedora-dev-tini
部署¶
简单部署¶
参考 Kubernetes部署Squid快速起步 相似部署:
---
apiVersion: v1
kind: Service
metadata:
name: fedora-dev-service
labels:
app: fedora-dev-tini
spec:
#type: LoadBalancer
ports:
- name: ssh
protocol: TCP
port: 22
targetPort: 22
- name: http
protocol: TCP
port: 80
targetPort: 80
- name: https
protocol: TCP
port: 443
targetPort: 443
selector:
app: fedora-dev-tini
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: fedora-dev-tini
spec:
replicas: 1
selector:
matchLabels:
app: fedora-dev-tini
template:
metadata:
labels:
app: fedora-dev-tini
spec:
containers:
- name: fedora-dev-tini
image: localhost:5001/fedora-dev-tini:latest
ports:
- containerPort: 22
name: ssh
protocol: TCP
部署:
kubectl apply -f fedora-dev-tini-deployment.yaml
(我已经修正 Fedora镜像(采用tini替代systemd)
fedora-dev-tini
镜像)如果一切正常(显然不会这么简单,见下文我的折腾),此时会看到:
% kubectl get pods
NAME READY STATUS RESTARTS AGE
fedora-dev-tini-6d6d88c84f-864s7 1/1 Running 0 25s
% kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
fedora-dev-service ClusterIP 10.96.175.32 <none> 22/TCP,80/TCP,443/TCP 47h
备注
由于 fedora-dev-tini
是用于开发的容器,和服务容器有所区别:
(可选)如果 配置Liveness, Readiness和Startup侦测 应该仅检测默认启动的 ssh服务 服务端口,不检测其他服务端口(http/https),因为开发主机不能保证WEB服务始终可用,但ssh服务是始终提供的
备注
部署成功也仅是第一步,因为还需要配置 kind Ingress 和 在 kind 部署MetalLB 才能对外提供服务
异常排查¶
备注
我在实践中,按照上文部署 fedora-dev-tini
遇到很多波折,原因是我最初构建 Fedora镜像(采用tini替代systemd) 时使用的 /entrypoint.sh
脚本采用的是 Docker tini进程管理器 的 entrypoint_ssh_cron_bash
:
#!/usr/bin/env bash
sshd() {
/usr/bin/ssh-keygen -A
/usr/sbin/sshd
}
crond() {
/usr/sbin/crond
}
main() {
sshd
crond
# 这里最后执行/bin/bash在docker中没有问题,但是K8s检测程序运行结束会判断pod终止crash,所以无法running
/bin/bash
}
main
排查和修正方法见下文,实际通过改进 /entrypoint.sh
脚本来解决(见最后)
覆盖Dockerfile的 ENTRYPOINT
和 CMD
¶
和 kind部署 fedora-dev 类似,启动的
fedora-dev-tini
pod也存在CrashLoopBackOff
,所以采用 配置Liveness, Readiness和Startup侦测 中强制配置:
---
apiVersion: v1
kind: Service
metadata:
name: fedora-dev-service
labels:
app: fedora-dev-tini-force
spec:
#type: LoadBalancer
ports:
- name: ssh
protocol: TCP
port: 22
targetPort: 22
- name: http
protocol: TCP
port: 80
targetPort: 80
- name: https
protocol: TCP
port: 443
targetPort: 443
selector:
app: fedora-dev-tini-force
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: fedora-dev-tini-force
spec:
replicas: 1
selector:
matchLabels:
app: fedora-dev-tini-force
template:
metadata:
labels:
app: fedora-dev-tini-force
spec:
containers:
- args:
- date; sleep 10; echo 'Hello from fedora-dev'; touch /tmp/healthy; sleep 1; while
true; do sleep 120; done;
command:
- /bin/bash
- -ec
name: fedora-dev
image: localhost:5001/fedora-dev-tini:latest
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
failureThreshold: 8
initialDelaySeconds: 15
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 15
ports:
- containerPort: 22
name: ssh
protocol: TCP
强制启动:
kubectl apply -f fedora-dev-tini-force-deployment.yaml
奇怪,在Docker环境中运行 fedora-dev-tini
是自动启动了 sshd
和 crond
,这说明镜像工作正常, 镜像根目录下的 /tini
和 entrypoint.sh
脚本是正常运行的,为何到了 kind
环境不能工作了?
参考 ENTRYPOINT和COMMAND差异和协作 的对比表:
说明 |
Dockerfile字段 |
Kubernetes字段 |
---|---|---|
在容器中运行的命令 |
|
|
传递给命令的参数 |
|
|
Kubernetes 的 command
和 args
分别覆盖 从Dockerfile构建Docker镜像 的 Entrypoint
和 Cmd
,也就是说,这里强制启动的 fedora-dev-tini-force-deployment.yaml
配置行:
spec:
containers:
- args:
- date; sleep 10; echo 'Hello from fedora-dev'; touch /tmp/healthy; sleep 1; while
true; do sleep 120; done;
command:
- /bin/bash
- -ec
实际上完整覆盖掉了 Fedora镜像(采用tini替代systemd) 镜像中定义的 Entrypont
和 Cmd
,也就是完全不运行 从Dockerfile构建Docker镜像 中的命令,所以强制启动仅仅是用于检查镜像内容,而不适合验证Docker容器是否正常运行。
仅覆盖Dockerfile的 CMD
¶
如果不是怀疑 Dockerfile 中的 Entrypoint
异常( Docker tini进程管理器 是标准程序,通常不会出错 ),那么在 Kubernetes的 yaml 文件中,就不要定义 command
,只定义 args
。这要就巧妙地覆盖掉了 Fedora镜像(采用tini替代systemd) 中的运行参数,也就是传递给 Docker tini进程管理器 的运行脚本 /entrypoint.sh
,而代之以自己定义的替代脚本(一定是准确运行的,生成一个检测文件):
spec:
containers:
- name: fedora-dev-tini
image: localhost:5001/fedora-dev-tini:latest
args:
- /bin/sh
- -c
- touch /tmp/healthy; sleep 10; echo 'Hello from fedora-dev-tini'; date; while
true; do (sleep 120; echo 'Hello from fedora-dev-tini'; date); done
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 5
这样,仅仅覆盖 /entrypoint.sh
还是可以验证 /tini
是否运行正确
尝试
fedora-dev-tini-args-deployment.yaml
:
---
apiVersion: v1
kind: Service
metadata:
name: fedora-dev-service
labels:
app: fedora-dev-tini-args
spec:
#type: LoadBalancer
ports:
- name: ssh
protocol: TCP
port: 22
targetPort: 22
- name: http
protocol: TCP
port: 80
targetPort: 80
- name: https
protocol: TCP
port: 443
targetPort: 443
selector:
app: fedora-dev-tini-args
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: fedora-dev-tini-args
spec:
replicas: 1
selector:
matchLabels:
app: fedora-dev-tini-args
template:
metadata:
labels:
app: fedora-dev-tini-args
spec:
containers:
- name: fedora-dev-tini-args
image: localhost:5001/fedora-dev-tini:latest
ports:
- containerPort: 22
name: ssh
protocol: TCP
args:
- /bin/sh
- -c
- touch /tmp/healthy; sleep 10; echo 'Hello from fedora-dev-tini'; date; while true; do (sleep 120; echo 'Hello from fedora-dev-tini'; date); done
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 5
启动:
kubectl apply -f fedora-dev-tini-args-deployment.yaml
果然可以启动:
% kubectl get pods
NAME READY STATUS RESTARTS AGE
fedora-dev-tini-args-7d588768f-6gbhg 1/1 Running 0 11s
登陆到这个启动容器中检查 top
显示:
top - 16:39:26 up 2:22, 0 users, load average: 0.69, 0.67, 0.86
Tasks: 6 total, 1 running, 5 sleeping, 0 stopped, 0 zombie
%Cpu(s): 5.9 us, 3.9 sy, 0.0 ni, 89.6 id, 0.1 wa, 0.0 hi, 0.6 si, 0.0 st
MiB Mem : 7851.5 total, 217.1 free, 2670.6 used, 4963.8 buff/cache
MiB Swap: 1024.0 total, 987.0 free, 37.0 used. 4625.9 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1 root 20 0 2248 772 692 S 0.0 0.0 0:00.12 tini
10 root 20 0 3916 2856 2596 S 0.0 0.0 0:00.00 sh
58 root 20 0 4564 3744 3128 S 0.0 0.0 0:00.01 bash
183 root 20 0 6412 2452 2148 R 0.0 0.0 0:00.58 top
1519 root 20 0 3916 1540 1280 S 0.0 0.0 0:00.00 sh
1520 root 20 0 2252 772 696 S 0.0 0.0 0:00.00 sleep
简化部署,仅配置一个端口¶
既然 tini
能正常运行,而且我看到其实最早的部署其实有一瞬间是 running
状态的,推测是服务检测错误。我是不是定义了太多的服务(实际只有 ssh服务 在容器中启动),而没有配置 livenessProbe
,则默认可能是检测所有配置的端口?
简化配置
fedora-dev-tini-1port-deployment.yaml
---
apiVersion: v1
kind: Service
metadata:
name: fedora-dev-service
labels:
app: fedora-dev-tini-1port
spec:
#type: LoadBalancer
ports:
- name: ssh
protocol: TCP
port: 22
targetPort: 22
selector:
app: fedora-dev-tini-1port
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: fedora-dev-tini-1port
spec:
replicas: 1
selector:
matchLabels:
app: fedora-dev-tini-1port
template:
metadata:
labels:
app: fedora-dev-tini-1port
spec:
containers:
- name: fedora-dev-tini-1port
image: localhost:5001/fedora-dev-tini:latest
ports:
- containerPort: 22
name: ssh
protocol: TCP
依然 CrashLoopBackOff
参考 配置Liveness, Readiness和Startup侦测 添加一个简单的端口侦测
livenessProbe
:
---
apiVersion: v1
kind: Service
metadata:
name: fedora-dev-service
labels:
app: fedora-dev-tini-1port-live
spec:
#type: LoadBalancer
ports:
- name: ssh
protocol: TCP
port: 22
targetPort: 22
selector:
app: fedora-dev-tini-1port-live
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: fedora-dev-tini-1port-live
spec:
replicas: 1
selector:
matchLabels:
app: fedora-dev-tini-1port-live
template:
metadata:
labels:
app: fedora-dev-tini-1port-live
spec:
containers:
- name: fedora-dev-tini-1port-live
image: localhost:5001/fedora-dev-tini:latest
ports:
- containerPort: 22
name: ssh
protocol: TCP
readinessProbe:
tcpSocket:
port: 22
initialDelaySeconds: 5
periodSeconds: 10
livenessProbe:
tcpSocket:
port: 22
initialDelaySeconds: 15
periodSeconds: 20
依然 CrashLoopBackOff
,检查 get pods xxx -o yaml
显示:
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2023-01-28T09:17:05Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2023-01-28T09:17:05Z"
message: 'containers with unready status: [fedora-dev-tini-1port-livefile]'
reason: ContainersNotReady
status: "False"
type: Ready
奇怪,我改为一定成功的 livenessProbe
,也就是检查 ls /etc/hosts
文件:
---
apiVersion: v1
kind: Service
metadata:
name: fedora-dev-service
labels:
app: fedora-dev-tini-1port-livefile
spec:
#type: LoadBalancer
ports:
- name: ssh
protocol: TCP
port: 22
targetPort: 22
selector:
app: fedora-dev-tini-1port-livefile
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: fedora-dev-tini-1port-livefile
spec:
replicas: 1
selector:
matchLabels:
app: fedora-dev-tini-1port-livefile
template:
metadata:
labels:
app: fedora-dev-tini-1port-livefile
spec:
containers:
- name: fedora-dev-tini-1port-livefile
image: localhost:5001/fedora-dev-tini:latest
ports:
- containerPort: 22
name: ssh
protocol: TCP
livenessProbe:
exec:
command:
- cat
- /etc/hosts
initialDelaySeconds: 5
periodSeconds: 5
还是同样的 ContainersNotReady
等等,怎么是 ContainersNotReady
,难道必须配置 ContainersNotReady
?
修订再加上 readinessProbe
:
---
apiVersion: v1
kind: Service
metadata:
name: fedora-dev-service
labels:
app: fedora-dev-tini-1port-livefile-readyfile
spec:
#type: LoadBalancer
ports:
- name: ssh
protocol: TCP
port: 22
targetPort: 22
selector:
app: fedora-dev-tini-1port-livefile-readyfile
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: fedora-dev-tini-1port-livefile-readyfile
spec:
replicas: 1
selector:
matchLabels:
app: fedora-dev-tini-1port-livefile-readyfile
template:
metadata:
labels:
app: fedora-dev-tini-1port-livefile-readyfile
spec:
containers:
- name: fedora-dev-tini-1port-livefile-readyfile
image: localhost:5001/fedora-dev-tini:latest
ports:
- containerPort: 22
name: ssh
protocol: TCP
livenessProbe:
exec:
command:
- cat
- /etc/hosts
initialDelaySeconds: 5
periodSeconds: 5
readinessProbe:
exec:
command:
- cat
- /etc/hosts
initialDelaySeconds: 5
periodSeconds: 5
还是同样的 ContainersNotReady
到底是什么原因的导致明明能够正常运行的 tini
init服务(前面已经验证通过 args
覆盖掉 Dockerfile 中的 /entrypoint.sh
可以启动pod,并且在容器内部 tini
进程运行正常)。但是 /entrypoint.sh
脚本也看不出什么毛病,在启动后的容器中运行 /entrypoint.sh
完全正常...
突然, "灵光一闪" : 等等,所有的 livenessProbe
都是采用了一个无限循环的检测脚本,脚本永远在执行...而 /entrypoint.sh
脚本启动 sshd
和 crond
之后,执行的是一个没有任何命令参数的 bash
,直接返回控制台,虽然 $?
是 0
表示成功,但是这不是对于 Kubernetes 来说程序已经执行完了么? 难怪 kubectl get pods
时候能够看到pod的很短的一段时间闪过 running
然后立即转为 completed
,接着就是无限循环的 Crash
: Kubernetes 判断容器程序终止了呀!!!
马上模仿
livenessProbe
改写一个简化版部署:
---
apiVersion: v1
kind: Service
metadata:
name: fedora-dev-service
labels:
app: fedora-dev-tini-simple
spec:
#type: LoadBalancer
ports:
- name: ssh
protocol: TCP
port: 22
targetPort: 22
selector:
app: fedora-dev-tini-simple
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: fedora-dev-tini-simple
spec:
replicas: 1
selector:
matchLabels:
app: fedora-dev-tini-simple
template:
metadata:
labels:
app: fedora-dev-tini-simple
spec:
containers:
- name: fedora-dev-tini-simple
image: localhost:5001/fedora-dev-tini:latest
ports:
- containerPort: 22
args:
- /bin/sh
- -c
- /usr/sbin/sshd; sleep 10; echo 'Hello from fedora-dev-tini'; date; while
true; do (sleep 120; echo 'Hello from fedora-dev-tini'; date); done
部署
fedora-dev-tini-simple
:
kubectl apply -f fedora-dev-tini-simple.yaml
WOW,终于启动起来了:
# top
top - 21:39:03 up 5:20, 0 users, load average: 0.52, 0.97, 1.09
Tasks: 8 total, 1 running, 7 sleeping, 0 stopped, 0 zombie
%Cpu(s): 6.2 us, 5.2 sy, 0.0 ni, 88.1 id, 0.1 wa, 0.0 hi, 0.4 si, 0.0 st
MiB Mem : 7851.5 total, 136.2 free, 2715.8 used, 4999.4 buff/cache
MiB Swap: 1024.0 total, 984.4 free, 39.6 used. 4583.0 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
165 root 20 0 6404 2408 2100 R 0.3 0.0 0:00.01 top
1 root 20 0 2248 756 676 S 0.0 0.0 0:00.55 tini
10 root 20 0 3916 2824 2564 S 0.0 0.0 0:00.03 sh
12 root 20 0 14932 2832 1800 S 0.0 0.0 0:00.00 sshd
17 root 20 0 4576 3720 3112 S 0.0 0.0 0:00.02 bash
141 root 20 0 3916 1528 1268 S 0.0 0.0 0:00.00 sh
142 root 20 0 2252 768 688 S 0.0 0.0 0:00.00 sleep
143 root 20 0 4576 3704 3112 S 0.0 0.0 0:00.00 bash
# ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 2248 756 ? Ss 20:08 0:00 /tini -- /bin/sh -c /usr/sbin/sshd; sleep 10; echo 'Hello from fedora-dev-tini'; date; while true; do (sleep 120; echo 'Hello from fedora-dev
root 10 0.0 0.0 3916 2824 ? S 20:08 0:00 /bin/sh -c /usr/sbin/sshd; sleep 10; echo 'Hello from fedora-dev-tini'; date; while true; do (sleep 120; echo 'Hello from fedora-dev-tini'; d
root 12 0.0 0.0 14932 2832 ? Ss 20:08 0:00 sshd: /usr/sbin/sshd [listener] 0 of 10-100 startups
root 17 0.0 0.0 4576 3720 pts/0 Ss+ 20:08 0:00 /bin/bash
root 143 0.0 0.0 4576 3716 pts/1 Ss 21:43 0:00 /bin/bash
root 178 0.0 0.0 3916 1528 ? S 21:56 0:00 /bin/sh -c /usr/sbin/sshd; sleep 10; echo 'Hello from fedora-dev-tini'; date; while true; do (sleep 120; echo 'Hello from fedora-dev-tini'; d
root 179 0.0 0.0 2252 764 ? S 21:56 0:00 sleep 120
root 180 0.0 0.0 6100 1392 pts/1 R+ 21:56 0:00 ps aux
解决方法¶
经过上述的反复尝试,可以归纳以下思路:
最简单的 Kubernetes deployment 可以不用 配置Liveness, Readiness和Startup侦测
但是一定要确保 从Dockerfile构建Docker镜像 最后
ENTRYPOINT
+CMD
是一个持续运行的前台程序,否则 Kubernetes 会判断应用运行结束,进入Completed
状态。此时不管有没有配套 配置Liveness, Readiness和Startup侦测 都会判断pod出现了Crash
,除非配置一个Kubernetes的args
覆盖掉 从Dockerfile构建Docker镜像 中的CMD
且这个args
是始终运行的无限脚本
我的解决方法就是改写 Fedora镜像(采用tini替代systemd) 中的 /entrypoint.sh
脚本,将最后的 /bin/bash
改写成持续运行输出信息的循环脚本:
#!/usr/bin/env bash
sshd() {
/usr/sbin/sshd
}
crond() {
/usr/sbin/crond
}
main() {
sshd
crond
# 在k8s不能直接bash执行结束,否则判断为pod Crash,需要改写成持续执行循环脚本
#/bin/bash
/bin/bash -c "while true; do (echo 'Hello from tini'; date; sleep 120); done"
}
main