kind部署 fedora-dev-tini (tini替代systmed)

在完成 Fedora镜像(采用tini替代systemd) 制作之后,将镜像推送到 kind集群本地Registry 再次尝试部署个人开发环境( 暂时放弃 kind部署 fedora-dev )

准备工作

fedora-dev-tini 镜像tag后推送Local Registry
docker tag fedora-dev-tini localhost:5001/fedora-dev-tini:latest
docker push localhost:5001/fedora-dev-tini

部署

简单部署

部署到kind集群的fedora-dev-tini-deployment.yaml
---
apiVersion: v1
kind: Service
metadata:
  name: fedora-dev-service
  labels:
    app: fedora-dev-tini
spec:
  #type: LoadBalancer
  ports:
    - name: ssh
      protocol: TCP
      port: 22
      targetPort: 22
    - name: http
      protocol: TCP
      port: 80
      targetPort: 80
    - name: https
      protocol: TCP
      port: 443
      targetPort: 443
  selector:
    app: fedora-dev-tini
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: fedora-dev-tini
spec:
  replicas: 1
  selector:
    matchLabels:
      app: fedora-dev-tini
  template:
    metadata:
      labels:
        app: fedora-dev-tini
    spec:
      containers:
      - name: fedora-dev-tini
        image: localhost:5001/fedora-dev-tini:latest
        ports:
          - containerPort: 22
            name: ssh
            protocol: TCP
  • 部署:

fedora-dev-tini 部署到kind集群
kubectl apply -f fedora-dev-tini-deployment.yaml
部署 fedora-dev-tini 后检查 kubectl get pods
% kubectl get pods
NAME                                READY   STATUS    RESTARTS     AGE
fedora-dev-tini-6d6d88c84f-864s7    1/1     Running   0            25s
部署 fedora-dev-tini 后检查 kubectl get services
% kubectl get services
NAME                 TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                 AGE
fedora-dev-service   ClusterIP   10.96.175.32    <none>        22/TCP,80/TCP,443/TCP   47h

备注

由于 fedora-dev-tini 是用于开发的容器,和服务容器有所区别:

备注

部署成功也仅是第一步,因为还需要配置 kind Ingress在 kind 部署MetalLB 才能对外提供服务

异常排查

备注

我在实践中,按照上文部署 fedora-dev-tini 遇到很多波折,原因是我最初构建 Fedora镜像(采用tini替代systemd) 时使用的 /entrypoint.sh 脚本采用的是 Docker tini进程管理器entrypoint_ssh_cron_bash :

采用 Docker tini进程管理器entrypoint_ssh_cron_bash 存在缺陷,Kubernetes会判断命令运行结束,导致pod不断Crash
#!/usr/bin/env bash

sshd() {
    /usr/bin/ssh-keygen -A
    /usr/sbin/sshd
}

crond() {
    /usr/sbin/crond
}

main() {
    sshd
    crond
    # 这里最后执行/bin/bash在docker中没有问题,但是K8s检测程序运行结束会判断pod终止crash,所以无法running
    /bin/bash
}

main

排查和修正方法见下文,实际通过改进 /entrypoint.sh 脚本来解决(见最后)

覆盖Dockerfile的 ENTRYPOINTCMD

部署到kind集群强制启动 fedora-dev-tini-force
---
apiVersion: v1
kind: Service
metadata:
  name: fedora-dev-service
  labels:
    app: fedora-dev-tini-force
spec:
  #type: LoadBalancer
  ports:
    - name: ssh
      protocol: TCP
      port: 22
      targetPort: 22
    - name: http
      protocol: TCP
      port: 80
      targetPort: 80
    - name: https
      protocol: TCP
      port: 443
      targetPort: 443
  selector:
    app: fedora-dev-tini-force
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: fedora-dev-tini-force
spec:
  replicas: 1
  selector:
    matchLabels:
      app: fedora-dev-tini-force
  template:
    metadata:
      labels:
        app: fedora-dev-tini-force
    spec:
      containers:
      - args:
        - date; sleep 10; echo 'Hello from fedora-dev'; touch /tmp/healthy; sleep 1; while
          true; do sleep 120; done;
        command:
        - /bin/bash
        - -ec
        name: fedora-dev
        image: localhost:5001/fedora-dev-tini:latest
        livenessProbe:
          exec:
            command:
            - cat
            - /tmp/healthy
          failureThreshold: 8
          initialDelaySeconds: 15
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 15
        ports:
          - containerPort: 22
            name: ssh
            protocol: TCP
  • 强制启动:

部署到kind集群强制启动 fedora-dev-tini-force
kubectl apply -f fedora-dev-tini-force-deployment.yaml

奇怪,在Docker环境中运行 fedora-dev-tini 是自动启动了 sshdcrond ,这说明镜像工作正常, 镜像根目录下的 /tinientrypoint.sh 脚本是正常运行的,为何到了 kind 环境不能工作了?

参考 ENTRYPOINT和COMMAND差异和协作 的对比表:

Docker和Kubernetes的ENTRYPOINT和COMMAND的对应关系

说明

Dockerfile字段

Kubernetes字段

在容器中运行的命令

Entrypoint

command

传递给命令的参数

Cmd

args

Kubernetes 的 commandargs 分别覆盖 从Dockerfile构建Docker镜像EntrypointCmd ,也就是说,这里强制启动的 fedora-dev-tini-force-deployment.yaml 配置行:

spec:
  containers:
  - args:
    - date; sleep 10; echo 'Hello from fedora-dev'; touch /tmp/healthy; sleep 1; while
      true; do sleep 120; done;
    command:
    - /bin/bash
    - -ec

实际上完整覆盖掉了 Fedora镜像(采用tini替代systemd) 镜像中定义的 EntrypontCmd ,也就是完全不运行 从Dockerfile构建Docker镜像 中的命令,所以强制启动仅仅是用于检查镜像内容,而不适合验证Docker容器是否正常运行。

仅覆盖Dockerfile的 CMD

如果不是怀疑 Dockerfile 中的 Entrypoint 异常( Docker tini进程管理器 是标准程序,通常不会出错 ),那么在 Kubernetes的 yaml 文件中,就不要定义 command ,只定义 args 。这要就巧妙地覆盖掉了 Fedora镜像(采用tini替代systemd) 中的运行参数,也就是传递给 Docker tini进程管理器 的运行脚本 /entrypoint.sh ,而代之以自己定义的替代脚本(一定是准确运行的,生成一个检测文件):

spec:
  containers:
  - name: fedora-dev-tini
    image: localhost:5001/fedora-dev-tini:latest
    args:
    - /bin/sh
    - -c
    - touch /tmp/healthy; sleep 10; echo 'Hello from fedora-dev-tini'; date; while
      true; do (sleep 120; echo 'Hello from fedora-dev-tini'; date); done
    livenessProbe:
      exec:
        command:
        - cat
        - /tmp/healthy
      initialDelaySeconds: 5
      periodSeconds: 5

这样,仅仅覆盖 /entrypoint.sh 还是可以验证 /tini 是否运行正确

  • 尝试 fedora-dev-tini-args-deployment.yaml :

保留tini运行,采用k8s的args覆盖Dockerfile的CMD来运行pod: fedora-dev-tini-args-deployment.yaml
---
apiVersion: v1
kind: Service
metadata:
  name: fedora-dev-service
  labels:
    app: fedora-dev-tini-args
spec:
  #type: LoadBalancer
  ports:
    - name: ssh
      protocol: TCP
      port: 22
      targetPort: 22
    - name: http
      protocol: TCP
      port: 80
      targetPort: 80
    - name: https
      protocol: TCP
      port: 443
      targetPort: 443
  selector:
    app: fedora-dev-tini-args
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: fedora-dev-tini-args
spec:
  replicas: 1
  selector:
    matchLabels:
      app: fedora-dev-tini-args
  template:
    metadata:
      labels:
        app: fedora-dev-tini-args
    spec:
      containers:
      - name: fedora-dev-tini-args
        image: localhost:5001/fedora-dev-tini:latest
        ports:
          - containerPort: 22
            name: ssh
            protocol: TCP
        args:
          - /bin/sh
          - -c
          - touch /tmp/healthy; sleep 10; echo 'Hello from fedora-dev-tini'; date; while true; do (sleep 120; echo 'Hello from fedora-dev-tini'; date); done
        livenessProbe:
          exec:
            command:
              - cat
              - /tmp/healthy
          initialDelaySeconds: 5
          periodSeconds: 5
  • 启动:

    kubectl apply -f fedora-dev-tini-args-deployment.yaml
    

果然可以启动:

% kubectl get pods
NAME                                    READY   STATUS             RESTARTS        AGE
fedora-dev-tini-args-7d588768f-6gbhg    1/1     Running            0               11s

登陆到这个启动容器中检查 top 显示:

保留tini运行,采用k8s的args覆盖Dockerfile的CMD来运行pod: 容器内检top查进程可以看到tini正常运行
top - 16:39:26 up  2:22,  0 users,  load average: 0.69, 0.67, 0.86
Tasks:   6 total,   1 running,   5 sleeping,   0 stopped,   0 zombie
%Cpu(s):  5.9 us,  3.9 sy,  0.0 ni, 89.6 id,  0.1 wa,  0.0 hi,  0.6 si,  0.0 st
MiB Mem :   7851.5 total,    217.1 free,   2670.6 used,   4963.8 buff/cache
MiB Swap:   1024.0 total,    987.0 free,     37.0 used.   4625.9 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
      1 root      20   0    2248    772    692 S   0.0   0.0   0:00.12 tini
     10 root      20   0    3916   2856   2596 S   0.0   0.0   0:00.00 sh
     58 root      20   0    4564   3744   3128 S   0.0   0.0   0:00.01 bash
    183 root      20   0    6412   2452   2148 R   0.0   0.0   0:00.58 top
   1519 root      20   0    3916   1540   1280 S   0.0   0.0   0:00.00 sh
   1520 root      20   0    2252    772    696 S   0.0   0.0   0:00.00 sleep

简化部署,仅配置一个端口

既然 tini 能正常运行,而且我看到其实最早的部署其实有一瞬间是 running 状态的,推测是服务检测错误。我是不是定义了太多的服务(实际只有 ssh服务 在容器中启动),而没有配置 livenessProbe ,则默认可能是检测所有配置的端口?

  • 简化配置 fedora-dev-tini-1port-deployment.yaml

简化配置,只配置一个ssh服务端口: fedora-dev-tini-1port-deployment.yaml
---
apiVersion: v1
kind: Service
metadata:
  name: fedora-dev-service
  labels:
    app: fedora-dev-tini-1port
spec:
  #type: LoadBalancer
  ports:
    - name: ssh
      protocol: TCP
      port: 22
      targetPort: 22
  selector:
    app: fedora-dev-tini-1port
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: fedora-dev-tini-1port
spec:
  replicas: 1
  selector:
    matchLabels:
      app: fedora-dev-tini-1port
  template:
    metadata:
      labels:
        app: fedora-dev-tini-1port
    spec:
      containers:
      - name: fedora-dev-tini-1port
        image: localhost:5001/fedora-dev-tini:latest
        ports:
          - containerPort: 22
            name: ssh
            protocol: TCP

依然 CrashLoopBackOff

为ssh服务端口配置livenessProbe: fedora-dev-tini-1port-live-deployment.yaml
---
apiVersion: v1
kind: Service
metadata:
  name: fedora-dev-service
  labels:
    app: fedora-dev-tini-1port-live
spec:
  #type: LoadBalancer
  ports:
    - name: ssh
      protocol: TCP
      port: 22
      targetPort: 22
  selector:
    app: fedora-dev-tini-1port-live
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: fedora-dev-tini-1port-live
spec:
  replicas: 1
  selector:
    matchLabels:
      app: fedora-dev-tini-1port-live
  template:
    metadata:
      labels:
        app: fedora-dev-tini-1port-live
    spec:
      containers:
      - name: fedora-dev-tini-1port-live
        image: localhost:5001/fedora-dev-tini:latest
        ports:
          - containerPort: 22
            name: ssh
            protocol: TCP
        readinessProbe:
          tcpSocket:
            port: 22
          initialDelaySeconds: 5
          periodSeconds: 10
        livenessProbe:
          tcpSocket:
            port: 22
          initialDelaySeconds: 15
          periodSeconds: 20

依然 CrashLoopBackOff ,检查 get pods xxx -o yaml 显示:

status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2023-01-28T09:17:05Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2023-01-28T09:17:05Z"
    message: 'containers with unready status: [fedora-dev-tini-1port-livefile]'
    reason: ContainersNotReady
    status: "False"
    type: Ready

奇怪,我改为一定成功的 livenessProbe ,也就是检查 ls /etc/hosts 文件:

livenessProbe只检测必定存在的文件: fedora-dev-tini-1port-livefile-deployment.yaml
---
apiVersion: v1
kind: Service
metadata:
  name: fedora-dev-service
  labels:
    app: fedora-dev-tini-1port-livefile
spec:
  #type: LoadBalancer
  ports:
    - name: ssh
      protocol: TCP
      port: 22
      targetPort: 22
  selector:
    app: fedora-dev-tini-1port-livefile
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: fedora-dev-tini-1port-livefile
spec:
  replicas: 1
  selector:
    matchLabels:
      app: fedora-dev-tini-1port-livefile
  template:
    metadata:
      labels:
        app: fedora-dev-tini-1port-livefile
    spec:
      containers:
      - name: fedora-dev-tini-1port-livefile
        image: localhost:5001/fedora-dev-tini:latest
        ports:
          - containerPort: 22
            name: ssh
            protocol: TCP
        livenessProbe:
          exec:
            command:
            - cat
            - /etc/hosts
          initialDelaySeconds: 5
          periodSeconds: 5

还是同样的 ContainersNotReady

等等,怎么是 ContainersNotReady ,难道必须配置 ContainersNotReady ?

修订再加上 readinessProbe :

livenessProbe和readinessProbe检测必定存在的文件: fedora-dev-tini-1port-livefile-readyfile-deployment.yaml
---
apiVersion: v1
kind: Service
metadata:
  name: fedora-dev-service
  labels:
    app: fedora-dev-tini-1port-livefile-readyfile
spec:
  #type: LoadBalancer
  ports:
    - name: ssh
      protocol: TCP
      port: 22
      targetPort: 22
  selector:
    app: fedora-dev-tini-1port-livefile-readyfile
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: fedora-dev-tini-1port-livefile-readyfile
spec:
  replicas: 1
  selector:
    matchLabels:
      app: fedora-dev-tini-1port-livefile-readyfile
  template:
    metadata:
      labels:
        app: fedora-dev-tini-1port-livefile-readyfile
    spec:
      containers:
      - name: fedora-dev-tini-1port-livefile-readyfile
        image: localhost:5001/fedora-dev-tini:latest
        ports:
          - containerPort: 22
            name: ssh
            protocol: TCP
        livenessProbe:
          exec:
            command:
            - cat
            - /etc/hosts
          initialDelaySeconds: 5
          periodSeconds: 5
        readinessProbe:
          exec:
            command:
            - cat
            - /etc/hosts
          initialDelaySeconds: 5
          periodSeconds: 5

还是同样的 ContainersNotReady

到底是什么原因的导致明明能够正常运行的 tini init服务(前面已经验证通过 args 覆盖掉 Dockerfile 中的 /entrypoint.sh 可以启动pod,并且在容器内部 tini 进程运行正常)。但是 /entrypoint.sh 脚本也看不出什么毛病,在启动后的容器中运行 /entrypoint.sh 完全正常...

突然, "灵光一闪" : 等等,所有的 livenessProbe 都是采用了一个无限循环的检测脚本,脚本永远在执行...而 /entrypoint.sh 脚本启动 sshdcrond 之后,执行的是一个没有任何命令参数的 bash ,直接返回控制台,虽然 $?0 表示成功,但是这不是对于 Kubernetes 来说程序已经执行完了么? 难怪 kubectl get pods 时候能够看到pod的很短的一段时间闪过 running 然后立即转为 completed ,接着就是无限循环的 Crash : Kubernetes 判断容器程序终止了呀!!!

  • 马上模仿 livenessProbe 改写一个简化版部署:

简化版配置,通过 args 覆盖 Dockerfile 中启动sshd的 entrypoint.sh 脚本,但是这次 args 附带了一个永远不结束的循环脚本
---
apiVersion: v1
kind: Service
metadata:
  name: fedora-dev-service
  labels:
    app: fedora-dev-tini-simple
spec:
  #type: LoadBalancer
  ports:
    - name: ssh
      protocol: TCP
      port: 22
      targetPort: 22
  selector:
    app: fedora-dev-tini-simple
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: fedora-dev-tini-simple
spec:
  replicas: 1
  selector:
    matchLabels:
      app: fedora-dev-tini-simple
  template:
    metadata:
      labels:
        app: fedora-dev-tini-simple
    spec:
      containers:
      - name: fedora-dev-tini-simple
        image: localhost:5001/fedora-dev-tini:latest
        ports:
        - containerPort: 22
        args:
        - /bin/sh
        - -c
        - /usr/sbin/sshd; sleep 10; echo 'Hello from fedora-dev-tini'; date; while
                true; do (sleep 120; echo 'Hello from fedora-dev-tini'; date); done
  • 部署 fedora-dev-tini-simple :

部署简化版配置,通过 args 覆盖Dockerfile 中启动sshd的 entrypoint.sh 脚本,但是这次 args 附带了一个永远不结束的循环脚本
kubectl apply -f fedora-dev-tini-simple.yaml

WOW,终于启动起来了:

部署简化版配置 fedora-dev-tini-simple 使用无限循环的脚本使得Kubernetes判断容器持续运行,检查容器内部可以看到 sshd 在 tini 进程管理下运行
# top
top - 21:39:03 up  5:20,  0 users,  load average: 0.52, 0.97, 1.09
Tasks:   8 total,   1 running,   7 sleeping,   0 stopped,   0 zombie
%Cpu(s):  6.2 us,  5.2 sy,  0.0 ni, 88.1 id,  0.1 wa,  0.0 hi,  0.4 si,  0.0 st
MiB Mem :   7851.5 total,    136.2 free,   2715.8 used,   4999.4 buff/cache
MiB Swap:   1024.0 total,    984.4 free,     39.6 used.   4583.0 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
    165 root      20   0    6404   2408   2100 R   0.3   0.0   0:00.01 top
      1 root      20   0    2248    756    676 S   0.0   0.0   0:00.55 tini
     10 root      20   0    3916   2824   2564 S   0.0   0.0   0:00.03 sh
     12 root      20   0   14932   2832   1800 S   0.0   0.0   0:00.00 sshd
     17 root      20   0    4576   3720   3112 S   0.0   0.0   0:00.02 bash
    141 root      20   0    3916   1528   1268 S   0.0   0.0   0:00.00 sh
    142 root      20   0    2252    768    688 S   0.0   0.0   0:00.00 sleep
    143 root      20   0    4576   3704   3112 S   0.0   0.0   0:00.00 bash

# ps aux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root           1  0.0  0.0   2248   756 ?        Ss   20:08   0:00 /tini -- /bin/sh -c /usr/sbin/sshd; sleep 10; echo 'Hello from fedora-dev-tini'; date; while true; do (sleep 120; echo 'Hello from fedora-dev
root          10  0.0  0.0   3916  2824 ?        S    20:08   0:00 /bin/sh -c /usr/sbin/sshd; sleep 10; echo 'Hello from fedora-dev-tini'; date; while true; do (sleep 120; echo 'Hello from fedora-dev-tini'; d
root          12  0.0  0.0  14932  2832 ?        Ss   20:08   0:00 sshd: /usr/sbin/sshd [listener] 0 of 10-100 startups
root          17  0.0  0.0   4576  3720 pts/0    Ss+  20:08   0:00 /bin/bash
root         143  0.0  0.0   4576  3716 pts/1    Ss   21:43   0:00 /bin/bash
root         178  0.0  0.0   3916  1528 ?        S    21:56   0:00 /bin/sh -c /usr/sbin/sshd; sleep 10; echo 'Hello from fedora-dev-tini'; date; while true; do (sleep 120; echo 'Hello from fedora-dev-tini'; d
root         179  0.0  0.0   2252   764 ?        S    21:56   0:00 sleep 120
root         180  0.0  0.0   6100  1392 pts/1    R+   21:56   0:00 ps aux

解决方法

经过上述的反复尝试,可以归纳以下思路:

我的解决方法就是改写 Fedora镜像(采用tini替代systemd) 中的 /entrypoint.sh 脚本,将最后的 /bin/bash 改写成持续运行输出信息的循环脚本:

改写 /entrypoint.sh 脚本,确保持续运行(循环)
#!/usr/bin/env bash

sshd() {
    /usr/sbin/sshd
}

crond() {
    /usr/sbin/crond
}

main() {
    sshd
    crond
    # 在k8s不能直接bash执行结束,否则判断为pod Crash,需要改写成持续执行循环脚本
    #/bin/bash
    /bin/bash -c "while true; do (echo 'Hello from tini'; date; sleep 120); done"
}

main