基于DNS轮询构建高可用Kubernetes

警告

我在重新部署 私有云架构 的Kubernetes集群,采用最新的 1.24 版本。虽然之前已经有一些部署经验,但是这次从头开始部署还是遇到不少挫折,走了不少弯路,所以本文记述比较杂乱(因为我断断续续进行,前后记述有所重复)。如果你希望有一个便捷精简且可重复正确完成的参考手册,请阅读 Kubernetes集群(z-k8s) ,我将在这篇文档中去芜存菁,系统整理如何快速完成高可用Kubernetes部署。

备注

基于负载均衡的高可用Kubernetes集群 部署中,负载均衡采用的是 HAProxy 结合keeplived实现VIP自动漂移。可以在部署基础上改造DNSRR到负载均衡模式

在完成 私有云部署TLS认证的etcd集群 ,具备了扩展etcd架构,就可以开始部署本文 基于DNS轮询构建高可用Kubernetes 。后续再补充 HAProxy 这样的负载均衡,就可以进一步改造成 基于负载均衡的高可用Kubernetes集群 架构。

准备etcd访问证书

由于是访问外部扩展etcd集群,所以首先需要将etcd证书复制到管控服务器节点,以便管控服务器服务(如apiserver)启动后能够正常读写etcd:

etcd 客户端所需要的证书可以从 私有云部署TLS认证的etcd集群 配置的 etctctl 客户端配置找到对应文件

etcd客户端配置:使用证书
export ETCDCTL_API=3
#export ETCDCTL_ENDPOINTS='https://etcd.staging.huatai.me:2379'
export ETCDCTL_ENDPOINTS=https://192.168.6.204:2379,https://192.168.6.205:2379,https://192.168.6.206:2379
export ETCDCTL_CACERT=/etc/etcd/ca.pem
export ETCDCTL_CERT=/etc/etcd/client.pem
export ETCDCTL_KEY=/etc/etcd/client-key.pem

将上述 etcdctl 客户端配置文件和Kubernetes访问etcd配置文件一一对应如下:

kubernetes apiserver访问etcd证书对应关系

cfssl生成etcd客户端密钥

对应k8s访问etcd密钥文件

ca.pem

ca.crt

client.pem

apiserver-etcd-client.crt

client-key.pem

apiserver-etcd-client.key

  • 分发kubernetes的apiserver使用的etcd证书:

分发kubernetes的apiserver使用的etcd证书
for host in z-k8s-m-1 z-k8s-m-2 z-k8s-m-3;do
   scp /etc/etcd/ca.pem $host:/tmp/ca.crt
   scp /etc/etcd/client.pem $host:/tmp/apiserver-etcd-client.crt
   scp /etc/etcd/client-key.pem $host:/tmp/apiserver-etcd-client.key
   
   ssh $host 'sudo mkdir -p /etc/kubernetes/pki/etcd' 
   ssh $host 'sudo mv /tmp/ca.crt /etc/kubernetes/pki/etcd/ca.crt'
   ssh $host 'sudo mv /tmp/apiserver-etcd-client.crt /etc/kubernetes/pki/apiserver-etcd-client.crt'
   ssh $host 'sudo mv /tmp/apiserver-etcd-client.key /etc/kubernetes/pki/apiserver-etcd-client.key'
done

备注

我是在具备密钥认证管理主机 z-b-data-1 上作为客户端,通过ssh远程登录到 z-k8s-m-1 / z-k8s-m-2 / z-k8s-m-3 ,执行上述 deploy_k8s_etcd_key.sh 分发密钥

配置第一个管控节点(control plane ndoe)

  • 创建 create_kubeadm-config.sh 脚本 :

创建第一个管控节点配置 kubeadm-config.yaml
K8S_API_ENDPOINT=z-k8s-api.staging.huatai.me
K8S_API_ENDPOINT_PORT=6443
K8S_CLUSTER_NAME=z-k8s

ETCD_0_IP=192.168.6.204
ETCD_1_IP=192.168.6.205
ETCD_2_IP=192.168.6.206

cat << EOF > kubeadm-config.yaml
---
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
kubernetesVersion: stable
clusterName: ${K8S_CLUSTER_NAME}
controlPlaneEndpoint: "${K8S_API_ENDPOINT}:${K8S_API_ENDPOINT_PORT}"
etcd:
  external:
    endpoints:
      - https://${ETCD_0_IP}:2379
      - https://${ETCD_1_IP}:2379
      - https://${ETCD_2_IP}:2379
    caFile: /etc/kubernetes/pki/etcd/ca.crt
    certFile: /etc/kubernetes/pki/apiserver-etcd-client.crt
    keyFile: /etc/kubernetes/pki/apiserver-etcd-client.key
EOF

备注

配置 systemd cgroup驱动 containerd运行时(runtime) 就会使用 systemd cgroup driver 。对应 kubelet 也需要使用 systemd cgroup driver : 在Kubernetes官方文档 Configuring a cgroup driver 指明了:

  • kubelet 可以通过 kubeadm-config.yaml 中指定 cgroupDriver: systemd 明确配置 kubelet 使用 systemd cgroup driver ,这样 kubeadm init 创建的集群中 kubelet 就会正确使用 systemd cgroup driver

  • 从 Kubernetes 1.22 开始,即使没有明确配置 cgroupDriver: systemdkubelet 也是默认使用 systemd cgroup driver

  • 如果集群创建时没有指定 systemd cgroup driver (且版本低于1.22),则可以通过 kubectl edit cm kubelet-config -n kube-system 修订 cgroupDriver: systemd

备注

Kubernetes by kubeadm config yamls 提供了 kubeadm-config.yaml 配置案例,例如如何修订集群名,网段等。实际上可以进一步参考官方文档定制 apiserver / controller / scheduler 的配置,在这篇文档中也有介绍和引用。非常好

  • 执行 sh create_kubeadm-config.sh 生成 kubeadm-config.yaml 配置文件

  • 创建第一个管控节点:

初始化第一个管控节点 kubeadm init
sudo kubeadm init --config kubeadm-config.yaml --upload-certs

备注

实际上只要保证网络畅通(翻墙),并且做好前置准备工作, kubeadm init 初始化会非常丝滑地完成。此时终端会提示一些非常有用地信息,例如如何启动集群,如何配置管理环境。

如果一切顺利,会有如下提示信息:

初始化第一个管控节点 kubeadm init 输出信息
[init] Using Kubernetes version: v1.24.2
[preflight] Running pre-flight checks
        [WARNING SystemVerification]: missing optional cgroups: hugetlb blkio
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [apiserver.staging.huatai.me kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local z-k8s-m-1] and IPs [10.96.0.1 192.168.6.101]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] External etcd mode: Skipping etcd/ca certificate authority generation
[certs] External etcd mode: Skipping etcd/server certificate generation
[certs] External etcd mode: Skipping etcd/peer certificate generation
[certs] External etcd mode: Skipping etcd/healthcheck-client certificate generation
[certs] External etcd mode: Skipping apiserver-etcd-client certificate generation
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 16.010266 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[upload-certs] Using certificate key:
<CERTIFICATE KEY>
[mark-control-plane] Marking the node z-k8s-m-1 as control-plane by adding the labels: [node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
[mark-control-plane] Marking the node z-k8s-m-1 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule node-role.kubernetes.io/control-plane:NoSchedule]
[bootstrap-token] Using token: <TOKEN>
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] Configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] Configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[kubelet-check] Initial timeout of 40s passed.
[addons] Applied essential addon: kube-proxy

Your Kubernetes control-plane has initialized successfully!

表明集群第一个管控节点初始化成功!

  • 此时根据提示,执行以下命令为自己的账户准备好管理配置

配置个人账户的管理k8s环境
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

并且提供了如何添加管控平面节点的操作命令(包含密钥,所以必须保密),以及添加工作节点的命令(包含密钥,所以必须保密)

容器没有启动问题

我这里遇到一个问题,就是 kubeadm init 显示初始化成功,但是我使用 docker ps 居然看不到任何容器:

$ docker ps
CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES

这怎么回事?

  • 检查端口:

    netstat -an | grep 6443
    

可以看到端口已经监听:

tcp        0      0 192.168.6.101:39868     192.168.6.101:6443      ESTABLISHED
tcp        0      0 192.168.6.101:38514     192.168.6.101:6443      ESTABLISHED
tcp        0      0 192.168.6.101:40006     192.168.6.101:6443      TIME_WAIT
tcp        0      0 192.168.6.101:39982     192.168.6.101:6443      TIME_WAIT
tcp        0      0 192.168.6.101:39926     192.168.6.101:6443      ESTABLISHED
tcp        0      0 192.168.6.101:40070     192.168.6.101:6443      TIME_WAIT
tcp6       0      0 :::6443                 :::*                    LISTEN
tcp6       0      0 ::1:51780               ::1:6443                ESTABLISHED
tcp6       0      0 192.168.6.101:6443      192.168.6.101:39868     ESTABLISHED
tcp6       0      0 ::1:6443                ::1:51780               ESTABLISHED
tcp6       0      0 192.168.6.101:6443      192.168.6.101:38514     ESTABLISHED
tcp6       0      0 192.168.6.101:6443      192.168.6.101:39926     ESTABLISHED

难道现在真的已经不再使用docker,直接使用 containerd 这样的 容器运行时(Container Runtimes)

  • 检查 top 输出:

    Tasks: 149 total,   1 running, 148 sleeping,   0 stopped,   0 zombie
    %Cpu(s):  3.0 us,  1.8 sy,  0.0 ni, 92.7 id,  2.2 wa,  0.0 hi,  0.2 si,  0.2 st
    MiB Mem :   3929.8 total,   2579.8 free,    531.7 used,    818.4 buff/cache
    MiB Swap:      0.0 total,      0.0 free,      0.0 used.   3179.4 avail Mem
    
        PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
      18408 root      20   0 1173260 336344  71252 S   4.6   8.4   0:32.39 kube-apiserver
        437 root      20   0 1855336  97924  63844 S   2.0   2.4   2:49.35 kubelet
      19247 root      20   0  818924  89480  59096 S   2.0   2.2   0:02.62 kube-controller
        422 root      20   0 1296700  60624  30648 S   0.7   1.5   0:51.12 containerd
        262 root      19  -1   93020  37364  36260 S   0.3   0.9   0:07.20 systemd-journal
      18028 root      20   0       0      0      0 I   0.3   0.0   0:00.05 kworker/1:0-events
    

可以看到系统已经运行了 kube-apiserverkube-controller ,说明Kubernetes相关容器已经运行,否则也不会有这些进程。

  • 检查节点:

    kubectl get nodes -o wide
    

提示信息:

Unable to connect to the server: Forbidden

why?

想到我部署采用DNSRR,也就是解析 apiserver.staging.huatai.me 域名可能是访问目前尚未加入管控的另外2个服务器,所以我尝试把DNS解析修改成只解析到节点1,也就是第一个加入管控的节点IP。但是依然没有解决

  • 检查 kubelet.service 日志:

    sudo journalctl -u kubelet.service | less
    

看到启动kubelet之后有报错信息,首先就是有关网络没有就绪的错误:

Jul 11 09:06:40 z-k8s-m-1 kubelet[437]: E0711 09:06:40.420728     437 kubelet.go:2424] "Error getting node" err="node \"z-k8s-m-1\" not found"
Jul 11 09:06:40 z-k8s-m-1 kubelet[437]: E0711 09:06:40.521709     437 kubelet.go:2424] "Error getting node" err="node \"z-k8s-m-1\" not found"
Jul 11 09:06:40 z-k8s-m-1 kubelet[437]: E0711 09:06:40.581769     437 kubelet.go:2349] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"
Jul 11 09:06:40 z-k8s-m-1 kubelet[437]: E0711 09:06:40.622654     437 kubelet.go:2424] "Error getting node" err="node \"z-k8s-m-1\" not found"
...
Jul 11 09:06:41 z-k8s-m-1 kubelet[437]: I0711 09:06:41.796597     437 kubelet_node_status.go:70] "Attempting to register node" node="z-k8s-m-1"
...
Jul 11 09:06:43 z-k8s-m-1 kubelet[437]: I0711 09:06:43.232192     437 kubelet_node_status.go:108] "Node was previously registered" node="z-k8s-m-1"
Jul 11 09:06:43 z-k8s-m-1 kubelet[437]: I0711 09:06:43.232303     437 kubelet_node_status.go:73] "Successfully registered node" node="z-k8s-m-1"
Jul 11 09:06:43 z-k8s-m-1 kubelet[437]: I0711 09:06:43.427832     437 apiserver.go:52] "Watching apiserver"
Jul 11 09:06:43 z-k8s-m-1 kubelet[437]: I0711 09:06:43.438830     437 topology_manager.go:200] "Topology Admit Handler"
...
Jul 11 09:06:43 z-k8s-m-1 kubelet[437]: E0711 09:06:43.440795     437 pod_workers.go:951] "Error syncing pod, skipping" err="network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized" pod="kube-system/cor
edns-6d4b75cb6d-pf7bs" podUID=79bc049c-0f1a-4387-aeed-ccfb04dfe7ca
Jul 11 09:06:43 z-k8s-m-1 kubelet[437]: E0711 09:06:43.440978     437 pod_workers.go:951] "Error syncing pod, skipping" err="network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized" pod="kube-system/cor
edns-6d4b75cb6d-m2hf6" podUID=b2d40bb9-f431-4c53-a164-215bfcd4da47
...

可以看到访问 apiserver 失败,并且网络 cni plugin 没有初始化。想起来当时在官方文档中看到过必须初始化网络之后才能正常工作。(虽然我之前在 创建单一控制平面(单master)集群 经验是无需cni也至少能够看到pod运行,并且能够使用 docker ps )

此外,在 容器运行时(Container Runtimes) 从官方文档查到 kubernetes 1.24 之后移除了docker支持,是否不再支持使用docker ? 我检查了 容器运行时(Container Runtimes) 中对运行时检查 sockets 方法,确认系统只有 containerd 的sockets文件

仔细检查了 kubeadm init 输出信息,可以看到提示安装cni的方法:

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

参考 Troubleshooting CNI plugin-related errors 提供了排查CNI插件的建议。其中在文档开始就提出:

  • 为避免CNI plugin相关错误,首先应该将 容器运行时(Container Runtimes) 升级到对应Kubernetes版本的已经验证正常工作的CNI plugins版本。例如,对于 Kubernetes 1.24 需要满足:

    • containerd v1.6.4 或更高版本, v1.5.11 或更高版本

    • CRI-O v1.24.0 或更新版本

我检查了一下我的 containerd 版本,发现是采用发行版安装的 docker.io ,所以版本不是最新, containerd 的版本只是 1.5.9 ,所以很可能是版本过低无法适配。该文档还提示:

在Kubernetes, containerd 运行时哦人添加一个回环接口 lo 给pods作为默认特性。containerd运行时配置通过一个 CNI plugin loopback 来实现,这是 containerd 发行包默认分发的一部分,例如 containerd v1.6.0 以及更高版本就将一个CNI v1.0.0兼容loopback插件作为默认CNI plugins。

备注

根据上文排查,无法正常启动 kubeadmin init 容器是因为Kubernetes 1.24,也就是我安装的最新K8s已经移除了CNI plugin,将这部分工作移交给容器运行时自带的CNI plugin来完成。这就要求配套使用最新版本(或满足版本要求)的 containerd 运行时。最新版本的 containerd 运行时默认启动 loopback 就提供了kubernetes管控平面pods运行的条件,这样从开始就可以拉起 Kubernetes 的管控pods,为下一步安装不同的 Kubernetes网络 提供初始运行条件。当Kubernetes集群正常运行后,通过 在扩展etcd环境安装cilium 就可以直接在运行起来的kubernetes集群上完成网络定制。

containerd versioning and release 提供了支持 Kubernetes 不同版本所对应当 containerd 版本,确实明确指出 Kubernetes 1.24 需要 containerd 1.64+,1.5.11+

再次排查启动问题

我采用以下方式重新开始部署

  • 执行 kubeadm集群重置 删除掉存在异常的k8s集群

  • 重试通过 安装containerd官方执行程序 将发行版 containerd 升级到 1.64+ ,每个节点都执行一遍,确保采用最新 containerd

  • 改进 kubeadm-config.yaml 自定义集群名(见上文,已经修订到配置中)

重新执行了 kubeadmin init 但是发现 kubectl get nodes 依然出现报错 Unable to connect to the server: Forbidden

备注

我没有想到从熟悉的 Docker Atlas 切换到 containerd运行时(runtime) 遇到这么多麻烦:

  • Kubernetes 1.24 中无法使用 docker 命令,而 ctr 命令实践下来也是存在问题的(实际容器已经启动,但是没有任何管理输出)

  • 需要完整切换到Kubernetes 1.24所使用的标准 cri 接口命令 crictl 才能观察容器运行情况

  • 按照 crictl 配置好 /etc/crictl.conf :

crictl配置文件 /etc/crictl.yaml
runtime-endpoint: unix:///var/run/containerd/containerd.sock
image-endpoint: unix:///var/run/containerd/containerd.sock
timeout: 10
#debug: true
debug: false
  • 执行pods检查:

    crictl pods
    

输出显示:

POD ID              CREATED             STATE               NAME                                NAMESPACE           ATTEMPT             RUNTIME
173ffdaeefab9       13 hours ago        Ready               kube-proxy-vwqsn                    kube-system         0                   (default)
0952e1399e340       13 hours ago        Ready               kube-scheduler-z-k8s-m-1            kube-system         0                   (default)
424cc7a5a9bfc       13 hours ago        Ready               kube-controller-manager-z-k8s-m-1   kube-system         0                   (default)
7249ac0122d31       13 hours ago        Ready               kube-apiserver-z-k8s-m-1            kube-system         0                   (default)
  • 检查镜像:

    crictl images
    

输出显示:

IMAGE                                TAG                 IMAGE ID            SIZE
k8s.gcr.io/coredns/coredns           v1.8.6              a4ca41631cc7a       13.6MB
k8s.gcr.io/kube-apiserver            v1.24.2             d3377ffb7177c       33.8MB
k8s.gcr.io/kube-apiserver            v1.24.3             d521dd763e2e3       33.8MB
k8s.gcr.io/kube-controller-manager   v1.24.2             34cdf99b1bb3b       31MB
k8s.gcr.io/kube-controller-manager   v1.24.3             586c112956dfc       31MB
k8s.gcr.io/kube-proxy                v1.24.2             a634548d10b03       39.5MB
k8s.gcr.io/kube-proxy                v1.24.3             2ae1ba6417cbc       39.5MB
k8s.gcr.io/kube-scheduler            v1.24.2             5d725196c1f47       15.5MB
k8s.gcr.io/kube-scheduler            v1.24.3             3a5aa3a515f5d       15.5MB
k8s.gcr.io/pause                     3.5                 ed210e3e4a5ba       301kB
k8s.gcr.io/pause                     3.6                 6270bb605e12e       302kB
k8s.gcr.io/pause                     3.7                 221177c6082a8       311kB
  • 检查容器:

    crictl ps -a
    

显示主机上运行的容器:

CONTAINER           IMAGE               CREATED             STATE               NAME                      ATTEMPT             POD ID              POD
fd65e2a037600       2ae1ba6417cbc       16 hours ago        Running             kube-proxy                0                   173ffdaeefab9       kube-proxy-vwqsn
5922644848149       3a5aa3a515f5d       16 hours ago        Running             kube-scheduler            0                   0952e1399e340       kube-scheduler-z-k8s-m-1
58e48e8bc6861       586c112956dfc       16 hours ago        Running             kube-controller-manager   0                   424cc7a5a9bfc       kube-controller-manager-z-k8s-m-1
901b1dc06eed1       d521dd763e2e3       16 hours ago        Running             kube-apiserver            0                   7249ac0122d31       kube-apiserver-z-k8s-m-1

以上的排查可以看到 z-k8s-m-1 服务器上各个容器以及pods是运行正常的

  • 检查容器日志,例如检查 apiserver 容器日志:

    crictl logs 901b1dc06eed1
    

未发现报错,看起来 apiserver 运行正常

万般无奈,我再次 google 了一下,终于 蚌埠住了 :

../../../../_images/marmot.gif

Unable to connect to the server: Forbidden 报错原因很简单: z-k8s-m-1 设置了http代理

原来,在最初为了翻墙下载google仓库文件,我配置了 curl代理 ,原来报错信息是因为代理服务器拒绝导致的。多么可笑的失误啊!!!

简单注释掉代理配置的环境变量就顺利可以执行 kubectl 了:

unset http_proxy
unset https_proxy

备注

不过,也别灰心…

这一周的折腾,至少对Kubernetes的 容器运行时(Container Runtimes) 等新技术有了进一步了解,也顺带学习了一些相关技术…不放弃

  • 现在终于能够通过 kubectl 管理新部署的集群了:

    kubectl get nodes
    

显示输出:

NAME        STATUS     ROLES           AGE   VERSION
z-k8s-m-1   NotReady   control-plane   18h   v1.24.2
  • 检查 pods

    kubectl get pods -n kube-system -o wide
    

此时输出:

没有安装网络前无法启动coredns,此时 kubectl get pods 输出
NAME                                READY   STATUS    RESTARTS   AGE   IP              NODE        NOMINATED NODE   READINESS GATES
coredns-6d4b75cb6d-jnfmj            0/1     Pending   0          18h   <none>          <none>      <none>           <none>
coredns-6d4b75cb6d-nm5fz            0/1     Pending   0          18h   <none>          <none>      <none>           <none>
kube-apiserver-z-k8s-m-1            1/1     Running   0          18h   192.168.6.101   z-k8s-m-1   <none>           <none>
kube-controller-manager-z-k8s-m-1   1/1     Running   0          18h   192.168.6.101   z-k8s-m-1   <none>           <none>
kube-proxy-vwqsn                    1/1     Running   0          18h   192.168.6.101   z-k8s-m-1   <none>           <none>
kube-scheduler-z-k8s-m-1            1/1     Running   0          18h   192.168.6.101   z-k8s-m-1   <none>           <none>

备注

目前还有2个问题没有解决:

  • z-k8s-m-1 节点状态是 NotReady

  • coredns 管控pods无法启动(网络没有配置)

    • 这个问题我之前在 创建单一控制平面(单master)集群 已经有经验,只要为Kubernetes集群安装正确的网络接口即可启动容器

    • 请注意,Kubernetes集群的3大组件 apiserver / scheduler / controller-manager 都是使用物理主机的IP地址 192.168.6.101 ,也就是说,即使没有安装网络接口组件这3个管控组件也是能够启动的;这也是为何在 kubeadm-config.yaml 配置的 controlPlaneEndpoint 项域名 z-k8s-api.staging.huatai.me 就是指向物理主机IP地址的解析

安装 Cilium网络

需要注意,针对 私有云部署TLS认证的etcd集群 (扩展外部etcd)需要采用 在扩展etcd环境安装cilium :

  • 首先在节点安装 helm :

在Linux平台安装helm
version=3.12.2
wget https://get.helm.sh/helm-v${version}-linux-amd64.tar.gz
tar -zxvf helm-v${version}-linux-amd64.tar.gz
sudo mv linux-amd64/helm /usr/local/bin/helm
  • 设置cilium Helm仓库:

设置cilium Helm仓库
helm repo add cilium https://helm.cilium.io/
  • 通过 helm 部署Cilium:

为cilium配置访问etcd的Kubernetes secret,安装cilium采用SSL模式访问etcd
VERSION=1.11.7

ETCD_0_IP=192.168.6.204
ETCD_1_IP=192.168.6.205
ETCD_2_IP=192.168.6.206

kubectl create secret generic -n kube-system cilium-etcd-secrets \
    --from-file=etcd-client-ca.crt=/etc/kubernetes/pki/etcd/ca.crt \
    --from-file=etcd-client.key=/etc/kubernetes/pki/apiserver-etcd-client.key \
    --from-file=etcd-client.crt=/etc/kubernetes/pki/apiserver-etcd-client.crt

helm install cilium cilium/cilium --version ${VERSION} \
  --namespace kube-system \
  --set etcd.enabled=true \
  --set etcd.ssl=true \
  --set "etcd.endpoints[0]=https://${ETCD_0_IP}:2379" \
  --set "etcd.endpoints[1]=https://${ETCD_1_IP}:2379" \
  --set "etcd.endpoints[2]=https://${ETCD_2_IP}:2379"
  • 此时,在安装了 cilium 这样的 CNI 之后,之前部署过程中没有运行起来的coredns容器就能够分配IP地址并运行起来:

    kubectl -n kube-system get pods -o wide
    

输出显示:

安装cilium CNI网络之后coredns就可以运行,此时 kubectl get pods 输出可以看到所有pods已分配IP并运行
NAME                                READY   STATUS    RESTARTS   AGE     IP              NODE        NOMINATED NODE   READINESS GATES
cilium-7c5nv                        1/1     Running   0          8m40s   192.168.6.101   z-k8s-m-1   <none>           <none>
cilium-operator-68dffdc9f7-cqvqr    1/1     Running   0          8m40s   192.168.6.101   z-k8s-m-1   <none>           <none>
cilium-operator-68dffdc9f7-rph4w    0/1     Pending   0          8m40s   <none>          <none>      <none>           <none>
coredns-6d4b75cb6d-jnfmj            1/1     Running   0          25h     10.0.0.241      z-k8s-m-1   <none>           <none>
coredns-6d4b75cb6d-nm5fz            1/1     Running   0          25h     10.0.0.141      z-k8s-m-1   <none>           <none>
kube-apiserver-z-k8s-m-1            1/1     Running   0          25h     192.168.6.101   z-k8s-m-1   <none>           <none>
kube-controller-manager-z-k8s-m-1   1/1     Running   0          25h     192.168.6.101   z-k8s-m-1   <none>           <none>
kube-proxy-vwqsn                    1/1     Running   0          25h     192.168.6.101   z-k8s-m-1   <none>           <none>
kube-scheduler-z-k8s-m-1            1/1     Running   0          25h     192.168.6.101   z-k8s-m-1   <none>           <none>
  • 安装cilium客户端:

安装cilium CLI
curl -L --remote-name-all https://github.com/cilium/cilium-cli/releases/latest/download/cilium-linux-amd64.tar.gz{,.sha256sum}
sha256sum --check cilium-linux-amd64.tar.gz.sha256sum
sudo tar xzvfC cilium-linux-amd64.tar.gz /usr/local/bin
rm cilium-linux-amd64.tar.gz{,.sha256sum}
  • 检查:

    cilium status
    
cilium安装完成后状态验证
    /¯¯\
 /¯¯\__/¯¯\    Cilium:         OK
 \__/¯¯\__/    Operator:       1 errors, 1 warnings
 /¯¯\__/¯¯\    Hubble:         disabled
 \__/¯¯\__/    ClusterMesh:    disabled
    \__/

Deployment        cilium-operator    Desired: 2, Ready: 1/2, Available: 1/2, Unavailable: 1/2
DaemonSet         cilium             Desired: 1, Ready: 1/1, Available: 1/1
Containers:       cilium-operator    Running: 1, Pending: 1
                  cilium             Running: 1
Cluster Pods:     2/2 managed by Cilium
Image versions    cilium             quay.io/cilium/cilium:v1.11.7@sha256:66a6f72a49e55e21278d07a99ff2cffa7565ed07f2578d54b5a92c1a492a6597: 1
                  cilium-operator    quay.io/cilium/operator-generic:v1.11.7@sha256:0f8ed5d815873d20848a360df3f2ebbd4116481ff817d3f295557801e0b45900: 2
Errors:           cilium-operator    cilium-operator                     1 pods of Deployment cilium-operator are not ready
Warnings:         cilium-operator    cilium-operator-68dffdc9f7-rph4w    pod is pending

备注

至此,已初步完成了管控节点安装,接下来就是添加更多管控节点,以及增加工作节点。这个操作可以从最初 kubeadmin init 输出的提示信息中获取(如果你当时保存了输出信息)

添加第二个管控节点

  • 按照 kubeadm init 输出信息,在第二个管控节点 z-k8s-m-2 上执行节点添加:

kubeadm join添加control-plane节点
kubeadm join z-k8s-api.staging.huatai.me:6443 --token <token> \
      --discovery-token-ca-cert-hash sha256:<hash> \
      --control-plane --certificate-key <hash>

管控节点添加排查

备注

kubeadm 初始化集群时候生成的 certificatetoken (24小时) 都是有一定有效期限。所以如果在初始化之后,再经过较长时间才添加管控节点和工作节点,就会遇到 tokencertificate 相关错误。此时,需要重新上传certifiate和重新生成token。并且,对于使用 external etcd,还需要通过 kubeadm-config.yaml 传递etcd参数。

我在执行 kubeadm join 管控节点添加遇到报错:

error execution phase preflight: couldn't validate the identity of the API Server: could not find a JWS signature in the cluster-info ConfigMap for token ID "XXXXXXX"
To see the stack trace of this error execute with --v=5 or higher

上述报错是因为原有的token已经过期,所以需要重新生成:

kubeadm token create命令重新生成token(解决初始化集群时token过期问题)
kubeadm token create --print-join-command
  • 然后再次执行 kubeadm init 命令,但是把tokern替换成新生成的有效token(依然保留之前的 --certificate-key ),此时会提示 kubeadm-certs 没有找到:

    ...
    [download-certs] Downloading the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
    error execution phase control-plane-prepare/download-certs: error downloading certs: error downloading the secret: Secret "kubeadm-certs" was not found in the "kube-system" Namespace. This Secret might have expired. Please, run `kubeadm init phase upload-certs --upload-certs` on a control plane to generate a new one
    To see the stack trace of this error execute with --v=5 or higher
    

根据提示,实际上 kubeadm token create 只是针对worker节点,对于管控节点,还需要重新上传certificates( 参考 How do I find the join command for kubeadm on the master? )。也就是对于token失效的时候,新加入管控节点需要有2步:

  • 在已经工作的管控节点上重新上传certifiates:

kubeadm init 重新上传证书
sudo kubeadm init phase upload-certs --upload-certs

此时提示重新生成了证书:

[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[upload-certs] Using certificate key:
XXXXXXXXXX
  • 然后执行重建toke命令(前面执行过可以不用重复):

kubeadm token create命令重新生成token(解决初始化集群时token过期问题)
kubeadm token create --print-join-command
  • 可以执行检查token命令:

    kubeadm token list
    

就显示出刚才重新生成的token

  • 将生成的 certificate 拼接到 kubeadm join 命令(使用新生成的token)再次添加管控节点。然而还是报错:

    [download-certs] Downloading the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
    error execution phase control-plane-prepare/download-certs: error downloading certs: the Secret does not include the required certificate or key - name: external-etcd.crt, path: /etc/kubernetes/pki/apiserver-etcd-client.crt
    To see the stack trace of this error execute with --v=5 or higher
    

原来重新生成 cartificate 的默认命令是针对内部etcd的, 对于外部etcd,重新初始化证书还是要使用 "包含" etcd配置信息的 kubeadm-config.yaml ( 参考 Control plane certs not working with external etcd #1886 ),即:

使用external etcd时,kubeadm init 重新上传证书需要使用 kubeadm-config.yaml
sudo kubeadm init phase upload-certs --upload-certs --config kubeadm-config.yaml

添加工作节点

  • 按照 kubeadm init 输出信息,在工作节点 z-k8s-n-1 等上执行:

kubeadm join添加worker节点
kubeadm join z-k8s-api.staging.huatai.me:6443 --token <token> \
        --discovery-token-ca-cert-hash <hash>

完成检查

  • 最终完成后检查 nodespodes 得到完整列表:

完成扩展etcd的K8s集群后检查kubectl get nodes
$ kubectl get nodes -o wide
NAME        STATUS   ROLES           AGE     VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
z-k8s-m-1   Ready    control-plane   35h     v1.24.2   192.168.6.101   <none>        Ubuntu 20.04.4 LTS   5.4.0-121-generic   containerd://1.6.6
z-k8s-m-2   Ready    control-plane   26m     v1.24.2   192.168.6.102   <none>        Ubuntu 20.04.4 LTS   5.4.0-121-generic   containerd://1.6.6
z-k8s-m-3   Ready    control-plane   21m     v1.24.2   192.168.6.103   <none>        Ubuntu 20.04.4 LTS   5.4.0-121-generic   containerd://1.6.6
z-k8s-n-1   Ready    <none>          6m10s   v1.24.2   192.168.6.111   <none>        Ubuntu 20.04.4 LTS   5.4.0-110-generic   containerd://1.6.6
z-k8s-n-2   Ready    <none>          4m10s   v1.24.2   192.168.6.112   <none>        Ubuntu 20.04.4 LTS   5.4.0-110-generic   containerd://1.6.6
z-k8s-n-3   Ready    <none>          4m1s    v1.24.2   192.168.6.113   <none>        Ubuntu 20.04.4 LTS   5.4.0-110-generic   containerd://1.6.6
z-k8s-n-4   Ready    <none>          3m55s   v1.24.2   192.168.6.114   <none>        Ubuntu 20.04.4 LTS   5.4.0-110-generic   containerd://1.6.6
z-k8s-n-5   Ready    <none>          3m51s   v1.24.2   192.168.6.115   <none>        Ubuntu 20.04.4 LTS   5.4.0-110-generic   containerd://1.6.6
完成扩展etcd的K8s集群后检查kubectl get pods
$ kubectl get pods -n kube-system -o wide
NAME                                READY   STATUS    RESTARTS   AGE     IP              NODE        NOMINATED NODE   READINESS GATES
cilium-4mrgm                        1/1     Running   0          5m53s   192.168.6.114   z-k8s-n-4   <none>           <none>
cilium-5v6sn                        1/1     Running   0          5m49s   192.168.6.115   z-k8s-n-5   <none>           <none>
cilium-7c5nv                        1/1     Running   0          10h     192.168.6.101   z-k8s-m-1   <none>           <none>
cilium-dx68f                        1/1     Running   0          8m8s    192.168.6.111   z-k8s-n-1   <none>           <none>
cilium-ln686                        1/1     Running   0          5m59s   192.168.6.113   z-k8s-n-3   <none>           <none>
cilium-operator-68dffdc9f7-cqvqr    1/1     Running   0          10h     192.168.6.101   z-k8s-m-1   <none>           <none>
cilium-operator-68dffdc9f7-rph4w    1/1     Running   0          10h     192.168.6.102   z-k8s-m-2   <none>           <none>
cilium-p9jtz                        1/1     Running   0          27m     192.168.6.102   z-k8s-m-2   <none>           <none>
cilium-pkqfj                        1/1     Running   0          23m     192.168.6.103   z-k8s-m-3   <none>           <none>
cilium-xn4gf                        1/1     Running   0          6m8s    192.168.6.112   z-k8s-n-2   <none>           <none>
coredns-6d4b75cb6d-jnfmj            1/1     Running   0          35h     10.0.0.241      z-k8s-m-1   <none>           <none>
coredns-6d4b75cb6d-nm5fz            1/1     Running   0          35h     10.0.0.141      z-k8s-m-1   <none>           <none>
kube-apiserver-z-k8s-m-1            1/1     Running   0          35h     192.168.6.101   z-k8s-m-1   <none>           <none>
kube-apiserver-z-k8s-m-2            1/1     Running   0          27m     192.168.6.102   z-k8s-m-2   <none>           <none>
kube-apiserver-z-k8s-m-3            1/1     Running   0          23m     192.168.6.103   z-k8s-m-3   <none>           <none>
kube-controller-manager-z-k8s-m-1   1/1     Running   0          35h     192.168.6.101   z-k8s-m-1   <none>           <none>
kube-controller-manager-z-k8s-m-2   1/1     Running   0          27m     192.168.6.102   z-k8s-m-2   <none>           <none>
kube-controller-manager-z-k8s-m-3   1/1     Running   0          23m     192.168.6.103   z-k8s-m-3   <none>           <none>
kube-proxy-2njjl                    1/1     Running   0          5m59s   192.168.6.113   z-k8s-n-3   <none>           <none>
kube-proxy-fzrz7                    1/1     Running   0          8m8s    192.168.6.111   z-k8s-n-1   <none>           <none>
kube-proxy-gvlwt                    1/1     Running   0          6m8s    192.168.6.112   z-k8s-n-2   <none>           <none>
kube-proxy-nr5wd                    1/1     Running   0          5m53s   192.168.6.114   z-k8s-n-4   <none>           <none>
kube-proxy-tg794                    1/1     Running   0          5m49s   192.168.6.115   z-k8s-n-5   <none>           <none>
kube-proxy-tv4sx                    1/1     Running   0          27m     192.168.6.102   z-k8s-m-2   <none>           <none>
kube-proxy-vwqsn                    1/1     Running   0          35h     192.168.6.101   z-k8s-m-1   <none>           <none>
kube-proxy-z8xj7                    1/1     Running   0          23m     192.168.6.103   z-k8s-m-3   <none>           <none>
kube-scheduler-z-k8s-m-1            1/1     Running   0          35h     192.168.6.101   z-k8s-m-1   <none>           <none>
kube-scheduler-z-k8s-m-2            1/1     Running   0          27m     192.168.6.102   z-k8s-m-2   <none>           <none>
kube-scheduler-z-k8s-m-3            1/1     Running   0          23m     192.168.6.103   z-k8s-m-3   <none>           <none>

集群创建成功!!!

验证

由于摈弃了 Docker Atlas ,采用 nerdctl 来实现镜像构建和运行(实践步骤记录在 nerdctl )

参考