部署ARM架构Kubernetes¶
部署环境¶
树莓派4操作系统采用Ubuntu 20.04(Focal Fossa),提供了64位ARM操作系统,可以非常完美运行AArch64容器镜像,这样可以避免很多32位镜像的软件问题。通过树莓派实现微型Kubernetes集群,可以充分演练新的云原生技术。
注解
从技术上来说 AArch64
和 ARM64
是同一种架构,是64位系统。ARM和x86的指令集不同,所以不能在x86服务器上运行ARM65镜像,反之也不行。
准备工作¶
- 准备3个(或更多) Raspberry Pi ,我采用了:
- 1台 2G 规格 树莓派Raspberry Pi 4 :用于管控
pi-master1
( 操作系统采用 树莓派4b运行64位Ubuntu ) - 2台 8G 规格 树莓派Raspberry Pi 4 :用于工作节点
pi-worker1
和pi-worker2
( 操作系统采用 树莓派4b运行64位Ubuntu ) - 1台 4G 规格 树莓派Raspberry Pi 400 : 用于工作节点
kali
( 操作系统采用 树莓派安装Kali Linux )
- 1台 2G 规格 树莓派Raspberry Pi 4 :用于管控
在构建Kubernetes集群之前,主要需要解决树莓派访问TF卡性能低下的问题,采用 树莓派4 USB存储启动Ubuntu Server 20.04 可以极大提高树莓派存储IO性能。
- 我也使用了一台 Jetson Nano 设备作为GPU工作节点
- 为了测试和验证Kubernetes混合不同架构,在ARM集群中添加一台 ThinkPad X220笔记本 运行 Arch Linux 作为模拟X86异构Kubernetes工作节点
安装和配置Docker¶
我使用 树莓派4b运行64位Ubuntu ,使用的Ubuntu 20.04 提供了非常新的Docker版本,v19.03,可以直接通过 apt
命令安装:
sudo apt install -y docker.io
设置systemd管理cgroups¶
安装完docker之后,需要做一些配置确保激活 cgroups (Control Groups)。cgroups是内核用用限制和隔离资源,可以让Kubernetes更好地管理容器运行时使用地资源,并且通过隔离容器来增加安全性。
执行
docker info
检查:# Check `docker info` # Some output omitted $ sudo docker info (...) Cgroup Driver: cgroups Cgroup Version: 1 (...) WARNING: No memory limit support WARNING: No swap limit support WARNING: No kernel memory limit support WARNING: No kernel memory TCP limit support WARNING: No oom kill disable support
注解
请注意默认使用 Cgroup Version: 1
,目前最新版本 Docker Cgroup v2 ,可以提供精细的io隔离功能
这里显示 cgroups 驱动需要修改成 Systemd进程管理器 作为 cgroups 管理器,并且确保只使用一个cgroup manager。所以修改或者创建 /etc/docker/daemon.json
如下:
$ sudo cat > /etc/docker/daemon.json <<EOF
{
"exec-opts": ["native.cgroupdriver=systemd"],
"log-driver": "json-file",
"log-opts": {
"max-size": "100m"
},
"storage-driver": "overlay2"
}
EOF
注解
在 树莓派安装Kali Linux 上安装的最新版本 docker.io 默认已经启用了 systemd
并且支持 Control Group v2 ,所以 docker info
显示:
...
Cgroup Driver: systemd
Cgroup Version: 2
...
WARNING: No memory limit support
WARNING: No swap limit support
WARNING: Support for cgroup v2 is experimental
我暂时不调整,采用默认设置,看看能否顺利运行kubernetes
Docker Engine 20.10 Released: Supports cgroups v2 and Dual Logging 详细功能请参考 Docker 20.10
激活cgroups limit支持¶
上述 docker info
输出中显示了cgroups limit没有激活,需要修改内核来激活这些选项。对于树莓派4,需要在 /boot/firmware/cmdline.txt
文件中添加以下配置:
cgroup_enable=cpuset
cgroup_enable=memory
cgroup_memory=1
swapaccount=1
确保将上述配置添加到 cmdline.txt
文件到末尾,可以通过以下 sed
命令完成:
sudo sed -i '$ s/$/ cgroup_enable=cpuset cgroup_enable=memory cgroup_memory=1 swapaccount=1/' /boot/firmware/cmdline.txt
接下来重启一次系统,就会看到 docker info
输出显示 cgroups driver
是 systemd
并且有关 cgroup limits 的警告消失了。
允许iptables查看bridged流量¶
Kubernetes需要使用iptables来配置查看bridged网络流量,可以通过以下命令修改 sysctl
配置:
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
sudo sysctl --system
安装Kubernetes软件包¶
添加Kubernetes repo:
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add - cat <<EOF | sudo tee /etc/apt/sources.list.d/kubernetes.list deb https://apt.kubernetes.io/ kubernetes-xenial main EOF
注解
在最新的 Kali Linux 上执行 apt-key
命令会提示:
Warning: apt-key is deprecated. Manage keyring files in trusted.gpg.d instead (see apt-key(8)).
注解
Ubuntu 20.04的版本代号是Focal,Ubuntu 18.04代号是Xenial。你需要检查kubernetes.io的Apt软件仓库提供的对应Ubuntu LTS仓库版本号。当前,在 https://packages.cloud.google.com/apt/dists 中查询还仅仅有 kubernetes-xenial
尚未提供 focal 版本,所以上述配置apt源仅配置针对 Ubuntu 18.04 的 kubernetes-xenial
。后续可以关注该网站提供的软件仓库,在适当时切换到 Focal 版本。
注解
如果 apt update
出现以下报错:
Err:2 https://packages.cloud.google.com/apt kubernetes-xenial InRelease
The following signatures couldn't be verified because the public key is not available: NO_PUBKEY FEEA9169307EA071 NO_PUBKEY 8B57C5C2836F4BEB
...
W: An error occurred during the signature verification. The repository is not updated and the previous index files will be used. GPG error: https://packages.cloud.google.com/apt kubernetes-xenial InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY FEEA9169307EA071 NO_PUBKEY 8B57C5C2836F4BEB
W: Failed to fetch https://apt.kubernetes.io/dists/kubernetes-xenial/InRelease The following signatures couldn't be verified because the public key is not available: NO_PUBKEY FEEA9169307EA071 NO_PUBKEY 8B57C5C2836F4BEB
W: Some index files failed to download. They have been ignored, or old ones used instead.
则需要再次执行:
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
安装以下3个必要的Kubernetes软件包:
sudo apt update && sudo apt install -y kubelet kubeadm kubectl
安装完成后使用
apt-mark hold
命令来锁定上述3个软件版本,因为更新Kubernetes需要很多手工处理和关注,不要使用通用的更新方式更新:sudo apt-mark hold kubelet kubeadm kubectl
如果要解除 hold
则使用 sudo apt-mark unhold kubelet kubeadm kubectl
注解
apt-mark hold
命令非常重要,如果不锁定版本任由其随系统升级,会导致管控平面运行软件版本和客户端版本形成较大gap,最终导致无法管理节点。要升级Kubernetes版本,需要手工采用 使用kubeadms升级Kubernetes
创建Kubernetes集群¶
在创建Kubernetes集群前,需要确定:
- 有一个树莓派节点角色是控制平面节点(Control Plane node),其余节点则作为计算节点。
- 需要选择一个网络的CIDR作为Kubernetes集群的pods使用,这里的案例使用 Flannel网络 CNI,是一种比较功能简单但是性能较为卓越的容器网络接口(Container Network Interface, CNI)。需要确保Kubernetes使用的CIDR没有被路由器或者DHCP服务器所管理的网段冲突。 - 请确保规划足够大大网段,因为会使用大量的pods,往往超出最初的规划 - 使用 10.244.0.0/16
注解
我在模拟环境中使用了树莓派的无线网卡和有线网卡,无线网段可以方便我们调试服务,所以我采用指定BSSID方式确保客户端和服务器端通过同一个无线AP,就不需要使用可路由网段,只需要确保这个Kubernetes的CIDR和路由器DHCP分配的网段不冲突就行。
初始化控制平面¶
Kubernetes使用bootstrap token来认证加入集群的节点,这个token需要在 kubeadm init
命令中传递来初始化控制平面节点。
使用
kubeadm token generate
命令创建token:TOKEN=$(sudo kubeadm token generate) echo $TOKEN
这里 $TOKEN
输出需要记录下来,后续命令行需要
设置
kubernetes-version
可以指定初始化的管控集群版本:sudo kubeadm init --token=${TOKEN} --kubernetes-version=v1.19.4 --pod-network-cidr=10.244.0.0/16
输出信息:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 | W1129 23:50:41.292869 27185 configset.go:348] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s. io] [init] Using Kubernetes version: v1.19.4 [preflight] Running pre-flight checks [WARNING Service-Docker]: docker service is not enabled, please run 'systemctl enable docker.service' [WARNING SystemVerification]: missing optional cgroups: hugetlb [preflight] Pulling images required for setting up a Kubernetes cluster [preflight] This might take a minute or two, depending on the speed of your internet connection [preflight] You can also perform this action in beforehand using 'kubeadm config images pull' [certs] Using certificateDir folder "/etc/kubernetes/pki" [certs] Generating "ca" certificate and key [certs] Generating "apiserver" certificate and key [certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local pi-master1] and IPs [10.96.0.1 192.168.166.91] [certs] Generating "apiserver-kubelet-client" certificate and key [certs] Generating "front-proxy-ca" certificate and key [certs] Generating "front-proxy-client" certificate and key [certs] Generating "etcd/ca" certificate and key [certs] Generating "etcd/server" certificate and key [certs] etcd/server serving cert is signed for DNS names [localhost pi-master1] and IPs [192.168.166.91 127.0.0.1 ::1] [certs] Generating "etcd/peer" certificate and key [certs] etcd/peer serving cert is signed for DNS names [localhost pi-master1] and IPs [192.168.166.91 127.0.0.1 ::1] [certs] Generating "etcd/healthcheck-client" certificate and key [certs] Generating "apiserver-etcd-client" certificate and key [certs] Generating "sa" key and public key [kubeconfig] Using kubeconfig folder "/etc/kubernetes" [kubeconfig] Writing "admin.conf" kubeconfig file [kubeconfig] Writing "kubelet.conf" kubeconfig file [kubeconfig] Writing "controller-manager.conf" kubeconfig file [kubeconfig] Writing "scheduler.conf" kubeconfig file [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env" [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml" [kubelet-start] Starting the kubelet [control-plane] Using manifest folder "/etc/kubernetes/manifests" [control-plane] Creating static Pod manifest for "kube-apiserver" [control-plane] Creating static Pod manifest for "kube-controller-manager" [control-plane] Creating static Pod manifest for "kube-scheduler" [etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests" [wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s [apiclient] All control plane components are healthy after 37.509322 seconds [upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace [kubelet] Creating a ConfigMap "kubelet-config-1.19" in namespace kube-system with the configuration for the kubelets in the cluster [kubelet-check] Initial timeout of 40s passed. [upload-certs] Skipping phase. Please see --upload-certs [mark-control-plane] Marking the node pi-master1 as control-plane by adding the label "node-role.kubernetes.io/master=''" [mark-control-plane] Marking the node pi-master1 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule] [bootstrap-token] Using token: <TOKEN> [bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles [bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes [bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials [bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token [bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster [bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace [kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key [addons] Applied essential addon: CoreDNS [addons] Applied essential addon: kube-proxy Your Kubernetes control-plane has initialized successfully! To start using your cluster, you need to run the following as a regular user: mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config You should now deploy a pod network to the cluster. Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at: https://kubernetes.io/docs/concepts/cluster-administration/addons/ Then you can join any number of worker nodes by running the following on each as root: kubeadm join 192.168.166.91:6443 --token <TOKEN> \ --discovery-token-ca-cert-hash sha256:<DISCOVERY-TOKEN> |
对于管理集群用户,执行以下命令完成配置:
mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config
完成上述工作以后,执行节点检查:
kubectl get nodes
可以看到节点就绪(以下输出案例是我第二次重建集群的输出信息,所以版本是 v1.22.0):
NAME STATUS ROLES AGE VERSION
pi-master1 Ready control-plane,master 32m v1.22.0
多网卡困扰¶
在执行 kubeadm init
初始化Kubernetes集群时,我发现对于具有2块网卡( wlan0
和 eth0
)的树莓派系统, etcd
系统默认使用了 wlan0
地址 192.168.166.91
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost pi- master1] and IPs [192.168.166.91 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost pi-master1] and IPs [192.168.166.91 127.0.0.1 ::1]
而 apiserver
服务则同时提供两个接口的DNS名字证书(其中 10.96.0.1 并非当前服务器网卡IP地址):
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local pi-master1]
and IPs [10.96.0.1 192.168.166.91]
注解
需要注意的是无线网卡的IP地址是DHCP分配,这导致服务器重启会出现IP变化问题,需要解决。
由于没有携带参数的 kubeadmin init
会使用默认路由的网卡IP地址,这导致和我设想的使用有线网卡上的固定IP地址不同,所以我需要重新初始化。注意,再次初始化使用 --apiserver-advertise-address string
参数来指定公告IP地址。
采用的方法请参考我之前的实践 修改Kubernetes Master IP 重新初始化
systemctl stop kubelet docker
cd /etc/
# backup old kubernetes data
mv kubernetes kubernetes-backup
mv /var/lib/kubelet /var/lib/kubelet-backup
# restore certificates
mkdir -p kubernetes
cp -r kubernetes-backup/pki kubernetes
rm kubernetes/pki/{apiserver.*,etcd/peer.*}
systemctl start docker
# reinit master with data in etcd
# add --kubernetes-version, --pod-network-cidr and --token options if needed
# 原文使用如下命令:
# kubeadm init --ignore-preflight-errors=DirAvailable--var-lib-etcd
# 但是由于我使用的是Flannel网络,所以一定要加上参数,否则后续安装 flannel addon无法启动pod
kubeadm init --ignore-preflight-errors=DirAvailable--var-lib-etcd --pod-network-cidr=10.244.0.0/16 --apiserver-advertise-address 192.168.6.11
- 初始化以后等待一些实践,删除掉旧的节点
sleep 120
kubectl get nodes --sort-by=.metadata.creationTimestamp
这里可以看到一个master节点存在问题:
NAME STATUS ROLES AGE VERSION
pi-master1 NotReady master 2d10h v1.19.4
- 删除问题节点
kubectl delete node $(kubectl get nodes -o jsonpath='{.items[?(@.status.conditions[0].status=="Unknown")].metadata.name}')
这里提示:
error: resource(s) were provided, but no name, label selector, or --all flag specified
检查pod
# check running pods
kubectl get pods --all-namespaces -o wide
输出信息:
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system coredns-f9fd979d6-gd94x 0/1 Pending 0 2d10h <none> <none> <none> <none>
kube-system coredns-f9fd979d6-hbqx9 0/1 Pending 0 2d10h <none> <none> <none> <none>
kube-system etcd-pi-master1 1/1 Running 0 11h 192.168.6.11 pi-master1 <none> <none>
kube-system kube-apiserver-pi-master1 1/1 Running 0 11h 192.168.6.11 pi-master1 <none> <none>
kube-system kube-controller-manager-pi-master1 1/1 Running 1 2d10h 192.168.6.11 pi-master1 <none> <none>
kube-system kube-proxy-525kd 1/1 Running 1 2d10h 192.168.6.11 pi-master1 <none> <none>
kube-system kube-scheduler-pi-master1 1/1 Running 1 2d10h 192.168.6.11 pi-master1 <none> <none>
不过检查集群apiserver访问正常:
kubectl cluster-info
显示输出:
Kubernetes master is running at https://192.168.6.11:6443
KubeDNS is running at https://192.168.6.11:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
排查解决node NotReady¶
- 检查
pending
的pod原因
kubectl -n kube-system describe pods coredns-f9fd979d6-gd94x
可以看到是由于调度不成功导致
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 58s (x215 over 5h21m) default-scheduler 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
检查节点 NotReady 的原因:
kubectl describe nodes pi-master1
输出显示:
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Wed, 02 Dec 2020 22:59:43 +0800 Sun, 29 Nov 2020 23:53:52 +0800 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Wed, 02 Dec 2020 22:59:43 +0800 Sun, 29 Nov 2020 23:53:52 +0800 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Wed, 02 Dec 2020 22:59:43 +0800 Sun, 29 Nov 2020 23:53:52 +0800 KubeletHasSufficientPID kubelet has sufficient PID available
Ready False Wed, 02 Dec 2020 22:59:43 +0800 Sun, 29 Nov 2020 23:53:52 +0800 KubeletNotReady runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
可以看到 KubeletNotReady
的原因是 runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
,也就是说,我们指定使用 flannel网络的CNI没有就绪,导致docker的runtime network不能工作。
重建集群实践¶
我在 删除kubeadm构建的Kubernetes集群 之后重新创建集群,吸取了双网卡对集群初始化的配置要求,所以命令改为:
sudo kubeadm init --token=${TOKEN} --kubernetes-version=v1.22.0 --pod-network-cidr=10.244.0.0/16 --apiserver-advertise-address 192.168.6.11
输出信息:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 | [init] Using Kubernetes version: v1.22.0 [preflight] Running pre-flight checks [WARNING SystemVerification]: missing optional cgroups: hugetlb [preflight] Pulling images required for setting up a Kubernetes cluster [preflight] This might take a minute or two, depending on the speed of your internet connection [preflight] You can also perform this action in beforehand using 'kubeadm config images pull' ^[^[k[certs] Using certificateDir folder "/etc/kubernetes/pki" [certs] Generating "ca" certificate and key [certs] Generating "apiserver" certificate and key [certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local pi-master1] and IPs [10.96.0.1 192.168.6.11] [certs] Generating "apiserver-kubelet-client" certificate and key [certs] Generating "front-proxy-ca" certificate and key [certs] Generating "front-proxy-client" certificate and key [certs] Generating "etcd/ca" certificate and key [certs] Generating "etcd/server" certificate and key [certs] etcd/server serving cert is signed for DNS names [localhost pi-master1] and IPs [192.168.6.11 127.0.0.1 ::1] [certs] Generating "etcd/peer" certificate and key [certs] etcd/peer serving cert is signed for DNS names [localhost pi-master1] and IPs [192.168.6.11 127.0.0.1 ::1] [certs] Generating "etcd/healthcheck-client" certificate and key [certs] Generating "apiserver-etcd-client" certificate and key [certs] Generating "sa" key and public key [kubeconfig] Using kubeconfig folder "/etc/kubernetes" [kubeconfig] Writing "admin.conf" kubeconfig file [kubeconfig] Writing "kubelet.conf" kubeconfig file [kubeconfig] Writing "controller-manager.conf" kubeconfig file [kubeconfig] Writing "scheduler.conf" kubeconfig file [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env" [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml" [kubelet-start] Starting the kubelet [control-plane] Using manifest folder "/etc/kubernetes/manifests" [control-plane] Creating static Pod manifest for "kube-apiserver" [control-plane] Creating static Pod manifest for "kube-controller-manager" [control-plane] Creating static Pod manifest for "kube-scheduler" [etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests" [wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s [apiclient] All control plane components are healthy after 29.516593 seconds [upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace [kubelet] Creating a ConfigMap "kubelet-config-1.22" in namespace kube-system with the configuration for the kubelets in the cluster [upload-certs] Skipping phase. Please see --upload-certs [mark-control-plane] Marking the node pi-master1 as control-plane by adding the labels: [node-role.kubernetes.io/master(deprecated) node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers] [mark-control-plane] Marking the node pi-master1 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule] [bootstrap-token] Using token: 9i7czi.rcz4i0nf2i03237f [bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles [bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes [bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials [bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token [bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster [bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace [kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key [addons] Applied essential addon: CoreDNS [addons] Applied essential addon: kube-proxy Your Kubernetes control-plane has initialized successfully! To start using your cluster, you need to run the following as a regular user: mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config Alternatively, if you are the root user, you can run: export KUBECONFIG=/etc/kubernetes/admin.conf You should now deploy a pod network to the cluster. Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at: https://kubernetes.io/docs/concepts/cluster-administration/addons/ Then you can join any number of worker nodes by running the following on each as root: kubeadm join 192.168.6.11:6443 --token TOKEN \ --discovery-token-ca-cert-hash sha256:SHASTRING |
注解
这里输出信息中有:
...
[WARNING SystemVerification]: missing optional cgroups: hugetlb
这是在 Control Group v1 中支持的 Cgroup v1的HugeTLB控制器
安装CNI插件¶
CNI插件处理pod网络的配置和清理,这里使用最简单的flannel CNI插件,只需要下载和 kubeclt apply
Flannel YAML就可以安装好:
# Download the Flannel YAML data and apply it
# (output omitted)
#$ curl -sSL https://raw.githubusercontent.com/coreos/flannel/v0.12.0/Documentation/kube-flannel.yml | kubectl apply -f -
# 从Kubernetes v1.17+可以使用以下命令
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
果然,正确安装了 Flannel 网络插件之后再检查节点状态就恢复
Ready
kubectl get nodes
已经恢复正常状态:
NAME STATUS ROLES AGE VERSION
pi-master1 Ready master 2d23h v1.19.4
同时检查pod状态:
kubectl get pods --all-namespaces -o wide
可以看到 coredns 也恢复正航运行:
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system coredns-f9fd979d6-gd94x 1/1 Running 0 2d23h 10.244.0.3 pi-master1 <none> <none>
kube-system coredns-f9fd979d6-hbqx9 1/1 Running 0 2d23h 10.244.0.2 pi-master1 <none> <none>
kube-system etcd-pi-master1 1/1 Running 1 23h 192.168.6.11 pi-master1 <none> <none>
kube-system kube-apiserver-pi-master1 1/1 Running 1 23h 192.168.6.11 pi-master1 <none> <none>
kube-system kube-controller-manager-pi-master1 1/1 Running 2 2d23h 192.168.6.11 pi-master1 <none> <none>
kube-system kube-flannel-ds-arm64-5c2kf 1/1 Running 0 3m21s 192.168.6.11 pi-master1 <none> <none>
kube-system kube-proxy-525kd 1/1 Running 2 2d23h 192.168.6.11 pi-master1 <none> <none>
kube-system kube-scheduler-pi-master1 1/1 Running 2 2d23h 192.168.6.11 pi-master1 <none> <none>
将计算节点加入到集群¶
在完成了CNI add-on部署之后,就可以向集群增加计算节点
登陆到工作节点,例如
pi-worker1
上使用命令kubeadm join
kubeadm join 192.168.6.11:6443 --token <TOKEN> \ --discovery-token-ca-cert-hash sha256:<DISCOVERY-TOKEN>
这里出现了报错:
[preflight] Running pre-flight checks
[WARNING Service-Docker]: docker service is not enabled, please run 'systemctl enable docker.service'
[WARNING SystemVerification]: missing optional cgroups: hugetlb
error execution phase preflight: couldn't validate the identity of the API Server: could not find a JWS signature in the cluster-info ConfigMap for token ID "8pile8"
To see the stack trace of this error execute with --v=5 or higher
这个问题参考 Kubernetes: unable to join a remote master node 原因是token已经过期或者已经移除,所以可以通过以下方法重新创建并提供命令:
kubeadm token create --print-join-command
输出可以看到:
W1203 11:50:33.907625 484892 kubelet.go:200] cannot automatically set CgroupDriver when starting the Kubelet: cannot execute 'docker info -f {{.CgroupDriver}}': exit status 2
W1203 11:50:33.918553 484892 configset.go:348] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
kubeadm join 192.168.6.11:6443 --token <TOKEN> --discovery-token-ca-cert-hash sha256:<DISCOVERY-TOKEN>
注解
默认生成的token都有一个有效期,所以导致上述token过期无法使用的问题。
可以通过以下命令生成一个无期限的token(但是存在安全风险):
kubeadm token create --ttl 1
查看token的方法如下:
kubeadm token list
然后根据token重新生成证书摘要(即hash):
openssl x509 -pubkey -in /etc/kubenetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'
这样就能拼接出一个添加节点的join命令:
kubeadm join 192.168.6.11:6443 --token <TOKEN> --discovery-token-ca-cert-hash sha256:<DISCOVERY-TOKEN>
根据提示重新在
pi-worker1
上执行节点添加:kubeadm join 192.168.6.11:6443 --token <TOKEN> --discovery-token-ca-cert-hash sha256:<DISCOVERY-TOKEN>
输出信息:
[preflight] Running pre-flight checks
[WARNING SystemVerification]: missing optional cgroups: hugetlb
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
等待一会,在管控节点上检查:
kubectl get nodes -o wide
输出信息如下:
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
pi-master1 Ready master 3d12h v1.19.4 192.168.6.11 <none> Ubuntu 20.04.1 LTS 5.4.0-1022-raspi docker://19.3.8
pi-worker1 Ready <none> 2m10s v1.19.4 192.168.6.15 <none> Ubuntu 20.04.1 LTS 5.4.0-1022-raspi docker://19.3.8
jetson节点(GPU)¶
我在 ARM部署Kubernetes 说明了我部署ARM架构的设备中还有一个 Jetson Nano 设备,用来实现验证GPU容器在Kubernetes的部署,并学习 Machine Learning Atlas 。
jetson nano使用的Ubuntu 18.04定制版本L4T默认已经安装了Docker 19.03版本,满足了运行kubernetes要求,不过,也同样需要做Cgroup Driver调整:
- 通过
docker info
检查显示
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 | Client: Context: default Debug Mode: false Server: Containers: 0 Running: 0 Paused: 0 Stopped: 0 Images: 0 Server Version: 20.10.2 Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Native Overlay Diff: true Logging Driver: json-file Cgroup Driver: cgroupfs Cgroup Version: 1 Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog Swarm: inactive Runtimes: nvidia runc io.containerd.runc.v2 io.containerd.runtime.v1.linux Default Runtime: runc Init Binary: docker-init containerd version: runc version: init version: Security Options: seccomp Profile: default Kernel Version: 4.9.201-tegra Operating System: Ubuntu 18.04.5 LTS OSType: linux Architecture: aarch64 CPUs: 4 Total Memory: 3.863GiB Name: jetson ID: NNUG:AYZB:I6K7:CMR4:NCMA:5TPD:OPT6:CNGI:OYO5:NM2D:VJG7:N4W3 Docker Root Dir: /var/lib/docker Debug Mode: false Registry: https://index.docker.io/v1/ Labels: Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false WARNING: No blkio weight support WARNING: No blkio weight_device support |
- 注意,Jetson Nano的Docker激活了
nvidia
runtime,所以默认的daemon.json
配置如下
1 2 3 4 5 6 7 8 | { "runtimes": { "nvidia": { "path": "nvidia-container-runtime", "runtimeArgs": [] } } } |
修订成:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | { "exec-opts": ["native.cgroupdriver=systemd"], "log-driver": "json-file", "log-opts": { "max-size": "100m" }, "storage-driver": "overlay2", "runtimes": { "nvidia": { "path": "nvidia-container-runtime", "runtimeArgs": [] } } } |
然后重启docker服务:
systemctl restart docker
并通过 docker info
验证确保 Cgroup Driver: systemd
。
上述
docker info
中显示有2个WARNING:WARNING: No blkio weight support WARNING: No blkio weight_device support
注解
最初Ubuntu 20/18 提供的docker版本(19.x)都没有出现上述WARNING,但是最近升级docker到 20.10.2
出现。原因是Docker 从 Docker Engine 20.10开始,支持 Control Group v2 。 Docker Cgroup v2 提供了更好的io隔离。如果非生产环境,可以暂时忽略上述警告。
Jetson的L4T系统内核
sysctl
配置默认已经启动允许iptables查看bridge流量:sysctl -a | grep net.bridge.bridge-nf-call-ip
可以看到:
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
添加Kubernetes repo:
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add - cat <<EOF | sudo tee /etc/apt/sources.list.d/kubernetes.list deb https://apt.kubernetes.io/ kubernetes-xenial main EOF
Kubernetes软件包:
sudo apt update && sudo apt install -y kubelet kubeadm kubectl
注解
安装Kubernetes需要直接访问google服务,在墙内需要使用 OpenConnect VPN 翻墙,或者 Squid父级socks代理 构建代理。
锁定Kubernetes版本(可选,对于测试验证集群没有业务连续性要求,可以跳过):
sudo apt-mark hold kubelet kubeadm kubectl
关闭swap:
# 关闭系统默认启动的4个zram swap文件 for i in {0..3};do swapoff /dev/zram${i};echo $i > /sys/class/zram-control/hot_remove;done # 禁用启动swap systemctl disable nvzramconfig.service
注解
Jetson的定制L4T操作系统使用了 zram - 基于内存的压缩块存储设备 来构建swap,详细参考 Jetson Nano的swap 。
在管控服务器获取当前添加节点命令:
kubeadm token create --print-join-command
回到Jetson服务器节点执行添加节点命令:
kubeadm join 192.168.6.11:6443 --token <TOKEN> \ --discovery-token-ca-cert-hash sha256:<HASH_TOKEN>
注解
随着 kubeadm
软件版本不断升级,新安装的worker节点的版本可能高于原先构建 Kubernetes Atlas 集群,这导致无法加入集群,需要按照 使用kubeadms升级Kubernetes 。
完成以后执行命令检查节点:
kubectl get nodes
我们会看到如下输出:
NAME STATUS ROLES AGE VERSION
jetson Ready <none> 2m47s v1.19.4
pi-master1 Ready master 5d21h v1.19.4
pi-worker1 Ready <none> 2d9h v1.19.4
pi-worker2 Ready <none> 2d9h v1.19.4
注解
注意如果worker主机有多个网卡接口,kubelet执行 kubeadm join
命令时候有可能注册采用了默认有路由的网卡接口,也可能使用和apiserver指定IP所在相同网段的IP。这点让我很疑惑,例如上述注册worker节点,3个树莓派注册的 INTERNAL-IP
是正确的内网IP地址 192.168.6.x
,但是 jetson
就注册成了外网无线网卡上的IP地址 192.168.0.x
。
这个问题修订,请参考 指定Kubernetes工作节点内网IP 明确配置worker的 INTERNAL-IP
避免出现混乱。
kali linux节点(kali)¶
Kali Linux 也是基于 Ubuntu Linux 的操作系统,所以安装和管理Kubernetes非常相似。不过,需要注意的是,默认安装:
apt install docker.io
然后执行 docker info
可以看到已经启用了 systemd
和 Control Group v2
...
Cgroup Driver: systemd
Cgroup Version: 2
...
WARNING: No memory limit support
WARNING: No swap limit support
WARNING: Support for cgroup v2 is experimental
不过默认的docker配置是无法安装Kubernetes的,会提示报错:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | [preflight] Running pre-flight checks [preflight] The system verification failed. Printing the output from the verification: KERNEL_VERSION: 5.4.83-Re4son-v8l+ CONFIG_NAMESPACES: enabled CONFIG_NET_NS: enabled CONFIG_PID_NS: enabled CONFIG_IPC_NS: enabled CONFIG_UTS_NS: enabled CONFIG_CGROUPS: enabled CONFIG_CGROUP_CPUACCT: enabled CONFIG_CGROUP_DEVICE: enabled CONFIG_CGROUP_FREEZER: enabled CONFIG_CGROUP_PIDS: enabled CONFIG_CGROUP_SCHED: enabled CONFIG_CPUSETS: enabled CONFIG_MEMCG: enabled CONFIG_INET: enabled CONFIG_EXT4_FS: enabled CONFIG_PROC_FS: enabled CONFIG_NETFILTER_XT_TARGET_REDIRECT: enabled (as module) CONFIG_NETFILTER_XT_MATCH_COMMENT: enabled (as module) CONFIG_FAIR_GROUP_SCHED: enabled CONFIG_OVERLAY_FS: enabled (as module) CONFIG_AUFS_FS: not set - Required for aufs. CONFIG_BLK_DEV_DM: enabled (as module) CONFIG_CFS_BANDWIDTH: enabled CONFIG_CGROUP_HUGETLB: not set - Required for hugetlb cgroup. CONFIG_SECCOMP: enabled CONFIG_SECCOMP_FILTER: enabled DOCKER_VERSION: 20.10.5+dfsg1 DOCKER_GRAPH_DRIVER: overlay2 OS: Linux CGROUPS_CPU: enabled CGROUPS_CPUSET: enabled CGROUPS_DEVICES: enabled CGROUPS_FREEZER: enabled CGROUPS_MEMORY: missing CGROUPS_PIDS: enabled CGROUPS_HUGETLB: missing [WARNING SystemVerification]: missing optional cgroups: hugetlb [WARNING Service-Kubelet]: kubelet service is not enabled, please run 'systemctl enable kubelet.service' error execution phase preflight: [preflight] Some fatal errors occurred: [ERROR SystemVerification]: missing required cgroups: memory [preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...` To see the stack trace of this error execute with --v=5 or higher |
可以看到,必须要设置 CGROUPS_MEMORY
,所以也如前设置:
修订 Kali Linux for Raspberry Pi的配置文件
/boot/cmdline.txt
(原先只有一行配置,在配置行最后添加):... cgroup_enable=cpuset cgroup_enable=memory cgroup_memory=1 swapaccount=1
然后重启系统,再检查
docker info
后,只在最后提示:... WARNING: Support for cgroup v2 is experimental
然后重新开始将节点加入集群
kali linux节点添加失败排查¶
在解决了kali linux节点的docker版本配置问题后,就可以执行 kubeadm join
指令了,但是发现添加的节点始终是 NotReady
,检查pod创建:
kubectl -n kube-system get pods
可以看到kali linux节点上容器没有正确启动:
kube-flannel-ds-pkhch 0/1 Init:0/1 0 42m 30.73.167.10 kali <none> <none>
kube-proxy-6jt64 0/1 ContainerCreating 0 42m 30.73.167.10 kali <none> <none>
检查pod:
kubectl -n kube-system describe pods kube-flannel-ds-pkhch
可以看到报错原因:
Warning FailedCreatePodSandBox 2m29s (x186 over 42m) kubelet Failed to create pod sandbox: open /run/systemd/resolve/resolv.conf: no such file or directory
检查kali linux节点,发现确实没有这个文件:
# ls /run/systemd/resolve/resolv.conf ls: cannot access '/run/systemd/resolve/resolv.conf': No such file or directory
对比正常节点,则有这个文件,这说明kali linux默认没有激活 systemd-resolved
systemctl enable systemd-resolved
systemctl start systemd-resolved
然后再次检查就可以看到文件:
# ls /run/systemd/resolve/resolv.conf
/run/systemd/resolve/resolv.conf
- 等待kali linux节点上kube-system namespace对应的pod创建成功,就可以看到该worker节点正常Ready了
arch linux节点(zcloud)¶
安装docker:
panman -Sy docker
启动docker:
systemctl start docker systemctl enable docker
检查
docker info
输出信息,可以看到Cgroup Driver: cgroupsf
不是systemd
,所以修订成 Systemd进程管理器 作为 cgroups 管理器,并且确保只使用一个cgroup manager。修改或者创建/etc/docker/daemon.json
如下:$ sudo cat > /etc/docker/daemon.json <<EOF { "exec-opts": ["native.cgroupdriver=systemd"], "log-driver": "json-file", "log-opts": { "max-size": "100m" }, "storage-driver": "overlay2" } EOF
然后再次检查
docker info
可以看到还有一个报错:WARNING: bridge-nf-call-iptables is disabled WARNING: bridge-nf-call-ip6tables is disabled
修复:
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
sudo sysctl --system
注解
arch linux安装kubernetes参考 arch linux文档 - Kubernetes
安装软件包:
pacmsn -Syu && pacman -Sy kubelet kubeadm kubectl
关闭swap
在管控服务器获取当前添加节点命令:
kubeadm token create --print-join-command
回到Jetson服务器节点执行添加节点命令:
kubeadm join 192.168.6.11:6443 --token <TOKEN> \ --discovery-token-ca-cert-hash sha256:<HASH_TOKEN>
注解
如果遇到节点一直 NotReady
,请在节点上执行检查kubelet日志命令:
journalctl -xeu --no-pager kubelet
我遇到的问题是无法下载镜像,提示错误:
Mar 23 00:25:15 zcloud kubelet[3682]: E0323 00:25:15.851566 3682 pod_workers.go:191] Error syncing pod 062199e4-6351-4ff9-9e55-91db9ac8884f ("kube-proxy-jms58_kube-system(062199e4-6351-4ff9-9e55-91db9ac8884f)"), skipping: failed to "CreatePodSandbox" for "kube-proxy-jms58_kube-system(062199e4-6351-4ff9-9e55-91db9ac8884f)" with CreatePodSandboxError: "CreatePodSandbox for pod \"kube-proxy-jms58_kube-system(062199e4-6351-4ff9-9e55-91db9ac8884f)\" failed: rpc error: code = Unknown desc = failed pulling image \"k8s.gcr.io/pause:3.2\": Error response from daemon: Get \"https://k8s.gcr.io/v2/\": dial tcp: lookup k8s.gcr.io: no such host"
这个问题是因为我启动主机时候未连接无线网络,导致没有默认路由可以访问internet,此时启动的dnsmasq无法解析地址。虽然后面用手工脚本命令启动了wifi,但是dnsmasq依然无法提供本地主机域名解析。我是通过重启dnsmasq解决的,你的情况可能和我不同。不过,观察 kubelet 日志是一个比较好的排查问题方法。
再次添加zcloud节点(arch linux)¶
我在重建了Kubernetes集群之后,发现当前google提供的kubernetes软件版本,也就是我构建的管控平面,已经是最新的 v1.22.0
,但是在 zcloud 节点使用的Arch Linux,则提供的是 v1.21.3
。
尝试执行节点添加:
kubeadm join 192.168.6.11:6443 --token <TOKEN> \ --discovery-token-ca-cert-hash sha256:<HASH_TOKEN>
出现报错:
...
error execution phase preflight: unable to fetch the kubeadm-config ConfigMap: failed to decode cluster configuration data: no kind "ClusterConfiguration" is registered for version "kubeadm.k8s.io/v1beta3" in scheme "k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/scheme/scheme.go:31"
To see the stack trace of this error execute with --v=5 or higher
看来低版本节点无法加入高版本集群(兼容性实在太差了)
检查 arch linux kubeadm package 可以看到当前:
1.21.3-1
为社区版本(正式)1.22.0-1
为社区测试版本
修订
/etc/pacman.conf
配置文件,激活commuity-testing
仓库:[community-testing] Include = /etc/pacman.d/mirrorlist
然后执行指定版本安装:
# 同步 community-testing 仓库信息 pacman -Sy # 安装指定版本 pacman -S kubelet=1.22.0-1 kubeadm=1.22.0-1 kubectl=1.22.0-1
锁定版本是修改
/etc/pacman.conf
IgnorePkg = kubelet kubeadm kubectl
这样就不会升级上述版本。
- 再次修订
/etc/pacman.conf
将community-testing
仓库注释掉,避免其他软件版本升级到测试版本。
最终结果¶
最终节点如下:
kubectl get nodes -o wide
显示如下:
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
jetson Ready <none> 2d1h v1.22.0 30.73.165.36 <none> Ubuntu 18.04.5 LTS 4.9.201-tegra docker://20.10.7
kali Ready <none> 63m v1.22.0 30.73.167.10 <none> Kali GNU/Linux Rolling 5.4.83-Re4son-v8l+ docker://20.10.5+dfsg1
pi-master1 Ready control-plane,master 2d2h v1.22.0 192.168.6.11 <none> Ubuntu 20.04.2 LTS 5.4.0-1041-raspi docker://20.10.7
pi-worker1 Ready <none> 2d1h v1.22.0 192.168.6.15 <none> Ubuntu 20.04.2 LTS 5.4.0-1041-raspi docker://20.10.7
pi-worker2 Ready <none> 2d1h v1.22.0 192.168.6.16 <none> Ubuntu 20.04.2 LTS 5.4.0-1041-raspi docker://20.10.7
zcloud Ready <none> 21m v1.22.0 192.168.6.200 <none> Arch Linux 5.13.9-arch1-1 docker://20.10.8
注解
注意到有两个节点 jetson
和 kali
节点 INTERNAL-IP
采用了无线网卡上的IP地址,需要修订
现在,我们终于可以拥有一个混合架构的Kubernetes集群:
kubectl get nodes --show-labels
可以看到 ARM 节点的标签有 kubernetes.io/arch=arm64
,而 X86 节点标签有 kubernetes.io/arch=amd64
NAME STATUS ROLES AGE VERSION LABELS
jetson Ready <none> 2d1h v1.22.0 beta.kubernetes.io/arch=arm64,beta.kubernetes.io/os=linux,kubernetes.io/arch=arm64,kubernetes.io/hostname=jetson,kubernetes.io/os=linux
kali Ready <none> 61m v1.22.0 beta.kubernetes.io/arch=arm64,beta.kubernetes.io/os=linux,kubernetes.io/arch=arm64,kubernetes.io/hostname=kali,kubernetes.io/os=linux
pi-master1 Ready control-plane,master 2d2h v1.22.0 beta.kubernetes.io/arch=arm64,beta.kubernetes.io/os=linux,kubernetes.io/arch=arm64,kubernetes.io/hostname=pi-master1,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=,node.kubernetes.io/exclude-from-external-load-balancers=
pi-worker1 Ready <none> 2d1h v1.22.0 beta.kubernetes.io/arch=arm64,beta.kubernetes.io/os=linux,kubernetes.io/arch=arm64,kubernetes.io/hostname=pi-worker1,kubernetes.io/os=linux
pi-worker2 Ready <none> 2d1h v1.22.0 beta.kubernetes.io/arch=arm64,beta.kubernetes.io/os=linux,kubernetes.io/arch=arm64,kubernetes.io/hostname=pi-worker2,kubernetes.io/os=linux
zcloud Ready <none> 20m v1.22.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=zcloud,kubernetes.io/os=linux
后面我们部署应用,所使用的镜像需要区分不同架构。我将实践异构应用部署。
验证集群¶
现在一个基于ARM的Kubernetes集群已经部署完成,可以运行pods,创建deployments和jobs。
为了验证集群的正确运行,可以执行验证步骤:
- 创建新的namespace
- 创建deployment
- 创建serice
- 验证运行在deployment中的pods能够正确响应
注解
这里使用的验证镜像是从RED HAT维护的 quay.io 镜像仓库提供的,你也可以使用其他镜像仓库,例如Docker Hub提供的镜像进行验证。
创建一个名为
kube-verify
的namespace:kubectl create namespace kube-verify
提示信息:
namespace/kube-verify created
检查namespace:
kubectl get namespaces
显示输出:
NAME STATUS AGE
default Active 5d21h
kube-node-lease Active 5d21h
kube-public Active 5d21h
kube-system Active 5d21h
kube-verify Active 36s
- 创建一个deployment用于这个新的namespace:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | cat <<EOF | kubectl create -f - apiVersion: apps/v1 kind: Deployment metadata: name: kube-verify namespace: kube-verify labels: app: kube-verify spec: replicas: 3 selector: matchLabels: app: kube-verify template: metadata: labels: app: kube-verify spec: containers: - name: nginx image: quay.io/clcollins/kube-verify:01 ports: - containerPort: 8080 EOF |
此时提示信息:
deployment.apps/kube-verify created
上述deployment配置了3个副本 replicas: 3
pods,每个pod都运行了 quay.io/clcollins/kube-verify:01
镜像。
检查在
kube-verify
中的所有资源:kubectl get all -n kube-verify
输出显示:
NAME READY STATUS RESTARTS AGE
pod/kube-verify-69dd569645-nvnhl 1/1 Running 0 2m8s
pod/kube-verify-69dd569645-s5qb5 1/1 Running 0 2m8s
pod/kube-verify-69dd569645-v9zxt 1/1 Running 0 2m8s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/kube-verify 3/3 3 3 2m8s
NAME DESIRED CURRENT READY AGE
replicaset.apps/kube-verify-69dd569645 3 3 3 2m8s
可以看到有一个新的deployment deployment.apps/kube-verify
,这个deployment配置 replicas: 3
的请求下有3个pods被创建。
- 创建服务来输出 Nginx 应用,这个操作是一个单一入口
single endpoint
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | cat <<EOF | kubectl create -f - apiVersion: v1 kind: Service metadata: name: kube-verify namespace: kube-verify spec: selector: app: kube-verify ports: - protocol: TCP port: 80 targetPort: 8080 EOF |
提示:
service/kube-verify created
现在服务已经创建,我们可以检查新服务的IP地址:
kubectl get -n kube-verify service/kube-verify
显示输出:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-verify ClusterIP 10.107.123.31 <none> 80/TCP 90s
你可以看到 kube-verify
服务被设置到一个 ClusterIP 10.107.123.31
上,但是这个IP地址是集群内部使用的,所以你可以在集群的任何node节点上访问,但是在集群外部就不能访问。
选择一个集群节点,执行以下命令验证deployment的容器是正常工作的:
curl 10.107.123.31
你会看到一个完整输出的html文件内容。
访问服务¶
到这里,你已经完整部署和验证了Kubernetes on Raspberry Pi集群,只是当前在集群外部还不能访问到集群pod提供的服务。
你可以有多种方法:
- 输出部署(
expose deployments
)通过简单的负载均衡方式(指定external-ip
)将服务映射输出到集群外部 - 部署 Ingress 来管理外部访问集群的服务,例如, 部署Nginx Ingress Controller 对外提供服务
这里我们采用最简单的方法,使用 expose deployments
方法输出服务:
检查
kube-verify
namespace的服务:kubectl get services -n kube-verify
输出可以看到当前服务有内部 cluster-ip
但是没有 external-ip
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-verify ClusterIP 10.107.123.31 <none> 80/TCP 19h
我们的woker工作节点如下:
kubectl get nodes -o wide
显示:
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
jetson Ready <none> 19h v1.19.4 192.168.6.10 <none> Ubuntu 18.04.5 LTS 4.9.140-tegra docker://19.3.6
pi-master1 Ready master 6d17h v1.19.4 192.168.6.11 <none> Ubuntu 20.04.1 LTS 5.4.0-1022-raspi docker://19.3.8
pi-worker1 Ready <none> 3d5h v1.19.4 192.168.6.15 <none> Ubuntu 20.04.1 LTS 5.4.0-1022-raspi docker://19.3.8
pi-worker2 Ready <none> 3d5h v1.19.4 192.168.6.16 <none> Ubuntu 20.04.1 LTS 5.4.0-1022-raspi docker://19.3.8
问题来了,我们需要把服务输出到外部无线网卡上,该如何实现?实际上 pi-worker1
上无线网卡上IP地址 192.168.0.81:
- 无线网卡的外网地址是动态的,假设我直接输出到这个无线网卡IP地址,则下次主机重启如果获取到不同IP地址,则有可能和局域网其他IP地址冲突
- 我考虑到解决方法是采用一个自己控制的IP地址段作为演示,然后需要访问的客户机也绑定同一个网段,这样只要连接到相同的无线AP上,就可以访问服务
当前我简化这个配置,暂时先使用有线网络网段 192.168.6.x
检查当前service:
kubectl get service -n kube-verify
可以看到:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-verify ClusterIP 10.107.123.31 <none> 80/TCP 26h
删除掉这个service:
kubectl delete service kube-verify -n kube-verify
然后重建一个负载均衡类型的输出:
kubectl expose deployments kube-verify -n kube-verify --port=80 --protocol=TCP --target-port=8080 \ --name=kube-verify --external-ip=192.168.6.10 --type=LoadBalancer
提示信息:
service/kube-verify exposed
注解
注意,这里负载均衡类型的输出 target-ports
和之前 service 是一样的,只不过多了一个 EXTERNAL-IP
对外
现在我们检查
kubectl get service -n kube-verify
显示:NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kube-verify LoadBalancer 10.96.31.186 192.168.6.10 80:30586/TCP 61s
奇怪,这里 --target-port=8080
怎么没有生效么?怎么显示 PORT(S)
是 80:30586/TCP
,但是直接对该IP地址访问 curl http://192.168.6.10
是能够正常获取到页面输出的。
现在我们得到的验证环境:
kubectl get all -n kube-verify
输出显示:
NAME READY STATUS RESTARTS AGE
kube-verify-69dd569645-q9hzc 1/1 Running 0 33h
kube-verify-69dd569645-s5qb5 1/1 Running 0 2d12h
kube-verify-69dd569645-v9zxt 1/1 Running 0 2d12h
ubuntu@pi-master1:~$ kubectl get all -n kube-verify
NAME READY STATUS RESTARTS AGE
pod/kube-verify-69dd569645-q9hzc 1/1 Running 0 33h
pod/kube-verify-69dd569645-s5qb5 1/1 Running 0 2d12h
pod/kube-verify-69dd569645-v9zxt 1/1 Running 0 2d12h
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kube-verify LoadBalancer 10.96.31.186 192.168.6.10 80:30586/TCP 32h
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/kube-verify 3/3 3 3 2d12h
NAME DESIRED CURRENT READY AGE
replicaset.apps/kube-verify-69dd569645 3 3 3 2d12h
参考¶
- Build a Kubernetes cluster with the Raspberry Pi - 完整的部署指南,作者 Chris Collins 撰写了一系列在Raspberry Pi上构建homelab的文章,可以实现基于K8s的NFS服务;另外,如果需要通过Kubernetes对外提供HTTP路由和反向代理(ingress),可以参考 Try this Kubernetes HTTP router and reverse proxy
- 解决k8s执行kubeadm join遇到could not find a JWS signature的问题