单条命令安装kubeflow¶
准备工作¶
使用默认 Kubernetes存储 : Provisioner种类其实不多,我考虑使用以下几种类型:
kustomize 5.0.3 以上:
kustomize
¶curl -s "https://raw.githubusercontent.com/kubernetes-sigs/kustomize/master/hack/install_kustomize.sh" | bash
kubectl
安装¶
clone下仓库并进入
apps
目录:
git clone git@github.com:kubeflow/manifests.git
cd manifests
# 只需要以下单一命令进行安装
while ! kustomize build example | kubectl apply -f -; do echo "Retrying to apply resources"; sleep 10; done
备注
安装是如此简洁,令人击节赞叹…我厂的软件交付…
完成安装后,可能需要等待一些时间让所有的pods就绪,可以通过以下命令来确认:
kubeflow
相关Pods就绪¶kubectl get pods -n cert-manager
kubectl get pods -n istio-system
kubectl get pods -n auth
kubectl get pods -n knative-eventing
kubectl get pods -n knative-serving
kubectl get pods -n kubeflow
kubectl get pods -n kubeflow-user-example-com
备注
我在 Kubernetes集群(y-k8s) 部署的虚拟机中采用了极小化的虚拟磁盘,遇到一个尴尬的问题就是 节点压力驱逐 ,也就是磁盘空间不足导致运行Pod被驱逐。在上述Pods检测就绪发现存在问题时,通过 使用libvirt和XFS在线扩展Ceph RBD设备 实现扩容解决(离线扩展方式,并且将 /var/lib/docker
迁移到 /var/lib/containerd
)
需要注意,采用这种简单的部署方式,可能仅适用于测试环境。只少我看到的 deployments
都是单副本,没有提供冗余。后续我再仔细研究一下。
异常排查¶
在解决了 ref:y-k8s 集群的磁盘空间不足问题之后,我清理了 ContainerStatusUnknown
的pod,然后按照上文依次检查相关 namespace 中pod是否正常运行。
oidc-authservice
pending¶
检查
kubectl get pods -n istio-system
输出显示oidc-authservice-0
处于pending
状态,检查kubectl -n istio-system describe pods oidc-authservice-0
输出如下:
describe pods oidc-authservice-0
可以看到调度失败原因是没有对应 pvc
¶Name: oidc-authservice-0
Namespace: istio-system
Priority: 0
Service Account: authservice
Node: <none>
Labels: app=authservice
controller-revision-hash=oidc-authservice-7bd6b4b965
statefulset.kubernetes.io/pod-name=oidc-authservice-0
Annotations: sidecar.istio.io/inject: false
Status: Pending
IP:
IPs: <none>
Controlled By: StatefulSet/oidc-authservice
Containers:
authservice:
Image: gcr.io/arrikto/kubeflow/oidc-authservice:e236439
Port: 8080/TCP
Host Port: 0/TCP
Readiness: http-get http://:8081/ delay=0s timeout=1s period=10s #success=1 #failure=3
Environment Variables from:
oidc-authservice-client Secret Optional: false
oidc-authservice-parameters ConfigMap Optional: false
Environment: <none>
Mounts:
/var/lib/authservice from data (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-khhmb (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: authservice-pvc
ReadOnly: false
kube-api-access-khhmb:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 3m30s (x48 over 4h1m) default-scheduler 0/5 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/5 nodes are available: 5 No preemption victims found for incoming pod..
检查PVC:
kubectl -n istio-system get pvc
可以看到authservice-pvc
处于Pending
,则检查kubectl -n istio-system get pvc authservice-pvc -o yaml
输出如下:
get pvc authservice-pvc
¶apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"v1","kind":"PersistentVolumeClaim","metadata":{"annotations":{},"name":"authservice-pvc","namespace":"istio-system"},"spec":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"10Gi"}}}}
creationTimestamp: "2023-08-30T14:46:40Z"
finalizers:
- kubernetes.io/pvc-protection
name: authservice-pvc
namespace: istio-system
resourceVersion: "13270831"
uid: 425e321b-20cd-44b4-a797-28f092bfc42a
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
volumeMode: Filesystem
status:
phase: Pending
我最初以为这是一个简单的 Kubernetes PV 和 PVC 绑定 (类似我之前实践过的 kube-prometheus-stack 持久化卷 ) ,想正好实践一下 ZFS NFS 输出为 在Kubernetes中部署NFS 。
但是仔细检查这个 authservice-pvc
就会发现和 pv/pvc
的静态配置有所不同: authservice-pvc
并没有提供 storageClassName
来对应绑定 pv
和 pvc
。也就是说,这里的实现是 Kubernetes动态卷制备(Dynamic Volume Provisioning) 。
如果我不是在云计算厂商的平台部署(通常云厂商会提供 Kubernetes 容器存储接口(Container Storage Interface, CSI) ,并且只要配置好 Admission Plugin DefaultStorageClass 就能无需指定 sc
storage class直接创建存储pv ),就必须自己部署实现:
然后通过指定 Admission Plugin DefaultStorageClass 实现为 kubeflow mainfest
提供 Kubernetes动态卷制备(Dynamic Volume Provisioning)