Krew快速起步¶
在完成了 安装Krew 之后,可以安装管理各种插件,方便运维:
下载插件list:
kubectl krew update
探索可用的krew插件:
kubectl krew search
输出类似:
NAME DESCRIPTION INSTALLED
access-matrix Show an RBAC access matrix for server resources no
accurate Manage Accurate, a multi-tenancy controller no
advise-policy Suggests PodSecurityPolicies and OPA Policies f... unavailable on darwin/arm64
advise-psp Suggests PodSecurityPolicies for cluster. unavailable on darwin/arm64
aks Interact with and debug AKS clusters unavailable on darwin/arm64
allctx Run commands on contexts in your kubeconfig no
apparmor-manager Manage AppArmor profiles for cluster. unavailable on darwin/arm64
...
安装插件:
kubectl krew install resource-capacity
升级插件:
kubectl krew upgrade
卸载插件
kubectl krew uninstall resource-capacity
resource-capacity
¶
resource-capacity
是一个非常实用的插件,安装以后就可以直接检查集群的资源分配情况( 注意:只是资源的request和limits配置 ),但是如果要检查集群的实时负载信息,依然需要在集群中部署 metrics-server 才能通过metrics采集进行查询
安装插件:
kubectl krew install resource-capacity
使用
resource-capacity
可以检查节点资源实用:
# --kubeconfig 可以指定使用的集群访问key,但是这个参数不能位于子命令 resource-capacity 之前
kubectl resource-capacity --kubeconfig kubeconfig.yaml
输出可以看到各个节点使用情况:
NODE CPU REQUESTS CPU LIMITS MEMORY REQUESTS MEMORY LIMITS
* 75432390m (102%) 196294010m (267%) 229016019Mi (64%) 396955850Mi (111%)
10psx52 22000m (68%) 85000m (265%) 45056Mi (35%) 149248Mi (116%)
11by422 32000m (100%) 83500m (260%) 256000Mi (99%) 302580Mi (117%)
11psx52 35000m (109%) 137500m (429%) 68608Mi (53%) 193024Mi (150%)
...
注意:默认输出实际上只有配置信息(request和limits) ,所以如果执行获取实时监控信息会报错,见下文:
在集群安装了 metrics-server 之后,就可以使用
--util
参数获取每个节点的报告:
# --kubeconfig 可以指定使用的集群访问key,但是这个参数不能位于子命令 resource-capacity 之前
kubectl resource-capacity --kubeconfig kubeconfig.yaml --util
输出案例如下:
NODE CPU REQUESTS CPU LIMITS CPU UTIL MEMORY REQUESTS MEMORY LIMITS MEMORY UTIL
* 12715m (79%) 130000m (812%) 2532m (15%) 18476Mi (19%) 76447Mi (79%) 15925Mi (16%)
y-k8s-m-1 2770m (69%) 27000m (675%) 852m (21%) 2519Mi (9%) 14359Mi (52%) 5977Mi (22%)
y-k8s-m-2 2810m (70%) 30700m (767%) 884m (22%) 3580Mi (13%) 19037Mi (71%) 4798Mi (18%)
y-k8s-m-3 3265m (81%) 27400m (685%) 797m (19%) 5074Mi (19%) 16645Mi (62%) 5151Mi (19%)
y-k8s-n-1 1975m (98%) 26400m (1320%) 0m (0%) 3394Mi (44%) 16313Mi (212%) 0Mi (0%)
y-k8s-n-2 1895m (94%) 18500m (925%) 0m (0%) 3912Mi (50%) 10093Mi (131%) 0Mi (0%)
备注
如果集群没有部署 metrics-server ,则 resource-capacity --util
会报错:
Error getting Pod Metrics: the server could not find the requested resource (get pods.metrics.k8s.io)
For this to work, metrics-server needs to be running in your cluster
--util
结合--pods
参数可以详细列出集群每个node节点上pod
的运行情况:
kubectl resource-capacity --util --pods
输出案例:
NODE NAMESPACE POD CPU REQUESTS CPU LIMITS CPU UTIL MEMORY REQUESTS MEMORY LIMITS MEMORY UTIL
* * * 12715m (79%) 130000m (812%) 2655m (16%) 18476Mi (19%) 76447Mi (79%) 15953Mi (16%)
y-k8s-m-1 * * 2770m (69%) 27000m (675%) 933m (23%) 2519Mi (9%) 14359Mi (52%) 6007Mi (22%)
y-k8s-m-1 knative-serving activator-57888f4455-6sbm4 400m (10%) 3000m (75%) 8m (0%) 188Mi (0%) 1624Mi (5%) 69Mi (0%)
y-k8s-m-1 kube-system calico-kube-controllers-6dfcdfb99-bdpxt 30m (0%) 1000m (25%) 3m (0%) 62Mi (0%) 245Mi (0%) 46Mi (0%)
y-k8s-m-1 kube-system calico-node-bcwb7 150m (3%) 300m (7%) 68m (1%) 62Mi (0%) 477Mi (1%) 129Mi (0%)
y-k8s-m-1 kubeflow centraldashboard-f966d7897-jrwzb 100m (2%) 2000m (50%) 6m (0%) 128Mi (0%) 1024Mi (3%) 116Mi (0%)
...
y-k8s-m-2 * * 2810m (70%) 30700m (767%) 873m (21%) 3580Mi (13%) 19037Mi (71%) 4790Mi (18%)
y-k8s-m-2 kubeflow admission-webhook-deployment-6d48f6f745-64wkz 0m (0%) 0m (0%) 1m (0%) 0Mi (0%) 0Mi (0%) 8Mi (0%)
y-k8s-m-2 kubeflow cache-server-6ff6f476c9-sbjdq 100m (2%) 2000m (50%) 5m (0%) 128Mi (0%) 1024Mi (3%) 64Mi (0%)
y-k8s-m-2 kube-system calico-node-m7mfs 150m (3%) 300m (7%) 55m (1%) 62Mi (0%) 477Mi (1%) 131Mi (0%)
y-k8s-m-2 cert-manager cert-manager-5d77b478-ttbxg 0m (0%) 0m (0%) 2m (0%) 0Mi (0%) 0Mi (0%) 22Mi (0%)
y-k8s-m-2 cert-manager cert-manager-cainjector-576655b654-m5tcs 0m (0%) 0m (0%) 2m (0%) 0Mi (0%) 0Mi (0%) 51Mi (0%)
...
排序功能¶
--sort
参数提供了一些可以按照某列进行排序的功能:
cpu.util
cpu.request
cpu.limit
mem.util
mem.request
mem.limit
name
当结合了 --util --pods
再加上排序是按照物理主机的累计进行排序,然后按照每个主机上的pods使用进行排序
kubectl resource-capacity --util --pods --sort cpu.util
输出案例:
NODE NAMESPACE POD CPU REQUESTS CPU LIMITS CPU UTIL MEMORY REQUESTS MEMORY LIMITS MEMORY UTIL
* * * 12715m (79%) 130000m (812%) 2480m (15%) 18476Mi (19%) 76447Mi (79%) 16868Mi (17%)
y-k8s-m-2 * * 2810m (70%) 30700m (767%) 873m (21%) 3580Mi (13%) 19037Mi (71%) 5160Mi (19%)
y-k8s-m-2 kube-system kube-apiserver-y-k8s-m-2 250m (6%) 0m (0%) 154m (3%) 0Mi (0%) 0Mi (0%) 1332Mi (5%)
y-k8s-m-2 kube-system calico-node-m7mfs 150m (3%) 300m (7%) 62m (1%) 62Mi (0%) 477Mi (1%) 132Mi (0%)
y-k8s-m-2 default productpage-v1-58b4c9bff8-ksq9s 100m (2%) 2000m (50%) 28m (0%) 128Mi (0%) 1024Mi (3%) 111Mi (0%)
...
y-k8s-m-1 * * 2770m (69%) 27000m (675%) 863m (21%) 2519Mi (9%) 14359Mi (52%) 6236Mi (23%)
y-k8s-m-1 istio-system prometheus-67f6764db9-ll7ch 0m (0%) 0m (0%) 217m (5%) 0Mi (0%) 0Mi (0%) 1221Mi (4%)
y-k8s-m-1 kube-system kube-apiserver-y-k8s-m-1 250m (6%) 0m (0%) 136m (3%) 0Mi (0%) 0Mi (0%) 1218Mi (4%)
y-k8s-m-1 kube-system calico-node-bcwb7 150m (3%) 300m (7%) 49m (1%) 62Mi (0%) 477Mi (1%) 132Mi (0%)
...
更深层的输出¶
我们知道 pod
可能会包含多个container,例如有 sidecar
, istio-proxy
等案例,此时我们可以结合 --pods --containers
让整个输出能够深入到容器级别进行负载分析,这对于集群定位故障非常有用:
kubectl resource-capacity --util --pods --containers --sort cpu.util
输出就可以看到每个pod中详细的容器资源使用(高亮了其中一个pod案例可以看到 istio-proxy
和 domainmapping-webhook
):
NODE NAMESPACE POD CONTAINER CPU REQUESTS CPU LIMITS CPU UTIL MEMORY REQUESTS MEMORY LIMITS MEMORY UTIL
* * * * 12715m (79%) 130000m (812%) 2394m (14%) 18476Mi (19%) 76447Mi (79%) 15746Mi (16%)
y-k8s-m-2 * * * 2810m (70%) 30700m (767%) 989m (24%) 3580Mi (13%) 19037Mi (71%) 5807Mi (21%)
y-k8s-m-2 istio-system prometheus-67f6764db9-8gwlb * 0m (0%) 0m (0%) 179m (4%) 0Mi (0%) 0Mi (0%) 964Mi (3%)
y-k8s-m-2 istio-system prometheus-67f6764db9-8gwlb prometheus-server 0m (0%) 0m (0%) 179m (4%) 0Mi (0%) 0Mi (0%) 962Mi (3%)
y-k8s-m-2 istio-system prometheus-67f6764db9-8gwlb prometheus-server-configmap-reload 0m (0%) 0m (0%) 1m (0%) 0Mi (0%) 0Mi (0%) 3Mi (0%)
y-k8s-m-2 kube-system kube-apiserver-y-k8s-m-2 * 250m (6%) 0m (0%) 155m (3%) 0Mi (0%) 0Mi (0%) 1005Mi (3%)
y-k8s-m-2 kube-system kube-apiserver-y-k8s-m-2 kube-apiserver 250m (6%) 0m (0%) 155m (3%) 0Mi (0%) 0Mi (0%) 1005Mi (3%)
y-k8s-m-2 kube-system calico-node-m7mfs * 150m (3%) 300m (7%) 74m (1%) 62Mi (0%) 477Mi (1%) 132Mi (0%)
y-k8s-m-2 kube-system calico-node-m7mfs calico-node 150m (3%) 300m (7%) 74m (1%) 62Mi (0%) 477Mi (1%) 132Mi (0%)
y-k8s-m-2 default productpage-v1-58b4c9bff8-ksq9s * 100m (2%) 2000m (50%) 34m (0%) 128Mi (0%) 1024Mi (3%) 112Mi (0%)
y-k8s-m-2 default productpage-v1-58b4c9bff8-ksq9s productpage 0m (0%) 0m (0%) 22m (0%) 0Mi (0%) 0Mi (0%) 60Mi (0%)
y-k8s-m-2 default productpage-v1-58b4c9bff8-ksq9s istio-proxy 100m (2%) 2000m (50%) 13m (0%) 128Mi (0%) 1024Mi (3%) 52Mi (0%)
y-k8s-m-2 knative-serving domainmapping-webhook-758fbc96c6-4bkqk * 200m (5%) 2500m (62%) 28m (0%) 228Mi (0%) 1524Mi (5%) 74Mi (0%)
y-k8s-m-2 knative-serving domainmapping-webhook-758fbc96c6-4bkqk istio-proxy 100m (2%) 2000m (50%) 16m (0%) 128Mi (0%) 1024Mi (3%) 57Mi (0%)
y-k8s-m-2 knative-serving domainmapping-webhook-758fbc96c6-4bkqk domainmapping-webhook 100m (2%) 500m (12%) 12m (0%) 100Mi (0%) 500Mi (1%) 18Mi (0%)
...
按照 namespace
和 label
进行过滤¶
可以通过
-n
参数指定 namespace 过滤,例如,这里按照kubeflow
的namespace过滤:
kubectl resource-capacity --util -n kube-system --pods --containers --sort cpu.util
支持以下不同级别labels:
‐‐pod-labels
- Pod Level Labels‐‐namespace-labels
- Labels used at the Namespace Level‐‐node-labels
- Labels used at the node level
支持输出格式¶
默认输出的是 table
格式,其他还支持 yaml
和 json
格式。命令行使用 -o
参数支持:
yaml
json
table