Krew快速起步

在完成了 安装Krew 之后,可以安装管理各种插件,方便运维:

  • 下载插件list:

下载krew插件列表
kubectl krew update
  • 探索可用的krew插件:

查看krew可用插件
kubectl krew search

输出类似:

查看krew可用插件输出案例
NAME                            DESCRIPTION                                         INSTALLED
access-matrix                   Show an RBAC access matrix for server resources     no
accurate                        Manage Accurate, a multi-tenancy controller         no
advise-policy                   Suggests PodSecurityPolicies and OPA Policies f...  unavailable on darwin/arm64
advise-psp                      Suggests PodSecurityPolicies for cluster.           unavailable on darwin/arm64
aks                             Interact with and debug AKS clusters                unavailable on darwin/arm64
allctx                          Run commands on contexts in your kubeconfig         no
apparmor-manager                Manage AppArmor profiles for cluster.               unavailable on darwin/arm64
...
  • 安装插件:

使用 krew 安装 resource-capacity 插件
kubectl krew install resource-capacity
  • 升级插件:

使用 krew 升级安装的插件
kubectl krew upgrade
  • 卸载插件

使用 krew 卸载插件
kubectl krew uninstall resource-capacity

resource-capacity

resource-capacity 是一个非常实用的插件,安装以后就可以直接检查集群的资源分配情况( 注意:只是资源的request和limits配置 ),但是如果要检查集群的实时负载信息,依然需要在集群中部署 metrics-server 才能通过metrics采集进行查询

  • 安装插件:

使用 krew 安装 resource-capacity 插件
kubectl krew install resource-capacity
  • 使用 resource-capacity 可以检查节点资源实用:

使用 resource-capacity 检查集群的资源使用
# --kubeconfig 可以指定使用的集群访问key,但是这个参数不能位于子命令 resource-capacity 之前
kubectl resource-capacity --kubeconfig kubeconfig.yaml 

输出可以看到各个节点使用情况:

使用 resource-capacity 检查集群的资源使用输出案例
NODE                   CPU REQUESTS       CPU LIMITS          MEMORY REQUESTS     MEMORY LIMITS
*                      75432390m (102%)   196294010m (267%)   229016019Mi (64%)   396955850Mi (111%)
10psx52                22000m (68%)       85000m (265%)       45056Mi (35%)       149248Mi (116%)
11by422                32000m (100%)      83500m (260%)       256000Mi (99%)      302580Mi (117%)
11psx52                35000m (109%)      137500m (429%)      68608Mi (53%)       193024Mi (150%)
...

注意:默认输出实际上只有配置信息(request和limits) ,所以如果执行获取实时监控信息会报错,见下文:

  • 在集群安装了 metrics-server 之后,就可以使用 --util 参数获取每个节点的报告:

resource-capacity 提供对当前集群的运行使用状态统计 --util
# --kubeconfig 可以指定使用的集群访问key,但是这个参数不能位于子命令 resource-capacity 之前
kubectl resource-capacity --kubeconfig kubeconfig.yaml --util

输出案例如下:

resource-capacity 提供对当前集群的运行使用状态统计 --util 可以观察到节点的使用情况
NODE        CPU REQUESTS   CPU LIMITS       CPU UTIL      MEMORY REQUESTS   MEMORY LIMITS    MEMORY UTIL
*           12715m (79%)   130000m (812%)   2532m (15%)   18476Mi (19%)     76447Mi (79%)    15925Mi (16%)
y-k8s-m-1   2770m (69%)    27000m (675%)    852m (21%)    2519Mi (9%)       14359Mi (52%)    5977Mi (22%)
y-k8s-m-2   2810m (70%)    30700m (767%)    884m (22%)    3580Mi (13%)      19037Mi (71%)    4798Mi (18%)
y-k8s-m-3   3265m (81%)    27400m (685%)    797m (19%)    5074Mi (19%)      16645Mi (62%)    5151Mi (19%)
y-k8s-n-1   1975m (98%)    26400m (1320%)   0m (0%)       3394Mi (44%)      16313Mi (212%)   0Mi (0%)
y-k8s-n-2   1895m (94%)    18500m (925%)    0m (0%)       3912Mi (50%)      10093Mi (131%)   0Mi (0%)

备注

如果集群没有部署 metrics-server ,则 resource-capacity --util 会报错:

Error getting Pod Metrics: the server could not find the requested resource (get pods.metrics.k8s.io)
For this to work, metrics-server needs to be running in your cluster
  • --util 结合 --pods 参数可以详细列出集群每个node节点上 pod 的运行情况:

resource-capacity 结合使用 --util --pods 可以输出所有节点的上pods的运行负载
kubectl resource-capacity --util --pods

输出案例:

resource-capacity 结合使用 --util --pods 可以输出所有节点的上pods的运行负载
NODE        NAMESPACE                   POD                                                               CPU REQUESTS   CPU LIMITS       CPU UTIL      MEMORY REQUESTS   MEMORY LIMITS    MEMORY UTIL
*           *                           *                                                                 12715m (79%)   130000m (812%)   2655m (16%)   18476Mi (19%)     76447Mi (79%)    15953Mi (16%)

y-k8s-m-1   *                           *                                                                 2770m (69%)    27000m (675%)    933m (23%)    2519Mi (9%)       14359Mi (52%)    6007Mi (22%)
y-k8s-m-1   knative-serving             activator-57888f4455-6sbm4                                        400m (10%)     3000m (75%)      8m (0%)       188Mi (0%)        1624Mi (5%)      69Mi (0%)
y-k8s-m-1   kube-system                 calico-kube-controllers-6dfcdfb99-bdpxt                           30m (0%)       1000m (25%)      3m (0%)       62Mi (0%)         245Mi (0%)       46Mi (0%)
y-k8s-m-1   kube-system                 calico-node-bcwb7                                                 150m (3%)      300m (7%)        68m (1%)      62Mi (0%)         477Mi (1%)       129Mi (0%)
y-k8s-m-1   kubeflow                    centraldashboard-f966d7897-jrwzb                                  100m (2%)      2000m (50%)      6m (0%)       128Mi (0%)        1024Mi (3%)      116Mi (0%)
...
y-k8s-m-2   *                           *                                                                 2810m (70%)    30700m (767%)    873m (21%)    3580Mi (13%)      19037Mi (71%)    4790Mi (18%)
y-k8s-m-2   kubeflow                    admission-webhook-deployment-6d48f6f745-64wkz                     0m (0%)        0m (0%)          1m (0%)       0Mi (0%)          0Mi (0%)         8Mi (0%)
y-k8s-m-2   kubeflow                    cache-server-6ff6f476c9-sbjdq                                     100m (2%)      2000m (50%)      5m (0%)       128Mi (0%)        1024Mi (3%)      64Mi (0%)
y-k8s-m-2   kube-system                 calico-node-m7mfs                                                 150m (3%)      300m (7%)        55m (1%)      62Mi (0%)         477Mi (1%)       131Mi (0%)
y-k8s-m-2   cert-manager                cert-manager-5d77b478-ttbxg                                       0m (0%)        0m (0%)          2m (0%)       0Mi (0%)          0Mi (0%)         22Mi (0%)
y-k8s-m-2   cert-manager                cert-manager-cainjector-576655b654-m5tcs                          0m (0%)        0m (0%)          2m (0%)       0Mi (0%)          0Mi (0%)         51Mi (0%)
...

排序功能

--sort 参数提供了一些可以按照某列进行排序的功能:

cpu.util
cpu.request
cpu.limit
mem.util
mem.request
mem.limit
name

当结合了 --util --pods 再加上排序是按照物理主机的累计进行排序,然后按照每个主机上的pods使用进行排序

resource-capacity 结合使用 --util --pods 可以输出所有节点的上pods的运行负载
kubectl resource-capacity --util --pods --sort cpu.util

输出案例:

resource-capacity 结合使用 --util --pods --sort cpu.util 可以输出所有节点的上pods的运行负载
NODE        NAMESPACE                   POD                                                               CPU REQUESTS   CPU LIMITS       CPU UTIL      MEMORY REQUESTS   MEMORY LIMITS    MEMORY UTIL
*           *                           *                                                                 12715m (79%)   130000m (812%)   2480m (15%)   18476Mi (19%)     76447Mi (79%)    16868Mi (17%)

y-k8s-m-2   *                           *                                                                 2810m (70%)    30700m (767%)    873m (21%)    3580Mi (13%)      19037Mi (71%)    5160Mi (19%)
y-k8s-m-2   kube-system                 kube-apiserver-y-k8s-m-2                                          250m (6%)      0m (0%)          154m (3%)     0Mi (0%)          0Mi (0%)         1332Mi (5%)
y-k8s-m-2   kube-system                 calico-node-m7mfs                                                 150m (3%)      300m (7%)        62m (1%)      62Mi (0%)         477Mi (1%)       132Mi (0%)
y-k8s-m-2   default                     productpage-v1-58b4c9bff8-ksq9s                                   100m (2%)      2000m (50%)      28m (0%)      128Mi (0%)        1024Mi (3%)      111Mi (0%)
...
y-k8s-m-1   *                           *                                                                 2770m (69%)    27000m (675%)    863m (21%)    2519Mi (9%)       14359Mi (52%)    6236Mi (23%)
y-k8s-m-1   istio-system                prometheus-67f6764db9-ll7ch                                       0m (0%)        0m (0%)          217m (5%)     0Mi (0%)          0Mi (0%)         1221Mi (4%)
y-k8s-m-1   kube-system                 kube-apiserver-y-k8s-m-1                                          250m (6%)      0m (0%)          136m (3%)     0Mi (0%)          0Mi (0%)         1218Mi (4%)
y-k8s-m-1   kube-system                 calico-node-bcwb7                                                 150m (3%)      300m (7%)        49m (1%)      62Mi (0%)         477Mi (1%)       132Mi (0%)
...

更深层的输出

我们知道 pod 可能会包含多个container,例如有 sidecar , istio-proxy 等案例,此时我们可以结合 --pods --containers 让整个输出能够深入到容器级别进行负载分析,这对于集群定位故障非常有用:

resource-capacity 结合使用 --pods --containers 可以展示出更为细节的每个pods的容器负载情况
kubectl resource-capacity --util --pods --containers --sort cpu.util

输出就可以看到每个pod中详细的容器资源使用(高亮了其中一个pod案例可以看到 istio-proxydomainmapping-webhook ):

resource-capacity 结合使用 --pods --containers 输出中可以看到容器负载
NODE        NAMESPACE                   POD                                                               CONTAINER                            CPU REQUESTS   CPU LIMITS       CPU UTIL      MEMORY REQUESTS   MEMORY LIMITS    MEMORY UTIL
*           *                           *                                                                 *                                    12715m (79%)   130000m (812%)   2394m (14%)   18476Mi (19%)     76447Mi (79%)    15746Mi (16%)

y-k8s-m-2   *                           *                                                                 *                                    2810m (70%)    30700m (767%)    989m (24%)    3580Mi (13%)      19037Mi (71%)    5807Mi (21%)
y-k8s-m-2   istio-system                prometheus-67f6764db9-8gwlb                                       *                                    0m (0%)        0m (0%)          179m (4%)     0Mi (0%)          0Mi (0%)         964Mi (3%)
y-k8s-m-2   istio-system                prometheus-67f6764db9-8gwlb                                       prometheus-server                    0m (0%)        0m (0%)          179m (4%)     0Mi (0%)          0Mi (0%)         962Mi (3%)
y-k8s-m-2   istio-system                prometheus-67f6764db9-8gwlb                                       prometheus-server-configmap-reload   0m (0%)        0m (0%)          1m (0%)       0Mi (0%)          0Mi (0%)         3Mi (0%)
y-k8s-m-2   kube-system                 kube-apiserver-y-k8s-m-2                                          *                                    250m (6%)      0m (0%)          155m (3%)     0Mi (0%)          0Mi (0%)         1005Mi (3%)
y-k8s-m-2   kube-system                 kube-apiserver-y-k8s-m-2                                          kube-apiserver                       250m (6%)      0m (0%)          155m (3%)     0Mi (0%)          0Mi (0%)         1005Mi (3%)
y-k8s-m-2   kube-system                 calico-node-m7mfs                                                 *                                    150m (3%)      300m (7%)        74m (1%)      62Mi (0%)         477Mi (1%)       132Mi (0%)
y-k8s-m-2   kube-system                 calico-node-m7mfs                                                 calico-node                          150m (3%)      300m (7%)        74m (1%)      62Mi (0%)         477Mi (1%)       132Mi (0%)
y-k8s-m-2   default                     productpage-v1-58b4c9bff8-ksq9s                                   *                                    100m (2%)      2000m (50%)      34m (0%)      128Mi (0%)        1024Mi (3%)      112Mi (0%)
y-k8s-m-2   default                     productpage-v1-58b4c9bff8-ksq9s                                   productpage                          0m (0%)        0m (0%)          22m (0%)      0Mi (0%)          0Mi (0%)         60Mi (0%)
y-k8s-m-2   default                     productpage-v1-58b4c9bff8-ksq9s                                   istio-proxy                          100m (2%)      2000m (50%)      13m (0%)      128Mi (0%)        1024Mi (3%)      52Mi (0%)
y-k8s-m-2   knative-serving             domainmapping-webhook-758fbc96c6-4bkqk                            *                                    200m (5%)      2500m (62%)      28m (0%)      228Mi (0%)        1524Mi (5%)      74Mi (0%)
y-k8s-m-2   knative-serving             domainmapping-webhook-758fbc96c6-4bkqk                            istio-proxy                          100m (2%)      2000m (50%)      16m (0%)      128Mi (0%)        1024Mi (3%)      57Mi (0%)
y-k8s-m-2   knative-serving             domainmapping-webhook-758fbc96c6-4bkqk                            domainmapping-webhook                100m (2%)      500m (12%)       12m (0%)      100Mi (0%)        500Mi (1%)       18Mi (0%)
...

按照 namespacelabel 进行过滤

  • 可以通过 -n 参数指定 namespace 过滤,例如,这里按照 kubeflow 的namespace过滤:

resource-capacity 结合使用 -n <namespace> --pods --containers 检查指定namespace(这里是 kube-system )中的资源使用
kubectl resource-capacity --util -n kube-system --pods --containers --sort cpu.util
  • 支持以下不同级别labels:

    • ‐‐pod-labels - Pod Level Labels

    • ‐‐namespace-labels - Labels used at the Namespace Level

    • ‐‐node-labels - Labels used at the node level

支持输出格式

默认输出的是 table 格式,其他还支持 yamljson 格式。命令行使用 -o 参数支持:

  • yaml

  • json

  • table

参考