更新Kubernetes集群的Prometheus配置

备注

使用Helm 3在Kubernetes集群部署Prometheus和Grafana 中部署 DCGM-Exporter 管理GPU监控,需要修订Prometheus配置来抓取特定节点和端口metrics,需要修订Prometheus配置。

对于采用Prometheus Operator(例如 使用Helm 3在Kubernetes集群部署Prometheus和Grafana 就是采用 kube-prometheus-stack helm chart完成部署)构建Prometheus监控堆栈之后,可以通过设置附加的scrape配置来监控自己的自定义服务。

所谓的附加的scrape配置(additional scrape config)就是使用 正则表达式 来找寻匹配的服务,并根据 标签( label ), 注释( annotation ),命名空间( namespace )或命名( name )来定位一组服务。

附加的scrape配置(additional scrape config)是一项功能强大的底层操作,需要在 helm chart values的 prometheusSpec 部分 或者 Prometheus配置文件 中指定。所以非常适合实现一个平台界别的监控极致,而且可以控制Prometheus的安装和配置设置。

备注

本文实践是在我的 使用Helm 3在Kubernetes集群部署Prometheus和Grafana 基础上完成,目标是 在Kuternetes集成GPU可观测能力

使用Helm 3在Kubernetes集群部署Prometheus和Grafana 没有包含 在Kuternetes集成GPU可观测能力 针对GPU监控所需的 additionalScrapeConfigs 所以 prometheus 此时无法抓取 9400 端口的GPU metrics 。以下是 在Kuternetes集成GPU可观测能力 所要求的 configMap :

configMap 配置 additionalScrapeConfigs 添加 gpu-metrics
# AdditionalScrapeConfigs allows specifying additional Prometheus scrape configurations. Scrape configurations
# are appended to the configurations generated by the Prometheus Operator. Job configurations must have the form
# as specified in the official Prometheus documentation:
# https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config. As scrape configs are
# appended, the user is responsible to make sure it is valid. Note that using this feature may expose the possibility
# to break upgrades of Prometheus. It is advised to review Prometheus release notes to ensure that no incompatible
# scrape configs are going to break Prometheus after the upgrade.
#
# The scrape configuration example below will find master nodes, provided they have the name .*mst.*, relabel the
# port to 2379 and allow etcd scraping provided it is running on all Kubernetes master nodes
#
additionalScrapeConfigs:
- job_name: gpu-metrics
  scrape_interval: 1s
  metrics_path: /metrics
  scheme: http
  kubernetes_sd_configs:
  - role: endpoints
    namespaces:
      names:
      - gpu-operator
  relabel_configs:
  - source_labels: [__meta_kubernetes_pod_node_name]
    action: replace
    target_label: kubernetes_node
  • 获取当前 Prometheus helm :

获取当前已经安装的prometheus helm
helm list -A | grep prometheus

输出信息:

获取当前已经安装的prometheus helm 输出信息
stable                      default     1           2023-03-29 22:10:58.12684326 +0800 CST  deployedkube-prometheus-stack-45.8.0v0.6        3.0

dcgm-exporter 部署情况

参考 how prometheus get dcgm-exporter metrics? #106dcgm-exporter 进行剖析:

  • Namespace

  • DaemonSet

  • Service

  • ServiceMonitor

  • ServiceAccount

  • 观察 在Kuternetes集成GPU可观测能力 部署的 service ,其中有 dcgm-exporter-1680364448 ,检查内容:

    kubectl get svc dcgm-exporter-1680364448 -o yaml
    

输出 dcgm-exporter service配置:

kubectl get svc dcgm-exporter -o yaml 输出
apiVersion: v1
kind: Service
metadata:
  annotations:
    meta.helm.sh/release-name: dcgm-exporter-1680364448
    meta.helm.sh/release-namespace: default
  creationTimestamp: "2023-04-01T15:54:13Z"
  labels:
    app.kubernetes.io/component: dcgm-exporter
    app.kubernetes.io/instance: dcgm-exporter-1680364448
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: dcgm-exporter
    app.kubernetes.io/version: 2.6.10
    helm.sh/chart: dcgm-exporter-2.6.10
  managedFields:
  - apiVersion: v1
    fieldsType: FieldsV1
...
    manager: helm
    operation: Update
    time: "2023-04-01T15:54:13Z"
  name: dcgm-exporter-1680364448
  namespace: default
  resourceVersion: "6314410"
  selfLink: /api/v1/namespaces/default/services/dcgm-exporter-1680364448
  uid: fef9c429-4c9f-418b-ae62-c8012efc577b
spec:
  clusterIP: 10.233.18.35
  ports:
  - name: metrics
    port: 9400
    protocol: TCP
    targetPort: 9400
  selector:
    app.kubernetes.io/instance: dcgm-exporter-1680364448
    app.kubernetes.io/name: dcgm-exporter
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}
  • 观察 dcgm-exporter daemonset配置:

    kubectl get ds dcgm-exporter-1680364448 -o yaml
    
kubectl get ds dcgm-exporter -o yaml 输出
apiVersion: apps/v1
kind: DaemonSet
metadata:
  annotations:
    deprecated.daemonset.template.generation: "1"
    meta.helm.sh/release-name: dcgm-exporter-1680364448
    meta.helm.sh/release-namespace: default
  creationTimestamp: "2023-04-01T15:54:13Z"
  generation: 1
  labels:
    app.kubernetes.io/component: dcgm-exporter
    app.kubernetes.io/instance: dcgm-exporter-1680364448
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: dcgm-exporter
    app.kubernetes.io/version: 2.6.10
    helm.sh/chart: dcgm-exporter-2.6.10
  managedFields:
  - apiVersion: apps/v1
    fieldsType: FieldsV1
    fieldsV1:
...
    manager: helm
    operation: Update
    time: "2023-04-01T15:54:13Z"
  - apiVersion: apps/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:status:
        f:currentNumberScheduled: {}
        f:desiredNumberScheduled: {}
        f:numberAvailable: {}
        f:numberMisscheduled: {}
        f:numberReady: {}
        f:numberUnavailable: {}
        f:observedGeneration: {}
        f:updatedNumberScheduled: {}
    manager: kube-controller-manager
    operation: Update
    time: "2023-04-02T09:48:53Z"
  name: dcgm-exporter-1680364448
  namespace: default
  resourceVersion: "6988330"
  selfLink: /apis/apps/v1/namespaces/default/daemonsets/dcgm-exporter-1680364448
  uid: 43010398-556f-4db0-9d2a-4b544cc6d318
spec:
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app.kubernetes.io/component: dcgm-exporter
      app.kubernetes.io/instance: dcgm-exporter-1680364448
      app.kubernetes.io/name: dcgm-exporter
  template:
    metadata:
      creationTimestamp: null
      labels:
        app.kubernetes.io/component: dcgm-exporter
        app.kubernetes.io/instance: dcgm-exporter-1680364448
        app.kubernetes.io/name: dcgm-exporter
    spec:
      containers:
      - args:
        - -f
        - /etc/dcgm-exporter/dcp-metrics-included.csv
        env:
        - name: DCGM_EXPORTER_KUBERNETES
          value: "true"
        - name: DCGM_EXPORTER_LISTEN
          value: :9400
        image: nvcr.io/nvidia/k8s/dcgm-exporter:2.4.6-2.6.10-ubuntu20.04
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /health
            port: 9400
            scheme: HTTP
          initialDelaySeconds: 45
          periodSeconds: 5
          successThreshold: 1
          timeoutSeconds: 1
        name: exporter
        ports:
        - containerPort: 9400
          name: metrics
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /health
            port: 9400
            scheme: HTTP
          initialDelaySeconds: 45
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        resources: {}
        securityContext:
          capabilities:
            add:
            - SYS_ADMIN
          runAsNonRoot: false
          runAsUser: 0
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /var/lib/kubelet/pod-resources
          name: pod-gpu-resources
          readOnly: true
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: dcgm-exporter-1680364448
      serviceAccountName: dcgm-exporter-1680364448
      terminationGracePeriodSeconds: 30
      volumes:
      - hostPath:
          path: /var/lib/kubelet/pod-resources
          type: ""
        name: pod-gpu-resources
  updateStrategy:
    rollingUpdate:
      maxUnavailable: 1
    type: RollingUpdate
...

更新 helm 部署

helm inspect values 输出Prometheus Stack的chart变量值
helm inspect values prometheus-community/kube-prometheus-stack > kube-prometheus-stack.values
  • configMap 配置 additionalScrapeConfigs 添加 gpu-metrics :

configMap 配置 additionalScrapeConfigs 添加 gpu-metrics (namespace由于部署原因设为default)
# AdditionalScrapeConfigs allows specifying additional Prometheus scrape configurations. Scrape configurations
# are appended to the configurations generated by the Prometheus Operator. Job configurations must have the form
# as specified in the official Prometheus documentation:
# https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config. As scrape configs are
# appended, the user is responsible to make sure it is valid. Note that using this feature may expose the possibility
# to break upgrades of Prometheus. It is advised to review Prometheus release notes to ensure that no incompatible
# scrape configs are going to break Prometheus after the upgrade.
#
# The scrape configuration example below will find master nodes, provided they have the name .*mst.*, relabel the
# port to 2379 and allow etcd scraping provided it is running on all Kubernetes master nodes
#
additionalScrapeConfigs:
- job_name: gpu-metrics
  scrape_interval: 1s
  metrics_path: /metrics
  scheme: http
  kubernetes_sd_configs:
  - role: endpoints
    namespaces:
      names:
      - default
  relabel_configs:
  - source_labels: [__meta_kubernetes_pod_node_name]
    action: replace
    target_label: kubernetes_node
  • 更新:

使用 helm upgrade prometheus-community/kube-prometheus-stack
helm upgrade kube-prometheus-stack-1681228346 prometheus-community/kube-prometheus-stack \
  --namespace prometheus --values kube-prometheus-stack.values

我这里遇到一个报错,是因为我忘记我已经在 使用Helm 3在Kubernetes集群部署Prometheus和Grafana 步骤中修订了 prometheus / grafana 等 service,已经将 ClusterIP 修订为 NodePort ,所以提示错误(忽略):

Error: UPGRADE FAILED: cannot patch "stable-grafana" with kind Service: Service "stable-grafana" is invalid: spec.ports[0].nodePort: Forbidden: may not be used when `type` is 'ClusterIP' && cannot patch "stable-kube-prometheus-sta-alertmanager" with kind Service: Service "stable-kube-prometheus-sta-alertmanager" is invalid: spec.ports[0].nodePort: Forbidden: may not be used when `type` is 'ClusterIP' && cannot patch "stable-kube-prometheus-sta-prometheus" with kind Service:
Service "stable-kube-prometheus-sta-prometheus" is invalid: spec.ports[0].nodePort: Forbidden: may not be used when `type` is 'ClusterIP'

备注

更新helm需要2个参数: [RELEASE] [CHART] ,否则会报错:

Error: "helm upgrade" requires 2 arguments

Usage:  helm upgrade [RELEASE] [CHART] [flags]

不过,比较反直觉,这里 [RELEASE] 需要使用 helm listNAME ,而 [CHART] 则使用 repo_name/path_to_chart 格式,使用 prometheus-community/kube-prometheus-stack ,但不是 prometheus-community/kube-prometheus-stack-45.9.1

备注

helm upgrade 会再次拉取软件包,例如 Get "https://github.com/prometheus-community/helm-charts/releases/download/kube-prometheus-stack-45.9.1/kube-prometheus-stack-45.9.1.tgz" ,所以这个方法很沉重。我后续会找更好的更新方法

执行完成后,就可以在 Grafana通用可视分析平台 面板看到GPU数据已经采集

问题

  • 对于主机采用了复杂的网络接口( Calico网络 多个网口), 上述抓取获得的主机实例IP地址可能不是想要的接口IP,实践发现如果 DCGM-Exporter DaemonSet 采用 Kubernetes hostNetwork 则能固定获得主机IP地址,也就比较清晰反映了部署情况

参考