kube-prometheus-stack 添加Prometheus scrape配置

备注

本文记录一些配置案例,逐步完善。详细的解析会在相应的文档中完善

在Kubernetes集群配置Kube State Metrics在Kubernetes集群(z-k8s)部署集成GPU监控的Prometheus和Grafana 过程中,逐步添加了一些附加监控项,也就是在 prometheus 中实现一些自定义 metrics 抓取:

kube-prometheus-stack 添加 additionalScrapeConfigs
    ## AdditionalScrapeConfigs allows specifying additional Prometheus scrape configurations. Scrape configurations
    ## are appended to the configurations generated by the Prometheus Operator. Job configurations must have the form
    ## as specified in the official Prometheus documentation:
    ## https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config. As scrape configs are
    ## appended, the user is responsible to make sure it is valid. Note that using this feature may expose the possibility
    ## to break upgrades of Prometheus. It is advised to review Prometheus release notes to ensure that no incompatible
    ## scrape configs are going to break Prometheus after the upgrade.
    ## AdditionalScrapeConfigs can be defined as a list or as a templated string.
    ##
    ## The scrape configuration example below will find master nodes, provided they have the name .*mst.*, relabel the
    ## port to 2379 and allow etcd scraping provided it is running on all Kubernetes master nodes
    ##
    additionalScrapeConfigs:
    - job_name: gpu-metrics
      scrape_interval: 1s
      metrics_path: /metrics
      scheme: http
      kubernetes_sd_configs:
      - role: endpoints
        namespaces:
          names:
          - nvidia-gpu
      relabel_configs:
      - source_labels: [__meta_kubernetes_pod_node_name]
        action: replace
        target_label: kubernetes_node
    - job_name: starship-gpu-metrics
      scrape_interval: 1s
      metrics_path: /metrics
      scheme: http
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
      - source_labels: [__meta_kubernetes_node_label_starship]
        action: keep
        regex: dcgm
      - source_labels: [__meta_kubernetes_node_address_InternalIP]
        regex: (.+)
        target_label: __address__
        replacement: ${1}:9273
    - job_name: agent-node-gpu
      scheme: http
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
      - source_labels: [__meta_kubernetes_node_address_InternalIP]
        target_label: instance
      - source_labels: [__meta_kubernetes_node_name]
        target_label: nodename
      - source_labels: [__meta_kubernetes_node_label_kernel]
        target_label: kernel
      - source_labels: [__meta_kubernetes_node_address_InternalIP]
        regex: (.+)
        target_label: __address__
        replacement: ${1}:9199
      - target_label: __metrics_path__
        replacement: /metrics/gpu

参考