kube-prometheus-stack
添加Prometheus scrape配置¶
备注
本文记录一些配置案例,逐步完善。详细的解析会在相应的文档中完善
在 在Kubernetes集群配置Kube State Metrics 和 在Kubernetes集群(z-k8s)部署集成GPU监控的Prometheus和Grafana 过程中,逐步添加了一些附加监控项,也就是在 prometheus
中实现一些自定义 metrics
抓取:
在Kuternetes集成GPU可观测能力 添加一段抓取 DCGM-Exporter 的GPU信息
采集 阿里云Prometheus监控产品
staragent
Agent输出的GPU metrics采集公司自研Agent输出的服务器metrics
## AdditionalScrapeConfigs allows specifying additional Prometheus scrape configurations. Scrape configurations
## are appended to the configurations generated by the Prometheus Operator. Job configurations must have the form
## as specified in the official Prometheus documentation:
## https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config. As scrape configs are
## appended, the user is responsible to make sure it is valid. Note that using this feature may expose the possibility
## to break upgrades of Prometheus. It is advised to review Prometheus release notes to ensure that no incompatible
## scrape configs are going to break Prometheus after the upgrade.
## AdditionalScrapeConfigs can be defined as a list or as a templated string.
##
## The scrape configuration example below will find master nodes, provided they have the name .*mst.*, relabel the
## port to 2379 and allow etcd scraping provided it is running on all Kubernetes master nodes
##
additionalScrapeConfigs:
- job_name: gpu-metrics
scrape_interval: 1s
metrics_path: /metrics
scheme: http
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- nvidia-gpu
relabel_configs:
- source_labels: [__meta_kubernetes_pod_node_name]
action: replace
target_label: kubernetes_node
- job_name: starship-gpu-metrics
scrape_interval: 1s
metrics_path: /metrics
scheme: http
kubernetes_sd_configs:
- role: node
relabel_configs:
- source_labels: [__meta_kubernetes_node_label_starship]
action: keep
regex: dcgm
- source_labels: [__meta_kubernetes_node_address_InternalIP]
regex: (.+)
target_label: __address__
replacement: ${1}:9273
- job_name: agent-node-gpu
scheme: http
kubernetes_sd_configs:
- role: node
relabel_configs:
- source_labels: [__meta_kubernetes_node_address_InternalIP]
target_label: instance
- source_labels: [__meta_kubernetes_node_name]
target_label: nodename
- source_labels: [__meta_kubernetes_node_label_kernel]
target_label: kernel
- source_labels: [__meta_kubernetes_node_address_InternalIP]
regex: (.+)
target_label: __address__
replacement: ${1}:9199
- target_label: __metrics_path__
replacement: /metrics/gpu