kube-prometheus-stack 扩展运行参数( extraArgs )

在使用 helm 完成 在Kubernetes集群(z-k8s)部署集成GPU监控的Prometheus和Grafana ,有一个需求是定制 kube-state-metrics (KSM) 运行参数:

定制 kube-state-metrics 运行参数
...
spec:
  ...
  template:
    ...
    spec:
      containers:
      - args:
        - --port=8080
        - --resources=certificatesigningrequests,configmaps,cronjobs...
        - --metric-labels-allowlist=nodes=[infra.cloud-atlas/node-ip,machine.cloud-atlas.io/biz-name,k8s.cloud-atlas.io/arch],pods=[sync.k8s.cloud-atlas.io/resource-type,custom.cloud-atlas.io/runtime-class]
        image: registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.8.2
        imagePullPolicy: IfNotPresent
        ...

虽然可以通过 kubectl -n prometheus edit deploy kube-prometheus-stack-1681228346-kube-state-metrics 直接修订添加 --metric-labels-allowlist 运行参数,但是如果执行 更新Kubernetes集群的Prometheus配置 就会被刷掉,所以我们需要固化参数。

仔细检查 kube-prometheus-stack.values 可以看到在 prometheus-node-exporter 这个 subchart 有定制运行参数的配置:

kube-prometheus-stack.valuesprometheus-node-exporter extraArgs 参数
## Configuration for prometheus-node-exporter subchart
##
prometheus-node-exporter:
  namespaceOverride: ""
  podLabels:
    ## Add the 'node-exporter' label to be used by serviceMonitor to match standard common usage in rules and grafana dashboards
    ##
    jobLabel: node-exporter
  releaseLabel: true
  extraArgs:
    - --collector.filesystem.mount-points-exclude=^/(dev|proc|sys|var/lib/docker/.+|var/lib/kubelet/.+)($|/)
    - --collector.filesystem.fs-types-exclude=^(autofs|binfmt_misc|bpf|cgroup2?|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|iso9660|mqueue|nsfs|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|selinuxfs|squashfs|sysfs|tracefs)$
  service:
    portName: http-metrics
  ...

原来 kube-prometheus-stack.values 每个 subchart 都可以采用类似方法定制pod中镜像运行参数(多个container该怎么搞?)

参考 Using prometheus-community helm chart how can I expose custom pod labels 做如下定制:

定制 kube-prometheus-stack.valueskube-state-metrics extraArgs 参数
## Configuration for kube-state-metrics subchart
##
kube-state-metrics:
  namespaceOverride: ""
  rbac:
    create: true
  releaseLabel: true
  extraArgs:
    - --metric-labels-allowlist=nodes=[infra.cloud-atlas/node-ip,machine.cloud-atlas.io/biz-name,k8s.cloud-atlas.io/arch]<Plug>PeepOpenods=[sync.k8s.cloud-atlas.io/resource-type,custom.cloud-atlas.io/runtime-class]
  prometheus:
    monitor:
      enabled: true 
...

然后执行 更新Kubernetes集群的Prometheus配置 :

使用 helm upgrade prometheus-community/kube-prometheus-stack
helm upgrade kube-prometheus-stack-1681228346 prometheus-community/kube-prometheus-stack \
  --namespace prometheus --values kube-prometheus-stack.values

参考