`kube-prometheus-stack` 告警配置¶

和 Prometheus监控应用实战类似，通过配置 Prometheus 告警规则来实现告警通知，不过 kube-prometheus-stack 提供了 values 输入方法来简化配置，本文记录实践经验。

告警配置入口¶

kube-prometheus-stack 的 values.yaml 中可以找到如下 PrometheusRules 相关入口:

kube-prometheus-stack 配置 PrometheusRules 默认入口，由于这里采用独立配置文件 rules_disk_alert.yaml 所以需要注释掉 additionalPrometheusRulesMap: {} 行¶

## Deprecated way to provide custom recording or alerting rules to be deployed into the cluster.
##
# additionalPrometheusRules: []
#  - name: my-rule-file
#    groups:
#      - name: my_group
#        rules:
#        - record: my_record
#          expr: 100 * my_record

## Provide custom recording or alerting rules to be deployed into the cluster.
##
#additionalPrometheusRulesMap: {}
#  rule-name:
#    groups:
#    - name: my_group
#      rules:
#      - record: my_record
#        expr: 100 * my_record

我们可以在入口上添加自定义的监控配置 ConfigMap : 如果采用独立配置，则注释掉 additionalPrometheusRulesMap: {} ；如果采用合并配置，则直接在 additionalPrometheusRulesMap: {} 行下添加配置(注意要去掉 {} )

Prometheus规则 `DiskUsage`¶

配置 rules_DiskUsage.yaml :

添加Prometheus规则对DiskUsage告警(可以将磁盘相关告警都放到名为 disk_alert 的分组中)¶

additionalPrometheusRulesMap:
  rule-name:
    groups:
    - name: disk_alert
      rules:
      - alert: DiskUsage
        expr: (node_filesystem_size_bytes{mountpoint="/"}) - (node_filesystem_free_bytes{mountpoint="/"}) >= (node_filesystem_size_bytes{mountpoint="/"}) * 0.8
        for: 1m
        labels:
          severity: warning
        annotations:
          summary: "Disk usage is too high (instance {{ $labels.instance }})"
          description: "Disk usage is too high (instance {{ $labels.instance }})."

上述监控目录 / 使用量，对于不同目录还要不断添加，感觉有点繁琐。改进为以下监控方式:

全面的主机磁盘使用空间检测，包括实例、设备和挂载点¶

- alert: "HostOutOfDiskSpace"
  expr: (node_filesystem_avail_bytes * 100) / node_filesystem_size_bytes < 20 and ON (instance, device, mountpoint) node_filesystem_readonly == 0
  for: 2m
  labels:
    severity: "warning"
  annotations:
    summary: "Host out of disk space (instance {{ $labels.instance }})"
    description: "Disk is almost full (< 20% left)"
    value: "{{ $value }}"

这样就可以监视所有挂载目录，出现超过 80% 使用率报警

执行更新Kubernetes集群的Prometheus配置将上述告警配置添加:

通过 helm upgrade 添加附加的Prometheus告警规则¶

helm upgrade kube-prometheus-stack-1681228346 prometheus-community/kube-prometheus-stack \
  --namespace prometheus --values kube-prometheus-stack.values -f rules_disk_alert.yaml

备注

采用独立的 rules_disk_alert.yaml 需要注释掉对应 additionalPrometheusRulesMap: {} ，所以我感觉还是直接编辑 values.yaml 更为方便(只需要一个配置文件)。当然，为了能够将不同的配置归类，采用独立的配置文件也未尝不可。主要看你的运维习惯。

参考¶

How are Prometheus alerts configured on Kubernetes with prometheus-community/prometheus
Helm / kube-prometheus-stack: Can I create rules for exporters in values.yaml?
Prometheus: Configuring Prometheus alert rules 这个告警设置非常准确可用

kube-prometheus-stack 告警配置¶

告警配置入口¶

Prometheus规则 DiskUsage¶

参考¶

`kube-prometheus-stack` 告警配置¶

Prometheus规则 `DiskUsage`¶