kube-prometheus-stack
配置AlertManager¶
首先需要知道有两种 Prometheus 规则 :
Prometheus 记录规则 : 预先计算的表达式,无需每次都执行原始表达式就可以查询
Prometheus 告警规则 : 告警规则使用 PromQL 编写,评估一个或多个表达式并根据解决过触发警报
prometheus.yml
是 Prometheus监控 的主配置文件,但是并不是定义了所有的Prometheus规则,而是命名包含实际规则的其他文件。传统上, alerting rules
和 recording rules
是拆分到单独文件中的。
虽然在 prometheus
服务器pod中
alertmanager.config
定义¶
alertmanager.config
提供了指定 altermanager 的配置,这样就能够自己定制一些特定的 receivers
,不过可能更方便是直接 apply
## Configuration for alertmanager
## ref: https://prometheus.io/docs/alerting/alertmanager/
##
alertmanager:
...
## Alertmanager configuration directives
## ref: https://prometheus.io/docs/alerting/configuration/#configuration-file
## https://prometheus.io/webtools/alerting/routing-tree-editor/
##
config:
global:
resolve_timeout: 5m
inhibit_rules:
- source_matchers:
- 'severity = critical'
target_matchers:
- 'severity =~ warning|info'
equal:
- 'namespace'
- 'alertname'
- source_matchers:
- 'severity = warning'
target_matchers:
- 'severity = info'
equal:
- 'namespace'
- 'alertname'
- source_matchers:
- 'alertname = InfoInhibitor'
target_matchers:
- 'severity = info'
equal:
- 'namespace'
route:
group_by: ['namespace']
group_wait: 30s
group_interval: 5m
repeat_interval: 12h
receiver: 'cloud_atlas_alert'
routes:
- receiver: 'cloud_atlas_alert'
matchers:
- alertname =~ "InfoInhibitor|Watchdog"
receivers:
- name: cloud_atlas_alert
webhook_configs:
- url: http://192.168.6.115:8060/dingtalk/cloud_atlas_alert/send
templates:
- '/etc/alertmanager/config/*.tmpl'
这里主要将默认配置修改如下:
将
alertmanager.config
配置中receiver
从null
替换成需要接收的 prometheus-webhook-dingtalk 配置中的target
名字,这里是cloud_atlas_alert
添加 prometheus-webhook-dingtalk 的
webhook
配置,注意URL路径中包括了target
(cloud_atlas_alert
) ,也就是 URL 必须是http://<prometheus-webhook-dingtalk服务器IP>:8060/dingtalk/<target>/send
结合 prometheus-webhook-dingtalk 部署运行的服务器,就能立即收到钉钉通知(MarkDown格式),并且能够 @ 指定用户(根据手机号码或者工号),类似:
参考¶
How are Prometheus alerts configured on Kubernetes with prometheus-community/prometheus
[kube-prometheus-stack] Alertmanager does not update secret with custom configuration options #1998 提供了一个CRD配置思路待验证
How to overwrite alertmanager configuration in kube-prometheus-stack helm chart
[kube-prometheus-stack] Alertmanager does not update secret with custom configuration options #1998 这个issue的
values.yaml
可以参考