prometheus-webhook-dingtalk¶
timonwong / prometheus-webhook-dingtalk 是Prometheus官方推荐的第三方 Alertmanager Webhook Receiver
,用于支持通过钉钉 DingTalk
发送告警通知。
systemd
方式运行 prometheus-webhook-dingtalk
¶
备注
对于没有容器运行环境的系统,可以直接下载二进制可执行程序并结合 Systemd进程管理器 管理脚本来实现服务启动和停止,也非常方便
从 timonwong / prometheus-webhook-dingtalk GitHub的Release可以下载到官方编译的执行程序,例如AMD64版本
prometheus-webhook-dingtalk-2.1.0.linux-amd64.tar.gz
(将程序复制到/opt
目录,后续配置也以这个为准):tar xfz prometheus-webhook-dingtalk-2.1.0.linux-amd64.tar.gz mv prometheus-webhook-dingtalk-2.1.0.linux-amd64 /opt/prometheus-webhook-dingtalk
编辑
/etc/systemd/system/prometheus-webhook-dingtalk.service
:
[Unit]
Description=prometheus-webhook-dingtalk
After=network-online.target
[Service]
Restart=on-failure
ExecStart=/opt/prometheus-webhook-dingtalk/prometheus-webhook-dingtalk --config.file=/opt/prometheus-webhook-dingtalk/config.yml
[Install]
WantedBy=multi-user.target
采用 prometheus-webhook-dingtalk 模版 的配置文件
config.yml
和template.tmpl
,将这两个文件复制到/opt/prometheus-webhook-dingtalk
目录下(注意修订一下config.yml
指定正确的template.tmpl
位置)启动服务:
systemctl daemon-reload systemctl start prometheus-webhook-dingtalk systemctl enable prometheus-webhook-dingtalk ss -tnl | grep 8060
Docker运行¶
备注
timonwong / prometheus-webhook-dingtalk 原作者已经不再使用钉钉(应该离开了阿里),所以项目文档没有很好维护,需要根据项目issue中一些线索来自行探索。我尝试 ./contrib/k8s 没有成功,由于我也没有时间折腾,所以采用最简单的Docker方式来运行,先满足项目的临时要求。
docker run -d --restart always -p 8060:8060 -v $PWD/config.yml:/etc/prometheus-webhook-dingtalk/config.yml \
timonwong/prometheus-webhook-dingtalk --config.file=/etc/prometheus-webhook-dingtalk/config.yml \
--web.listen-address=0.0.0.0:8060 --web.enable-ui --web.enable-lifecycle
对于 containerd运行时(runtime) ,将
docker
命令修订为 nerdctl 执行如下:
nerdctl run -d --restart always -p 8060:8060 -v $PWD/config.yml:/etc/prometheus-webhook-dingtalk/config.yml \
timonwong/prometheus-webhook-dingtalk --config.file=/etc/prometheus-webhook-dingtalk/config.yml --web.listen-address=0.0.0.0:8060 --web.enable-ui
备注
参数 --web.listen-address=0.0.0.0:8060 --web.enable-ui --web.enable-lifecycle
:
--web.listen-address=0.0.0.0:8060
监听所有网络接口--web.enable-ui
激活WEB ui功能,这样方便通过WEB页面配置模版--web.enable-lifecycle
提供了通过curl -XPOST http://localhost:8060/-/reload
重新加载配置的功能
备注
这里使用了参数 --restart always
,这会使得 nerdctl stop
失效。解决的方法是使用 nerdctl rm -f XXXX
这里会使用一个
config.yml
,从 timonwong / prometheus-webhook-dingtalk 项目中的案例config.example.yml
复制出来修改:
## Request timeout
# timeout: 5s
## Uncomment following line in order to write template from scratch (be careful!)
#no_builtin_template: true
## Customizable templates path
#templates:
# - contrib/templates/legacy/template.tmpl
## You can also override default template using `default_message`
## The following example to use the 'legacy' template from v0.3.0
#default_message:
# title: '{{ template "legacy.title" . }}'
# text: '{{ template "legacy.content" . }}'
## Targets, previously was known as "profiles"
targets:
cloud_atlas_alert:
url: https://oapi.dingtalk.com/robot/send?access_token=zzzzzzzzzzzz
mention:
mobiles: ['136xxxxxxxxx']
sre_team_1:
url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxx
mention:
mobiles: ['136xxxx8827', '139xxxx8325']
sre_team_2:
url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxx
mention:
mobiles: ['156xxxx8827', '189xxxx8325']
备注
这里我简单配置了3个 target
(相当于值班组),当 kube-prometheus-stack 配置AlertManager 配置了对应的 receivers
,关联的webhook是根据URL中的路径来识别出哪个 target
,对应的钉钉机器人就会被通知到。
下文我将配置 kube-prometheus-stack 配置AlertManager ,添加对应的接受人关联到这个 webhook
kube-prometheus-stack
配置¶
kube-prometheus-stack
通过 helm 的 values.yaml
添加对应的 receivers
,来和 prometheus-webhook-dingtalk
关联:
## Configuration for alertmanager
## ref: https://prometheus.io/docs/alerting/alertmanager/
##
alertmanager:
...
## Alertmanager configuration directives
## ref: https://prometheus.io/docs/alerting/configuration/#configuration-file
## https://prometheus.io/webtools/alerting/routing-tree-editor/
##
config:
global:
resolve_timeout: 5m
inhibit_rules:
- source_matchers:
- 'severity = critical'
target_matchers:
- 'severity =~ warning|info'
equal:
- 'namespace'
- 'alertname'
- source_matchers:
- 'severity = warning'
target_matchers:
- 'severity = info'
equal:
- 'namespace'
- 'alertname'
- source_matchers:
- 'alertname = InfoInhibitor'
target_matchers:
- 'severity = info'
equal:
- 'namespace'
route:
group_by: ['namespace']
group_wait: 30s
group_interval: 5m
repeat_interval: 12h
receiver: 'cloud_atlas_alert'
routes:
- receiver: 'cloud_atlas_alert'
matchers:
- alertname =~ "InfoInhibitor|Watchdog"
receivers:
- name: cloud_atlas_alert
webhook_configs:
- url: http://192.168.6.115:8060/dingtalk/cloud_atlas_alert/send
templates:
- '/etc/alertmanager/config/*.tmpl'
然后执行 更新Kubernetes集群的Prometheus配置 :
helm upgrade kube-prometheus-stack-1681228346 prometheus-community/kube-prometheus-stack \
--namespace prometheus --values kube-prometheus-stack.values
此时更新后的 alertmanager.yaml
之后,钉钉群机器人就会立即收到通知
Prometheus的 web.external-url
¶
默认通知中 Graph
是使用 Prometheus监控 的内部域名 http://kube-prometheus-stack-1680-prometheus.prometheus:9090/graph
,这个URL通常在外部无法访问(当然你也可以在公司内部增加这个域名解析)。比较好的解决方法是采用 --web.external-url
参数传递给 Prometheus监控 ( Alertmanager 也有这样一个参数) 。对于 在Kubernetes集群(z-k8s)部署集成GPU监控的Prometheus和Grafana 所采用的 kube-prometheus-stack
修订配置: 参考 f663fb6 修订位置应该是 prometheus.prometheusSpec.externalURL
(是的,我想到了 kube-prometheus-stack tsdb数据保存时间 曾经设置过向 prometheus 传递运行参数 --storage.tsdb.retention.time=180d
)
## Deploy a Prometheus instance
##
prometheus:
enabled: true
...
## Settings affecting prometheusSpec
## ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#prometheusspec
##
prometheusSpec:
...
## External URL at which Prometheus will be reachable.
##
externalUrl: "http://prometheus.cloud-atlas.io:9090"
...
## How long to retain metrics
##
retention: 180d
多个群通知¶
参考 如果是发送给多个群怎么配置? #198 可以尝试将钉钉消息发给多个群:
- name: 'rx'
webhook_configs:
- url: 'http://monitor-alertmanager-webhook-dingtalk:8060/dingtalk/r1/send'
- url: http://monitor-alertmanager-webhook-dingtalk:8060/dingtalk/r2/send'
访问设置页面¶
prometheus-webhook-dingtalk
提供了一个 Node.js Atlas 编写的配置页面,可以参考 prometheus-webhook-dingtalk FAQ 配置模版
参考¶
将钉钉接入 Prometheus AlertManager WebHook
prometheus-webhook-dingtalk
原作者的blog二进制方式部署配置prometheus-webhook-dingtalk+alertmanager自动告警 比较详细的操作文档