Prometheus规则 etcdDatabaseHighFragmentationRatio
¶
收到关于 etcd - 分布式kv存储 告警:
Alerts Firing
[WARNING] etcd database size in use is less than 50% of the actual allocated storage.
Description: etcd cluster "kube-etcd": database size in use on instance 172.21.44.238:2381 is 19.18% of the actual allocated disk space, please run defragmentation (e.g. etcdctl defrag) to retrieve the unused fragmented disk space.
Graph: 📈
Details:
alertname: etcdDatabaseHighFragmentationRatio
endpoint: http-metrics
instance: 172.21.44.238:2381
job: kube-etcd
namespace: kube-system
prometheus: default/kube-prometheus-stack-1681-prometheus
service: kube-prometheus-stack-1681-kube-etcd
这个告警初看没有明白,既然使用率不到50%为何还要告警? 而且还提示我要做碎片整理(run defragmentation)
在 helm定制 kube-prometheus-stack 解析社区 kube-prometheus-stack
可以看到在 templates/prometheus/rules-1.14/etcd.yaml
有如下规则:
{{- if not (.Values.defaultRules.disabled.etcdDatabaseHighFragmentationRatio | default false) }}
- alert: etcdDatabaseHighFragmentationRatio
annotations:
{{- if .Values.defaultRules.additionalRuleAnnotations }}
{{ toYaml .Values.defaultRules.additionalRuleAnnotations | indent 8 }}
{{- end }}
{{- if .Values.defaultRules.additionalRuleGroupAnnotations.etcd }}
{{ toYaml .Values.defaultRules.additionalRuleGroupAnnotations.etcd | indent 8 }}
{{- end }}
description: 'etcd cluster "{{`{{`}} $labels.job {{`}}`}}": database size in use on instance {{`{{`}} $labels.instance {{`}}`}} is {{`{{`}} $value | humanizePercentage {{`}}`}} of the actual allocated disk space, please run defragmentation (e.g. etcdctl defrag) to retrieve the unused fragmented disk space.'
runbook_url: https://etcd.io/docs/v3.5/op-guide/maintenance/#defragmentation
summary: etcd database size in use is less than 50% of the actual allocated storage.
expr: (last_over_time(etcd_mvcc_db_total_size_in_use_in_bytes[5m]) / last_over_time(etcd_mvcc_db_total_size_in_bytes[5m])) < 0.5 and etcd_mvcc_db_total_size_in_use_in_bytes > 104857600
for: 10m
labels:
severity: warning
{{- if or .Values.defaultRules.additionalRuleLabels .Values.defaultRules.additionalRuleGroupLabels.etcd }}
{{- with .Values.defaultRules.additionalRuleLabels }}
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.defaultRules.additionalRuleGroupLabels.etcd }}
{{- toYaml . | nindent 8 }}
{{- end }}
{{- end }}
{{- end }}
可以看到这个Prometheus查询规则:
(last_over_time(etcd_mvcc_db_total_size_in_use_in_bytes[5m]) / last_over_time(etcd_mvcc_db_total_size_in_bytes[5m])) < 0.5
查询出 etcd_mvcc_db
的使用空间和总空间的比率,小于 50%
并且 etcd_mvcc_db
使用空间 大于 100MB
就会发送告警