在Kubernetes中安装运行Prometheus¶
备注
本文通过手工配置步骤,一步步在Kubernetes集群运行Prometheus进行集群监控,配合 在Kubernetes集群运行Grafana 可以实现Kubernetes集群常规监控和故障分析。后续再通过 使用Helm 3在Kubernetes集群部署Prometheus和Grafana 实现自动化部署整套监控系统。
之前还做过 在Kubernetes(ARM)中安装运行Prometheus ,本文实践在此基础上,重新在 私有云架构 的 Kubernetes集群(z-k8s) 部署
Prometheus提供了官方docker hub的 Prometheus docker image ,可以用来安装。
Prometheus Kubernetes Manifest文件¶
从 bibinwilson/kubernetes-prometheus 下载帮助我们安装的配置文件:
git clone https://github.com/bibinwilson/kubernetes-prometheus
创建Namespace和ClusterRole¶
首先创建一个用于所有监控组件的Kubernetes namespace,这样可以避免Prometheus Kubernetes部署对象被部署到默认namespace:
kubectl create namespace monitoring
Prometheus使用Kubernetes API来获取节点、Pods、Deployments等的所有提供的metrics,所以需要创建一个 read access
的RBAC策略绑定到 monitoring
namespace。
创建一个
clusterRole.yaml
:
1apiVersion: rbac.authorization.k8s.io/v1
2kind: ClusterRole
3metadata:
4 name: prometheus
5rules:
6- apiGroups: [""]
7 resources:
8 - nodes
9 - nodes/proxy
10 - services
11 - endpoints
12 - pods
13 verbs: ["get", "list", "watch"]
14- apiGroups:
15 - extensions
16 resources:
17 - ingresses
18 verbs: ["get", "list", "watch"]
19- nonResourceURLs: ["/metrics"]
20 verbs: ["get"]
21---
22apiVersion: rbac.authorization.k8s.io/v1
23kind: ClusterRoleBinding
24metadata:
25 name: prometheus
26roleRef:
27 apiGroup: rbac.authorization.k8s.io
28 kind: ClusterRole
29 name: prometheus
30subjects:
31- kind: ServiceAccount
32 name: default
33 namespace: monitoring
在权限中添加了 verbs: ["get", "list", "watch"]
提供了节点、服务、pods和ingress的对应权限,然后绑定到 monitoring
namespace。
使用如下命令创建角色:
kubectl create -f clusterRole.yaml
提示成功:
clusterrole.rbac.authorization.k8s.io/prometheus created
clusterrolebinding.rbac.authorization.k8s.io/prometheus created
创建Config Map来输出Prometheus配置¶
配置文件:
prometheus.yaml
处理所有配置,服务发现,存储以及数据保留等有关Prometheus的配置prometheus.rules
包含所有Prometheus告警规则
通过将Prometheus配置暴露给Kubernetes Config Map,就不需要在添加和删除配置时重建Prometheus镜像,你只需要更新config map并重启Prometheus Pod就可以使配置生效。
在 config-map.yaml
配置中包含了上述两个配置文件:
1apiVersion: v1
2kind: ConfigMap
3metadata:
4 name: prometheus-server-conf
5 labels:
6 name: prometheus-server-conf
7 namespace: monitoring
8data:
9 prometheus.rules: |-
10 groups:
11 - name: devopscube demo alert
12 rules:
13 - alert: High Pod Memory
14 expr: sum(container_memory_usage_bytes) > 1
15 for: 1m
16 labels:
17 severity: slack
18 annotations:
19 summary: High Memory Usage
20 prometheus.yml: |-
21 global:
22 scrape_interval: 5s
23 evaluation_interval: 5s
24 rule_files:
25 - /etc/prometheus/prometheus.rules
26 alerting:
27 alertmanagers:
28 - scheme: http
29 static_configs:
30 - targets:
31 - "alertmanager.monitoring.svc:9093"
32
33 scrape_configs:
34 - job_name: 'node-exporter'
35 kubernetes_sd_configs:
36 - role: endpoints
37 relabel_configs:
38 - source_labels: [__meta_kubernetes_endpoints_name]
39 regex: 'node-exporter'
40 action: keep
41
42 - job_name: 'kubernetes-apiservers'
43
44 kubernetes_sd_configs:
45 - role: endpoints
46 scheme: https
47
48 tls_config:
49 ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
50 bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
51
52 relabel_configs:
53 - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
54 action: keep
55 regex: default;kubernetes;https
56
57 - job_name: 'kubernetes-nodes'
58
59 scheme: https
60
61 tls_config:
62 ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
63 bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
64
65 kubernetes_sd_configs:
66 - role: node
67
68 relabel_configs:
69 - action: labelmap
70 regex: __meta_kubernetes_node_label_(.+)
71 - target_label: __address__
72 replacement: kubernetes.default.svc:443
73 - source_labels: [__meta_kubernetes_node_name]
74 regex: (.+)
75 target_label: __metrics_path__
76 replacement: /api/v1/nodes/${1}/proxy/metrics
77
78 - job_name: 'kubernetes-pods'
79
80 kubernetes_sd_configs:
81 - role: pod
82
83 relabel_configs:
84 - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
85 action: keep
86 regex: true
87 - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
88 action: replace
89 target_label: __metrics_path__
90 regex: (.+)
91 - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
92 action: replace
93 regex: ([^:]+)(?::\d+)?;(\d+)
94 replacement: $1:$2
95 target_label: __address__
96 - action: labelmap
97 regex: __meta_kubernetes_pod_label_(.+)
98 - source_labels: [__meta_kubernetes_namespace]
99 action: replace
100 target_label: kubernetes_namespace
101 - source_labels: [__meta_kubernetes_pod_name]
102 action: replace
103 target_label: kubernetes_pod_name
104
105 - job_name: 'kube-state-metrics'
106 static_configs:
107 - targets: ['kube-state-metrics.kube-system.svc.cluster.local:8080']
108
109 - job_name: 'kubernetes-cadvisor'
110
111 scheme: https
112
113 tls_config:
114 ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
115 bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
116
117 kubernetes_sd_configs:
118 - role: node
119
120 relabel_configs:
121 - action: labelmap
122 regex: __meta_kubernetes_node_label_(.+)
123 - target_label: __address__
124 replacement: kubernetes.default.svc:443
125 - source_labels: [__meta_kubernetes_node_name]
126 regex: (.+)
127 target_label: __metrics_path__
128 replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
129
130 - job_name: 'kubernetes-service-endpoints'
131
132 kubernetes_sd_configs:
133 - role: endpoints
134
135 relabel_configs:
136 - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
137 action: keep
138 regex: true
139 - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
140 action: replace
141 target_label: __scheme__
142 regex: (https?)
143 - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
144 action: replace
145 target_label: __metrics_path__
146 regex: (.+)
147 - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
148 action: replace
149 target_label: __address__
150 regex: ([^:]+)(?::\d+)?;(\d+)
151 replacement: $1:$2
152 - action: labelmap
153 regex: __meta_kubernetes_service_label_(.+)
154 - source_labels: [__meta_kubernetes_namespace]
155 action: replace
156 target_label: kubernetes_namespace
157 - source_labels: [__meta_kubernetes_service_name]
158 action: replace
159 target_label: kubernetes_name
scrape配置解析
kubernetes-apiservers
从API服务器获取所有metricskubernetes-nodes
搜集所有kubernetes node metricskubernetes-pods
如果pod的metadata通过prometheus.io/scrape
和prometheus.io/port
声明就采集kubernetes-cadvisor
采集所有cAdvisor metricskubernetes-service-endpoints
如果service的pod的metadata通过prometheus.io/scrape
和prometheus.io/port
声明就采集
prometheus.rules
包含所有发送告警规则现在执行以下命令创建Config Map:
kubectl create -f config-map.yaml
创建Prometheus Deployment¶
创建
prometheus-deployment.yaml
:
1apiVersion: apps/v1
2kind: Deployment
3metadata:
4 name: prometheus-deployment
5 namespace: monitoring
6 labels:
7 app: prometheus-server
8spec:
9 replicas: 1
10 selector:
11 matchLabels:
12 app: prometheus-server
13 template:
14 metadata:
15 labels:
16 app: prometheus-server
17 spec:
18 containers:
19 - name: prometheus
20 #image: prom/prometheus
21 image: prom/prometheus-linux-arm64
22 args:
23 - "--config.file=/etc/prometheus/prometheus.yml"
24 - "--storage.tsdb.path=/prometheus/"
25 ports:
26 - containerPort: 9090
27 volumeMounts:
28 - name: prometheus-config-volume
29 mountPath: /etc/prometheus/
30 - name: prometheus-storage-volume
31 mountPath: /prometheus/
32 nodeSelector:
33 kubernetes.io/arch: arm64
34 volumes:
35 - name: prometheus-config-volume
36 configMap:
37 defaultMode: 420
38 name: prometheus-server-conf
39
40 - name: prometheus-storage-volume
41 emptyDir: {}
警告
这里没有设置 Kubernetes持久化存储卷 后续完善,生产环境已经要持久化存储
备注
需要注意,我的实践环境是在 部署ARM架构Kubernetes ,所以需要采用 ARM 版本prometheus镜像 prom/prometheus-linux-arm64
,如果是 x86 架构,则直接使用 prom/prometheus
pod的deployment中必须配置 nodeSelector
spec:
containers:
- name: prometheus
#image: prom/prometheus
image: prom/prometheus-linux-arm64
...
nodeSelector:
kubernetes.io/arch: arm64
如果你使用常规的x86环境,请将上述配置修订成:
spec:
containers:
- name: prometheus
image: prom/prometheus
#image: prom/prometheus-linux-arm64
...
#nodeSelector:
# kubernetes.io/arch: arm64
创建部署:
kubectl create -f prometheus-deployment.yaml
完成后检查:
kubectl -n monitoring get pods -o wide
显示:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
prometheus-deployment-64d4b79f85-565jn 1/1 Running 0 24h 10.244.1.3 pi-worker1 <none> <none>
设置Kube State Metrics¶
默认配置下Kube state metrics service并没有提供很多metrics。所以需要确保部署Kube state metrics来监控所有的Kubernetes API对象,例如 deployments
, pods
, jobs
, cronjobs
等等。请参考 在Kubernetes集群配置Kube State Metrics