在Kubernetes中安装运行Prometheus

备注

本文通过手工配置步骤,一步步在Kubernetes集群运行Prometheus进行集群监控,配合 在Kubernetes集群运行Grafana 可以实现Kubernetes集群常规监控和故障分析。后续再通过 使用Helm 3在Kubernetes集群部署Prometheus和Grafana 实现自动化部署整套监控系统。

之前还做过 在Kubernetes(ARM)中安装运行Prometheus ,本文实践在此基础上,重新在 私有云架构Kubernetes集群(z-k8s) 部署

Prometheus提供了官方docker hub的 Prometheus docker image ,可以用来安装。

Prometheus Kubernetes Manifest文件

创建Namespace和ClusterRole

  • 首先创建一个用于所有监控组件的Kubernetes namespace,这样可以避免Prometheus Kubernetes部署对象被部署到默认namespace:

    kubectl create namespace monitoring
    

Prometheus使用Kubernetes API来获取节点、Pods、Deployments等的所有提供的metrics,所以需要创建一个 read access 的RBAC策略绑定到 monitoring namespace。

  • 创建一个 clusterRole.yaml :

run_prometheus_in_k8s/clusterRole.yaml
 1apiVersion: rbac.authorization.k8s.io/v1
 2kind: ClusterRole
 3metadata:
 4  name: prometheus
 5rules:
 6- apiGroups: [""]
 7  resources:
 8  - nodes
 9  - nodes/proxy
10  - services
11  - endpoints
12  - pods
13  verbs: ["get", "list", "watch"]
14- apiGroups:
15  - extensions
16  resources:
17  - ingresses
18  verbs: ["get", "list", "watch"]
19- nonResourceURLs: ["/metrics"]
20  verbs: ["get"]
21---
22apiVersion: rbac.authorization.k8s.io/v1
23kind: ClusterRoleBinding
24metadata:
25  name: prometheus
26roleRef:
27  apiGroup: rbac.authorization.k8s.io
28  kind: ClusterRole
29  name: prometheus
30subjects:
31- kind: ServiceAccount
32  name: default
33  namespace: monitoring

在权限中添加了 verbs: ["get", "list", "watch"] 提供了节点、服务、pods和ingress的对应权限,然后绑定到 monitoring namespace。

  • 使用如下命令创建角色:

    kubectl create -f clusterRole.yaml
    

提示成功:

clusterrole.rbac.authorization.k8s.io/prometheus created
clusterrolebinding.rbac.authorization.k8s.io/prometheus created

创建Config Map来输出Prometheus配置

  • 配置文件:

    • prometheus.yaml 处理所有配置,服务发现,存储以及数据保留等有关Prometheus的配置

    • prometheus.rules 包含所有Prometheus告警规则

通过将Prometheus配置暴露给Kubernetes Config Map,就不需要在添加和删除配置时重建Prometheus镜像,你只需要更新config map并重启Prometheus Pod就可以使配置生效。

config-map.yaml 配置中包含了上述两个配置文件:

run_prometheus_in_k8s/config-map.yaml
  1apiVersion: v1
  2kind: ConfigMap
  3metadata:
  4  name: prometheus-server-conf
  5  labels:
  6    name: prometheus-server-conf
  7  namespace: monitoring
  8data:
  9  prometheus.rules: |-
 10    groups:
 11    - name: devopscube demo alert
 12      rules:
 13      - alert: High Pod Memory
 14        expr: sum(container_memory_usage_bytes) > 1
 15        for: 1m
 16        labels:
 17          severity: slack
 18        annotations:
 19          summary: High Memory Usage
 20  prometheus.yml: |-
 21    global:
 22      scrape_interval: 5s
 23      evaluation_interval: 5s
 24    rule_files:
 25      - /etc/prometheus/prometheus.rules
 26    alerting:
 27      alertmanagers:
 28      - scheme: http
 29        static_configs:
 30        - targets:
 31          - "alertmanager.monitoring.svc:9093"
 32
 33    scrape_configs:
 34      - job_name: 'node-exporter'
 35        kubernetes_sd_configs:
 36          - role: endpoints
 37        relabel_configs:
 38        - source_labels: [__meta_kubernetes_endpoints_name]
 39          regex: 'node-exporter'
 40          action: keep
 41      
 42      - job_name: 'kubernetes-apiservers'
 43
 44        kubernetes_sd_configs:
 45        - role: endpoints
 46        scheme: https
 47
 48        tls_config:
 49          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
 50        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
 51
 52        relabel_configs:
 53        - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
 54          action: keep
 55          regex: default;kubernetes;https
 56
 57      - job_name: 'kubernetes-nodes'
 58
 59        scheme: https
 60
 61        tls_config:
 62          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
 63        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
 64
 65        kubernetes_sd_configs:
 66        - role: node
 67
 68        relabel_configs:
 69        - action: labelmap
 70          regex: __meta_kubernetes_node_label_(.+)
 71        - target_label: __address__
 72          replacement: kubernetes.default.svc:443
 73        - source_labels: [__meta_kubernetes_node_name]
 74          regex: (.+)
 75          target_label: __metrics_path__
 76          replacement: /api/v1/nodes/${1}/proxy/metrics     
 77      
 78      - job_name: 'kubernetes-pods'
 79
 80        kubernetes_sd_configs:
 81        - role: pod
 82
 83        relabel_configs:
 84        - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
 85          action: keep
 86          regex: true
 87        - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
 88          action: replace
 89          target_label: __metrics_path__
 90          regex: (.+)
 91        - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
 92          action: replace
 93          regex: ([^:]+)(?::\d+)?;(\d+)
 94          replacement: $1:$2
 95          target_label: __address__
 96        - action: labelmap
 97          regex: __meta_kubernetes_pod_label_(.+)
 98        - source_labels: [__meta_kubernetes_namespace]
 99          action: replace
100          target_label: kubernetes_namespace
101        - source_labels: [__meta_kubernetes_pod_name]
102          action: replace
103          target_label: kubernetes_pod_name
104      
105      - job_name: 'kube-state-metrics'
106        static_configs:
107          - targets: ['kube-state-metrics.kube-system.svc.cluster.local:8080']
108
109      - job_name: 'kubernetes-cadvisor'
110
111        scheme: https
112
113        tls_config:
114          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
115        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
116
117        kubernetes_sd_configs:
118        - role: node
119
120        relabel_configs:
121        - action: labelmap
122          regex: __meta_kubernetes_node_label_(.+)
123        - target_label: __address__
124          replacement: kubernetes.default.svc:443
125        - source_labels: [__meta_kubernetes_node_name]
126          regex: (.+)
127          target_label: __metrics_path__
128          replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
129      
130      - job_name: 'kubernetes-service-endpoints'
131
132        kubernetes_sd_configs:
133        - role: endpoints
134
135        relabel_configs:
136        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
137          action: keep
138          regex: true
139        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
140          action: replace
141          target_label: __scheme__
142          regex: (https?)
143        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
144          action: replace
145          target_label: __metrics_path__
146          regex: (.+)
147        - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
148          action: replace
149          target_label: __address__
150          regex: ([^:]+)(?::\d+)?;(\d+)
151          replacement: $1:$2
152        - action: labelmap
153          regex: __meta_kubernetes_service_label_(.+)
154        - source_labels: [__meta_kubernetes_namespace]
155          action: replace
156          target_label: kubernetes_namespace
157        - source_labels: [__meta_kubernetes_service_name]
158          action: replace
159          target_label: kubernetes_name
  • scrape配置解析

    • kubernetes-apiservers 从API服务器获取所有metrics

    • kubernetes-nodes 搜集所有kubernetes node metrics

    • kubernetes-pods 如果pod的metadata通过 prometheus.io/scrapeprometheus.io/port 声明就采集

    • kubernetes-cadvisor 采集所有cAdvisor metrics

    • kubernetes-service-endpoints 如果service的pod的metadata通过 prometheus.io/scrapeprometheus.io/port 声明就采集

  • prometheus.rules 包含所有发送告警规则

  • 现在执行以下命令创建Config Map:

    kubectl create -f config-map.yaml
    

创建Prometheus Deployment

  • 创建 prometheus-deployment.yaml :

run_prometheus_in_k8s/prometheus-deployment.yaml
 1apiVersion: apps/v1
 2kind: Deployment
 3metadata:
 4  name: prometheus-deployment
 5  namespace: monitoring
 6  labels:
 7    app: prometheus-server
 8spec:
 9  replicas: 1
10  selector:
11    matchLabels:
12      app: prometheus-server
13  template:
14    metadata:
15      labels:
16        app: prometheus-server
17    spec:
18      containers:
19        - name: prometheus
20          #image: prom/prometheus
21          image: prom/prometheus-linux-arm64
22          args:
23            - "--config.file=/etc/prometheus/prometheus.yml"
24            - "--storage.tsdb.path=/prometheus/"
25          ports:
26            - containerPort: 9090
27          volumeMounts:
28            - name: prometheus-config-volume
29              mountPath: /etc/prometheus/
30            - name: prometheus-storage-volume
31              mountPath: /prometheus/
32      nodeSelector:
33        kubernetes.io/arch: arm64
34      volumes:
35        - name: prometheus-config-volume
36          configMap:
37            defaultMode: 420
38            name: prometheus-server-conf
39  
40        - name: prometheus-storage-volume
41          emptyDir: {}

警告

这里没有设置 Kubernetes持久化存储卷 后续完善,生产环境已经要持久化存储

备注

需要注意,我的实践环境是在 部署ARM架构Kubernetes ,所以需要采用 ARM 版本prometheus镜像 prom/prometheus-linux-arm64 ,如果是 x86 架构,则直接使用 prom/prometheus

pod的deployment中必须配置 nodeSelector

spec:
containers:
  - name: prometheus
    #image: prom/prometheus
    image: prom/prometheus-linux-arm64
    ...
nodeSelector:
  kubernetes.io/arch: arm64

如果你使用常规的x86环境,请将上述配置修订成:

spec:
containers:
  - name: prometheus
    image: prom/prometheus
    #image: prom/prometheus-linux-arm64
    ...
#nodeSelector:
#  kubernetes.io/arch: arm64
  • 创建部署:

    kubectl create  -f prometheus-deployment.yaml
    
  • 完成后检查:

    kubectl -n monitoring get pods -o wide
    

显示:

NAME                                     READY   STATUS    RESTARTS   AGE   IP           NODE         NOMINATED NODE   READINESS GATES
prometheus-deployment-64d4b79f85-565jn   1/1     Running   0          24h   10.244.1.3   pi-worker1   <none>           <none>

设置Kube State Metrics

默认配置下Kube state metrics service并没有提供很多metrics。所以需要确保部署Kube state metrics来监控所有的Kubernetes API对象,例如 deployments , pods , jobs , cronjobs 等等。请参考 在Kubernetes集群配置Kube State Metrics

参考