在Kubernetes中安装运行Prometheus¶
注解
本文通过手工配置步骤,一步步在Kubernetes集群运行Prometheus进行集群监控,配合 在Kubernetes集群运行Grafana 可以实现Kubernetes集群常规监控和故障分析。后续再通过 使用Helm 3在Kubernetes集群部署Prometheus和Grafana 实现自动化部署整套监控系统。
Prometheus提供了官方docker hub的 Prometheus docker image ,可以用来安装。
Prometheus Kubernetes Manifest文件¶
从 bibinwilson/kubernetes-prometheus 下载帮助我们安装的配置文件:
git clone https://github.com/bibinwilson/kubernetes-prometheus
创建Namespace和ClusterRole¶
首先创建一个用于所有监控组件的Kubernetes namespace,这样可以避免Prometheus Kubernetes部署对象被部署到默认namespace:
kubectl create namespace monitoring
Prometheus使用Kubernetes API来获取节点、Pods、Deployments等的所有提供的metrics,所以需要创建一个 read access
的RBAC策略绑定到 monitoring
namespace。
- 创建一个
clusterRole.yaml
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: prometheus rules: - apiGroups: [""] resources: - nodes - nodes/proxy - services - endpoints - pods verbs: ["get", "list", "watch"] - apiGroups: - extensions resources: - ingresses verbs: ["get", "list", "watch"] - nonResourceURLs: ["/metrics"] verbs: ["get"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: prometheus roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: prometheus subjects: - kind: ServiceAccount name: default namespace: monitoring |
在权限中添加了 verbs: ["get", "list", "watch"]
提供了节点、服务、pods和ingress的对应权限,然后绑定到 monitoring
namespace。
使用如下命令创建角色:
kubectl create -f clusterRole.yaml
提示成功:
clusterrole.rbac.authorization.k8s.io/prometheus created
clusterrolebinding.rbac.authorization.k8s.io/prometheus created
创建Config Map来输出Prometheus配置¶
- 配置文件:
prometheus.yaml
处理所有配置,服务发现,存储以及数据保留等有关Prometheus的配置prometheus.rules
包含所有Prometheus告警规则
通过将Prometheus配置暴露给Kubernetes Config Map,就不需要在添加和删除配置时重建Prometheus镜像,你只需要更新config map并重启Prometheus Pod就可以使配置生效。
在 config-map.yaml
配置中包含了上述两个配置文件:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 | apiVersion: v1 kind: ConfigMap metadata: name: prometheus-server-conf labels: name: prometheus-server-conf namespace: monitoring data: prometheus.rules: |- groups: - name: devopscube demo alert rules: - alert: High Pod Memory expr: sum(container_memory_usage_bytes) > 1 for: 1m labels: severity: slack annotations: summary: High Memory Usage prometheus.yml: |- global: scrape_interval: 5s evaluation_interval: 5s rule_files: - /etc/prometheus/prometheus.rules alerting: alertmanagers: - scheme: http static_configs: - targets: - "alertmanager.monitoring.svc:9093" scrape_configs: - job_name: 'node-exporter' kubernetes_sd_configs: - role: endpoints relabel_configs: - source_labels: [__meta_kubernetes_endpoints_name] regex: 'node-exporter' action: keep - job_name: 'kubernetes-apiservers' kubernetes_sd_configs: - role: endpoints scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token relabel_configs: - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name] action: keep regex: default;kubernetes;https - job_name: 'kubernetes-nodes' scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token kubernetes_sd_configs: - role: node relabel_configs: - action: labelmap regex: __meta_kubernetes_node_label_(.+) - target_label: __address__ replacement: kubernetes.default.svc:443 - source_labels: [__meta_kubernetes_node_name] regex: (.+) target_label: __metrics_path__ replacement: /api/v1/nodes/${1}/proxy/metrics - job_name: 'kubernetes-pods' kubernetes_sd_configs: - role: pod relabel_configs: - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape] action: keep regex: true - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path] action: replace target_label: __metrics_path__ regex: (.+) - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port] action: replace regex: ([^:]+)(?::\d+)?;(\d+) replacement: $1:$2 target_label: __address__ - action: labelmap regex: __meta_kubernetes_pod_label_(.+) - source_labels: [__meta_kubernetes_namespace] action: replace target_label: kubernetes_namespace - source_labels: [__meta_kubernetes_pod_name] action: replace target_label: kubernetes_pod_name - job_name: 'kube-state-metrics' static_configs: - targets: ['kube-state-metrics.kube-system.svc.cluster.local:8080'] - job_name: 'kubernetes-cadvisor' scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token kubernetes_sd_configs: - role: node relabel_configs: - action: labelmap regex: __meta_kubernetes_node_label_(.+) - target_label: __address__ replacement: kubernetes.default.svc:443 - source_labels: [__meta_kubernetes_node_name] regex: (.+) target_label: __metrics_path__ replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor - job_name: 'kubernetes-service-endpoints' kubernetes_sd_configs: - role: endpoints relabel_configs: - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape] action: keep regex: true - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme] action: replace target_label: __scheme__ regex: (https?) - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path] action: replace target_label: __metrics_path__ regex: (.+) - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port] action: replace target_label: __address__ regex: ([^:]+)(?::\d+)?;(\d+) replacement: $1:$2 - action: labelmap regex: __meta_kubernetes_service_label_(.+) - source_labels: [__meta_kubernetes_namespace] action: replace target_label: kubernetes_namespace - source_labels: [__meta_kubernetes_service_name] action: replace target_label: kubernetes_name |
scrape配置解析
kubernetes-apiservers
从API服务器获取所有metricskubernetes-nodes
搜集所有kubernetes node metricskubernetes-pods
如果pod的metadata通过prometheus.io/scrape
和prometheus.io/port
声明就采集kubernetes-cadvisor
采集所有cAdvisor metricskubernetes-service-endpoints
如果service的pod的metadata通过prometheus.io/scrape
和prometheus.io/port
声明就采集
prometheus.rules
包含所有发送告警规则现在执行以下命令创建Config Map:
kubectl create -f config-map.yaml
创建Prometheus Deployment¶
- 创建
prometheus-deployment.yaml
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | apiVersion: apps/v1 kind: Deployment metadata: name: prometheus-deployment namespace: monitoring labels: app: prometheus-server spec: replicas: 1 selector: matchLabels: app: prometheus-server template: metadata: labels: app: prometheus-server spec: containers: - name: prometheus #image: prom/prometheus image: prom/prometheus-linux-arm64 args: - "--config.file=/etc/prometheus/prometheus.yml" - "--storage.tsdb.path=/prometheus/" ports: - containerPort: 9090 volumeMounts: - name: prometheus-config-volume mountPath: /etc/prometheus/ - name: prometheus-storage-volume mountPath: /prometheus/ nodeSelector: kubernetes.io/arch: arm64 volumes: - name: prometheus-config-volume configMap: defaultMode: 420 name: prometheus-server-conf - name: prometheus-storage-volume emptyDir: {} |
警告
这里没有设置 Kubernetes持久化存储卷 后续完善,生产环境已经要持久化存储
注解
需要注意,我的实践环境是在 部署ARM架构Kubernetes ,所以需要采用 ARM 版本prometheus镜像 prom/prometheus-linux-arm64
,如果是 x86 架构,则直接使用 prom/prometheus
pod的deployment中必须配置 nodeSelector
spec:
containers:
- name: prometheus
#image: prom/prometheus
image: prom/prometheus-linux-arm64
...
nodeSelector:
kubernetes.io/arch: arm64
如果你使用常规的x86环境,请将上述配置修订成:
spec:
containers:
- name: prometheus
image: prom/prometheus
#image: prom/prometheus-linux-arm64
...
#nodeSelector:
# kubernetes.io/arch: arm64
创建部署:
kubectl create -f prometheus-deployment.yaml
完成后检查:
kubectl -n monitoring get pods -o wide
显示:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
prometheus-deployment-64d4b79f85-565jn 1/1 Running 0 24h 10.244.1.3 pi-worker1 <none> <none>
设置Kube State Metrics¶
默认配置下Kube state metrics service并没有提供很多metrics。所以需要确保部署Kube state metrics来监控所有的Kubernetes API对象,例如 deployments
, pods
, jobs
, cronjobs
等等。请参考 在Kubernetes集群配置Kube State Metrics