17.10.2 TLS加密与证书管理
在Prometheus体系中启用TLS主要覆盖Prometheus Server、Alertmanager、Pushgateway、Exporter以及Grafana的链路,确保指标传输、告警通知与Web界面的机密性与完整性。实施前需明确证书颁发体系(自签或企业CA)、证书使用范围(服务端证书、客户端证书、双向TLS)以及证书生命周期管理策略。
原理草图(TLS链路与证书校验):
证书体系与规划要点:
- 建议为每个服务实例或服务域名签发独立证书,避免复用带来的扩展与吊销风险
- 优先使用内网CA或企业PKI,统一管理证书签发与吊销
- 为Prometheus与Alertmanager启用双向TLS(mTLS),保证服务端与客户端身份双向校验
- 证书有效期不宜过长,结合自动化续期与平滑轮转
安装与证书生成示例(基于OpenSSL,自签CA):
# 1) 准备目录
mkdir -p /etc/prometheus/certs && cd /etc/prometheus/certs
# 2) 生成CA私钥与证书
openssl genrsa -out ca.key 4096
openssl req -x509 -new -nodes -key ca.key -sha256 -days 3650 \
-subj "/C=CN/ST=Beijing/L=Beijing/O=Ops/OU=CA/CN=ops-ca" \
-out ca.crt
# 3) 生成Prometheus服务端证书
openssl genrsa -out prometheus.key 2048
openssl req -new -key prometheus.key \
-subj "/C=CN/ST=BJ/L=BJ/O=Ops/OU=Prometheus/CN=prometheus.local" \
-out prometheus.csr
# 4) 生成包含SAN的扩展文件
cat > prometheus.ext <<'EOF'
subjectAltName=DNS:prometheus.local,IP:10.0.0.10
extendedKeyUsage=serverAuth
EOF
# 5) 使用CA签发
openssl x509 -req -in prometheus.csr -CA ca.crt -CAkey ca.key \
-CAcreateserial -out prometheus.crt -days 365 -sha256 -extfile prometheus.ext
# 6) 生成Prometheus客户端证书(用于mTLS访问Exporter)
openssl genrsa -out prometheus-client.key 2048
openssl req -new -key prometheus-client.key \
-subj "/C=CN/ST=BJ/L=BJ/O=Ops/OU=Prometheus/CN=prom-client" \
-out prometheus-client.csr
cat > prometheus-client.ext <<'EOF'
extendedKeyUsage=clientAuth
EOF
openssl x509 -req -in prometheus-client.csr -CA ca.crt -CAkey ca.key \
-CAcreateserial -out prometheus-client.crt -days 365 -sha256 -extfile prometheus-client.ext
命令说明要点:
- -subj:证书主题信息,CN建议与访问域名匹配
- subjectAltName:SAN字段,必须包含实际访问的域名/IP
- extendedKeyUsage:限定证书用途,服务端与客户端分开
Prometheus Server启用HTTPS示例(web-config文件):
# /etc/prometheus/web-config.yml
tls_server_config:
cert_file: /etc/prometheus/certs/prometheus.crt
key_file: /etc/prometheus/certs/prometheus.key
client_ca_file: /etc/prometheus/certs/ca.crt
client_auth_type: RequireAndVerifyClientCert
Prometheus启动参数示例:
# /usr/local/bin/prometheus
/usr/local/bin/prometheus \
--config.file=/etc/prometheus/prometheus.yml \
--web.config.file=/etc/prometheus/web-config.yml \
--web.listen-address=0.0.0.0:9090
Prometheus抓取Exporter启用mTLS示例:
# /etc/prometheus/prometheus.yml
scrape_configs:
- job_name: 'node'
scheme: https
metrics_path: /metrics
tls_config:
ca_file: /etc/prometheus/certs/ca.crt
cert_file: /etc/prometheus/certs/prometheus-client.crt
key_file: /etc/prometheus/certs/prometheus-client.key
server_name: nodeexporter.local
insecure_skip_verify: false
static_configs:
- targets: ['10.0.0.20:9100']
Node Exporter启用TLS示例:
# 生成Node Exporter证书(与Prometheus类似,CN/SAN为nodeexporter.local)
# 省略生成步骤,放置于 /etc/node_exporter/certs
/usr/local/bin/node_exporter \
--web.config.file=/etc/node_exporter/web-config.yml
# /etc/node_exporter/web-config.yml
tls_server_config:
cert_file: /etc/node_exporter/certs/nodeexporter.crt
key_file: /etc/node_exporter/certs/nodeexporter.key
client_ca_file: /etc/node_exporter/certs/ca.crt
client_auth_type: RequireAndVerifyClientCert
Alertmanager启用HTTPS示例:
# /etc/alertmanager/web-config.yml
tls_server_config:
cert_file: /etc/alertmanager/certs/alertmanager.crt
key_file: /etc/alertmanager/certs/alertmanager.key
client_ca_file: /etc/alertmanager/certs/ca.crt
client_auth_type: RequestClientCert
/usr/local/bin/alertmanager \
--config.file=/etc/alertmanager/alertmanager.yml \
--web.config.file=/etc/alertmanager/web-config.yml
证书与密钥权限设置示例:
chown -R prometheus:prometheus /etc/prometheus/certs
chmod 700 /etc/prometheus/certs
chmod 600 /etc/prometheus/certs/*.key
命令说明:
- chmod 600确保仅服务用户可读私钥
- 证书目录与配置目录分离,便于审计与轮转
证书验证与连通性检查:
# 验证证书链
openssl verify -CAfile /etc/prometheus/certs/ca.crt /etc/prometheus/certs/prometheus.crt
# 使用curl验证HTTPS与mTLS
curl -vk --cacert /etc/prometheus/certs/ca.crt \
--cert /etc/prometheus/certs/prometheus-client.crt \
--key /etc/prometheus/certs/prometheus-client.key \
https://prometheus.local:9090/metrics
预期效果:
- openssl verify返回 OK
- curl返回Prometheus自身的metrics文本
排错清单与定位命令:
- 证书不匹配/过期:
openssl x509 -in /etc/prometheus/certs/prometheus.crt -noout -dates -subject -issuer
- SAN缺失导致握手失败:
openssl s_client -connect prometheus.local:9090 -servername prometheus.local -showcerts
- 客户端未携带证书导致mTLS失败:
journalctl -u prometheus -n 50 | grep -i tls
常见报错解释:
- x509: certificate is not valid for any names:SAN未包含访问域名/IP
- tls: bad certificate:客户端证书不被CA信任或未提供
运维流程建议:
- 建立证书申请、审批、签发、部署、续期与吊销流程
- 将证书与私钥纳入安全审计与访问记录
- 配置自动化脚本与配置管理工具(Ansible/Helm)进行证书分发
- 定期演练证书到期与吊销应急预案,确保告警链路可靠
练习:
1. 为Prometheus与Node Exporter分别签发证书,开启mTLS并通过curl验证。
2. 将证书有效期设置为7天,写一个脚本输出剩余天数,并用Prometheus采集该指标。
3. 故意移除SAN后重新签发证书,观察Prometheus抓取错误日志并定位原因。