17.10.2 TLS加密与证书管理

在Prometheus体系中启用TLS主要覆盖Prometheus Server、Alertmanager、Pushgateway、Exporter以及Grafana的链路,确保指标传输、告警通知与Web界面的机密性与完整性。实施前需明确证书颁发体系(自签或企业CA)、证书使用范围(服务端证书、客户端证书、双向TLS)以及证书生命周期管理策略。

原理草图(TLS链路与证书校验):

文章图片

证书体系与规划要点:
- 建议为每个服务实例或服务域名签发独立证书,避免复用带来的扩展与吊销风险
- 优先使用内网CA或企业PKI,统一管理证书签发与吊销
- 为Prometheus与Alertmanager启用双向TLS(mTLS),保证服务端与客户端身份双向校验
- 证书有效期不宜过长,结合自动化续期与平滑轮转

安装与证书生成示例(基于OpenSSL,自签CA):

# 1) 准备目录
mkdir -p /etc/prometheus/certs && cd /etc/prometheus/certs

# 2) 生成CA私钥与证书
openssl genrsa -out ca.key 4096
openssl req -x509 -new -nodes -key ca.key -sha256 -days 3650 \
  -subj "/C=CN/ST=Beijing/L=Beijing/O=Ops/OU=CA/CN=ops-ca" \
  -out ca.crt

# 3) 生成Prometheus服务端证书
openssl genrsa -out prometheus.key 2048
openssl req -new -key prometheus.key \
  -subj "/C=CN/ST=BJ/L=BJ/O=Ops/OU=Prometheus/CN=prometheus.local" \
  -out prometheus.csr

# 4) 生成包含SAN的扩展文件
cat > prometheus.ext <<'EOF'
subjectAltName=DNS:prometheus.local,IP:10.0.0.10
extendedKeyUsage=serverAuth
EOF

# 5) 使用CA签发
openssl x509 -req -in prometheus.csr -CA ca.crt -CAkey ca.key \
  -CAcreateserial -out prometheus.crt -days 365 -sha256 -extfile prometheus.ext

# 6) 生成Prometheus客户端证书(用于mTLS访问Exporter)
openssl genrsa -out prometheus-client.key 2048
openssl req -new -key prometheus-client.key \
  -subj "/C=CN/ST=BJ/L=BJ/O=Ops/OU=Prometheus/CN=prom-client" \
  -out prometheus-client.csr

cat > prometheus-client.ext <<'EOF'
extendedKeyUsage=clientAuth
EOF

openssl x509 -req -in prometheus-client.csr -CA ca.crt -CAkey ca.key \
  -CAcreateserial -out prometheus-client.crt -days 365 -sha256 -extfile prometheus-client.ext

命令说明要点:
- -subj:证书主题信息,CN建议与访问域名匹配
- subjectAltName:SAN字段,必须包含实际访问的域名/IP
- extendedKeyUsage:限定证书用途,服务端与客户端分开

Prometheus Server启用HTTPS示例(web-config文件):

# /etc/prometheus/web-config.yml
tls_server_config:
  cert_file: /etc/prometheus/certs/prometheus.crt
  key_file: /etc/prometheus/certs/prometheus.key
  client_ca_file: /etc/prometheus/certs/ca.crt
  client_auth_type: RequireAndVerifyClientCert

Prometheus启动参数示例:

# /usr/local/bin/prometheus
/usr/local/bin/prometheus \
  --config.file=/etc/prometheus/prometheus.yml \
  --web.config.file=/etc/prometheus/web-config.yml \
  --web.listen-address=0.0.0.0:9090

Prometheus抓取Exporter启用mTLS示例:

# /etc/prometheus/prometheus.yml
scrape_configs:
  - job_name: 'node'
    scheme: https
    metrics_path: /metrics
    tls_config:
      ca_file: /etc/prometheus/certs/ca.crt
      cert_file: /etc/prometheus/certs/prometheus-client.crt
      key_file: /etc/prometheus/certs/prometheus-client.key
      server_name: nodeexporter.local
      insecure_skip_verify: false
    static_configs:
      - targets: ['10.0.0.20:9100']

Node Exporter启用TLS示例:

# 生成Node Exporter证书(与Prometheus类似,CN/SAN为nodeexporter.local)
# 省略生成步骤,放置于 /etc/node_exporter/certs

/usr/local/bin/node_exporter \
  --web.config.file=/etc/node_exporter/web-config.yml
# /etc/node_exporter/web-config.yml
tls_server_config:
  cert_file: /etc/node_exporter/certs/nodeexporter.crt
  key_file: /etc/node_exporter/certs/nodeexporter.key
  client_ca_file: /etc/node_exporter/certs/ca.crt
  client_auth_type: RequireAndVerifyClientCert

Alertmanager启用HTTPS示例:

# /etc/alertmanager/web-config.yml
tls_server_config:
  cert_file: /etc/alertmanager/certs/alertmanager.crt
  key_file: /etc/alertmanager/certs/alertmanager.key
  client_ca_file: /etc/alertmanager/certs/ca.crt
  client_auth_type: RequestClientCert
/usr/local/bin/alertmanager \
  --config.file=/etc/alertmanager/alertmanager.yml \
  --web.config.file=/etc/alertmanager/web-config.yml

证书与密钥权限设置示例:

chown -R prometheus:prometheus /etc/prometheus/certs
chmod 700 /etc/prometheus/certs
chmod 600 /etc/prometheus/certs/*.key

命令说明:
- chmod 600确保仅服务用户可读私钥
- 证书目录与配置目录分离,便于审计与轮转

证书验证与连通性检查:

# 验证证书链
openssl verify -CAfile /etc/prometheus/certs/ca.crt /etc/prometheus/certs/prometheus.crt

# 使用curl验证HTTPS与mTLS
curl -vk --cacert /etc/prometheus/certs/ca.crt \
  --cert /etc/prometheus/certs/prometheus-client.crt \
  --key /etc/prometheus/certs/prometheus-client.key \
  https://prometheus.local:9090/metrics

预期效果:
- openssl verify返回 OK
- curl返回Prometheus自身的metrics文本

排错清单与定位命令:
- 证书不匹配/过期:

openssl x509 -in /etc/prometheus/certs/prometheus.crt -noout -dates -subject -issuer
  • SAN缺失导致握手失败:
openssl s_client -connect prometheus.local:9090 -servername prometheus.local -showcerts
  • 客户端未携带证书导致mTLS失败:
journalctl -u prometheus -n 50 | grep -i tls

常见报错解释:
- x509: certificate is not valid for any names:SAN未包含访问域名/IP
- tls: bad certificate:客户端证书不被CA信任或未提供

运维流程建议:
- 建立证书申请、审批、签发、部署、续期与吊销流程
- 将证书与私钥纳入安全审计与访问记录
- 配置自动化脚本与配置管理工具(Ansible/Helm)进行证书分发
- 定期演练证书到期与吊销应急预案,确保告警链路可靠

练习:
1. 为Prometheus与Node Exporter分别签发证书,开启mTLS并通过curl验证。
2. 将证书有效期设置为7天,写一个脚本输出剩余天数,并用Prometheus采集该指标。
3. 故意移除SAN后重新签发证书,观察Prometheus抓取错误日志并定位原因。