17.2.3 基础配置文件prometheus.yml详解
基础配置文件prometheus.yml详解#
prometheus.yml 是 Prometheus 的核心配置文件,定义抓取目标、规则、告警与远程存储。掌握其结构、字段与验证方法,是部署与后续扩展的基础。
1. 原理草图:配置驱动的抓取链路#
2. 文件路径与最小可用配置#
建议路径:
- 二进制安装:/etc/prometheus/prometheus.yml
- 容器安装:/etc/prometheus/prometheus.yml 或 /prometheus/prometheus.yml
最小配置示例(可直接启动验证):
# /etc/prometheus/prometheus.yml
global:
scrape_interval: 15s
scrape_configs:
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]
验证启动:
/usr/local/bin/prometheus \
--config.file=/etc/prometheus/prometheus.yml \
--storage.tsdb.path=/var/lib/prometheus
预期效果:访问 http://<host>:9090/graph 可看到 up{job="prometheus"}。
3. 全局配置 global#
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_timeout: 10s
external_labels:
env: prod
region: bj
命令解释与注意点:
- scrape_timeout 必须小于 scrape_interval
- external_labels 常用于联邦或远程存储区分数据来源
4. 规则文件 rule_files#
rule_files:
- "/etc/prometheus/rules/*.yml"
修改规则后热加载:
curl -X POST http://localhost:9090/-/reload
5. 抓取任务 scrape_configs(含静态目标)#
scrape_configs:
- job_name: "node"
scrape_interval: 10s
metrics_path: /metrics
scheme: http
static_configs:
- targets:
- "10.0.0.11:9100"
- "10.0.0.12:9100"
labels:
role: "web"
env: "prod"
解释:
- job_name 会形成 label:job
- labels 用于统一打标签便于告警与查询
6. relabel_configs 目标重写#
relabel_configs:
- source_labels: [__address__]
regex: "(.*):9100"
target_label: instance
replacement: "$1"
效果:instance 标签从 10.0.0.11:9100 变为 10.0.0.11。
7. metric_relabel_configs 指标过滤#
metric_relabel_configs:
- source_labels: [__name__]
regex: "go_.*"
action: drop
用途:减少不必要指标,降低存储压力。
8. Alertmanager 配置#
alerting:
alertmanagers:
- static_configs:
- targets: ["10.0.0.20:9093"]
9. 远程读写 remote_write / remote_read#
remote_write:
- url: "http://thanos-receive:19291/api/v1/receive"
remote_read:
- url: "http://thanos-query:10902/api/v1/read"
10. 完整示例(含自监控、规则、告警)#
# /etc/prometheus/prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
external_labels:
env: prod
rule_files:
- "/etc/prometheus/rules/*.yml"
alerting:
alertmanagers:
- static_configs:
- targets: ["10.0.0.20:9093"]
scrape_configs:
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]
- job_name: "node"
scrape_interval: 10s
static_configs:
- targets: ["10.0.0.11:9100","10.0.0.12:9100"]
labels:
role: "web"
relabel_configs:
- source_labels: [__address__]
regex: "(.*):9100"
target_label: instance
replacement: "$1"
11. 命令与排错#
配置检查(Prometheus 2.24+):
/usr/local/bin/prometheus \
--config.file=/etc/prometheus/prometheus.yml \
--check-config
查看当前生效配置(验证是否加载成功):
curl http://localhost:9090/api/v1/status/config | head -n 20
查看目标抓取状态:
curl http://localhost:9090/api/v1/targets | head -n 40
常见问题与定位:
- 报错 scrape_timeout must be <= scrape_interval:调整超时与间隔
- 目标为 DOWN:检查端口可达 curl http://10.0.0.11:9100/metrics
- 规则不生效:确认 rule_files 路径正确并重载
12. 练习#
- 在
node抓取任务中增加标签dc: bj,并通过curl /api/v1/targets验证标签。 - 为
node任务添加metric_relabel_configs丢弃process_.*指标。 - 使用
/-/reload热加载配置,验证不重启的情况下生效。