17.2.3 基础配置文件prometheus.yml详解

基础配置文件prometheus.yml详解#

prometheus.yml 是 Prometheus 的核心配置文件,定义抓取目标、规则、告警与远程存储。掌握其结构、字段与验证方法,是部署与后续扩展的基础。

1. 原理草图:配置驱动的抓取链路#

文章图片

2. 文件路径与最小可用配置#

建议路径:
- 二进制安装:/etc/prometheus/prometheus.yml
- 容器安装:/etc/prometheus/prometheus.yml 或 /prometheus/prometheus.yml

最小配置示例(可直接启动验证):

# /etc/prometheus/prometheus.yml
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]

验证启动:

/usr/local/bin/prometheus \
  --config.file=/etc/prometheus/prometheus.yml \
  --storage.tsdb.path=/var/lib/prometheus

预期效果:访问 http://<host>:9090/graph 可看到 up{job="prometheus"}

3. 全局配置 global#

global:
  scrape_interval: 15s
  evaluation_interval: 15s
  scrape_timeout: 10s
  external_labels:
    env: prod
    region: bj

命令解释与注意点:
- scrape_timeout 必须小于 scrape_interval
- external_labels 常用于联邦或远程存储区分数据来源

4. 规则文件 rule_files#

rule_files:
  - "/etc/prometheus/rules/*.yml"

修改规则后热加载:

curl -X POST http://localhost:9090/-/reload

5. 抓取任务 scrape_configs(含静态目标)#

scrape_configs:
  - job_name: "node"
    scrape_interval: 10s
    metrics_path: /metrics
    scheme: http
    static_configs:
      - targets:
          - "10.0.0.11:9100"
          - "10.0.0.12:9100"
        labels:
          role: "web"
          env: "prod"

解释:
- job_name 会形成 label:job
- labels 用于统一打标签便于告警与查询

6. relabel_configs 目标重写#

relabel_configs:
  - source_labels: [__address__]
    regex: "(.*):9100"
    target_label: instance
    replacement: "$1"

效果:instance 标签从 10.0.0.11:9100 变为 10.0.0.11

7. metric_relabel_configs 指标过滤#

metric_relabel_configs:
  - source_labels: [__name__]
    regex: "go_.*"
    action: drop

用途:减少不必要指标,降低存储压力。

8. Alertmanager 配置#

alerting:
  alertmanagers:
    - static_configs:
        - targets: ["10.0.0.20:9093"]

9. 远程读写 remote_write / remote_read#

remote_write:
  - url: "http://thanos-receive:19291/api/v1/receive"

remote_read:
  - url: "http://thanos-query:10902/api/v1/read"

10. 完整示例(含自监控、规则、告警)#

# /etc/prometheus/prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s
  external_labels:
    env: prod

rule_files:
  - "/etc/prometheus/rules/*.yml"

alerting:
  alertmanagers:
    - static_configs:
        - targets: ["10.0.0.20:9093"]

scrape_configs:
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]

  - job_name: "node"
    scrape_interval: 10s
    static_configs:
      - targets: ["10.0.0.11:9100","10.0.0.12:9100"]
        labels:
          role: "web"
    relabel_configs:
      - source_labels: [__address__]
        regex: "(.*):9100"
        target_label: instance
        replacement: "$1"

11. 命令与排错#

配置检查(Prometheus 2.24+):

/usr/local/bin/prometheus \
  --config.file=/etc/prometheus/prometheus.yml \
  --check-config

查看当前生效配置(验证是否加载成功):

curl http://localhost:9090/api/v1/status/config | head -n 20

查看目标抓取状态:

curl http://localhost:9090/api/v1/targets | head -n 40

常见问题与定位:
- 报错 scrape_timeout must be <= scrape_interval:调整超时与间隔
- 目标为 DOWN:检查端口可达 curl http://10.0.0.11:9100/metrics
- 规则不生效:确认 rule_files 路径正确并重载

12. 练习#

  1. node 抓取任务中增加标签 dc: bj,并通过 curl /api/v1/targets 验证标签。
  2. node 任务添加 metric_relabel_configs 丢弃 process_.* 指标。
  3. 使用 /-/reload 热加载配置,验证不重启的情况下生效。