17.1.1 Prometheus总体架构与数据流

Prometheus总体架构与数据流#

Prometheus 采用拉取式（Pull）模型构建“采集—存储—查询—告警—可视化”的闭环监控链路。本节在总体架构层面给出原理草图、安装验证、数据流示例、排错要点与练习。

原理草图（总体架构与数据流）

最小可运行示例（本机安装与验证）

目标：快速验证数据流中“目标暴露→抓取→TSDB→查询”的链路

1) 启动 node_exporter 暴露指标：

# 下载并启动 node_exporter
cd /opt
wget https://github.com/prometheus/node_exporter/releases/download/v1.6.1/node_exporter-1.6.1.linux-amd64.tar.gz
tar -xf node_exporter-1.6.1.linux-amd64.tar.gz
./node_exporter-1.6.1.linux-amd64/node_exporter --web.listen-address=":9100" &
# 预期：浏览器访问 http://localhost:9100/metrics 能看到文本指标

2) 安装并配置 Prometheus：

# 下载 Prometheus
cd /opt
wget https://github.com/prometheus/prometheus/releases/download/v2.47.0/prometheus-2.47.0.linux-amd64.tar.gz
tar -xf prometheus-2.47.0.linux-amd64.tar.gz
cd prometheus-2.47.0.linux-amd64

# 配置抓取目标（prometheus.yml）
cat > /opt/prometheus-2.47.0.linux-amd64/prometheus.yml <<'EOF'
global:
  scrape_interval: 15s
scrape_configs:
  - job_name: "node"
    static_configs:
      - targets: ["localhost:9100"]
EOF

# 启动 Prometheus Server
./prometheus --config.file=prometheus.yml --storage.tsdb.path=/opt/prometheus-data &
# 预期：访问 http://localhost:9090/targets 看到 node 目标为 UP

3) 查询验证（PromQL）：

# 通过 HTTP API 查询 up 指标
curl -s 'http://localhost:9090/api/v1/query?query=up' | head
# 预期：返回 value=1 表示目标抓取成功

数据流关键环节与命令解释
- scrape_interval: 15s：每 15 秒拉取一次目标指标。
- targets: ["localhost:9100"]：静态目标，Prometheus 直接向该地址拉取 /metrics。
- --storage.tsdb.path：TSDB 存储路径，决定数据落盘位置。

常见排错与定位
1) 目标不可达（Targets 页面为 DOWN）

# 检查目标端口是否监听
ss -lntp | grep 9100
# 预期：看到 node_exporter 进程监听 9100

2) 抓取失败或超时

# 直接探测 /metrics
curl -s http://localhost:9100/metrics | head
# 若无返回，检查防火墙或进程是否启动

3) Prometheus 启动失败

# 语法校验配置
/opt/prometheus-2.47.0.linux-amd64/promtool check config \
  /opt/prometheus-2.47.0.linux-amd64/prometheus.yml
# 预期：SUCCESS

练习与思考
1) 将 scrape_interval 改为 5s，观察数据点密度变化。
2) 新增一个 job 采集本机应用的 /metrics，并通过 sum by(job) 统计 up 指标。
3) 停止 node_exporter，观察 Targets 状态与 up 指标变化，并记录故障发现延迟。