17.4.4 自定义Exporter与指标设计规范

自定义 Exporter 用于采集业务或系统中 Prometheus 生态未覆盖的指标,通常通过 HTTP 暴露 /metrics 端点并返回 OpenMetrics/Prometheus 文本格式指标。建议优先复用官方 Client 库(Go/Java/Python),避免自建协议导致维护成本上升。

文章图片

指标类型与设计原则#

  • Counter:只增不减(如请求数/错误数),重启自然归零。
  • Gauge:可增可减(如连接数/队列长度)。
  • Histogram:分布统计与分位数分析(如耗时)。
  • Summary:客户端分位数,低维度小规模场景。

命名、单位与标签规范(示例)#

  • 命名:<namespace>_<subsystem>_<metric>_total(Counter)
  • 单位:_seconds_bytes_ratio
  • 标签:仅保留可聚合维度,如 serviceinstanceenvcode(避免 user_id、request_id)

示例指标设计:

# HELP biz_order_requests_total Total orders by result
# TYPE biz_order_requests_total counter
biz_order_requests_total{service="order",env="prod",code="2xx"} 10234
biz_order_requests_total{service="order",env="prod",code="5xx"} 12

# HELP biz_order_queue_length Current queue length
# TYPE biz_order_queue_length gauge
biz_order_queue_length{service="order",env="prod"} 37

示例:使用 Python 编写自定义 Exporter#

1)安装#

# 依赖安装
python3 -m venv /opt/exporter-venv
source /opt/exporter-venv/bin/activate
pip install prometheus-client==0.20.0

2)编写 Exporter#

文件:/opt/biz_exporter/biz_exporter.py

from prometheus_client import start_http_server, Counter, Gauge
import random, time

# Counter: 订单请求总数
order_total = Counter(
    'biz_order_requests_total',
    'Total orders by result',
    ['service', 'env', 'code']
)

# Gauge: 队列长度
queue_len = Gauge(
    'biz_order_queue_length',
    'Current queue length',
    ['service', 'env']
)

SERVICE = "order"
ENV = "prod"

def collect():
    # 模拟业务逻辑
    code = random.choice(["2xx", "5xx"])
    order_total.labels(service=SERVICE, env=ENV, code=code).inc()
    queue_len.labels(service=SERVICE, env=ENV).set(random.randint(0, 50))

if __name__ == "__main__":
    start_http_server(8000)
    while True:
        collect()
        time.sleep(2)

3)运行与验证#

# 启动 Exporter
source /opt/exporter-venv/bin/activate
python /opt/biz_exporter/biz_exporter.py

# 验证指标
curl -s http://127.0.0.1:8000/metrics | head -n 20

预期输出包含:

# HELP biz_order_requests_total Total orders by result
# TYPE biz_order_requests_total counter
biz_order_requests_total{service="order",env="prod",code="2xx"} 10
biz_order_requests_total{service="order",env="prod",code="5xx"} 1

4)Prometheus 配置接入#

文件:/etc/prometheus/prometheus.yml

scrape_configs:
  - job_name: "biz_exporter"
    static_configs:
      - targets: ["127.0.0.1:8000"]

重载配置:

curl -X POST http://127.0.0.1:9090/-/reload

安装与运行常见排错#

1)/metrics 无响应
- 排查端口:ss -lntp | grep 8000
- 检查进程:ps -ef | grep biz_exporter.py

2)Prometheus 抓取失败(target down)
- 访问 http://127.0.0.1:9090/targets 查看状态
- 检查防火墙:iptables -L -nfirewall-cmd --list-ports

3)指标格式错误
- 直接 curl /metrics,确认输出包含 # HELP# TYPE
- 检查是否有非数字值、非法标签名

安全与隔离示例(Nginx 反向代理 + 基本认证)#

# 安装 nginx 与 htpasswd 工具
yum -y install nginx httpd-tools
htpasswd -bc /etc/nginx/.htpasswd promuser promPass123

/etc/nginx/conf.d/exporter.conf

server {
  listen 9108;
  location /metrics {
    auth_basic "Metrics";
    auth_basic_user_file /etc/nginx/.htpasswd;
    proxy_pass http://127.0.0.1:8000/metrics;
  }
}

重启:

systemctl restart nginx
curl -u promuser:promPass123 http://127.0.0.1:9108/metrics | head

练习#

1)为应用新增一个 biz_order_latency_seconds Histogram 指标,并设置桶边界:0.1,0.2,0.5,1,2,5
2)在 Exporter 中增加自监控指标 exporter_collect_errors_total,当采集失败时自增。
3)将 Exporter 改为 systemd 服务,要求开机自启并输出日志到 /var/log/biz_exporter.log