17.4.4 自定义Exporter与指标设计规范
自定义 Exporter 用于采集业务或系统中 Prometheus 生态未覆盖的指标,通常通过 HTTP 暴露 /metrics 端点并返回 OpenMetrics/Prometheus 文本格式指标。建议优先复用官方 Client 库(Go/Java/Python),避免自建协议导致维护成本上升。
指标类型与设计原则#
- Counter:只增不减(如请求数/错误数),重启自然归零。
- Gauge:可增可减(如连接数/队列长度)。
- Histogram:分布统计与分位数分析(如耗时)。
- Summary:客户端分位数,低维度小规模场景。
命名、单位与标签规范(示例)#
- 命名:
<namespace>_<subsystem>_<metric>_total(Counter) - 单位:
_seconds、_bytes、_ratio - 标签:仅保留可聚合维度,如
service、instance、env、code(避免 user_id、request_id)
示例指标设计:
# HELP biz_order_requests_total Total orders by result
# TYPE biz_order_requests_total counter
biz_order_requests_total{service="order",env="prod",code="2xx"} 10234
biz_order_requests_total{service="order",env="prod",code="5xx"} 12
# HELP biz_order_queue_length Current queue length
# TYPE biz_order_queue_length gauge
biz_order_queue_length{service="order",env="prod"} 37
示例:使用 Python 编写自定义 Exporter#
1)安装#
# 依赖安装
python3 -m venv /opt/exporter-venv
source /opt/exporter-venv/bin/activate
pip install prometheus-client==0.20.0
2)编写 Exporter#
文件:/opt/biz_exporter/biz_exporter.py
from prometheus_client import start_http_server, Counter, Gauge
import random, time
# Counter: 订单请求总数
order_total = Counter(
'biz_order_requests_total',
'Total orders by result',
['service', 'env', 'code']
)
# Gauge: 队列长度
queue_len = Gauge(
'biz_order_queue_length',
'Current queue length',
['service', 'env']
)
SERVICE = "order"
ENV = "prod"
def collect():
# 模拟业务逻辑
code = random.choice(["2xx", "5xx"])
order_total.labels(service=SERVICE, env=ENV, code=code).inc()
queue_len.labels(service=SERVICE, env=ENV).set(random.randint(0, 50))
if __name__ == "__main__":
start_http_server(8000)
while True:
collect()
time.sleep(2)
3)运行与验证#
# 启动 Exporter
source /opt/exporter-venv/bin/activate
python /opt/biz_exporter/biz_exporter.py
# 验证指标
curl -s http://127.0.0.1:8000/metrics | head -n 20
预期输出包含:
# HELP biz_order_requests_total Total orders by result
# TYPE biz_order_requests_total counter
biz_order_requests_total{service="order",env="prod",code="2xx"} 10
biz_order_requests_total{service="order",env="prod",code="5xx"} 1
4)Prometheus 配置接入#
文件:/etc/prometheus/prometheus.yml
scrape_configs:
- job_name: "biz_exporter"
static_configs:
- targets: ["127.0.0.1:8000"]
重载配置:
curl -X POST http://127.0.0.1:9090/-/reload
安装与运行常见排错#
1)/metrics 无响应
- 排查端口:ss -lntp | grep 8000
- 检查进程:ps -ef | grep biz_exporter.py
2)Prometheus 抓取失败(target down)
- 访问 http://127.0.0.1:9090/targets 查看状态
- 检查防火墙:iptables -L -n 或 firewall-cmd --list-ports
3)指标格式错误
- 直接 curl /metrics,确认输出包含 # HELP、# TYPE
- 检查是否有非数字值、非法标签名
安全与隔离示例(Nginx 反向代理 + 基本认证)#
# 安装 nginx 与 htpasswd 工具
yum -y install nginx httpd-tools
htpasswd -bc /etc/nginx/.htpasswd promuser promPass123
/etc/nginx/conf.d/exporter.conf
server {
listen 9108;
location /metrics {
auth_basic "Metrics";
auth_basic_user_file /etc/nginx/.htpasswd;
proxy_pass http://127.0.0.1:8000/metrics;
}
}
重启:
systemctl restart nginx
curl -u promuser:promPass123 http://127.0.0.1:9108/metrics | head
练习#
1)为应用新增一个 biz_order_latency_seconds Histogram 指标,并设置桶边界:0.1,0.2,0.5,1,2,5。
2)在 Exporter 中增加自监控指标 exporter_collect_errors_total,当采集失败时自增。
3)将 Exporter 改为 systemd 服务,要求开机自启并输出日志到 /var/log/biz_exporter.log。