17.3.5 目标重写与元数据标签管理

目标重写(relabeling)用于在抓取前后对目标、标签与指标进行转换与过滤,是治理标签规范、控制采集规模与增强可观测性的关键机制。本节给出原理草图、可执行配置、验证命令与排错步骤。

原理草图(重写阶段)

文章图片

最小可运行示例(含目标筛选、标签映射与清洗)
文件路径:/etc/prometheus/prometheus.yml

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'k8s-pods'
    # 假设使用kubernetes_sd_config,示例保持简化
    kubernetes_sd_configs:
      - role: pod

    relabel_configs:
      # 1) 仅保留生产环境命名空间
      - source_labels: [__meta_kubernetes_namespace]
        regex: "prod|production"
        action: keep

      # 2) 将Pod标签映射为业务标签
      - source_labels: [__meta_kubernetes_pod_label_app]
        target_label: app
        action: replace

      # 3) 统一实例标识
      - source_labels: [__meta_kubernetes_pod_ip, __meta_kubernetes_pod_container_port_number]
        separator: ":"
        target_label: instance
        action: replace

      # 4) 清理高基数标签
      - regex: "__meta_kubernetes_pod_uid|__meta_kubernetes_pod_container_id"
        action: labeldrop

    metric_relabel_configs:
      # 5) 仅保留关键指标,避免采集噪音
      - source_labels: [__name__]
        regex: "http_requests_total|process_cpu_seconds_total|up"
        action: keep

命令解释与验证流程

# 1) 校验配置文件语法与规则完整性
promtool check config /etc/prometheus/prometheus.yml
# 预期:SUCCESS 或具体错误行号

# 2) 重新加载配置(不重启)
curl -X POST http://localhost:9090/-/reload

# 3) 查看当前目标与重写后的标签(需要Prometheus Web)
# 打开浏览器: http://localhost:9090/targets
# 在“Labels”中确认 app/instance 等标签是否出现

典型重写规则示例

# 目标筛选:仅保留带有label env=prod 的目标
- source_labels: [__meta_kubernetes_pod_label_env]
  regex: "prod"
  action: keep

# 路径与协议重写:统一 metrics 路径
- target_label: __metrics_path__
  replacement: /metrics
  action: replace

- target_label: __scheme__
  replacement: http
  action: replace

# 批量标签映射:将__meta_kubernetes_pod_label_* 映射为业务标签
- regex: "__meta_kubernetes_pod_label_(.+)"
  action: labelmap
  replacement: "$1"

安装与环境准备(快速验证用)

# 以Linux为例快速安装prometheus
mkdir -p /opt/prometheus && cd /opt/prometheus
wget https://github.com/prometheus/prometheus/releases/download/v2.50.0/prometheus-2.50.0.linux-amd64.tar.gz
tar -zxf prometheus-2.50.0.linux-amd64.tar.gz
ln -s prometheus-2.50.0.linux-amd64 prometheus

# 启动(前台,便于观察)
/opt/prometheus/prometheus/prometheus \
  --config.file=/etc/prometheus/prometheus.yml \
  --web.listen-address=:9090

排错清单(含明确命令)
- 目标丢失
- 检查keep/drop正则是否过严:
bash grep -n "action: keep\|action: drop" -n /etc/prometheus/prometheus.yml
- 在Targets页面查看discovered labels是否含__meta_*字段。
- 标签缺失
- 检查labelmap正则是否匹配:
bash promtool check config /etc/prometheus/prometheus.yml
- 注意重写顺序:先keep/droplabelmap
- 基数暴涨
- 查找高基数字段并清理:
bash # 示例:统计label值数量(使用PromQL在UI中执行) # count by (label) (label_replace(up, "label", "$1", "pod", "(.*)"))
- 用labeldrop清理pod_uidcontainer_id等动态标签。

告警联动示例(确保告警标签标准化)

# 将元数据映射为告警所需标签
- source_labels: [__meta_kubernetes_pod_label_team]
  target_label: team
  action: replace
- source_labels: [__meta_kubernetes_pod_label_service]
  target_label: service
  action: replace
- source_labels: [__meta_kubernetes_pod_label_severity]
  target_label: severity
  action: replace

练习
1. 将命名空间为dev的目标全部过滤掉,并验证Targets只剩prod
2. 为所有目标添加固定标签cluster=cn-bj-1,并在PromQL中查询up{cluster="cn-bj-1"}
3. 将pod_uidcontainer_id标签删除,观察TSDB基数变化(对比/metrics中的prometheus_tsdb_head_series)。