19.6.3 日志解析、规范化与索引

日志解析、规范化与索引是日志管理体系的核心环节,目标是将多源、多格式日志转换为统一、可检索、可关联的结构化数据,并建立高效索引支撑搜索与分析。本节以常见 ELK/EFK 思路为例,覆盖原理草图、安装、解析与规范化、索引与生命周期、排错与练习。

原理草图(采集→解析→规范化→索引)#

文章图片

1. 安装与基础链路验证(示例)#

以下示例基于 Debian/Ubuntu(CentOS 需替换为 yum/dnf)。

# 1) 安装 Elasticsearch(单节点实验)
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
echo "deb https://artifacts.elastic.co/packages/8.x/apt stable main" | \
  sudo tee /etc/apt/sources.list.d/elastic-8.x.list
sudo apt update
sudo apt install -y elasticsearch

# 2) 启动并查看状态
sudo systemctl enable --now elasticsearch
sudo systemctl status elasticsearch --no-pager

# 3) 安装 Logstash 与 Filebeat
sudo apt install -y logstash filebeat

# 4) 启动服务
sudo systemctl enable --now logstash
sudo systemctl enable --now filebeat

链路验证(确认 ES 可用):

curl -s http://127.0.0.1:9200 | jq .
# 预期输出:包含 name、cluster_name、version 等字段

2. 解析与规范化(Grok + 字段标准)#

2.1 示例日志#

应用日志示例(Nginx access):

10.0.0.1 - - [10/Oct/2023:13:55:36 +0800] "GET /api/v1/user?id=1 HTTP/1.1" 200 123 "-" "curl/7.68.0" 0.045

2.2 Logstash 解析与规范化配置#

文件:/etc/logstash/conf.d/nginx.conf

input {
  beats {
    port => 5044
  }
}

filter {
  grok {
    match => {
      "message" => "%{IP:client_ip} - - \[%{HTTPDATE:timestamp}\] \"%{WORD:method} %{URIPATHPARAM:uri} HTTP/%{NUMBER:http_version}\" %{NUMBER:status:int} %{NUMBER:bytes:int} \"%{DATA:referrer}\" \"%{DATA:user_agent}\" %{NUMBER:latency:float}"
    }
    remove_field => ["message"]
  }

  date {
    match => ["timestamp", "dd/MMM/yyyy:HH:mm:ss Z"]
    target => "@timestamp"
    remove_field => ["timestamp"]
  }

  mutate {
    add_field => {
      "service" => "nginx"
      "env" => "prod"
      "log_type" => "access"
    }
    rename => { "client_ip" => "source.ip" }
  }
}

output {
  elasticsearch {
    hosts => ["http://127.0.0.1:9200"]
    index => "logs-nginx-%{+YYYY.MM.dd}"
  }
}

命令说明与预期:

# 1) 测试配置语法
sudo /usr/share/logstash/bin/logstash --path.settings /etc/logstash -t

# 2) 重载 Logstash
sudo systemctl restart logstash

# 3) Filebeat 指向 Logstash
sudo sed -i 's/#host: "localhost:5044"/host: "127.0.0.1:5044"/' /etc/filebeat/filebeat.yml
sudo systemctl restart filebeat

2.3 字段规范化要点#

  • 时间统一到 @timestamp,时区明确
  • 必填字段建议:serviceenvlog_typesource.ipstatus
  • 类型强制:status:intbytes:intlatency:float

3. 索引设计与模板(避免字段漂移)#

创建索引模板统一字段类型,防止解析不一致:

cat <<'EOF' > /tmp/logs-template.json
{
  "index_patterns": ["logs-nginx-*"],
  "template": {
    "settings": {
      "number_of_shards": 1,
      "number_of_replicas": 0
    },
    "mappings": {
      "dynamic": true,
      "properties": {
        "@timestamp": {"type": "date"},
        "service": {"type": "keyword"},
        "env": {"type": "keyword"},
        "log_type": {"type": "keyword"},
        "source.ip": {"type": "ip"},
        "status": {"type": "integer"},
        "bytes": {"type": "integer"},
        "latency": {"type": "float"},
        "uri": {"type": "keyword"},
        "method": {"type": "keyword"}
      }
    }
  }
}
EOF

curl -s -X PUT http://127.0.0.1:9200/_index_template/logs-template \
  -H 'Content-Type: application/json' \
  -d @/tmp/logs-template.json | jq .

查询验证:

curl -s -X GET "http://127.0.0.1:9200/logs-nginx-*/_search?size=1" | jq '.hits.hits[0]._source'
# 预期:字段均为标准化后的结构化数据

4. 解析失败与质量检测(示例)#

将解析失败日志打入隔离索引,便于二次修复:

filter {
  grok {
    match => { "message" => "..." }
    tag_on_failure => ["_grok_failed"]
  }
}

output {
  if "_grok_failed" in [tags] {
    elasticsearch { hosts => ["http://127.0.0.1:9200"] index => "logs-deadletter-%{+YYYY.MM.dd}" }
  } else {
    elasticsearch { hosts => ["http://127.0.0.1:9200"] index => "logs-nginx-%{+YYYY.MM.dd}" }
  }
}

质量检测(字段缺失率示例):

curl -s -X GET "http://127.0.0.1:9200/logs-nginx-*/_search" \
  -H 'Content-Type: application/json' -d '{
    "size": 0,
    "aggs": {
      "missing_status": { "missing": { "field": "status" } }
    }
  }' | jq .

5. 索引生命周期(ILM)与冷热分层(示例)#

cat <<'EOF' > /tmp/logs-ilm.json
{
  "policy": {
    "phases": {
      "hot": { "actions": { "rollover": { "max_age": "7d", "max_size": "30gb" } } },
      "warm": { "min_age": "7d", "actions": { "shrink": { "number_of_shards": 1 } } },
      "cold": { "min_age": "30d", "actions": { "freeze": {} } },
      "delete": { "min_age": "90d", "actions": { "delete": {} } }
    }
  }
}
EOF

curl -s -X PUT http://127.0.0.1:9200/_ilm/policy/logs-ilm \
  -H 'Content-Type: application/json' -d @/tmp/logs-ilm.json | jq .

6. 常见排错(含命令)#

1) Grok 解析失败

# 查看 Logstash 日志
sudo tail -f /var/log/logstash/logstash-plain.log
# 关注 _grok_failed 标签与 message 原文

2) 字段类型冲突

# 查看索引映射
curl -s http://127.0.0.1:9200/logs-nginx-*/_mapping | jq .
# 若类型不一致,创建新索引并重建模板

3) Filebeat 无数据

sudo filebeat test output
sudo filebeat test config
sudo journalctl -u filebeat -n 100 --no-pager
# 检查采集路径与权限

7. 练习(动手)#

1) 将 Nginx access 解析新增字段 request_id(从请求头或日志中提取),并在模板中设为 keyword
2) 将 latency 超过 1 秒的日志写入 logs-slow-* 索引(条件分流)。
3) 将解析失败日志在 logs-deadletter-* 中重跑一次解析并输出修复率统计。