19.6.3 日志解析、规范化与索引
日志解析、规范化与索引是日志管理体系的核心环节,目标是将多源、多格式日志转换为统一、可检索、可关联的结构化数据,并建立高效索引支撑搜索与分析。本节以常见 ELK/EFK 思路为例,覆盖原理草图、安装、解析与规范化、索引与生命周期、排错与练习。
原理草图(采集→解析→规范化→索引)#
1. 安装与基础链路验证(示例)#
以下示例基于 Debian/Ubuntu(CentOS 需替换为 yum/dnf)。
# 1) 安装 Elasticsearch(单节点实验)
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
echo "deb https://artifacts.elastic.co/packages/8.x/apt stable main" | \
sudo tee /etc/apt/sources.list.d/elastic-8.x.list
sudo apt update
sudo apt install -y elasticsearch
# 2) 启动并查看状态
sudo systemctl enable --now elasticsearch
sudo systemctl status elasticsearch --no-pager
# 3) 安装 Logstash 与 Filebeat
sudo apt install -y logstash filebeat
# 4) 启动服务
sudo systemctl enable --now logstash
sudo systemctl enable --now filebeat
链路验证(确认 ES 可用):
curl -s http://127.0.0.1:9200 | jq .
# 预期输出:包含 name、cluster_name、version 等字段
2. 解析与规范化(Grok + 字段标准)#
2.1 示例日志#
应用日志示例(Nginx access):
10.0.0.1 - - [10/Oct/2023:13:55:36 +0800] "GET /api/v1/user?id=1 HTTP/1.1" 200 123 "-" "curl/7.68.0" 0.045
2.2 Logstash 解析与规范化配置#
文件:/etc/logstash/conf.d/nginx.conf
input {
beats {
port => 5044
}
}
filter {
grok {
match => {
"message" => "%{IP:client_ip} - - \[%{HTTPDATE:timestamp}\] \"%{WORD:method} %{URIPATHPARAM:uri} HTTP/%{NUMBER:http_version}\" %{NUMBER:status:int} %{NUMBER:bytes:int} \"%{DATA:referrer}\" \"%{DATA:user_agent}\" %{NUMBER:latency:float}"
}
remove_field => ["message"]
}
date {
match => ["timestamp", "dd/MMM/yyyy:HH:mm:ss Z"]
target => "@timestamp"
remove_field => ["timestamp"]
}
mutate {
add_field => {
"service" => "nginx"
"env" => "prod"
"log_type" => "access"
}
rename => { "client_ip" => "source.ip" }
}
}
output {
elasticsearch {
hosts => ["http://127.0.0.1:9200"]
index => "logs-nginx-%{+YYYY.MM.dd}"
}
}
命令说明与预期:
# 1) 测试配置语法
sudo /usr/share/logstash/bin/logstash --path.settings /etc/logstash -t
# 2) 重载 Logstash
sudo systemctl restart logstash
# 3) Filebeat 指向 Logstash
sudo sed -i 's/#host: "localhost:5044"/host: "127.0.0.1:5044"/' /etc/filebeat/filebeat.yml
sudo systemctl restart filebeat
2.3 字段规范化要点#
- 时间统一到
@timestamp,时区明确 - 必填字段建议:
service、env、log_type、source.ip、status - 类型强制:
status:int、bytes:int、latency:float
3. 索引设计与模板(避免字段漂移)#
创建索引模板统一字段类型,防止解析不一致:
cat <<'EOF' > /tmp/logs-template.json
{
"index_patterns": ["logs-nginx-*"],
"template": {
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"dynamic": true,
"properties": {
"@timestamp": {"type": "date"},
"service": {"type": "keyword"},
"env": {"type": "keyword"},
"log_type": {"type": "keyword"},
"source.ip": {"type": "ip"},
"status": {"type": "integer"},
"bytes": {"type": "integer"},
"latency": {"type": "float"},
"uri": {"type": "keyword"},
"method": {"type": "keyword"}
}
}
}
}
EOF
curl -s -X PUT http://127.0.0.1:9200/_index_template/logs-template \
-H 'Content-Type: application/json' \
-d @/tmp/logs-template.json | jq .
查询验证:
curl -s -X GET "http://127.0.0.1:9200/logs-nginx-*/_search?size=1" | jq '.hits.hits[0]._source'
# 预期:字段均为标准化后的结构化数据
4. 解析失败与质量检测(示例)#
将解析失败日志打入隔离索引,便于二次修复:
filter {
grok {
match => { "message" => "..." }
tag_on_failure => ["_grok_failed"]
}
}
output {
if "_grok_failed" in [tags] {
elasticsearch { hosts => ["http://127.0.0.1:9200"] index => "logs-deadletter-%{+YYYY.MM.dd}" }
} else {
elasticsearch { hosts => ["http://127.0.0.1:9200"] index => "logs-nginx-%{+YYYY.MM.dd}" }
}
}
质量检测(字段缺失率示例):
curl -s -X GET "http://127.0.0.1:9200/logs-nginx-*/_search" \
-H 'Content-Type: application/json' -d '{
"size": 0,
"aggs": {
"missing_status": { "missing": { "field": "status" } }
}
}' | jq .
5. 索引生命周期(ILM)与冷热分层(示例)#
cat <<'EOF' > /tmp/logs-ilm.json
{
"policy": {
"phases": {
"hot": { "actions": { "rollover": { "max_age": "7d", "max_size": "30gb" } } },
"warm": { "min_age": "7d", "actions": { "shrink": { "number_of_shards": 1 } } },
"cold": { "min_age": "30d", "actions": { "freeze": {} } },
"delete": { "min_age": "90d", "actions": { "delete": {} } }
}
}
}
EOF
curl -s -X PUT http://127.0.0.1:9200/_ilm/policy/logs-ilm \
-H 'Content-Type: application/json' -d @/tmp/logs-ilm.json | jq .
6. 常见排错(含命令)#
1) Grok 解析失败
# 查看 Logstash 日志
sudo tail -f /var/log/logstash/logstash-plain.log
# 关注 _grok_failed 标签与 message 原文
2) 字段类型冲突
# 查看索引映射
curl -s http://127.0.0.1:9200/logs-nginx-*/_mapping | jq .
# 若类型不一致,创建新索引并重建模板
3) Filebeat 无数据
sudo filebeat test output
sudo filebeat test config
sudo journalctl -u filebeat -n 100 --no-pager
# 检查采集路径与权限
7. 练习(动手)#
1) 将 Nginx access 解析新增字段 request_id(从请求头或日志中提取),并在模板中设为 keyword。
2) 将 latency 超过 1 秒的日志写入 logs-slow-* 索引(条件分流)。
3) 将解析失败日志在 logs-deadletter-* 中重跑一次解析并输出修复率统计。