16.8.2 日志采集与集中化分析
日志采集与集中化分析在Kubernetes中用于统一收集容器标准输出、节点系统日志与关键中间件日志,解决日志散落、容器漂移与故障复现困难的问题。目标是可搜索、可关联、可追溯,并支持告警联动与审计需求。
原理与架构(Agent采集为主):
日志类型与来源:
- 应用容器 stdout/stderr(优先结构化JSON)
- 节点系统日志(journald、syslog)
- Kubernetes组件日志(kubelet、apiserver、scheduler、controller-manager)
- 中间件日志(MySQL、Nginx、Redis、Kafka等)
- 审计日志与安全事件
安装示例:Fluent Bit + Elasticsearch(Kubernetes)#
1)安装Elasticsearch(示例使用官方 Helm):
helm repo add elastic https://helm.elastic.co
helm repo update
kubectl create ns logging
helm install es elastic/elasticsearch \
-n logging \
--set replicas=1 \
--set minimumMasterNodes=1
2)部署Fluent Bit(DaemonSet),采集容器日志并输出到ES:
# 文件: fluent-bit-daemonset.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: fluent-bit
namespace: logging
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluent-bit
namespace: logging
spec:
selector:
matchLabels:
app: fluent-bit
template:
metadata:
labels:
app: fluent-bit
spec:
serviceAccountName: fluent-bit
containers:
- name: fluent-bit
image: cr.fluentbit.io/fluent/fluent-bit:2.1
args: ["-c", "/fluent-bit/etc/fluent-bit.conf"]
volumeMounts:
- name: varlog
mountPath: /var/log
- name: containers
mountPath: /var/lib/docker/containers
readOnly: true
- name: config
mountPath: /fluent-bit/etc
volumes:
- name: varlog
hostPath: { path: /var/log }
- name: containers
hostPath: { path: /var/lib/docker/containers }
- name: config
configMap: { name: fluent-bit-config }
---
apiVersion: v1
kind: ConfigMap
metadata:
name: fluent-bit-config
namespace: logging
data:
fluent-bit.conf: |
[SERVICE]
Flush 1
Daemon Off
Log_Level info
Parsers_File parsers.conf
[INPUT]
Name tail
Path /var/log/containers/*.log
Parser docker
Tag kube.*
Mem_Buf_Limit 50MB
Skip_Long_Lines On
[FILTER]
Name kubernetes
Match kube.*
Kube_URL https://kubernetes.default.svc:443
Merge_Log On
Keep_Log Off
[OUTPUT]
Name es
Match kube.*
Host es-elasticsearch.logging.svc.cluster.local
Port 9200
Index k8s-logs
Logstash_Format On
parsers.conf: |
[PARSER]
Name docker
Format json
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L
应用:
kubectl apply -f fluent-bit-daemonset.yaml
kubectl -n logging get pods
预期效果:
- Fluent Bit 每个节点启动一个Pod
- ES中出现 k8s-logs-* 索引
- Kibana/Grafana可查询日志
结构化日志示例(应用侧)#
建议应用输出JSON,包含trace_id等字段:
{"time":"2024-05-01T10:00:01.123Z","level":"INFO","trace_id":"7f9a","request_id":"abc123","namespace":"prod","pod":"web-7d9c","msg":"login success"}
关键命令与解释#
查看节点容器日志路径:
ls -l /var/log/containers
# 预期:看到 *.log 文件,命名含pod/namespace信息
验证采集器读取:
kubectl -n logging logs -l app=fluent-bit --tail=5
# 预期:看到发送到ES的日志/缓冲信息
检查ES索引是否存在:
kubectl -n logging port-forward svc/es-elasticsearch 9200:9200
curl -s http://127.0.0.1:9200/_cat/indices?v
# 预期:出现 k8s-logs-* 索引
多行日志处理示例(Java堆栈)#
# 在Fluent Bit中添加多行规则
[MULTILINE_PARSER]
Name java_multiline
Type regex
Flush_Timeout 1000
Rule "start_state" "/^\\d{4}-\\d{2}-\\d{2}/" "cont"
Rule "cont" "/^\\s+at\\s+/" "cont"
[INPUT]
Name tail
Path /var/log/containers/*.log
Multiline.Parser java_multiline
排错清单与命令#
1)日志丢失
- 检查容器日志轮转:
cat /etc/docker/daemon.json
# 关注 log-opts.max-size / max-file
- 查看Fluent Bit缓冲:
kubectl -n logging exec -it ds/fluent-bit -- ls /fluent-bit/tail
2)时间错乱
- 核对节点NTP同步:
timedatectl status
chronyc tracking
3)写入延迟
- 调整批量与后端性能:
[OUTPUT]
Name es
Buffer_Size 5M
Retry_Limit False
练习#
1)部署一个nginx并访问,验证日志进入ES索引。
2)故意写入多行异常堆栈日志,验证多行合并是否正确。
3)设置过滤规则仅保留level=ERROR日志,比较索引增长速度。
4)在Fluent Bit中增加字段:
[FILTER]
Name modify
Match kube.*
Add env prod
最佳实践#
- 统一JSON结构化日志与字段标准(time、level、trace_id、request_id等)
- 日志分流(namespace/环境隔离)
- 设置索引生命周期与保留策略(冷热分层)
- 与指标和追踪通过trace_id联动
- 审计日志单独索引与权限控制