16.8.2 日志采集与集中化分析

日志采集与集中化分析在Kubernetes中用于统一收集容器标准输出、节点系统日志与关键中间件日志,解决日志散落、容器漂移与故障复现困难的问题。目标是可搜索、可关联、可追溯,并支持告警联动与审计需求。

原理与架构(Agent采集为主):

文章图片

日志类型与来源:
- 应用容器 stdout/stderr(优先结构化JSON)
- 节点系统日志(journald、syslog)
- Kubernetes组件日志(kubelet、apiserver、scheduler、controller-manager)
- 中间件日志(MySQL、Nginx、Redis、Kafka等)
- 审计日志与安全事件

安装示例:Fluent Bit + Elasticsearch(Kubernetes)#

1)安装Elasticsearch(示例使用官方 Helm):

helm repo add elastic https://helm.elastic.co
helm repo update
kubectl create ns logging

helm install es elastic/elasticsearch \
  -n logging \
  --set replicas=1 \
  --set minimumMasterNodes=1

2)部署Fluent Bit(DaemonSet),采集容器日志并输出到ES:

# 文件: fluent-bit-daemonset.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: fluent-bit
  namespace: logging
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluent-bit
  namespace: logging
spec:
  selector:
    matchLabels:
      app: fluent-bit
  template:
    metadata:
      labels:
        app: fluent-bit
    spec:
      serviceAccountName: fluent-bit
      containers:
      - name: fluent-bit
        image: cr.fluentbit.io/fluent/fluent-bit:2.1
        args: ["-c", "/fluent-bit/etc/fluent-bit.conf"]
        volumeMounts:
        - name: varlog
          mountPath: /var/log
        - name: containers
          mountPath: /var/lib/docker/containers
          readOnly: true
        - name: config
          mountPath: /fluent-bit/etc
      volumes:
      - name: varlog
        hostPath: { path: /var/log }
      - name: containers
        hostPath: { path: /var/lib/docker/containers }
      - name: config
        configMap: { name: fluent-bit-config }
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: fluent-bit-config
  namespace: logging
data:
  fluent-bit.conf: |
    [SERVICE]
        Flush        1
        Daemon       Off
        Log_Level    info
        Parsers_File parsers.conf

    [INPUT]
        Name        tail
        Path        /var/log/containers/*.log
        Parser      docker
        Tag         kube.*
        Mem_Buf_Limit 50MB
        Skip_Long_Lines On

    [FILTER]
        Name        kubernetes
        Match       kube.*
        Kube_URL    https://kubernetes.default.svc:443
        Merge_Log   On
        Keep_Log    Off

    [OUTPUT]
        Name        es
        Match       kube.*
        Host        es-elasticsearch.logging.svc.cluster.local
        Port        9200
        Index       k8s-logs
        Logstash_Format On

  parsers.conf: |
    [PARSER]
        Name   docker
        Format json
        Time_Key time
        Time_Format %Y-%m-%dT%H:%M:%S.%L

应用:

kubectl apply -f fluent-bit-daemonset.yaml
kubectl -n logging get pods

预期效果:
- Fluent Bit 每个节点启动一个Pod
- ES中出现 k8s-logs-* 索引
- Kibana/Grafana可查询日志

结构化日志示例(应用侧)#

建议应用输出JSON,包含trace_id等字段:

{"time":"2024-05-01T10:00:01.123Z","level":"INFO","trace_id":"7f9a","request_id":"abc123","namespace":"prod","pod":"web-7d9c","msg":"login success"}

关键命令与解释#

查看节点容器日志路径:

ls -l /var/log/containers
# 预期:看到 *.log 文件,命名含pod/namespace信息

验证采集器读取:

kubectl -n logging logs -l app=fluent-bit --tail=5
# 预期:看到发送到ES的日志/缓冲信息

检查ES索引是否存在:

kubectl -n logging port-forward svc/es-elasticsearch 9200:9200
curl -s http://127.0.0.1:9200/_cat/indices?v
# 预期:出现 k8s-logs-* 索引

多行日志处理示例(Java堆栈)#

# 在Fluent Bit中添加多行规则
[MULTILINE_PARSER]
    Name          java_multiline
    Type          regex
    Flush_Timeout 1000
    Rule      "start_state"  "/^\\d{4}-\\d{2}-\\d{2}/"  "cont"
    Rule      "cont"         "/^\\s+at\\s+/"            "cont"

[INPUT]
    Name         tail
    Path         /var/log/containers/*.log
    Multiline.Parser  java_multiline

排错清单与命令#

1)日志丢失
- 检查容器日志轮转:

cat /etc/docker/daemon.json
# 关注 log-opts.max-size / max-file
  • 查看Fluent Bit缓冲:
kubectl -n logging exec -it ds/fluent-bit -- ls /fluent-bit/tail

2)时间错乱
- 核对节点NTP同步:

timedatectl status
chronyc tracking

3)写入延迟
- 调整批量与后端性能:

[OUTPUT]
    Name es
    Buffer_Size 5M
    Retry_Limit False

练习#

1)部署一个nginx并访问,验证日志进入ES索引。
2)故意写入多行异常堆栈日志,验证多行合并是否正确。
3)设置过滤规则仅保留level=ERROR日志,比较索引增长速度。
4)在Fluent Bit中增加字段:

[FILTER]
    Name modify
    Match kube.*
    Add  env  prod

最佳实践#

  • 统一JSON结构化日志与字段标准(time、level、trace_id、request_id等)
  • 日志分流(namespace/环境隔离)
  • 设置索引生命周期与保留策略(冷热分层)
  • 与指标和追踪通过trace_id联动
  • 审计日志单独索引与权限控制