16.4.2 CoreDNS与集群DNS解析

CoreDNS与集群DNS解析#

CoreDNS 是 Kubernetes 默认的集群 DNS 服务，负责将 Service、Pod 与外部域名解析为可访问的 IP。Pod 通过 kubelet 注入的 /etc/resolv.conf 指向 kube-dns Service 的 ClusterIP，从而完成服务发现与内部通信。

原理草图与解析流程#

Service 解析：<service>.<namespace>.svc.<cluster-domain> → ClusterIP；无头服务返回 Pod IP 列表
Pod 解析：启用 pods 插件后可解析 pod-ip.<namespace>.pod.<cluster-domain>
搜索域：默认 namespace.svc.cluster-domain 与 svc.cluster-domain
外部解析：通过 forward 插件转发到上游 DNS

安装与部署校验（集群内自检）#

# 1) 确认 coredns 部署存在
kubectl -n kube-system get deploy coredns

# 2) 确认 Service 与 ClusterIP
kubectl -n kube-system get svc kube-dns

# 3) 查看 Corefile 配置
kubectl -n kube-system get cm coredns -o yaml

预期结果：kube-dns Service 存在，CoreDNS Pod 为 Running，Corefile 包含 kubernetes 与 forward 插件。

Corefile 关键配置示例（可直接使用）#

# 编辑 CoreDNS 配置
kubectl -n kube-system edit cm coredns

# /etc/coredns/Corefile
.:53 {
    errors
    health
    ready
    kubernetes cluster.local in-addr.arpa ip6.arpa {
        pods insecure
        fallthrough in-addr.arpa ip6.arpa
        ttl 30
    }
    forward . 223.5.5.5 8.8.8.8
    cache 30
    log
    reload
}

命令解释：
- kubernetes：监听 API Server，维护 Service/Endpoint 映射
- forward：把非集群域名转发到上游 DNS
- cache 30：缓存 30 秒，减少请求与延迟

修改后重载：

kubectl -n kube-system rollout restart deploy coredns
kubectl -n kube-system rollout status deploy coredns

解析验证与示例（包含预期输出）#

# 1) 创建测试服务
kubectl create ns dns-test
kubectl -n dns-test create deploy web --image=nginx --port=80
kubectl -n dns-test expose deploy web --port=80

# 2) 使用临时 Pod 验证解析
kubectl -n dns-test run -it --rm dnsutils --image=registry.k8s.io/e2e-test-images/agnhost:2.45 -- bash

进入 Pod 后执行：

# 解析服务（短名）
nslookup web

# 解析完整域名
nslookup web.dns-test.svc.cluster.local

# 解析外网域名
nslookup www.baidu.com

预期：web 解析到 ClusterIP，外网域名能解析到公网 IP。

排错与诊断（含命令与解释）#

# 1) 查看 CoreDNS 日志
kubectl -n kube-system logs -l k8s-app=kube-dns --tail=100

# 2) 检查 CoreDNS 资源限制
kubectl -n kube-system describe deploy coredns | sed -n '/Limits:/,/Requests:/p'

# 3) 确认 Pod 的 resolv.conf
kubectl -n dns-test exec -it deploy/web -- cat /etc/resolv.conf

常见故障与处理
- 解析超时：
- 检查 CoreDNS 副本与资源：kubectl -n kube-system scale deploy coredns --replicas=2
- 确认上游 DNS 可达：在 CoreDNS Pod 内 nslookup 上游域名
- 服务记录不更新：
- 检查 Endpoints：kubectl -n dns-test get ep web
- API Server 访问权限异常时检查 kube-system 网络策略
- 跨命名空间解析失败：
- 使用全限定域名 service.ns.svc.cluster.local
- 检查 Pod 搜索域 search
- DNS 负载过高：
- 增加副本、调大 cache、优化 ndots

生产实践与监控示例#

# 1) 设置 HPA（示例）
kubectl -n kube-system autoscale deploy coredns --min=2 --max=5 --cpu-percent=60

# 2) 暴露指标并验证（若已启用 metrics）
kubectl -n kube-system port-forward deploy/coredns 9153:9153
curl -s http://127.0.0.1:9153/metrics | head

关键指标：
- coredns_dns_request_count_total：请求总数
- coredns_dns_request_duration_seconds：请求耗时

练习#

在 dns-test 命名空间创建无头服务，观察 nslookup 返回多个 Pod IP。
将 forward 上游地址改为不可达，观察解析超时并通过日志定位。
调整 cache 30 为 cache 300，比较解析性能与日志变化。