16.9.3 发布策略与回滚机制

发布策略与回滚机制围绕“可控、可观测、可快速回退”展开,目标是降低变更风险与缩短故障恢复时间。本节以 Kubernetes 原生能力为主线,结合命令、配置与排错示例说明如何落地。

发布与回滚原理草图:

文章图片

1. 发布前准备与校验(含安装与命令说明)#

发布前确保集群具备基础工具与权限:

# 安装 kubectl(示例为 Debian/Ubuntu)
sudo apt-get update && sudo apt-get install -y ca-certificates curl
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.28/deb/Release.key \
  | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
echo "deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] \
https://pkgs.k8s.io/core:/stable:/v1.28/deb/ /" \
| sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt-get update && sudo apt-get install -y kubectl

# 验证连接与权限
kubectl cluster-info
kubectl auth can-i update deployment -n prod

镜像与配置的最小校验示例(结合镜像标签与变更记录):

# 拉取镜像并校验标签(避免 latest)
docker pull registry.example.com/app:v1.2.3

# 查看 Deployment 当前镜像
kubectl -n prod get deploy app -o jsonpath='{.spec.template.spec.containers[0].image}'

# 生成变更记录
kubectl -n prod annotate deploy app change-cause="release v1.2.3"

2. 发布策略与可执行示例#

2.1 滚动更新(Deployment)#

示例 Deployment(带 maxSurge/maxUnavailable 与探针):

# 文件: k8s/app-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app
  namespace: prod
spec:
  replicas: 4
  revisionHistoryLimit: 5
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1         # 允许临时多出1个Pod
      maxUnavailable: 1   # 允许最多1个不可用
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
    spec:
      containers:
      - name: app
        image: registry.example.com/app:v1.2.3
        ports:
        - containerPort: 8080
        readinessProbe:
          httpGet:
            path: /healthz
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5
        livenessProbe:
          httpGet:
            path: /live
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 10

执行发布与观察进度:

kubectl apply -f k8s/app-deploy.yaml
kubectl -n prod rollout status deploy/app
kubectl -n prod get pods -l app=web -w

2.2 蓝绿发布(Service 切换)#

通过 Service Selector 切换新旧版本:

# 文件: k8s/app-blue.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-blue
  namespace: prod
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web
      version: blue
  template:
    metadata:
      labels:
        app: web
        version: blue
    spec:
      containers:
      - name: app
        image: registry.example.com/app:v1.2.2
        ports:
        - containerPort: 8080
---
# 文件: k8s/app-green.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-green
  namespace: prod
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web
      version: green
  template:
    metadata:
      labels:
        app: web
        version: green
    spec:
      containers:
      - name: app
        image: registry.example.com/app:v1.2.3
        ports:
        - containerPort: 8080
---
# 文件: k8s/app-svc.yaml
apiVersion: v1
kind: Service
metadata:
  name: app-svc
  namespace: prod
spec:
  selector:
    app: web
    version: blue
  ports:
  - port: 80
    targetPort: 8080

切换到 green:

kubectl apply -f k8s/app-blue.yaml -f k8s/app-green.yaml -f k8s/app-svc.yaml
kubectl -n prod patch svc app-svc -p '{"spec":{"selector":{"app":"web","version":"green"}}}'
kubectl -n prod get endpoints app-svc

2.3 灰度/金丝雀(比例放量)#

使用两个 Deployment + 两个 Service,配合 Ingress 按权重分流(示例基于 Nginx Ingress):

# 文件: k8s/app-stable.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-stable
  namespace: prod
spec:
  replicas: 4
  selector:
    matchLabels:
      app: web
      track: stable
  template:
    metadata:
      labels:
        app: web
        track: stable
    spec:
      containers:
      - name: app
        image: registry.example.com/app:v1.2.2
        ports:
        - containerPort: 8080
---
# 文件: k8s/app-canary.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-canary
  namespace: prod
spec:
  replicas: 1
  selector:
    matchLabels:
      app: web
      track: canary
  template:
    metadata:
      labels:
        app: web
        track: canary
    spec:
      containers:
      - name: app
        image: registry.example.com/app:v1.2.3
        ports:
        - containerPort: 8080
---
# 文件: k8s/svc-stable.yaml
apiVersion: v1
kind: Service
metadata:
  name: app-stable-svc
  namespace: prod
spec:
  selector:
    app: web
    track: stable
  ports:
  - port: 80
    targetPort: 8080
---
# 文件: k8s/svc-canary.yaml
apiVersion: v1
kind: Service
metadata:
  name: app-canary-svc
  namespace: prod
spec:
  selector:
    app: web
    track: canary
  ports:
  - port: 80
    targetPort: 8080
---
# 文件: k8s/ingress-canary.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: app-ingress
  namespace: prod
  annotations:
    nginx.ingress.kubernetes.io/canary: "true"
    nginx.ingress.kubernetes.io/canary-weight: "20"   # 20% 流量到 canary
spec:
  ingressClassName: nginx
  rules:
  - host: app.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: app-canary-svc
            port:
              number: 80

应用与观察:

kubectl apply -f k8s/app-stable.yaml -f k8s/app-canary.yaml
kubectl apply -f k8s/svc-stable.yaml -f k8s/svc-canary.yaml -f k8s/ingress-canary.yaml
kubectl -n prod describe ingress app-ingress | grep -i canary

3. 回滚机制与命令说明#

3.1 应用回滚(Deployment revision)#

# 查看历史版本
kubectl -n prod rollout history deploy/app

# 回滚到上一个版本
kubectl -n prod rollout undo deploy/app

# 回滚到指定版本
kubectl -n prod rollout undo deploy/app --to-revision=3

3.2 配置回滚(ConfigMap/Secret 版本化)#

通过带版本号的 ConfigMap 并在 Pod 中引用:

# 文件: k8s/app-config-v2.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config-v2
  namespace: prod
data:
  APP_MODE: "prod"
  FEATURE_X: "on"
---
# 文件: k8s/app-deploy-config.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app
  namespace: prod
spec:
  template:
    spec:
      containers:
      - name: app
        image: registry.example.com/app:v1.2.3
        envFrom:
        - configMapRef:
            name: app-config-v2

回滚到 v1:

kubectl -n prod apply -f k8s/app-config-v1.yaml
kubectl -n prod patch deploy app \
  -p '{"spec":{"template":{"spec":{"containers":[{"name":"app","envFrom":[{"configMapRef":{"name":"app-config-v1"}}]}]}}}}'
kubectl -n prod rollout status deploy/app

3.3 数据回滚(示例:向前兼容字段)#

示例迁移脚本(需在发布前准备与演练):

-- 文件: db/migrate_v123.sql
ALTER TABLE orders ADD COLUMN discount_rate DECIMAL(5,2) DEFAULT 0;

-- 回滚脚本
-- 文件: db/rollback_v123.sql
ALTER TABLE orders DROP COLUMN discount_rate;

4. 观测门禁与自动止损示例#

使用 Prometheus 规则触发告警并人工/自动回滚:

# 文件: monitoring/app-slo-rule.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: app-slo
  namespace: monitoring
spec:
  groups:
  - name: app-slo
    rules:
    - alert: AppHighErrorRate
      expr: rate(http_requests_total{job="app",code=~"5.."}[5m]) / rate(http_requests_total{job="app"}[5m]) > 0.02
      for: 2m
      labels:
        severity: critical
      annotations:
        summary: "app 5xx 错误率过高"

回滚触发示例(人工操作):

kubectl -n prod rollout undo deploy/app

5. 排错清单(含命令)#

1) 发布卡住:

kubectl -n prod describe deploy app
kubectl -n prod get pods -l app=web -o wide

关注:ImagePullBackOff、Readiness probe failed、CrashLoopBackOff。

2) 流量未切换到新版本:

kubectl -n prod get svc app-svc -o yaml | grep selector -A2
kubectl -n prod get endpoints app-svc

检查 Service selector 是否指向正确标签。

3) 灰度比例不生效:

kubectl -n prod describe ingress app-ingress
kubectl -n ingress-nginx logs deploy/ingress-nginx-controller | tail -n 50

确认 IngressClass 与注解是否生效。

6. 练习与验证#

1) 实战滚动更新:
- 修改镜像 tag 至 v1.2.4,观察 rollout status 与 Pod 变更。
- 记录 revisionHistory 并回滚到上一个版本。

2) 蓝绿切换演练:
- 部署 blue 与 green,切换 Service selector。
- 通过 curl http://app.example.com/version 验证版本切换。

3) 灰度放量:
- 将 canary-weight 从 10 → 30 → 50,观察错误率。
- 若错误率超过阈值,执行 rollback 命令并记录时间。

4) 配置回滚演练:
- 将 ConfigMap 从 v1 切换到 v2,再回退到 v1,验证配置生效。

通过标准化流程:变更评审 → 预发布验证 → 灰度/滚动 → 观测门禁 → 全量发布 → 复盘,并配合 GitOps/Helm/Kustomize 与 Service Mesh 流量治理,可实现发布的可编排、可审计与可度量。