16.9.3 发布策略与回滚机制
发布策略与回滚机制围绕“可控、可观测、可快速回退”展开,目标是降低变更风险与缩短故障恢复时间。本节以 Kubernetes 原生能力为主线,结合命令、配置与排错示例说明如何落地。
发布与回滚原理草图:
1. 发布前准备与校验(含安装与命令说明)#
发布前确保集群具备基础工具与权限:
# 安装 kubectl(示例为 Debian/Ubuntu)
sudo apt-get update && sudo apt-get install -y ca-certificates curl
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.28/deb/Release.key \
| sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
echo "deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] \
https://pkgs.k8s.io/core:/stable:/v1.28/deb/ /" \
| sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt-get update && sudo apt-get install -y kubectl
# 验证连接与权限
kubectl cluster-info
kubectl auth can-i update deployment -n prod
镜像与配置的最小校验示例(结合镜像标签与变更记录):
# 拉取镜像并校验标签(避免 latest)
docker pull registry.example.com/app:v1.2.3
# 查看 Deployment 当前镜像
kubectl -n prod get deploy app -o jsonpath='{.spec.template.spec.containers[0].image}'
# 生成变更记录
kubectl -n prod annotate deploy app change-cause="release v1.2.3"
2. 发布策略与可执行示例#
2.1 滚动更新(Deployment)#
示例 Deployment(带 maxSurge/maxUnavailable 与探针):
# 文件: k8s/app-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: app
namespace: prod
spec:
replicas: 4
revisionHistoryLimit: 5
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1 # 允许临时多出1个Pod
maxUnavailable: 1 # 允许最多1个不可用
selector:
matchLabels:
app: web
template:
metadata:
labels:
app: web
spec:
containers:
- name: app
image: registry.example.com/app:v1.2.3
ports:
- containerPort: 8080
readinessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
livenessProbe:
httpGet:
path: /live
port: 8080
initialDelaySeconds: 10
periodSeconds: 10
执行发布与观察进度:
kubectl apply -f k8s/app-deploy.yaml
kubectl -n prod rollout status deploy/app
kubectl -n prod get pods -l app=web -w
2.2 蓝绿发布(Service 切换)#
通过 Service Selector 切换新旧版本:
# 文件: k8s/app-blue.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-blue
namespace: prod
spec:
replicas: 3
selector:
matchLabels:
app: web
version: blue
template:
metadata:
labels:
app: web
version: blue
spec:
containers:
- name: app
image: registry.example.com/app:v1.2.2
ports:
- containerPort: 8080
---
# 文件: k8s/app-green.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-green
namespace: prod
spec:
replicas: 3
selector:
matchLabels:
app: web
version: green
template:
metadata:
labels:
app: web
version: green
spec:
containers:
- name: app
image: registry.example.com/app:v1.2.3
ports:
- containerPort: 8080
---
# 文件: k8s/app-svc.yaml
apiVersion: v1
kind: Service
metadata:
name: app-svc
namespace: prod
spec:
selector:
app: web
version: blue
ports:
- port: 80
targetPort: 8080
切换到 green:
kubectl apply -f k8s/app-blue.yaml -f k8s/app-green.yaml -f k8s/app-svc.yaml
kubectl -n prod patch svc app-svc -p '{"spec":{"selector":{"app":"web","version":"green"}}}'
kubectl -n prod get endpoints app-svc
2.3 灰度/金丝雀(比例放量)#
使用两个 Deployment + 两个 Service,配合 Ingress 按权重分流(示例基于 Nginx Ingress):
# 文件: k8s/app-stable.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-stable
namespace: prod
spec:
replicas: 4
selector:
matchLabels:
app: web
track: stable
template:
metadata:
labels:
app: web
track: stable
spec:
containers:
- name: app
image: registry.example.com/app:v1.2.2
ports:
- containerPort: 8080
---
# 文件: k8s/app-canary.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-canary
namespace: prod
spec:
replicas: 1
selector:
matchLabels:
app: web
track: canary
template:
metadata:
labels:
app: web
track: canary
spec:
containers:
- name: app
image: registry.example.com/app:v1.2.3
ports:
- containerPort: 8080
---
# 文件: k8s/svc-stable.yaml
apiVersion: v1
kind: Service
metadata:
name: app-stable-svc
namespace: prod
spec:
selector:
app: web
track: stable
ports:
- port: 80
targetPort: 8080
---
# 文件: k8s/svc-canary.yaml
apiVersion: v1
kind: Service
metadata:
name: app-canary-svc
namespace: prod
spec:
selector:
app: web
track: canary
ports:
- port: 80
targetPort: 8080
---
# 文件: k8s/ingress-canary.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: app-ingress
namespace: prod
annotations:
nginx.ingress.kubernetes.io/canary: "true"
nginx.ingress.kubernetes.io/canary-weight: "20" # 20% 流量到 canary
spec:
ingressClassName: nginx
rules:
- host: app.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: app-canary-svc
port:
number: 80
应用与观察:
kubectl apply -f k8s/app-stable.yaml -f k8s/app-canary.yaml
kubectl apply -f k8s/svc-stable.yaml -f k8s/svc-canary.yaml -f k8s/ingress-canary.yaml
kubectl -n prod describe ingress app-ingress | grep -i canary
3. 回滚机制与命令说明#
3.1 应用回滚(Deployment revision)#
# 查看历史版本
kubectl -n prod rollout history deploy/app
# 回滚到上一个版本
kubectl -n prod rollout undo deploy/app
# 回滚到指定版本
kubectl -n prod rollout undo deploy/app --to-revision=3
3.2 配置回滚(ConfigMap/Secret 版本化)#
通过带版本号的 ConfigMap 并在 Pod 中引用:
# 文件: k8s/app-config-v2.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: app-config-v2
namespace: prod
data:
APP_MODE: "prod"
FEATURE_X: "on"
---
# 文件: k8s/app-deploy-config.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: app
namespace: prod
spec:
template:
spec:
containers:
- name: app
image: registry.example.com/app:v1.2.3
envFrom:
- configMapRef:
name: app-config-v2
回滚到 v1:
kubectl -n prod apply -f k8s/app-config-v1.yaml
kubectl -n prod patch deploy app \
-p '{"spec":{"template":{"spec":{"containers":[{"name":"app","envFrom":[{"configMapRef":{"name":"app-config-v1"}}]}]}}}}'
kubectl -n prod rollout status deploy/app
3.3 数据回滚(示例:向前兼容字段)#
示例迁移脚本(需在发布前准备与演练):
-- 文件: db/migrate_v123.sql
ALTER TABLE orders ADD COLUMN discount_rate DECIMAL(5,2) DEFAULT 0;
-- 回滚脚本
-- 文件: db/rollback_v123.sql
ALTER TABLE orders DROP COLUMN discount_rate;
4. 观测门禁与自动止损示例#
使用 Prometheus 规则触发告警并人工/自动回滚:
# 文件: monitoring/app-slo-rule.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: app-slo
namespace: monitoring
spec:
groups:
- name: app-slo
rules:
- alert: AppHighErrorRate
expr: rate(http_requests_total{job="app",code=~"5.."}[5m]) / rate(http_requests_total{job="app"}[5m]) > 0.02
for: 2m
labels:
severity: critical
annotations:
summary: "app 5xx 错误率过高"
回滚触发示例(人工操作):
kubectl -n prod rollout undo deploy/app
5. 排错清单(含命令)#
1) 发布卡住:
kubectl -n prod describe deploy app
kubectl -n prod get pods -l app=web -o wide
关注:ImagePullBackOff、Readiness probe failed、CrashLoopBackOff。
2) 流量未切换到新版本:
kubectl -n prod get svc app-svc -o yaml | grep selector -A2
kubectl -n prod get endpoints app-svc
检查 Service selector 是否指向正确标签。
3) 灰度比例不生效:
kubectl -n prod describe ingress app-ingress
kubectl -n ingress-nginx logs deploy/ingress-nginx-controller | tail -n 50
确认 IngressClass 与注解是否生效。
6. 练习与验证#
1) 实战滚动更新:
- 修改镜像 tag 至 v1.2.4,观察 rollout status 与 Pod 变更。
- 记录 revisionHistory 并回滚到上一个版本。
2) 蓝绿切换演练:
- 部署 blue 与 green,切换 Service selector。
- 通过 curl http://app.example.com/version 验证版本切换。
3) 灰度放量:
- 将 canary-weight 从 10 → 30 → 50,观察错误率。
- 若错误率超过阈值,执行 rollback 命令并记录时间。
4) 配置回滚演练:
- 将 ConfigMap 从 v1 切换到 v2,再回退到 v1,验证配置生效。
通过标准化流程:变更评审 → 预发布验证 → 灰度/滚动 → 观测门禁 → 全量发布 → 复盘,并配合 GitOps/Helm/Kustomize 与 Service Mesh 流量治理,可实现发布的可编排、可审计与可度量。