12.4.5 切换演练与验证方法

切换演练与验证方法用于验证主备切换的可靠性、业务连续性与告警可观测性。演练建议在低峰期进行,提前通知相关人员并准备回滚方案。

原理草图(切换验证路径)

文章图片

演练前准备(含安装与基线)
- 确认 keepalived 已安装并可用(未安装请先安装):

# CentOS/RHEL
yum -y install keepalived

# Ubuntu/Debian
apt-get update && apt-get -y install keepalived

# 验证安装与版本
keepalived --version
systemctl status keepalived
  • 备份配置与脚本:
cp -a /etc/keepalived/keepalived.conf /etc/keepalived/keepalived.conf.bak
cp -a /etc/keepalived/*.sh /etc/keepalived/backup/
  • 基线记录(说明:ip addr 验证 VIP,ss/ab 验证业务连接):
ip addr show dev eth0
ip -4 a | grep -E "10\.0\.0\.100"
ss -s
curl -s http://10.0.0.100/health

演练方法与命令(含预期与验证)
1. 进程级切换(停止 keepalived)

# 主节点执行
systemctl stop keepalived

# 预期:VIP 从主漂移至备
ip -4 a | grep -E "10\.0\.0\.100" -n
# 备节点执行验证
ip -4 a | grep -E "10\.0\.0\.100" -n

说明:systemctl stop 触发 VRRP 失效,备节点晋升 MASTER。

  1. 网络级切换(阻断 VRRP)
# 主节点临时阻断 VRRP(协议号112)
iptables -I INPUT -p vrrp -j DROP

# 预期:备节点成为 MASTER
journalctl -u keepalived -n 50 | tail -n 20

说明:iptables 阻断 VRRP 心跳,备节点检测超时接管 VIP。

  1. 健康检查触发切换(服务级)
    示例健康检查脚本(在主备均部署):
cat >/etc/keepalived/check_nginx.sh <<'EOF'
#!/bin/bash
if curl -s --connect-timeout 1 http://127.0.0.1/health >/dev/null; then
  exit 0
else
  exit 1
fi
EOF
chmod +x /etc/keepalived/check_nginx.sh

在 keepalived.conf 中引用(关键片段):

vrrp_script chk_nginx {
    script "/etc/keepalived/check_nginx.sh"
    interval 2
    weight -20
    fall 2
    rise 1
}
vrrp_instance VI_1 {
    state MASTER
    interface eth0
    virtual_router_id 51
    priority 120
    track_script {
        chk_nginx
    }
    virtual_ipaddress {
        10.0.0.100/24 dev eth0
    }
}

演练命令:

# 主节点停服务触发切换
systemctl stop nginx

# 预期:脚本失败导致权重下降,备节点接管 VIP
journalctl -u keepalived -n 100 | grep -E "Entering|Leaving|VRRP"
  1. 优先级抢占验证
# 主节点恢复并启动
systemctl start nginx
systemctl start keepalived

# 若启用抢占,主节点夺回 VIP
# 若配置 nopreempt,则保持备为 MASTER
ip -4 a | grep -E "10\.0\.0\.100"

验证与观测指标(含命令解释)
- VIP 归属与 ARP:

ip -4 a show dev eth0        # 查看 VIP 是否绑定在本机
arping -I eth0 10.0.0.100 -c 3  # 验证 VIP 可达
  • 业务可用性与切换时延(RTO):
time curl -s http://10.0.0.100/health
  • 日志与告警:
journalctl -u keepalived -n 200
tail -n 200 /var/log/messages | grep -i keepalived

常见排错(演练失败时)
- VIP 未漂移:检查 virtual_router_idprioritynopreempt 配置是否一致。
- VRRP 无心跳:检查防火墙/安全组是否放行协议 112。

iptables -L -n | grep -i vrrp
  • 脚本不执行:检查权限、路径、执行结果与 SELinux。
ls -l /etc/keepalived/check_nginx.sh
setenforce 0   # 临时验证 SELinux 是否阻断(验证后恢复)

回滚与恢复

# 恢复主节点网络与服务
iptables -D INPUT -p vrrp -j DROP
systemctl start keepalived
systemctl start nginx

# 确认 VIP 与业务稳定
ip -4 a | grep -E "10\.0\.0\.100"
curl -s http://10.0.0.100/health

练习题
1. 将 interval 调整为 1 秒,测量切换时延并记录 RTO。
2. 将 weight -20 改为 -50,观察 VIP 漂移门槛变化。
3. 配置 nopreempt,模拟主节点恢复,验证 VIP 是否保持在备节点。