12.4.3 通知脚本与故障处理流程
通知脚本与故障处理流程#
Keepalived 在状态变更时触发通知脚本,用于 VIP 接管、服务拉起、告警与记录。通知脚本必须快速、幂等、可重入,并与健康检查结果联动。以下给出原理草图、安装准备、配置示例、脚本示例、故障处理流程与排错步骤。
原理草图(状态切换触发脚本)#
安装与目录准备#
# 以 RHEL/CentOS 为例
sudo yum install -y keepalived
# 通知脚本目录
sudo mkdir -p /etc/keepalived/scripts
sudo chown root:root /etc/keepalived/scripts
sudo chmod 750 /etc/keepalived/scripts
# 日志目录
sudo mkdir -p /var/log/keepalived
sudo chown root:root /var/log/keepalived
Keepalived 配置示例(含通知脚本)#
# /etc/keepalived/keepalived.conf
global_defs {
router_id LVS_NODE_1
script_user root
enable_script_security
}
vrrp_script chk_nginx {
script "/bin/sh /etc/keepalived/scripts/check_nginx.sh"
interval 2
timeout 2
fall 3
rise 2
}
vrrp_instance VI_1 {
state BACKUP
interface eth0
virtual_router_id 51
priority 100
advert_int 1
preempt_delay 5
nopreempt
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
10.0.0.100/24 dev eth0
}
track_script {
chk_nginx
}
notify_master "/etc/keepalived/scripts/notify_master.sh VI_1 MASTER"
notify_backup "/etc/keepalived/scripts/notify_backup.sh VI_1 BACKUP"
notify_fault "/etc/keepalived/scripts/notify_fault.sh VI_1 FAULT"
}
健康检查脚本示例(明确命令解释)#
# /etc/keepalived/scripts/check_nginx.sh
#!/bin/bash
# 说明:检查 80 端口是否监听,失败返回非0触发降级
ss -lntp | grep -q ':80' && exit 0
exit 1
通知脚本示例(幂等、日志、告警)#
# /etc/keepalived/scripts/notify_master.sh
#!/bin/bash
INSTANCE="$1"; STATE="$2"
VIP="10.0.0.100"
LOG="/var/log/keepalived/notify.log"
echo "$(date '+%F %T') $INSTANCE $STATE begin" >> "$LOG"
# 绑定VIP(幂等)
if ! ip addr show dev eth0 | grep -q "$VIP"; then
ip addr add ${VIP}/24 dev eth0
fi
# 启动服务(示例:nginx)
systemctl start nginx
systemctl is-active --quiet nginx || echo "$(date '+%F %T') nginx start failed" >> "$LOG"
# 告警(示例:本地logger)
logger -t keepalived "[$INSTANCE] to MASTER, VIP $VIP"
echo "$(date '+%F %T') $INSTANCE $STATE done" >> "$LOG"
# /etc/keepalived/scripts/notify_backup.sh
#!/bin/bash
INSTANCE="$1"; STATE="$2"
VIP="10.0.0.100"
LOG="/var/log/keepalived/notify.log"
echo "$(date '+%F %T') $INSTANCE $STATE begin" >> "$LOG"
# 解绑VIP
ip addr del ${VIP}/24 dev eth0 2>/dev/null
# 降级服务(示例:停止写服务)
systemctl stop nginx
logger -t keepalived "[$INSTANCE] to BACKUP, VIP removed"
echo "$(date '+%F %T') $INSTANCE $STATE done" >> "$LOG"
# /etc/keepalived/scripts/notify_fault.sh
#!/bin/bash
INSTANCE="$1"; STATE="$2"
LOG="/var/log/keepalived/notify.log"
echo "$(date '+%F %T') $INSTANCE $STATE begin" >> "$LOG"
systemctl stop nginx
logger -t keepalived "[$INSTANCE] to FAULT, service stopped"
echo "$(date '+%F %T') $INSTANCE $STATE done" >> "$LOG"
# 授权可执行
sudo chmod +x /etc/keepalived/scripts/*.sh
故障处理流程(可执行步骤)#
- 事件发现:
chk_nginx连续失败触发FAULT或降级。 - 快速判定:通知脚本读取
$1/$2判断状态并执行幂等动作。 - 资源接管:MASTER 节点绑定 VIP 并拉起服务。
- 业务校验:脚本内
systemctl is-active或curl验证。 - 告警通知:通过
logger/HTTP webhook/邮件发送。 - 故障记录:写入
/var/log/keepalived/notify.log。 - 回切策略:设置
nopreempt或preempt_delay控制抖动。
排错与验证(含明确命令)#
# 查看 keepalived 运行状态
systemctl status keepalived
# 查看 VRRP 与脚本执行日志
journalctl -u keepalived -f
# 验证 VIP 是否绑定
ip addr show dev eth0 | grep 10.0.0.100
# 模拟故障:停止 nginx 触发降级
systemctl stop nginx
# 验证 notify.log 是否记录
tail -f /var/log/keepalived/notify.log
常见问题与处理:
- 脚本未执行:检查 enable_script_security 与脚本权限;确认脚本路径正确。
- 切换后服务未拉起:检查 systemctl 结果与 SELinux/防火墙。
- 频繁抖动:增加 preempt_delay,提高 fall 次数,脚本中避免耗时操作。
- VIP 绑定失败:检查网卡名称与 VIP/掩码配置是否匹配。
练习#
- 将
notify_master.sh增加curl http://127.0.0.1:80/health校验,并在失败时写入日志。 - 将
notify_fault.sh增加发送 HTTP 告警(自建 webhook 接口)。 - 通过
ip addr del/add模拟 VIP 漂移,观察notify.log记录是否完整。