7.10.2 多实例与负载均衡配置实践

本节聚焦于在单机或集群场景下部署多个 Nginx 实例并实现负载均衡与资源隔离。通过多实例划分端口、工作目录与日志路径,可满足多业务共存与不同发布节奏的需求;结合 upstream 负载均衡策略,可在应用层实现流量分配与容灾基础能力。

原理草图:多实例与负载均衡关系#

文章图片

多实例部署要点#

  • 目录隔离:为每个实例创建独立的 conf、logs、run、cache 目录,避免配置与日志混杂。
  • 端口规划:HTTP/HTTPS/管理端口分配清晰,防止端口冲突。
  • 进程管理:通过 systemd 或独立脚本管理实例启停,区分 PID 文件。
  • 资源控制:利用 worker_processes、worker_connections 及 CPU 亲和性控制实例资源占用。

安装与目录准备(示例)#

以二进制包安装为例(可替换为发行版自带版本):

# 1) 安装依赖与 Nginx
yum -y install epel-release
yum -y install nginx

# 2) 创建实例目录(实例A/B)
for i in a b; do
  mkdir -p /opt/nginx-$i/{conf,logs,run,cache}
done

# 3) 拷贝基础配置(以默认配置为模板)
cp /etc/nginx/nginx.conf /opt/nginx-a/conf/nginx.conf
cp /etc/nginx/nginx.conf /opt/nginx-b/conf/nginx.conf

多实例配置示例(实例A + 实例B)#

实例A:监听 80/443,负载到 App1;实例B:监听 8080/8443,负载到 App2。
注意:PID、日志、client_body_temp_path 等路径必须隔离。

# /opt/nginx-a/conf/nginx.conf
worker_processes  2;
pid /opt/nginx-a/run/nginx.pid;

events {
  worker_connections  1024;
}

http {
  include       mime.types;
  default_type  application/octet-stream;

  access_log  /opt/nginx-a/logs/access.log;
  error_log   /opt/nginx-a/logs/error.log;

  client_body_temp_path /opt/nginx-a/cache/client_body;
  proxy_temp_path       /opt/nginx-a/cache/proxy;

  upstream app1 {
    least_conn;
    server 127.0.0.1:9001 max_fails=2 fail_timeout=5s;
    server 127.0.0.1:9002 max_fails=2 fail_timeout=5s;
  }

  server {
    listen 80;
    server_name app1.example.com;

    location / {
      proxy_pass http://app1;
      proxy_set_header Host $host;
      proxy_set_header X-Real-IP $remote_addr;
    }
  }
}
# /opt/nginx-b/conf/nginx.conf
worker_processes  1;
pid /opt/nginx-b/run/nginx.pid;

events {
  worker_connections  512;
}

http {
  include       mime.types;
  default_type  application/octet-stream;

  access_log  /opt/nginx-b/logs/access.log;
  error_log   /opt/nginx-b/logs/error.log;

  client_body_temp_path /opt/nginx-b/cache/client_body;
  proxy_temp_path       /opt/nginx-b/cache/proxy;

  upstream app2 {
    ip_hash;
    server 127.0.0.1:9101;
    server 127.0.0.1:9102;
  }

  server {
    listen 8080;
    server_name app2.example.com;

    location / {
      proxy_pass http://app2;
      proxy_set_header Host $host;
      proxy_set_header X-Real-IP $remote_addr;
    }
  }
}

systemd 多实例管理(示例)#

# /etc/systemd/system/nginx-a.service
[Unit]
Description=Nginx instance A
After=network.target

[Service]
Type=forking
ExecStart=/usr/sbin/nginx -c /opt/nginx-a/conf/nginx.conf
ExecReload=/usr/sbin/nginx -s reload -c /opt/nginx-a/conf/nginx.conf
ExecStop=/usr/sbin/nginx -s quit -c /opt/nginx-a/conf/nginx.conf
PIDFile=/opt/nginx-a/run/nginx.pid

[Install]
WantedBy=multi-user.target
# /etc/systemd/system/nginx-b.service
[Unit]
Description=Nginx instance B
After=network.target

[Service]
Type=forking
ExecStart=/usr/sbin/nginx -c /opt/nginx-b/conf/nginx.conf
ExecReload=/usr/sbin/nginx -s reload -c /opt/nginx-b/conf/nginx.conf
ExecStop=/usr/sbin/nginx -s quit -c /opt/nginx-b/conf/nginx.conf
PIDFile=/opt/nginx-b/run/nginx.pid

[Install]
WantedBy=multi-user.target

启用与启动:

systemctl daemon-reload
systemctl enable --now nginx-a nginx-b
systemctl status nginx-a
systemctl status nginx-b

负载均衡验证与效果#

# 启动两个后端简易服务
python3 -m http.server 9001 --bind 127.0.0.1 >/tmp/app1-9001.log 2>&1 &
python3 -m http.server 9002 --bind 127.0.0.1 >/tmp/app1-9002.log 2>&1 &
python3 -m http.server 9101 --bind 127.0.0.1 >/tmp/app2-9101.log 2>&1 &
python3 -m http.server 9102 --bind 127.0.0.1 >/tmp/app2-9102.log 2>&1 &

# 验证实例A负载均衡
curl -I http://127.0.0.1/
# 预期:返回 200,并在 /opt/nginx-a/logs/access.log 看到访问记录

# 验证实例B负载均衡
curl -I http://127.0.0.1:8080/
# 预期:返回 200,并在 /opt/nginx-b/logs/access.log 看到访问记录

常见排错与命令解释#

  • 配置语法检查:
nginx -t -c /opt/nginx-a/conf/nginx.conf
# -t: 检查配置语法
# -c: 指定配置文件
  • 端口冲突排查:
ss -lntp | grep -E '(:80|:8080)'
# 查看占用端口的进程,确保实例端口未冲突
  • PID 文件不一致:
cat /opt/nginx-a/run/nginx.pid
ps -fp $(cat /opt/nginx-a/run/nginx.pid)
# 确认 PID 与 systemd 管理进程一致
  • 负载不均或后端不可达:
tail -f /opt/nginx-a/logs/error.log
# 关注 "connect() failed""no live upstreams"

练习#

  1. 将实例A的 upstream 调整为 round_robin(默认)并对比 least_conn 的分配差异,记录 20 次请求的后端分布。
  2. 为实例B新增一个后端 127.0.0.1:9103,观察健康检查失败时的错误日志与 fail_timeout 的作用。
  3. 将实例A绑定到特定 CPU 核心(worker_cpu_affinity),并对比 top 中 CPU 使用率变化。