7.10.2 多实例与负载均衡配置实践

本节聚焦于在单机或集群场景下部署多个 Nginx 实例并实现负载均衡与资源隔离。通过多实例划分端口、工作目录与日志路径，可满足多业务共存与不同发布节奏的需求；结合 upstream 负载均衡策略，可在应用层实现流量分配与容灾基础能力。

原理草图：多实例与负载均衡关系#

多实例部署要点#

目录隔离：为每个实例创建独立的 conf、logs、run、cache 目录，避免配置与日志混杂。
端口规划：HTTP/HTTPS/管理端口分配清晰，防止端口冲突。
进程管理：通过 systemd 或独立脚本管理实例启停，区分 PID 文件。
资源控制：利用 worker_processes、worker_connections 及 CPU 亲和性控制实例资源占用。

安装与目录准备（示例）#

以二进制包安装为例（可替换为发行版自带版本）：

# 1) 安装依赖与 Nginx
yum -y install epel-release
yum -y install nginx

# 2) 创建实例目录（实例A/B）
for i in a b; do
  mkdir -p /opt/nginx-$i/{conf,logs,run,cache}
done

# 3) 拷贝基础配置（以默认配置为模板）
cp /etc/nginx/nginx.conf /opt/nginx-a/conf/nginx.conf
cp /etc/nginx/nginx.conf /opt/nginx-b/conf/nginx.conf

多实例配置示例（实例A + 实例B）#

实例A：监听 80/443，负载到 App1；实例B：监听 8080/8443，负载到 App2。
注意：PID、日志、client_body_temp_path 等路径必须隔离。

# /opt/nginx-a/conf/nginx.conf
worker_processes  2;
pid /opt/nginx-a/run/nginx.pid;

events {
  worker_connections  1024;
}

http {
  include       mime.types;
  default_type  application/octet-stream;

  access_log  /opt/nginx-a/logs/access.log;
  error_log   /opt/nginx-a/logs/error.log;

  client_body_temp_path /opt/nginx-a/cache/client_body;
  proxy_temp_path       /opt/nginx-a/cache/proxy;

  upstream app1 {
    least_conn;
    server 127.0.0.1:9001 max_fails=2 fail_timeout=5s;
    server 127.0.0.1:9002 max_fails=2 fail_timeout=5s;
  }

  server {
    listen 80;
    server_name app1.example.com;

    location / {
      proxy_pass http://app1;
      proxy_set_header Host $host;
      proxy_set_header X-Real-IP $remote_addr;
    }
  }
}

# /opt/nginx-b/conf/nginx.conf
worker_processes  1;
pid /opt/nginx-b/run/nginx.pid;

events {
  worker_connections  512;
}

http {
  include       mime.types;
  default_type  application/octet-stream;

  access_log  /opt/nginx-b/logs/access.log;
  error_log   /opt/nginx-b/logs/error.log;

  client_body_temp_path /opt/nginx-b/cache/client_body;
  proxy_temp_path       /opt/nginx-b/cache/proxy;

  upstream app2 {
    ip_hash;
    server 127.0.0.1:9101;
    server 127.0.0.1:9102;
  }

  server {
    listen 8080;
    server_name app2.example.com;

    location / {
      proxy_pass http://app2;
      proxy_set_header Host $host;
      proxy_set_header X-Real-IP $remote_addr;
    }
  }
}

systemd 多实例管理（示例）#

# /etc/systemd/system/nginx-a.service
[Unit]
Description=Nginx instance A
After=network.target

[Service]
Type=forking
ExecStart=/usr/sbin/nginx -c /opt/nginx-a/conf/nginx.conf
ExecReload=/usr/sbin/nginx -s reload -c /opt/nginx-a/conf/nginx.conf
ExecStop=/usr/sbin/nginx -s quit -c /opt/nginx-a/conf/nginx.conf
PIDFile=/opt/nginx-a/run/nginx.pid

[Install]
WantedBy=multi-user.target

# /etc/systemd/system/nginx-b.service
[Unit]
Description=Nginx instance B
After=network.target

[Service]
Type=forking
ExecStart=/usr/sbin/nginx -c /opt/nginx-b/conf/nginx.conf
ExecReload=/usr/sbin/nginx -s reload -c /opt/nginx-b/conf/nginx.conf
ExecStop=/usr/sbin/nginx -s quit -c /opt/nginx-b/conf/nginx.conf
PIDFile=/opt/nginx-b/run/nginx.pid

[Install]
WantedBy=multi-user.target

启用与启动：

systemctl daemon-reload
systemctl enable --now nginx-a nginx-b
systemctl status nginx-a
systemctl status nginx-b

负载均衡验证与效果#

# 启动两个后端简易服务
python3 -m http.server 9001 --bind 127.0.0.1 >/tmp/app1-9001.log 2>&1 &
python3 -m http.server 9002 --bind 127.0.0.1 >/tmp/app1-9002.log 2>&1 &
python3 -m http.server 9101 --bind 127.0.0.1 >/tmp/app2-9101.log 2>&1 &
python3 -m http.server 9102 --bind 127.0.0.1 >/tmp/app2-9102.log 2>&1 &

# 验证实例A负载均衡
curl -I http://127.0.0.1/
# 预期：返回 200，并在 /opt/nginx-a/logs/access.log 看到访问记录

# 验证实例B负载均衡
curl -I http://127.0.0.1:8080/
# 预期：返回 200，并在 /opt/nginx-b/logs/access.log 看到访问记录

常见排错与命令解释#

配置语法检查：

nginx -t -c /opt/nginx-a/conf/nginx.conf
# -t: 检查配置语法
# -c: 指定配置文件

端口冲突排查：

ss -lntp | grep -E '(:80|:8080)'
# 查看占用端口的进程，确保实例端口未冲突

PID 文件不一致：

cat /opt/nginx-a/run/nginx.pid
ps -fp $(cat /opt/nginx-a/run/nginx.pid)
# 确认 PID 与 systemd 管理进程一致

负载不均或后端不可达：

tail -f /opt/nginx-a/logs/error.log
# 关注 "connect() failed" 或 "no live upstreams"

练习#

将实例A的 upstream 调整为 round_robin（默认）并对比 least_conn 的分配差异，记录 20 次请求的后端分布。
为实例B新增一个后端 127.0.0.1:9103，观察健康检查失败时的错误日志与 fail_timeout 的作用。
将实例A绑定到特定 CPU 核心（worker_cpu_affinity），并对比 top 中 CPU 使用率变化。