Introduction
In modern micro‑service environments, traffic distribution, fault tolerance, and zero‑downtime deployments are non‑negotiable requirements. Nginx, with its event‑driven architecture and mature module ecosystem, is a go‑to solution for HTTP/HTTPS load balancing. This article walks you through a production‑ready Nginx load balancer setup, covering:
- Core concepts of Nginx load balancing
- Designing a resilient architecture
- Detailed configuration files with annotations
- Health‑checking, SSL termination, and session persistence
- Validation, performance tuning, and monitoring
By the end of the guide you will have a reproducible, version‑controlled setup ready to be deployed on any Linux‑based VM, container, or cloud instance.
Understanding Nginx as a Load Balancer
Nginx can operate in three primary load‑balancing modes:
- Round‑Robin - default algorithm, distributes requests evenly.
- Least Connections - forwards traffic to the server with the fewest active connections.
- IP Hash - hashes client IP to achieve sticky sessions (useful for stateful services).
Why choose Nginx?
- High concurrency with minimal memory footprint.
- Built‑in health checks via the
ngx_http_healthcheck_module(third‑party) or simpleproxy_next_upstreamlogic.- Seamless SSL termination, HTTP/2 support, and easy integration with service discovery tools.
Core Modules Relevant to Load Balancing
| Module | Purpose |
|---|---|
ngx_http_upstream_module | Defines upstream groups and load‑balancing methods. |
ngx_http_ssl_module | Handles TLS termination and certificate management. |
ngx_stream_core_module | Enables TCP/UDP load balancing (useful for non‑HTTP services). |
ngx_http_stub_status_module | Exposes runtime metrics for monitoring tools. |
The following sections illustrate how these modules combine to produce a robust, production‑grade architecture.
Designing a Production‑Ready Architecture
A typical high‑availability deployment consists of multiple Nginx instances behind a fail‑over layer (e.g., Keepalived, VRRP, or cloud‑native load balancer). Each Nginx node runs the same configuration, pulling backend server lists from a central source (static file, DNS, or Consul). Below is a simplified ASCII diagram:
+--------------------------+ +--------------------------+ | Keepalived / VRRP | | Keepalived / VRRP | | (Virtual IP 10.0.0.10) | | (Virtual IP 10.0.0.10) | +-----------+--------------+ +--------------+-----------+ | | | | +--------v--------+ +--------v--------+ | Nginx LB #1 | | Nginx LB #2 | +--------+--------+ +--------+--------+ | | | +---------------------------+ | +---| Upstream Service Pool |---+ | (app01, app02, app03…) | +---------------------------+
Key design considerations
- Virtual IP (VIP) - Guarantees a single entry point for clients. Failover is instant when Keepalived detects a node outage.
- Health Checks - Nginx performs active health probes (HTTP GET /health) to keep the upstream list accurate.
- SSL Termination - Centralized at the load balancer; back‑ends receive traffic over plain HTTP inside a trusted network.
- Session Persistence - Enabled with
ip_hashorstickydirectives for services that require stateful connections. - Observability - Enable the stub status module and export metrics to Prometheus or Grafana.
The next section translates this blueprint into concrete configuration files.
Step‑by‑Step Configuration
Below we create a reproducible directory layout that can be version‑controlled with Git. The example assumes Ubuntu 22.04 LTS, but the same concepts apply to any modern Linux distribution.
1. Directory Structure
bash mkdir -p /etc/nginx/conf.d /etc/nginx/upstreams /var/log/nginx cd /etc/nginx
conf.d/- High‑level server blocks.upstreams/- Dedicated upstream definitions for easier reuse.nginx.conf- Global settings and module inclusions.
2. Global nginx.conf
nginx user www-data; worker_processes auto; error_log /var/log/nginx/error.log warn; pid /var/run/nginx.pid;
events { worker_connections 4096; }
http { include /etc/nginx/mime.types; default_type application/octet-stream;
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
access_log /var/log/nginx/access.log main;
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
types_hash_max_size 2048;
include /etc/nginx/upstreams/*.conf; # Load upstream definitions
include /etc/nginx/conf.d/*.conf; # Load server blocks
# Enable stub status for monitoring
server {
listen 127.0.0.1:8080;
location /nginx_status {
stub_status;
allow 127.0.0.1;
deny all;
}
}
}
3. Upstream Definition (upstreams/app_upstream.conf)
nginx upstream app_backend { # Least connections to favour less‑loaded nodes least_conn;
# Health check parameters (requires ngx_http_upstream_check_module)
# The example uses the third‑party module - adjust according to your build.
# check interval=3000 rise=2 fall=5 timeout=1000 type=http;
# check_http_send "GET /health HTTP/1.0\r\n\r\n";
# check_http_expect_alive http_2xx;
server 10.0.1.11:80 max_fails=3 fail_timeout=30s;
server 10.0.1.12:80 max_fails=3 fail_timeout=30s;
server 10.0.1.13:80 max_fails=3 fail_timeout=30s;
}
4. Server Block (conf.d/load_balancer.conf)
nginx server { listen 80; listen 443 ssl http2; server_name www.example.com api.example.com;
# SSL certificates - replace with your own path or use Let's Encrypt.
ssl_certificate /etc/ssl/certs/example.com.crt;
ssl_certificate_key /etc/ssl/private/example.com.key;
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers HIGH:!aNULL:!MD5;
ssl_prefer_server_ciphers on;
# Enforce HTTP → HTTPS redirection
if ($scheme = http) {
return 301 https://$host$request_uri;
}
# Enable keepalive connections to upstreams
keepalive_timeout 65;
keepalive_requests 100;
keepalive 32; # Number of idle connections per worker
location / {
proxy_pass http://app_backend;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# Timeout settings for production workloads
proxy_connect_timeout 5s;
proxy_send_timeout 30s;
proxy_read_timeout 30s;
proxy_buffering off;
# Enable response compression
gzip on;
gzip_types text/plain text/css application/json application/javascript;
}
# Optional health endpoint for external monitoring tools
location /healthz {
access_log off;
return 200 'OK';
add_header Content-Type text/plain;
}
}
5. Enable Keepalived for VIP Failover (Optional but Recommended)
Create /etc/keepalived/keepalived.conf with two nodes sharing the same virtual_ipaddress.
conf vrrp_instance VI_1 { state MASTER # On secondary node use BACKUP interface eth0 virtual_router_id 51 priority 150 # Secondary node should have lower priority advert_int 1 authentication { auth_type PASS auth_pass secret123 } virtual_ipaddress { 10.0.0.10/24 dev eth0 label eth0:vip } }
After installing Keepalived (apt install keepalived), start the service on both nodes. The virtual IP will float between them, providing uninterrupted client connectivity.
6. Test the Configuration
bash
Verify syntax
sudo nginx -t
Reload without downtime
sudo systemctl reload nginx
Perform a quick curl test (replace VIP with your address)
curl -I https://10.0.0.10
If the response includes 200 OK and the SSL certificate details, the load balancer is operational.
Testing, Monitoring, and Tuning
A production environment demands continuous verification. Below are the essential practices:
1. Automated Smoke Tests
bash #!/usr/bin/env bash set -euo pipefail
VIP=10.0.0.10 ENDPOINTS=("/" "/api/v1/status" "/healthz")
for ep in "${ENDPOINTS[@]}"; do http_code=$(curl -k -s -o /dev/null -w "%{http_code}" https://${VIP}${ep}) if [[ "$http_code" -ne 200 ]]; then echo "[ERROR] $ep returned $http_code" exit 1 else echo "[OK] $ep returned 200" fi done
Integrate this script into your CI/CD pipeline to catch regressions before a new version rolls out.
2. Prometheus Exporter via Stub Status
Add a scrape target to prometheus.yml:
yaml scrape_configs:
- job_name: 'nginx_lb'
static_configs:
- targets: ['127.0.0.1:8080'] metrics_path: /nginx_status scheme: http
Grafana dashboards can visualize active connections, request rates, and error percentages.
3. Performance Tuning Tips
| Parameter | Recommended Value | Reason |
|---|---|---|
worker_processes | auto | Leverages all CPU cores. |
worker_connections | 8192 | Handles high concurrent connections. |
keepalive_timeout | 65s | Balances resource usage and client latency. |
client_max_body_size | 50m | Allows larger uploads (adjust per app). |
proxy_buffer_size | 16k | Reduces latency for small responses. |
4. Rolling Deployments
When updating backend services, keep the load balancer configuration static and rely on draining.
nginx
Example of graceful draining for a server
server 10.0.1.12:80 max_fails=3 fail_timeout=30s down;
Set a node to down in the upstream block, reload Nginx, wait for existing connections to finish, then upgrade the service. Bring the node back online by removing down.
5. Backup and Disaster Recovery
- Store the entire
/etc/nginxdirectory in a Git repository. - Periodically snapshot the Keepalived configuration.
- Use automated configuration management tools (Ansible, Chef) to rebuild nodes within minutes.
By combining automated testing, observability, and disciplined change management, the Nginx load balancer remains a reliable front‑door for your production traffic.
FAQs
Q1: How does Nginx handle TLS hand‑off when the backend also expects HTTPS?
A: Nginx can act as a pass‑through TLS proxy using the stream module. Define a stream upstream that forwards raw TCP traffic on port 443, preserving end‑to‑end encryption. This is useful for backend services that terminate TLS themselves.
nginx stream { upstream tls_backend { server 10.0.1.11:443; server 10.0.1.12:443; } server { listen 443; proxy_pass tls_backend; proxy_ssl_preread on; # Enables SNI based routing if needed } }
Q2: Can I use DNS for dynamic upstream server discovery?
A: Yes. Nginx supports the resolver directive combined with server entries that reference hostnames. When the DNS TTL expires, Nginx resolves the names again, allowing seamless scaling with services registered in Consul, etcd, or a cloud DNS.
nginx resolver 10.0.0.53 valid=30s; upstream dynamic_backend { server app.service.local; # Resolve via DNS every 30 seconds }
Q3: What is the difference between ip_hash and the sticky module?
A: ip_hash provides simple source‑IP based affinity - the same client IP always reaches the same upstream server. The third‑party sticky module (part of the ngx_http_upstream_sticky_module) stores a cookie on the client, enabling session persistence even when the client IP changes (e.g., mobile networks). Choose sticky when you need truly reliable session affinity across NATs.
Q4: How do I protect the load balancer itself from DoS attacks?
A: Implement rate limiting with limit_req_zone and limit_req. Also, enable the built‑in deny/allow directives for IP whitelisting on sensitive endpoints (e.g., /admin).
nginx limit_req_zone $binary_remote_addr zone=mylimit:10m rate=10r/s; server { location /login { limit_req zone=mylimit burst=20 nodelay; } }
Q5: Is it safe to store the SSL private key on the same host as the load balancer?
A: For most deployments, yes - the key never leaves the host's memory. However, for ultra‑high‑security environments, consider using an HSM or integrating with cloud KMS (e.g., AWS ACM) and enabling TLS termination via a sidecar proxy like Envoy.
Conclusion
Deploying a production‑ready Nginx load balancer blends solid architectural principles with meticulous configuration. By leveraging:
- Virtual IP failover (Keepalived)
- Robust upstream health checks
- SSL termination and HTTP/2 support
- Observability via stub status and Prometheus
- Automated testing and graceful deployment patterns
you create a resilient entry point that scales with your services while maintaining low latency and high throughput. Keep your configs version‑controlled, monitor key metrics, and iterate on tuning parameters as traffic patterns evolve. The result is a battle‑tested load‑balancing layer that empowers your micro‑service ecosystem to deliver reliable experiences to end users.
