A comprehensive guide to using PM2 Cluster Mode for production‑grade Node.js deployments, covering architecture, implementation, monitoring, and troubleshooting.

Understanding PM2 Cluster Mode

What Is PM2 Cluster Mode?

PM2 is a production‑process manager for Node.js applications. In Cluster Mode, PM2 leverages the Node.js cluster module to spawn multiple child processes that share a single server port. Each worker runs a separate instance of your application, allowing you to fully utilize multi‑core CPUs without writing any additional clustering code.

Why Choose Cluster Mode Over Fork Mode?

Feature	Fork Mode	Cluster Mode
CPU Utilization	Single core per process	All cores can be used
Load Balancing	Manual or external	Built‑in round‑robin
Memory Isolation	Each process isolated	Same memory space for master, workers isolated
Restart Behaviour	Individual process restart	Master restarts all workers

Cluster Mode is ideal for stateless HTTP services where each request can be handled independently. It also simplifies horizontal scaling within a single host.

How PM2 Manages Workers

Master Process - Starts first, reads the ecosystem configuration, and forks the desired number of workers.
Worker Processes - Each worker runs a copy of your Node.js app and listens on the same port.
Load Balancer - PM2’s internal round‑robin algorithm distributes incoming connections across workers.
Health Checks - PM2 monitors CPU, memory, and event loop latency. Unresponsive workers are automatically restarted.

Key Metrics to Watch

CPU % per worker - Ensure no single worker exceeds 70‑80% of a core.
Memory usage - Keep each worker below your container's limit to avoid OOM.
Event Loop Lag - Values above 100 ms usually indicate bottlenecks.

Understanding these fundamentals sets the stage for designing a resilient production architecture.

Designing a Production‑Ready Architecture

High‑Level Diagram

+-----------------------+ +----------------------------+ | Load Balancer (NGINX) | <---> | PM2 Cluster (Node.js) | +-----------------------+ | - Master Process | | - Worker #1 (CPU 0) | | - Worker #2 (CPU 1) | | - Worker #3 (CPU 2) | | - Worker #4 (CPU 3) | +----------------------------+ | +------------------+ | Redis (Cache) | +------------------+ | +------------------+ | PostgreSQL DB | +------------------+

The architecture relies on a reverse proxy (NGINX) to terminate TLS, handle HTTP‑2, and provide rate‑limiting. PM2 runs as the process manager inside a Docker container, exposing the application on port 3000. Workers share the same port, so NGINX forwards traffic to a single internal endpoint.

Choosing the Number of Workers

A common rule of thumb is number_of_cores * 2 for CPU‑bound workloads, or number_of_cores for I/O‑bound services. Over‑provisioning can cause context‑switch thrashing. Example for a 4‑core VM:

bash

Deploy 4 workers (ideal for I/O‑heavy APIs)

pm2 start ecosystem.config.js --env production --instances 4

Environment Configuration

Store sensitive values (DB credentials, API keys) in environment variables or a secret manager (HashiCorp Vault, AWS Secrets Manager). PM2 can reference these variables directly from the ecosystem file.

{ "apps": [{ "name": "api-service", "script": "dist/index.js", "instances": "max", "exec_mode": "cluster", "env_production": { "NODE_ENV": "production", "PORT": "3000", "DB_HOST": "${DB_HOST}", "REDIS_URL": "${REDIS_URL}" }, "max_memory_restart": "300M", "log_date_format": "YYYY-MM-DD HH:mm Z", "error_file": "/var/log/pm2/api-error.log", "out_file": "/var/log/pm2/api-out.log" }] }

Resilience Patterns

Graceful Shutdown - Capture SIGINT/SIGTERM in your Node app to close DB connections and stop accepting new requests.
Zero‑Downtime Restarts - Use pm2 reload ecosystem.config.js --env production to reload workers one‑by‑one.
Health Checks - Expose /healthz endpoint that returns 200 only when DB and cache connections are healthy. Configure NGINX proxy_next_upstream to skip unhealthy workers.

Monitoring & Observability

PM2‑plus / Keymetrics - Provides real‑time dashboards for CPU, memory, event‑loop, and custom metrics.
Prometheus Exporter - Run pm2-prometheus module to scrape metrics.
Log Aggregation - Forward STDOUT/STDERR to Elasticsearch via Filebeat, or use Loki.

Implementing these patterns transforms a simple cluster into a production‑grade service capable of handling traffic spikes and automatic recovery.

Implementing and Managing PM2 in Cluster Mode

Step‑by‑Step Deployment

Containerize the Application

Create a Dockerfile that installs dependencies, builds the project, and installs PM2 globally.

dockerfile FROM node:20-alpine AS builder WORKDIR /app COPY package*.json ./ RUN npm ci COPY . . RUN npm run build

FROM node:20-alpine WORKDIR /app COPY --from=builder /app/dist ./dist COPY --from=builder /app/package*.json ./ RUN npm ci --only=production && npm install -g pm2 COPY ecosystem.config.js ./ EXPOSE 3000 CMD ["pm2-runtime", "ecosystem.config.js", "--env", "production"]

Create the Ecosystem File - See the JSON snippet in the previous section. Save it as ecosystem.config.js.
Build and Run

bash docker build -t myapi:latest . docker run -d --name myapi
-p 3000:3000
-e DB_HOST=db.example.com
-e REDIS_URL=redis://cache:6379
myapi:latest

pm2-runtime ensures that the container process exits only when PM2 stops, making Docker aware of crashes.

Managing Workers at Runtime

Adding or Removing Instances

bash

Scale up to 8 workers

pm2 scale api-service 8

Scale down to 4 workers

pm2 scale api-service 4

Zero‑Downtime Reload

bash pm2 reload api-service --env production

PM2 will reload each worker sequentially, preserving existing connections.

Checking Logs

bash pm2 logs api-service --lines 100

Monitoring via CLI

bash pm2 monit

You’ll see CPU, memory, and event‑loop lag per worker in real time.

Code Example: Graceful Shutdown

// src/server.ts
import http from 'http';
import app from './app';

const server = http.createServer(app); const PORT = process.env.PORT || 3000;

server.listen(PORT, () => { console.log(Server listening on port ${PORT}); });

// Graceful termination function shutdown(signal: string) { console.log(Received ${signal}. Closing server...); server.close(() => { console.log('HTTP server closed. Exiting process.'); process.exit(0); }); // Force exit after 10 seconds setTimeout(() => process.exit(1), 10000); }

process.on('SIGINT', () => shutdown('SIGINT')); process.on('SIGTERM', () => shutdown('SIGTERM'));

PM2 forwards termination signals to the master, which then relays them to all workers, ensuring a clean shutdown.

Automation with CI/CD

Integrate the Docker build and push steps into your pipeline (GitHub Actions, GitLab CI, or Jenkins). Sample GitHub Actions snippet:

yaml name: CI on: push: branches: [ main ] jobs: build: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Set up Node uses: actions/setup-node@v3 with: node-version: '20' - name: Install dependencies run: npm ci - name: Build run: npm run build - name: Build Docker image run: | docker build -t ghcr.io/${{ github.repository }}:${{ github.sha }} . echo ${{ secrets.GITHUB_TOKEN }} | docker login ghcr.io -u ${{ github.actor }} --password-stdin docker push ghcr.io/${{ github.repository }}:${{ github.sha }}

Deploy the new image using a rolling update in Kubernetes or a Docker Swarm service, relying on PM2’s zero‑downtime reload to keep traffic flowing.

FAQs

Frequently Asked Questions

1. When should I use PM2 Cluster Mode versus Kubernetes native scaling?

Cluster Mode shines in single‑host setups where you need to maximize CPU utilization without orchestration overhead. For multi‑node, high‑availability environments, Kubernetes provides pod‑level replication, service discovery, and built‑in health checks. You can still run PM2 inside each pod to benefit from its process‑level monitoring and graceful reload capabilities.

2. How does PM2 handle stateful connections (e.g., WebSocket) in Cluster Mode?

PM2’s internal round‑robin balances TCP connections across workers. For long‑lived WebSocket connections, it’s recommended to use a sticky‑session proxy (NGINX ip_hash or HAProxy balance source) so that subsequent frames from the same client always reach the same worker. Alternatively, external message brokers (Redis Pub/Sub) can synchronize state across workers.

3. What is the impact of `max_memory_restart` on production stability?

max_memory_restart forces PM2 to restart a worker when its RSS exceeds the configured threshold. This prevents memory leaks from degrading the host. Set the limit slightly below your container’s memory limit (e.g., 300 MB for a 512 MB container) to give the OS time to reclaim resources before an OOM kill occurs.

4. Can I mix fork and cluster modes in the same ecosystem file?

Yes. PM2 allows each app entry to define its own exec_mode. You might keep a small background job in fork mode while the main API runs in cluster mode. Just ensure that port conflicts are avoided and resource limits are appropriately set.

Conclusion

Wrapping Up

PM2 Cluster Mode offers a pragmatic path to multi‑core scalability for Node.js services without the complexity of manual clustering code. By leveraging a well‑defined architecture-NGINX front‑end, PM2‑managed workers, and supporting services like Redis and PostgreSQL-you can achieve high availability, zero‑downtime updates, and robust observability.

Key takeaways:

Start with the right number of workers based on workload characteristics.
Implement graceful shutdown and health endpoints to ensure smooth rolling restarts.
Use PM2‑plus or Prometheus for real‑time metrics, and forward logs to a central system.
Integrate PM2 into your CI/CD pipeline using Docker and pm2-runtime for container‑native process management.
Combine PM2 with orchestration tools when scaling beyond a single host, keeping the benefits of both worlds.

When applied correctly, PM2 Cluster Mode transforms a simple Node.js application into a production‑grade service capable of handling traffic spikes, automatic recovery, and continuous deployment-all while maintaining a clean, maintainable codebase.

home

about

Experience

Work

Contact

Blog

PM2 Cluster Mode in Production – A Real‑World Implementation Guide

Understanding PM2 Cluster Mode

What Is PM2 Cluster Mode?

Why Choose Cluster Mode Over Fork Mode?

How PM2 Manages Workers

Key Metrics to Watch

Designing a Production‑Ready Architecture

High‑Level Diagram

Choosing the Number of Workers

Deploy 4 workers (ideal for I/O‑heavy APIs)

Environment Configuration

Resilience Patterns

Monitoring & Observability

Implementing and Managing PM2 in Cluster Mode

Step‑by‑Step Deployment

Managing Workers at Runtime

Adding or Removing Instances

Scale up to 8 workers

Scale down to 4 workers

Zero‑Downtime Reload

Checking Logs

Monitoring via CLI

Code Example: Graceful Shutdown

Automation with CI/CD

FAQs

Frequently Asked Questions

1. When should I use PM2 Cluster Mode versus Kubernetes native scaling?

2. How does PM2 handle stateful connections (e.g., WebSocket) in Cluster Mode?

3. What is the impact of `max_memory_restart` on production stability?

4. Can I mix fork and cluster modes in the same ecosystem file?

Conclusion

Wrapping Up

home

about

Experience

Work

Contact

Blog

PM2 Cluster Mode in Production – A Real‑World Implementation Guide

Understanding PM2 Cluster Mode

What Is PM2 Cluster Mode?

Why Choose Cluster Mode Over Fork Mode?

How PM2 Manages Workers

Key Metrics to Watch

Designing a Production‑Ready Architecture

High‑Level Diagram

Choosing the Number of Workers

Deploy 4 workers (ideal for I/O‑heavy APIs)

Environment Configuration

Resilience Patterns

Monitoring & Observability

Implementing and Managing PM2 in Cluster Mode

Step‑by‑Step Deployment

Managing Workers at Runtime

Adding or Removing Instances

Scale up to 8 workers

Scale down to 4 workers

Zero‑Downtime Reload

Checking Logs

Monitoring via CLI

Code Example: Graceful Shutdown

Automation with CI/CD

FAQs

Frequently Asked Questions

1. When should I use PM2 Cluster Mode versus Kubernetes native scaling?

2. How does PM2 handle stateful connections (e.g., WebSocket) in Cluster Mode?

3. What is the impact of max_memory_restart on production stability?

4. Can I mix fork and cluster modes in the same ecosystem file?

Conclusion

Wrapping Up

3. What is the impact of `max_memory_restart` on production stability?