Skip to main content

Command Palette

Search for a command to run...

Scaling Node.js Applications with the Cluster Module

Updated
7 min read

Node.js is widely known for its non-blocking I/O model and event-driven architecture. It excels at handling thousands of concurrent connections with minimal overhead.

But there’s a catch.

By default, a Node.js application runs on a single CPU core. Even if your machine has 8, 16, or 32 cores sitting idle.

This blog dives deep into how the Node.js Cluster Module solves this problem without Docker, Kubernetes, or external load balancers, and how you can use it to build high-performance, resilient backend systems.


The Problem: Underutilized CPU Cores

Modern servers are multi-core by design. However, when you start a typical Node.js server you spawn one OS process tied to one CPU core.

That means:

  • 8-core machine → ❌ 7 cores unused

  • CPU-bound routes → ❌ event loop blocked

    💡
    A CPU-bound route is an API endpoint that spends most of its time doing calculations and uses the CPU continuously.
    💡
    Event loop’s job is to pick the next task, execute it, move on to the next one, repeat forever. Node.js runs your JavaScript on one main thread using an event loop.
  • High traffic → ❌ request pile-ups & timeouts

What happens under load?

Imagine an API endpoint performing a CPU-heavy task (hashing, encryption, loops, image processing). WIth a lot of requests and a lot of concurrent users, a single Nodejs Process quickly becomes a bottleneck.

💡
Node js creates a single process by default that runs on single core with a single main thread.

A single Node.js process quickly becomes a bottleneck:

  • Requests queue up

  • Event loop stalls

  • Latency explodes

  • Requests start failing


The Solution: Node.js Clustering

Node.js ships with a built-in clustering mechanism via the cluster module.

Instead of one process → one core, clustering allows you to run multiple Node.js processes (workers) across all CPU cores

High-level architecture

  • Primary Process

    • Manages the cluster

    • Spawns worker processes

    • Distributes incoming connections

  • Worker Processes

    • Run identical copies of your app

    • Handle requests independently

    • Are isolated from each other

This model is process-based parallelism, not threading.


Lets Code and test the server in both the cases. Command for testing is :

npx loadtest -n 'total_requests' -c 'concurrent_requests' -k 'url_for_request'
npx loadtest -n 1200 -c 400 -k http://localhost:8080/heavy

Before Clustering: Single Process Express App

Here’s a simplified Express app with a CPU-intensive route:

import express from "express"

const app = express();
const port = 8080;

app.get("/heavy", (req, res) => {
    let total = 0;
    for(let i =0; i<50_000_000; i++){
        total += i;
    }
    res.send(`Result of heavy task = ${total}`);
})

app.listen(port, () => {
    console.log("App listening on Port " + port);
})

run the server using node index.js and test using the loadtest command.

Load Test Results (Single Process)

  • ~50% request failure

  • Mean latency ~30 seconds

  • ❌ CPU fully saturated on one core

This is not a Node.js weakness — it’s a deployment choice.


After Clustering: Using the Cluster Module

Step 1: Create the Primary Process

import cluster from 'cluster'
import os from 'os'

const cpuCount = os.cpus().length

if(cluster.isPrimary){
    console.log(`Primary ${process.pid} is running`)
    console.log(`Forking ${cpuCount} workers`)

    for(let i=0; i<cpuCount; i++){
        cluster.fork(); // used to create child processes
    }

    cluster.on("exit", (worker, code, signal) => {
        console.log(`Worker ${worker.process.pid} has beed killed`)
        console.log("Starting new Worker")
        cluster.fork();
    })
}
else {
    import('./index.js')
}

Step 2: Keep index.js unchanged just add one line

app.listen(port, () => {
    console.log("App listening on Port " + port);
    console.log("Worker pid = ", process.pid)
})

Run the file primary.js using node primary.js , it will create n numbers of child processes. Each process act as a server listening to port 8080.

Explanation :

When you run primary.js node js creates a single process to run the file on single core. But the primary.js file creates child processes using fork() method. These child process are not different but same as their parent first main process that was created immediately after running primary.js file. As the child processes are same as their parent but they have cluster.isPrimary attribute false. So they will lie in else block of code. And in else block we have imported index.js file. Index.js was the main file before stepping into making of clusters. so index.js will be ran by all child processes (worker clusters).

Now you have N workers all acting as independent server and all listening to the same Port 8080.

How all Processes are listening to the same port 8080
In a Node.js cluster, all worker processes can listen on the same port (for example, 8080) without causing conflicts because the port is actually owned and managed by the operating system, not by an individual process. When clustering is enabled, Node.js coordinates with the OS to create a shared listening socket, which all worker processes inherit. Incoming network connections are first received by the OS kernel on that port and then distributed across the available workers using low-level scheduling mechanisms. Each worker has its own event loop and handles requests independently, while the primary process never handles HTTP traffic itself. This kernel-level load distribution allows true parallel request handling across CPU cores without requiring an application-level reverse proxy.

How is load balancing done? It is platform-dependent behavior.

PlatformDistribution
LinuxOS-level scheduling
macOSOS-level scheduling
WindowsNode round-robin

Node may enable round-robin internally, but the key idea is:

The primary process is NOT acting as a reverse proxy (load balancer)


Load Test Results with Clustering

Same test:

  • 1200 total requests

  • 400 concurrent users

Results 🚀

Zero failed requests
Mean latency ~5 seconds
~6x performance improvement
Automatic worker recovery


Built-in Fault Tolerance (Underrated Feature)

If a worker crashes due to:

  • Out-of-memory error

  • Unexpected exception

  • Process kill

The primary process:

  • Detects the exit

  • Forks a new worker instantly

  • Keeps the service alive

This gives you self-healing behavior without Kubernetes.


Summary

When you run a Node.js application in cluster mode, a single primary process is responsible only for managing worker processes. It does not handle HTTP requests itself.

The primary process forks multiple worker processes, usually equal to the number of CPU cores. Each worker runs the same application code and listens on the same port (for example, 8080). Despite multiple processes binding to the same port, there is no conflict because the operating system and Node.js runtime coordinate access to the shared socket.

When a client sends a request to port 8080, the request is first received by the OS kernel, which then distributes incoming connections across the available worker processes. Each worker has its own event loop and can handle requests independently, allowing true parallelism across CPU cores.

This design ensures:

  • No single application-level load balancer

  • No HTTP traffic handled by the primary process

  • Efficient, kernel-level request distribution

  • Improved throughput and fault isolation

In short, clients send requests to a single port, and the OS automatically load-balances them across worker processes, while the primary process focuses solely on process management and reliability.

Cluster Module vs Worker Threads

FeatureClusterWorker Threads
ParallelismMulti-processMulti-thread
MemoryIsolatedShared
Best forHTTP serversCPU tasks
Crash isolation✅ Yes❌ No
IPCMessage passingShared memory

Rule of thumb

  • HTTP scaling → Cluster

  • CPU-heavy computation → Worker Threads

They solve different problems. Learn about cluster in this blog Multithreading in Node.js: Fixing CPU Bottlenecks with Worker Threads


Production Shortcut: PM2

PM2 is a production-grade process manager that wraps clustering for you.

pm2 start index.js -i max

What PM2 gives you:

  • Automatic clustering

  • Log management

  • Process restarts

  • Zero-downtime reloads

  • Monitoring dashboard

Under the hood → it still uses the cluster module.


When Should You Use Cluster?

✅ Multi-core servers
✅ High traffic APIs
✅ CPU-heavy routes
✅ No container orchestration

When Should You not Use Cluster?

❌ Serverless environments
❌ Stateless background jobs only


Feel Free to comment and share. Happy Coding :)