Skip to main content

Command Palette

Search for a command to run...

Production-Grade Error Handling in Node.js

Published
6 min read

A complete guide to building stable, debuggable, and resilient Node.js applications


Why Error Handling in Node.js Is Hard (and Why Most Apps Get It Wrong)

Error handling in Node.js is deceptively complex.

Unlike simple scripts, production Node.js applications run:

  • For long periods

  • Across multiple environments (local, staging, production)

  • With concurrency, async operations, and distributed systems

  • Under load, with real users, real money, and real consequences

A single unhandled error can:

  • Crash the entire process

  • Corrupt application state

  • Leak sensitive data

  • Cause cascading failures across services

Yet most Node.js apps still rely on:

throw "Something went wrong";
console.log(err);

That approach does not scale.

This blog walks through battle-tested error handling architecture used in real production systems.


Core Philosophy of Production Error Handling

Before code, let’s set the mindset.

A production-grade error handling system should:

  1. Distinguish expected vs unexpected errors

  2. Preserve debugging context

  3. Centralize error processing

  4. Fail fast on programmer errors

  5. Fail gracefully on operational errors

  6. Be observable (logs, metrics, alerts)

  7. Never leave the system in an inconsistent state

Everything below flows from these principles.


The Biggest Anti-Pattern: Throwing Primitive Errors

❌ Bad Practice

throw "Invalid user";
throw 404;
throw true;

Why this is terrible:

  • No stack trace

  • No error type

  • No metadata

  • Impossible to standardize

  • Impossible to monitor reliably

This leads to low-quality, unmaintainable, and brittle code.


Slightly Better, Still Not Enough: Native Error

⚠️ Okay Practice

throw new Error("User not found");

What improves:

  • Stack trace exists

  • Consistent structure

What’s still missing:

  • Error codes

  • HTTP status

  • Operational vs programmer distinction

  • Contextual metadata

  • Consistent API responses

In production, this is not sufficient.


The Foundation: Custom AppError Class

The correct approach is to extend JavaScript’s Error object.

✅ Production-Grade AppError

class AppError extends Error {
  constructor({
    name = "AppError",
    message,
    statusCode = 500,
    errorCode,
    isOperational = true,
    details = {}
  }) {
    super(message);

    this.name = name;
    this.statusCode = statusCode;
    this.errorCode = errorCode;
    this.isOperational = isOperational;
    this.details = details;

    Error.captureStackTrace(this, this.constructor);
  }
}

Why this matters

This single abstraction unlocks:

  • Consistent error structure

  • Safer process decisions

  • Better logging

  • Cleaner API responses

  • Easier monitoring & alerting


Operational vs Programmer Errors (Critical Concept)

Operational Errors

Errors we expect and handle:

  • Invalid input

  • Database connection failure

  • Network timeout

  • Authentication failure

throw new AppError({
  name: "ValidationError",
  message: "Email is invalid",
  statusCode: 400,
  errorCode: "INVALID_EMAIL",
  isOperational: true
});

Programmer Errors

Errors that indicate broken code:

  • Undefined variables

  • Logic bugs

  • Invalid assumptions

  • Memory corruption

throw new AppError({
  name: "InvariantViolation",
  message: "User must exist here",
  isOperational: false
});

👉 Rule: Operational errors = handle & respond Programmer errors = log & crash


Centralized Error Handling (Non-Negotiable)

❌ The Wrong Way

Putting everything inside middleware:

app.use((err, req, res, next) => {
  console.log(err);
  sendEmail(err);
  writeToFile(err);
  res.status(500).send("Error");
});

This becomes:

  • Coupled

  • Unreadable

  • Impossible to test

  • Hard to evolve


✅ The Right Way: Error Handler Service

errorHandler.js

function errorHandler(err) {
  logger.error({
    name: err.name,
    message: err.message,
    stack: err.stack,
    details: err.details
  });

  metrics.increment("errors.total");

  if (!err.isOperational) {
    // Critical failure
    process.exit(1);
  }
}

Express Middleware

app.use((err, req, res, next) => {
  errorHandler(err);

  res.status(err.statusCode || 500).json({
    success: false,
    error: {
      code: err.errorCode,
      message: err.message
    }
  });
});

Why this architecture works

  • Single responsibility

  • Reusable across contexts

  • Works for HTTP, background jobs, workers

  • Easier to test

  • Safer process lifecycle management


Uncaught Exceptions: When the Process Is Corrupted

The Reality

An uncaught exception means:

Your application state is unreliable

Continuing execution is dangerous.


Graceful Handling Strategy

process.on("uncaughtException", (err) => {
  errorHandler(err);
});

Inside errorHandler:

  • Log error

  • Send alerts

  • Flush logs

  • Close DB connections

  • Exit process if untrusted

Why exit is GOOD

  • Container orchestration (Docker, Kubernetes, PM2) will restart

  • Clean memory

  • Known state

  • Faster recovery

Crashing fast is safer than limping forever


Unhandled Promise Rejections (Async Hell)

The Problem

async function getUser() {
  await db.query("INVALID SQL");
}

No try/catch → silent failure → undefined state.


Global Handling

process.on("unhandledRejection", (reason) => {
  throw reason;
});

This forwards async failures into uncaughtException, giving you:

  • One centralized failure path

  • One restart strategy

  • One alerting pipeline


Input Validation: Stop Errors at the Boundary

Most errors are client-generated.

Validate Early (Data Ingress)

Using schema validation:

  • Reject invalid input immediately

  • Avoid downstream failures

  • Reduce security risks

Example with Joi:

const schema = Joi.object({
  email: Joi.string().email().required(),
  password: Joi.string().min(8).required()
});

On validation failure:

throw new AppError({
  name: "ValidationError",
  message: "Invalid request body",
  statusCode: 400,
  errorCode: "INVALID_INPUT"
});

Logging: console.log Is Not Logging

Why console.log fails in production

  • No log levels

  • No structure

  • No rotation

  • No async safety

  • No aggregation

Use Structured Logging

Good loggers provide:

  • JSON output

  • Log levels

  • Stack traces

  • Async safety

Logs should include:

  • requestId

  • userId (if available)

  • errorCode

  • stack trace

  • environment

  • timestamp


Monitoring & Observability

Logging alone is not enough.

You need:

  • Error rates

  • Latency

  • Uptime

  • Crash frequency

What to monitor

  • 5xx error spikes

  • Uncaught exceptions

  • Restart frequency

  • Slow endpoints

  • Memory usage

Logs + metrics + traces = observability


Error Handling in Tests (Often Ignored)

Most tests cover only: ✅ happy paths

Production systems fail in: ❌ unhappy paths

You should test:

  • Invalid inputs

  • DB failures

  • Network timeouts

  • Authorization errors

  • Logging side-effects

Example:

expect(() => service.execute()).toThrow(AppError);

If errors aren’t tested, they aren’t reliable.


API Error Documentation

Your API is incomplete without documented errors.

For each endpoint:

  • Possible error codes

  • HTTP status

  • Meaning

  • Recovery strategy

This helps:

  • Frontend teams

  • External clients

  • Debugging

  • Support engineers


Beyond the Video: Advanced Production Practices

1. Error Correlation IDs

Attach a unique ID per request to trace errors across services.

2. Error Normalization Layer

Convert third-party errors into AppError consistently.

3. Retry Strategies

Only retry idempotent operations.

4. Circuit Breakers

Prevent cascading failures when dependencies go down.

5. Fail Fast on Startup

Crash if:

  • Env vars missing

  • DB unreachable

  • Migrations incomplete


Final Mental Model

Think of error handling as infrastructure, not a feature.

A good Node.js system:

  • Treats errors as data

  • Handles expected failures gracefully

  • Crashes intentionally on corruption

  • Is observable and debuggable

  • Recovers automatically

The best Node.js apps don’t avoid crashes — they control them.