Production-Grade Error Handling in Node.js
A complete guide to building stable, debuggable, and resilient Node.js applications
Why Error Handling in Node.js Is Hard (and Why Most Apps Get It Wrong)
Error handling in Node.js is deceptively complex.
Unlike simple scripts, production Node.js applications run:
For long periods
Across multiple environments (local, staging, production)
With concurrency, async operations, and distributed systems
Under load, with real users, real money, and real consequences
A single unhandled error can:
Crash the entire process
Corrupt application state
Leak sensitive data
Cause cascading failures across services
Yet most Node.js apps still rely on:
throw "Something went wrong";
console.log(err);
That approach does not scale.
This blog walks through battle-tested error handling architecture used in real production systems.
Core Philosophy of Production Error Handling
Before code, let’s set the mindset.
A production-grade error handling system should:
Distinguish expected vs unexpected errors
Preserve debugging context
Centralize error processing
Fail fast on programmer errors
Fail gracefully on operational errors
Be observable (logs, metrics, alerts)
Never leave the system in an inconsistent state
Everything below flows from these principles.
The Biggest Anti-Pattern: Throwing Primitive Errors
❌ Bad Practice
throw "Invalid user";
throw 404;
throw true;
Why this is terrible:
No stack trace
No error type
No metadata
Impossible to standardize
Impossible to monitor reliably
This leads to low-quality, unmaintainable, and brittle code.
Slightly Better, Still Not Enough: Native Error
⚠️ Okay Practice
throw new Error("User not found");
What improves:
Stack trace exists
Consistent structure
What’s still missing:
Error codes
HTTP status
Operational vs programmer distinction
Contextual metadata
Consistent API responses
In production, this is not sufficient.
The Foundation: Custom AppError Class
The correct approach is to extend JavaScript’s Error object.
✅ Production-Grade AppError
class AppError extends Error {
constructor({
name = "AppError",
message,
statusCode = 500,
errorCode,
isOperational = true,
details = {}
}) {
super(message);
this.name = name;
this.statusCode = statusCode;
this.errorCode = errorCode;
this.isOperational = isOperational;
this.details = details;
Error.captureStackTrace(this, this.constructor);
}
}
Why this matters
This single abstraction unlocks:
Consistent error structure
Safer process decisions
Better logging
Cleaner API responses
Easier monitoring & alerting
Operational vs Programmer Errors (Critical Concept)
Operational Errors
Errors we expect and handle:
Invalid input
Database connection failure
Network timeout
Authentication failure
throw new AppError({
name: "ValidationError",
message: "Email is invalid",
statusCode: 400,
errorCode: "INVALID_EMAIL",
isOperational: true
});
Programmer Errors
Errors that indicate broken code:
Undefined variables
Logic bugs
Invalid assumptions
Memory corruption
throw new AppError({
name: "InvariantViolation",
message: "User must exist here",
isOperational: false
});
👉 Rule: Operational errors = handle & respond Programmer errors = log & crash
Centralized Error Handling (Non-Negotiable)
❌ The Wrong Way
Putting everything inside middleware:
app.use((err, req, res, next) => {
console.log(err);
sendEmail(err);
writeToFile(err);
res.status(500).send("Error");
});
This becomes:
Coupled
Unreadable
Impossible to test
Hard to evolve
✅ The Right Way: Error Handler Service
errorHandler.js
function errorHandler(err) {
logger.error({
name: err.name,
message: err.message,
stack: err.stack,
details: err.details
});
metrics.increment("errors.total");
if (!err.isOperational) {
// Critical failure
process.exit(1);
}
}
Express Middleware
app.use((err, req, res, next) => {
errorHandler(err);
res.status(err.statusCode || 500).json({
success: false,
error: {
code: err.errorCode,
message: err.message
}
});
});
Why this architecture works
Single responsibility
Reusable across contexts
Works for HTTP, background jobs, workers
Easier to test
Safer process lifecycle management
Uncaught Exceptions: When the Process Is Corrupted
The Reality
An uncaught exception means:
Your application state is unreliable
Continuing execution is dangerous.
Graceful Handling Strategy
process.on("uncaughtException", (err) => {
errorHandler(err);
});
Inside errorHandler:
Log error
Send alerts
Flush logs
Close DB connections
Exit process if untrusted
Why exit is GOOD
Container orchestration (Docker, Kubernetes, PM2) will restart
Clean memory
Known state
Faster recovery
Crashing fast is safer than limping forever
Unhandled Promise Rejections (Async Hell)
The Problem
async function getUser() {
await db.query("INVALID SQL");
}
No try/catch → silent failure → undefined state.
Global Handling
process.on("unhandledRejection", (reason) => {
throw reason;
});
This forwards async failures into uncaughtException, giving you:
One centralized failure path
One restart strategy
One alerting pipeline
Input Validation: Stop Errors at the Boundary
Most errors are client-generated.
Validate Early (Data Ingress)
Using schema validation:
Reject invalid input immediately
Avoid downstream failures
Reduce security risks
Example with Joi:
const schema = Joi.object({
email: Joi.string().email().required(),
password: Joi.string().min(8).required()
});
On validation failure:
throw new AppError({
name: "ValidationError",
message: "Invalid request body",
statusCode: 400,
errorCode: "INVALID_INPUT"
});
Logging: console.log Is Not Logging
Why console.log fails in production
No log levels
No structure
No rotation
No async safety
No aggregation
Use Structured Logging
Good loggers provide:
JSON output
Log levels
Stack traces
Async safety
Logs should include:
requestId
userId (if available)
errorCode
stack trace
environment
timestamp
Monitoring & Observability
Logging alone is not enough.
You need:
Error rates
Latency
Uptime
Crash frequency
What to monitor
5xxerror spikesUncaught exceptions
Restart frequency
Slow endpoints
Memory usage
Logs + metrics + traces = observability
Error Handling in Tests (Often Ignored)
Most tests cover only: ✅ happy paths
Production systems fail in: ❌ unhappy paths
You should test:
Invalid inputs
DB failures
Network timeouts
Authorization errors
Logging side-effects
Example:
expect(() => service.execute()).toThrow(AppError);
If errors aren’t tested, they aren’t reliable.
API Error Documentation
Your API is incomplete without documented errors.
For each endpoint:
Possible error codes
HTTP status
Meaning
Recovery strategy
This helps:
Frontend teams
External clients
Debugging
Support engineers
Beyond the Video: Advanced Production Practices
1. Error Correlation IDs
Attach a unique ID per request to trace errors across services.
2. Error Normalization Layer
Convert third-party errors into AppError consistently.
3. Retry Strategies
Only retry idempotent operations.
4. Circuit Breakers
Prevent cascading failures when dependencies go down.
5. Fail Fast on Startup
Crash if:
Env vars missing
DB unreachable
Migrations incomplete
Final Mental Model
Think of error handling as infrastructure, not a feature.
A good Node.js system:
Treats errors as data
Handles expected failures gracefully
Crashes intentionally on corruption
Is observable and debuggable
Recovers automatically
The best Node.js apps don’t avoid crashes — they control them.