engineeringbackendarchitectureindie dev

Errors Are Data

20 March 2026

The most common error-handling pattern I see in production codebases is the empty catch block. Something throws, you catch it, you return null or an empty array or nothing at all. The system keeps running. Nobody knows anything went wrong.

This feels defensive. It isn't. It's how silent failures happen.

Errors are information

A 404 tells you someone followed a broken link. A validation error tells you what users are actually sending. A timeout tells you where your dependencies are slow. An unexpected null tells you an assumption in your code is wrong.

None of these are embarrassments to be hidden. They're signals about what your system is doing in the real world, under real conditions, with real users. The instinct to suppress them is the instinct to fly blind.

Two kinds of errors

Most error-handling problems come from conflating two fundamentally different things.

Operational errors are expected. A file isn't found. A user submits invalid data. A third-party API returns a 503. These are part of normal system behavior. You handle them gracefully, return a meaningful response, and move on. They're not bugs — they're cases your system should know how to manage.

Programmer errors are not expected. A null reference where null should never exist. An assertion that should always hold. A code path that should never be reached. These aren't cases to handle gracefully — they're cases to surface loudly, because something is wrong with the code itself.

Catch blocks that treat both the same way are the root of most observability problems. Operational errors deserve handlers. Programmer errors deserve noise.

What to log, what to propagate, what to ignore

For operational errors at system boundaries — API endpoints, webhook handlers, external integrations — I log at info level with enough context to understand what happened: the input, the error, the response sent back. Not a stack trace. Just the facts.

For programmer errors, I log everything. Stack trace, request context, relevant state. Then I let the error propagate, or crash if it's a background job. A process that dies loudly is easier to fix than one that continues in a corrupt state.

There's a third category: errors I expect to happen at a known rate and don't need to act on. A retry that succeeds on the second attempt. A cache miss. These I log at debug level or skip entirely. The signal-to-noise ratio in your logs matters — if everything is an error, nothing is.

The silent failure trap

The most dangerous errors aren't the ones that crash your server. They're the ones that don't surface at all.

A function that returns null instead of throwing, a catch block that swallows an exception and returns a default — these let corrupt state travel deep into the system. By the time you notice something is wrong, the original error is gone. What you're left with is a symptom somewhere far removed from the cause.

The fix is to validate early and fail at the boundary. Don't pass half-formed data into your system and hope it sorts itself out downstream. Surface the problem where it originates.

Observability without infrastructure

You don't need Datadog on day one. What you need is structured logs with enough context to reproduce the problem: an error ID you can search for, the input that triggered it, and a timestamp. That's most of what a proper observability stack gives you, at the cost of a few lines of logging code.

I've debugged production issues in Clover, in Barba Studio, in integrations with the Italian SDI — almost always by reading logs. Not dashboards, not distributed traces. Logs with enough context to understand what happened.

When errors mean something

Treating errors as data rather than exceptions to suppress changes how you build. You start thinking about what each failure mode tells you. You log with intent instead of logging for coverage. You notice patterns — certain errors clustering around certain inputs, certain integrations failing at certain rates.

The system doesn't get quieter. It gets more legible. And a legible system is one you can actually fix.