engineeringdatabasesmongodbarchitecture

How I Think About Database Design

12 March 2026

Every schema decision you make early in a project has a half-life of years. You can rename a variable in ten seconds. Changing how a core entity is structured in your database — after it's in production, with real data, with queries built around it — is expensive in a way that most other technical decisions aren't.

This makes database design one of the few areas where it's worth thinking carefully before you write the first line of code. Not overengineering — thinking. There's a difference.

Start with queries, not entities

The natural instinct when designing a database is to start with the things in your domain. A barbershop app has barbers, clients, services, appointments. A messaging app has users, rooms, messages. You draw boxes and relationships between them. It feels like good design.

The problem is that this approach optimizes for how you think about the data, not how you'll access it. And the database doesn't care how you think — it only knows how you query.

The question I ask before touching a schema is: what are the ten most common queries this application will run? Not hypothetically — specifically. "Get all upcoming appointments for a given barber, sorted by time." "Get the last 50 messages in a room, with sender names." "Get total revenue per client for the current month."

If you can write those queries down, you can design a schema that serves them. If you can't, you're guessing — and guessing at the schema level is expensive to correct.

The normalization trap

Relational database theory pushes toward normalization: each piece of data in one place, relationships expressed through joins. It's intellectually clean. In practice, it often produces schemas that are painful to query.

A fully normalized schema for a messaging application might have users, rooms, memberships, messages, and attachments as separate tables. Every message read requires joining across three or four of them. For an app where message reads happen constantly and at scale, you're paying that join cost on every request.

Denormalization isn't laziness. It's a deliberate tradeoff: you accept redundancy in your stored data to make your reads faster and simpler. In a document database like MongoDB, this is often the right default. Embedding a sender's name and avatar directly in a message document means one read instead of a join. The cost is that if a user changes their name, you have to update it in multiple places — but users change their names rarely. Reads happen on every page load.

Model for the common case, not the edge case.

Embedding vs. referencing

In a document database, the central design question is whether to embed related data or reference it. The rule I use: embed if you always read the data together; reference if you read it independently.

In Barba Studio, an appointment always includes the service details — duration, name, price. I embed those at the time of booking rather than referencing the service document. This means that if a barber changes a service's price later, existing appointments keep the price they were booked at. That's actually the correct behavior: a booking is a snapshot of what was agreed, not a live reference to current pricing.

In Clover and Argan, messages reference room and user IDs rather than embedding full documents. You often query messages without needing room or user metadata. Embedding would mean fetching that metadata on every message even when you don't need it — and in a high-volume messaging app, that adds up.

The decision is always about access patterns. When in doubt, ask: will I ever need this data without the other?

Indexes are the design

Indexes are usually treated as a performance optimization — something you add later when queries get slow. I think about them differently: the indexes are the schema. If you don't know which fields you'll query and filter on, you don't know your design.

In practice, I define indexes when I write the collection setup, not after. In the db.ts module I use to initialize MongoDB connections, index creation runs on startup. If I realize I'm adding a new index to a collection that's been in production for months, it's a signal that I didn't fully understand the access patterns when I designed the schema — not just a performance fix.

Every query that runs without an index is doing a full collection scan. That's invisible at low volume and catastrophic at scale. Thinking about indexes early is cheap. Retroactively adding them to a production collection with millions of documents, while your application is running, is not.

What IoT taught me

IoT applications deal with sensor readings: temperature, humidity, pressure, timestamped and continuous. This kind of data has structure that's forced by its nature, not by schema choices.

Sensor data is append-only. You never update a past reading — it happened, it's immutable. Queries are almost always range-based: "all readings from this sensor between these two timestamps." There's no meaningful join across entities. The schema almost designs itself once you internalize the access pattern.

Working with time-series data changed how I think about general schema design. Most data has a natural structure if you look at how it actually moves through the system. Appointments are created and then only read. Messages are appended and queried by recency. Invoices are written once and occasionally updated. Understanding the write/read/update ratio for each entity is as important as understanding the relationships between them.

The hidden cost of schemaless

MongoDB is often described as schemaless, which creates the impression that you can figure out structure later. This is a trap.

What "schemaless" actually means is that the database won't enforce structure — your application has to. Early in a project that feels like freedom. A year into production, with documents that have accumulated inconsistencies across versions of your code, it feels like debt. Field names that changed. Optional fields that became required. Nested structures that were refactored but not migrated.

I run migrations on MongoDB. Not with a framework — just explicit scripts that I version and run as part of deployments. They're less formalized than SQL migrations but the discipline is the same: every structural change to the data is deliberate, tracked, and applied in sequence.

Schemaless means the schema lives in your code. That's fine, as long as you treat it seriously.

Schema design is a series of predictions about how your application will grow. You'll get some wrong. The goal isn't to get them all right upfront — it's to make the decisions deliberately enough that when you're wrong, you know why, and you know what to change.

Start with the queries. Design for reads. Know your indexes. The rest follows.