Advanced: Storage and Indexing

Storage engines answer a simple question with many tradeoffs: how do we write data durably and read it back efficiently?

Logs

An append-only log is one of the simplest durable structures. Writes append new records rather than updating data in place. This makes writes sequential and efficient, and it creates a history that can be replayed.

The challenge is lookup. If records are only appended, the system needs indexes and compaction to avoid scanning everything.

Hash Indexes

A hash index maps a key to the location of its latest value in a file. It is simple and fast for point lookups, but it does not support range queries well and requires index state to fit in memory or be carefully paged.

SSTables and LSM Trees

Sorted String Tables store immutable sorted segments. New writes go to an in-memory structure, then flush to disk as sorted files. Background compaction merges files and discards overwritten values.

This is the basis of LSM-tree storage engines. They are usually strong for high write throughput and sequential disk access, but reads may check multiple structures unless helped by Bloom filters and compaction.

B-Trees

B-trees update fixed-size pages in place and keep data sorted by key. They are the traditional default for many relational databases.

They are strong for reads, range queries, and predictable indexing behavior. Their write path is more random than log-structured designs, but mature implementations use write-ahead logs and careful page management to provide durability.

OLTP vs OLAP

Online transaction processing systems serve user-facing reads and writes with low latency. Online analytical processing systems scan large datasets for reporting, aggregation, and exploration.

Column-oriented storage is often better for analytics because queries commonly read a few columns across many rows. Row-oriented storage is often better for transactional workloads that read or update complete records.

Practical Takeaway

Storage engines are shaped by workload:

high write throughput often favors log-structured designs
point lookups need efficient key indexes
range scans need sorted structures
analytics often favor columnar layout and compression
durability usually depends on logs, fsync behavior, snapshots, and recovery protocols