๐๏ธ Database
Some general concepts and ideas about databases and how to choose the right one.
๐ณ Decision Treeโ
Categories by Workloadsโ
- OLAP: Write Heavy
- OLTP: Read Havy
- HTAP: A mix of both the above
Storage Modelsโ
- NSM: Store all attributes of a tuple in the same page
- DSM: Store each attribute of a tuple in different page
- PAX: A mix of both the above
ACIDโ
- Atomicity: All or nothing. If a transaction fails, the database is rolled back to its previous state.
- Consistency: If the transaction starts in a consistent state, it ends in a consistent state.
- Isolation: Transactions are isolated from each other. Changes made by other transactions are not visible to the current transaction.
- Durability: Once a transaction is committed, it is permanent. Permanent means its effects are there even if the system crashes.
CAPโ
Consistent, Available, and Partition Tolerant
Serializabilityโ

Isolation Levelsโ
Different isolation levels may suffer from different types of anomalies:
- Dirty Read: A transaction may reads uncommitted changes from another transaction.
- Unrepeatable Read: A transaction reads the same row twice but gets different values because another transaction modified and committed it in between.
- Lost Updates: Two transactions read the same data and update it, but one overwrites the otherโs changes, losing data.
- Phantom Reads: A transaction reads a set of rows, but when it reads again, new rows appear because another transaction inserted or deleted data.
Level | Dirty Read | Unrepeatable Read | Lost Updates | Phantom Reads | Usage |
---|---|---|---|---|---|
Serializable | No | No | No | No | SQLite's Default1 |
Repeatable Read | No | No | No | Maybe | MySQL's Default |
Read Committed | No | Maybe | Maybe | Maybe | PostgreSQL & Oracle's Default |
Read Uncommitted | Maybe | Maybe | Maybe | Maybe | Almost Never, but MySQL supports it |
Indexesโ
- Clustered index: Data is being stored in the same order as the index, and can be used to retrieve data directly
- Non-clustered index: Data is stored separately from the index, and the index contains pointers to the data
Miscellaneousโ
- Choosing page size:
- Many large reads: large page size
- Many small writes: small page size
Aggregationโ
- Sorting: Sort the data and then aggregate
- Hashing: Hash the data and then group by hash
Hashing is always more efficient unless the data is already sorted beforehand (e.g. with ORDER BY
).
Referencesโ
Footnotesโ
-
In practice, it behaves like Read Committed. โฉ