🗃️ Database
Some general concepts and ideas about databases and how to choose the right one.
🌳 Decision Tree
Categories by Workloads
- OLAP: Write Heavy
- OLTP: Read Havy
- HTAP: A mix of both the above
Storage Models
- NSM: Store all attributes of a tuple in the same page
- DSM: Store each attribute of a tuple in different page
- PAX: A mix of both the above
ACID
- Atomicity: All or nothing. If a transaction fails, the database is rolled back to its previous state.
- Consistency: If the transaction starts in a consistent state, it ends in a consistent state.
- Isolation: Transactions are isolated from each other. Changes made by other transactions are not visible to the current transaction.
- Durability: Once a transaction is committed, it is permanent. Permanent means its effects are there even if the system crashes.
CAP
Consistent, Available, and Partition Tolerant
Serializability
Isolation Levels
Level | Dirty Read | Unrepeatable Read | Lost Updates | Phantom |
---|---|---|---|---|
Serializable | No | No | No | No |
Repeatable Read | No | No | No | Maybe |
Read Committed | No | Maybe | Maybe | Maybe |
Read Uncommitted | Maybe | Maybe | Maybe | Maybe |
Indexes
- Clustered index: Data is being stored in the same order as the index, and can be used to retrieve data directly
- Non-clustered index: Data is stored separately from the index, and the index contains pointers to the data
Miscellaneous
- Choosing page size:
- Many large reads: large page size
- Many small writes: small page size
Aggregation
- Sorting: Sort the data and then aggregate
- Hashing: Hash the data and then group by hash
Hashing is always more efficient unless the data is already sorted beforehand (e.g. with ORDER BY
).