Skip to main content

🗃️ Database

Some general concepts and ideas about databases and how to choose the right one.

🌳 Decision Tree

Categories by Workloads

  • OLAP: Write Heavy
  • OLTP: Read Havy
  • HTAP: A mix of both the above

Storage Models

  • NSM: Store all attributes of a tuple in the same page
  • DSM: Store each attribute of a tuple in different page
  • PAX: A mix of both the above

ACID

  • Atomicity: All or nothing. If a transaction fails, the database is rolled back to its previous state.
  • Consistency: If the transaction starts in a consistent state, it ends in a consistent state.
  • Isolation: Transactions are isolated from each other. Changes made by other transactions are not visible to the current transaction.
  • Durability: Once a transaction is committed, it is permanent. Permanent means its effects are there even if the system crashes.

CAP

Consistent, Available, and Partition Tolerant

Serializability

A diagram titled 'Universe of Schedules,' displaying nested relationships among different types of database schedules. The outermost box is labeled 'All Schedules,' containing subsets: 'View Serializable,' 'Conflict Serializable,' and 'Serial.' 'No Cascading Aborts' and 'Strong Strict 2PL' overlap some of these categories. The hierarchy highlights how certain scheduling properties are stricter than others.

Isolation Levels

LevelDirty ReadUnrepeatable ReadLost UpdatesPhantom
SerializableNoNoNoNo
Repeatable ReadNoNoNoMaybe
Read CommittedNoMaybeMaybeMaybe
Read UncommittedMaybeMaybeMaybeMaybe

Indexes

  • Clustered index: Data is being stored in the same order as the index, and can be used to retrieve data directly
  • Non-clustered index: Data is stored separately from the index, and the index contains pointers to the data

Miscellaneous

  • Choosing page size:
    • Many large reads: large page size
    • Many small writes: small page size

Aggregation

  • Sorting: Sort the data and then aggregate
  • Hashing: Hash the data and then group by hash

Hashing is always more efficient unless the data is already sorted beforehand (e.g. with ORDER BY).

References