Skip to main content

๐Ÿ—ƒ๏ธ Database

Some general concepts and ideas about databases and how to choose the right one.

๐ŸŒณ Decision Treeโ€‹

Categories by Workloadsโ€‹

  • OLAP: Write Heavy
  • OLTP: Read Havy
  • HTAP: A mix of both the above

Storage Modelsโ€‹

  • NSM: Store all attributes of a tuple in the same page
  • DSM: Store each attribute of a tuple in different page
  • PAX: A mix of both the above

ACIDโ€‹

  • Atomicity: All or nothing. If a transaction fails, the database is rolled back to its previous state.
  • Consistency: If the transaction starts in a consistent state, it ends in a consistent state.
  • Isolation: Transactions are isolated from each other. Changes made by other transactions are not visible to the current transaction.
  • Durability: Once a transaction is committed, it is permanent. Permanent means its effects are there even if the system crashes.

CAPโ€‹

Consistent, Available, and Partition Tolerant

Serializabilityโ€‹

A diagram titled 'Universe of Schedules,' displaying nested relationships among different types of database schedules. The outermost box is labeled 'All Schedules,' containing subsets: 'View Serializable,' 'Conflict Serializable,' and 'Serial.' 'No Cascading Aborts' and 'Strong Strict 2PL' overlap some of these categories. The hierarchy highlights how certain scheduling properties are stricter than others.

Isolation Levelsโ€‹

Different isolation levels may suffer from different types of anomalies:

  • Dirty Read: A transaction may reads uncommitted changes from another transaction.
  • Unrepeatable Read: A transaction reads the same row twice but gets different values because another transaction modified and committed it in between.
  • Lost Updates: Two transactions read the same data and update it, but one overwrites the otherโ€™s changes, losing data.
  • Phantom Reads: A transaction reads a set of rows, but when it reads again, new rows appear because another transaction inserted or deleted data.
LevelDirty ReadUnrepeatable ReadLost UpdatesPhantom ReadsUsage
SerializableNoNoNoNoSQLite's Default1
Repeatable ReadNoNoNoMaybeMySQL's Default
Read CommittedNoMaybeMaybeMaybePostgreSQL & Oracle's Default
Read UncommittedMaybeMaybeMaybeMaybeAlmost Never, but MySQL supports it

Indexesโ€‹

  • Clustered index: Data is being stored in the same order as the index, and can be used to retrieve data directly
  • Non-clustered index: Data is stored separately from the index, and the index contains pointers to the data

Miscellaneousโ€‹

  • Choosing page size:
    • Many large reads: large page size
    • Many small writes: small page size

Aggregationโ€‹

  • Sorting: Sort the data and then aggregate
  • Hashing: Hash the data and then group by hash

Hashing is always more efficient unless the data is already sorted beforehand (e.g. with ORDER BY).

Referencesโ€‹

Footnotesโ€‹

  1. In practice, it behaves like Read Committed. โ†ฉ