Link Search Menu Expand Document

Databases

Relational databases

Relational databases strictly enforce a schema and relationships between entities.

ACID

ACID describes the functionality a good relational database should provide in transactional interaction:

  • Atomic means that transactions containing a failed operation will be rolled back, allowing for "all or nothing" operations.
  • Consistent: consistency with the schema is maintained at each transaction commit, and the database only moves from valid state to valid state when applying transactions. At no point is the database in an invalid/corrupt state.
  • Isolation allows concurrent execution of multiple separate transactions. Internally the database should queue these so that readers see a serially applied set of changes.
  • Durable: durability is the promise that once the data has been stored in the database it will remain so. Changes should be committed to persistent storage.

Non-relational databases

BASE

  • Basically Available guarantees availability.
  • Soft State allows the state of the dataset to change, even without input (allowing eviction).
  • Eventual Consistency commits to keeping the data consistent after a period of time with no writes.

Key-value stores

Principles

Indexing

Database indexes are data structures which improve the speed of data access operations on database tables, at the cost of maintaining an additional copies of selected columns (keys) against direct links to the complete data row.

Indexing architectures fit into two categories:

  • Clustered indexes alter the data block to dictate the sequence of the results, so it's possible to have only one clustered index.
  • Non-clustered indexes provide logical ordering over data in an arbitrary order. A separate index tree contains the index keys in sorted order, with values specifying pointers to the record.

Replication

Replication makes copies of the database for high availability (through failover), or for providing read-only copies of the database for querying.

Synchronous vs asynchronous

Sharding

Sharding introduces additional complexity to the application to deal with performance limitations of the underlying storage. The storage is partitioned into a number of smaller shards, and records are routed to storage based on some form of index or hash function. A common hash function for this use case is Ketama.


Children

  1. Elastic
  2. etcd