- Cardinality describes the number of possible values a column may have.
- Dimensionality describes the number of columns a record may have.
Relational databases strictly enforce a schema and relationships between entities.
- SQL Server
ACID describes the functionality a good relational database should provide in transactional interaction:
- Atomic means that transactions containing a failed operation will be rolled back, allowing for "all or nothing" operations.
- Consistent: consistency with the schema is maintained at each transaction commit, and the database only moves from valid state to valid state when applying transactions. At no point is the database in an invalid/corrupt state.
- Isolation allows concurrent execution of multiple separate transactions. Internally the database should queue these so that readers see a serially applied set of changes.
- Durable: durability is the promise that once the data has been stored in the database it will remain so. Changes should be committed to persistent storage.
- Basically Available guarantees availability.
- Soft State allows the state of the dataset to change, even without input (allowing eviction).
- Eventual Consistency commits to keeping the data consistent after a period of time with no writes.
Database systems typically target one of the following use cases:
- OLTP systems are optimised for transactional workloads, where data is frequently read and written.
- OLAP systems are optimised for reporting and data science workloads, where data is read more frequently than it's written, and is typically written in large batches.
Database indexes are data structures which improve the speed of data access operations on database tables, at the cost of maintaining an additional copies of selected columns (keys) against direct links to the complete data row.
Indexing architectures fit into two categories:
- Clustered indexes alter the data block to dictate the sequence of the results, so it's possible to have only one clustered index.
- Non-clustered indexes provide logical ordering over data in an arbitrary order. A separate index tree contains the index keys in sorted order, with values specifying pointers to the record.
Replication makes copies of the database for high availability (through failover), or for providing read-only copies of the database for querying.
Synchronous vs asynchronous
Sharding introduces additional complexity to the application to deal with performance limitations of the underlying storage. The storage is partitioned into a number of smaller shards, and records are routed to storage based on some form of index or hash function. A common hash function for this use case is Ketama.