Amazon Redshift

Redshift is a data warehouse designed to hold data used for analysis. It uses a PostgreSQL-like engine, offering native redundancy. Scaling up or out is possible with user intervention

Concepts

  • Clusters represent individual data warehouses.
  • Databases house the data.
  • Users are granted access to databases. By default only an initial administrative user is created.
  • Schemas provide namespaces within a database, and can in turn contain tables, views and functions.
  • Data shares allow sharing live data across Redshift clusters, AWS accounts, and AWS regions (preview).

Spectrum

Spectrum allows running queries directly against external data (from files in S3 buckets). Queries are executed in separate Redshift servers independent of the cluster, allowing compute-intensive operations to be offloaded to Spectrum (to potentially thousands of nodes) to save on cluster resources.

To use Spectrum, an external schema must be created to house the external tables. External table schema can be sourced either from Athena or the Glue data catalog.

Spectrum databases may be viewed in Athena.

Federated queries

Federated queries allow querying live data from RDS and Aurora without copying it.

Limits

  • 8PB of storage

Backlinks