AWS Glue provides serverless data integration spanning schema definition and ETL.

Data Catalog

Glue Data Catalog is based on Apache Hive.

  • Databases define logical groups of table definitions.
  • Tables define metadata about tables, including schema.
  • Crawlers attempt classifiers to determine the schema for the source data and create the table metadata. Crawlers can work with data partitioned across multiple files, given a directory with a trailing /.