Macie

Amazon Macie uses pattern matching and machine learning to discover, classify and protect data in S3 buckets without human intervention. Macie supports AWS Organisations, allowing central management of security policy and automatic enrolment of newly added accounts in the Macie service.

Concepts

  • Jobs can be either scheduled or one-time, and their configuration contains a set of S3 buckets to cover and a sampling frequency. Once created jobs become immutable.
  • Custom data identifiers allow creating patterns (using regular expressions) to identify sensitive data specific to an organisation.
  • Data identifiers define patterns Macie should attempt to identify. The platform maintains many required for compliance with common regulatory frameworks and customers are able to define up to 100 of their own, 30 of which can be added to a given job at no additional charge.

Job configuration

The scope of the job can be limited to objects matching a specific set of criteria including tags, modification time, file extension, and size.

What's evaluated during discovery

Macie considers both configuration and usage patterns in its risk evaluations:

  • The S3 bucket's policies:
    • Whether public access is enabled.
    • Whether stored blobs are encrypted.
    • What user sharing is configured, outside of the account or owning organisation; including via replication.
  • Deep inspection of blob contents, for supported types.

Alerting

Findings can be browsed in the Macie interface in the console where filtering is possible by both finding type and high level categories over bucket, type, and job. Findings are additionally reported via Security Hub and CloudWatch.

Automatic response can be taken using a compute service (e.g. Lambda) and the Macie API.

A superset of the data can also be recorded to an S3 bucket.

Macie Classic

Macie Classic was delivered using an external console, and is no longer available to new users.

The service had a number of limitations:

  • It processed just the first 20MB of each blob.
  • There was no support for custom data identifiers, limiting users to just the identifiers available as part of the service.
  • CloudFormation couldn't be used to manage deployments.
  • Deployments were limited to just two regions as the service wasn't deployed elsewhere.

Anomaly detection via CloudTrail events has been moved to Guard Duty and the data required to operate the new service is fetched entirely from S3, removing the dependency on CloudTrail.


Backlinks