Batch
AWS Batch is a batch computing platform built on ECS (though it can also schedule jobs on EKS). It's designed for both time- and cost-optimised operation, and supports spot instances for cheaper operation.
Use cases
Batch is well-suited to:
- Machine learning.
- Post-trade analytics.
- High-performance computing.
- Rendering, transcoding, or other long-running file format conversion.
Concepts
- The Scheduler examines submitted jobs and assigned compute environments. Scheduling can either be one container per instance or bin-packing multiple containers.
- Jobs are containerised workloads, run approximately in submission order. They can have dependencies on one another, and can target successful completion of specific elements of array jobs.
- Job Queues contain jobs until they're scheduled to a compute resource and for 24 hours afterwards.
- Compute Environments come in two forms:
- Managed are defined by business requirements such as budget or filesystem, and are launched and scaled by the Batch platform. These can be allocated based on a spot instance bid.
- Unmanaged allow the customer to launch and manage their own resources, which must run the ECS agent.
- Job Definitions allow templating jobs to reduce duplication of job properties. Their properties can be overridden at job submission time.
Array jobs
Array jobs allow scheduling multiple jobs from a single job specification including an array of input. Up to 10,000 jobs may be submitted in this means.
Dependencies
- Straight
dependsOn
withjobId
. - 1:1.
- Sequential.
- End-to-end.
Job states
SUBMITTED
means accepted into the queue, but not yet evaluated for execution.PENDING
indicates the job is waiting for dependencies to complete.RUNNABLE
means the job has been evaluated by the scheduler and is ready to run.STARTING
means the job is currently being assigned to a compute environment.RUNNING
indicates that the job's execution is in progress.SUCCEEDED
indicates that it's completed successfully.FAILED
indicates the job experienced a problem at run-time.