Engine version upgrades

Minor version upgrades can be performed automatically by the platform in the next maintenance window. Major upgrades, which may require migration of data, require manual intervention and are queued for the next maintenance window unless applied immediately.

Preparation

Define some shell variables we'll use in a moment:

region=us-east-1
instance_class=db.t2.micro
engine=postgresql
engine_version_current=9.6.1
engine_version_desired=10.4

Check that it's possible to upgrade from the current version to the desired version:

aws rds describe-db-engine-versions \
    --region $region \
    --engine $engine \
    --engine-version $engine_version_current | \
  jq -e "any(.DBEngineVersions[] | select(.Engine == '$engine').ValidUpgradeTarget[].EngineVersion; . == '$engine_version_desired'"

And that the instance class meets the minimum requirements for the desired version:

aws rds describe-orderable-db-instance-options \
    --region $region \
    --db-instance-class $instance_class \
    --engine $engine \
    --engine-version $engine_version_desired | \
  jq -e 'any(.OrderableDBInstanceOptions)'

Testing connectivity

When timing the upgrade to understand impact...

PostgreSQL:

while true; do
  date
  PGPASSWORD=master-password pg_isready -h endpoint -U master-username
done

Approaches

In-place

In-place upgrades require:

  1. Creating a new parameter group against the target engine version. Customised parameters can be identified and copied over using the parameter comparison in the Console, or by diffing DescribeDBParameters output.
  2. Snapshotting the instance, which may be skipped if you've configured the backup retention period correctly as AWS will take an automatic snapshot before performing the upgrade. Belt and braces, and all that. Be sure to wait for its completion.
  3. Kicking off the upgrade with ModifyDBInstance.
  4. If keeping resource names the same is desired (e.g. for use with Terraform), recreating the parameter group currently using the previous engine family against the new engine family, ,modifying the instance to use it, rebooting the instance to complete applying parameters, and then deleting the intermediate parameter group.

Instance migration

Such migrations typically require preventing applications from writing to the database to data loss during cutover to the new instance, and such this approach will increase the length of the maintenance window but simplify rollback, if it's necessary.

  1. If necessary, stop the application or prevent its connections.
  2. Snapshot the database instance, and await completion.
  3. Restore a new database instance from the snapshot and await completion.
  4. Upgrade the newly restored database instance and await completion.
  5. Rename the current instance to free up the name, rename the new instance to the original instance's name; or change connection strings.
  6. Test!
  7. If necessary, restart the application or allow connections.

Logical replication

Logical replication allows replicating transactions applied to the "old" engine version as the queries executed against the primary are also executed on secondaries. This creates an increased risk of drift due to data type differences, sequence differences, schema changes, etc., but greatly reduces the size of the required maintenance window to just the cutover period.

You also need to ensure that a replication identity is available for all schema items to be replicated. This will usually be the primary key, but CDC may be used in lieu of a PK (ALTER TABLE x REPLICATION IDENTITY FULL).

Broadly, the steps are:

  1. Extract the schema from the current instance.
  2. Apply the schema to the new instance.
  3. Manually copy over schema from existing DB to new DB.
  4. Create a replication slot/publisher on the existing node.
  5. Create a replication subscription on the new node.
  6. Allow the replication to catch up (keep an eye on replication counts; it'll take a while).
  7. During the maintenance window, stop the application or prevent its connections if necessary.
  8. Drop the subscription on the new node.
  9. Update any connection strings to point to the new instance.
  10. If necessary, restart the application or allow connections.
  11. Decommission the old instance.

References