Linux

Encrypting the OS disk

During encryption Azure requires the VM to have 2GB of free memory plus enough space to accommodate the amount of utilised disk space on the OS disk. This is because Azure will copy all of the data from the OS disk into a RAM disk and switch to this volume, allowing it to write to the original OS disk without requiring the platform to provision an additional disk image. Note that because of this design choice a VM undergoing disk encryption will almost certainly lose data if interrupted before completion; please heed Microsoft's warnings to limit access to systems prior to commencing encryption.

The OS disk is encrypted using plain (not LUKS) mode. In order to obtain the key to decrypt the root filesystem from the initramfs, Azure ships /usr/sbin/azure_crypt_key.sh (/lib/cryptsetup/scripts/azure_crypt_key.sh on the initramfs) for finding and mounting the /mnt/azure_bek_disk volume and printing the key. This is configured as the keyscript for the osencrypt volume entry in /etc/crypttab:

osencrypt /dev/sda1 luks,discard,header=/boot/luks/osluksheader,keyscript=/usr/sbin/azure_crypt_key.sh

Also of note is that the LUKS header for the OS disk is kept in /boot/luks/osluksheader rather than being stored at the beginning of the partition. The reason for this is unclear, but in practice just means cryptsetup commands must include the --header option:

$ sudo cryptsetup luksDump --header /boot/luks/osluksheader osencrypt
LUKS header information for /boot/luks/osluksheader

Version:        1
Cipher name:    aes
Cipher mode:    xts-plain64
Hash spec:      sha256
Payload offset: 0
MK bits:        256
MK digest:      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
MK salt:        00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
                00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
MK iterations:  68624
UUID:           00000000-0000-0000-0000-000000000000

Key Slot 0: ENABLED
        Iterations:             1097984
        Salt:                   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
                                00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
        Key material offset:    8
        AF stripes:             4000
Key Slot 1: DISABLED
Key Slot 2: DISABLED
Key Slot 3: DISABLED
Key Slot 4: DISABLED
Key Slot 5: DISABLED
Key Slot 6: DISABLED
Key Slot 7: DISABLED

Encrypting data disks

Data disks receive randomly generated UUIDs as device mapper names.

Status messages

The extension provides two values in its detailed status:

  • "os" contains the status of the OS disk.
  • "data" contains the status all of the attached data disks.

Each key will contain one of the following values:

  • "EncryptionInProgress" indicates that the extension is currently performing the initial encryption.
  • "Encrypted" indicates that the extension has performed the initial encryption for of the appropriate volumes.
  • "NotMounted" indicates that data disk encryption was enabled but no mountpoints for the disks could be found. This should be considered a failure.

Recovering encrypted disks from the initramfs shell

If a machine fails to boot because it can't find its root filesystem it'll drop you into a Busybox shell. This shell is quite feature-limited and is intended to provide you with just enough tooling to help you nurse the machine back to health. If this happens, you'll see:

Begin: Running /scripts/init-premount ... done.
Begin: Mounting root file system ... Begin: Running /scripts/local-top ... Trying to get the key from disks ...
> Trying device: sda ...
Success loading keyfile!
[   14.175943] NET: Registered protocol family 38
cryptsetup (osencrypt): unknown fstype, bad password or options?
Trying to get the key from disks ...
> Trying device: sda ...
Success loading keyfile!
cryptsetup (osencrypt): unknown fstype, bad password or options?
Trying to get the key from disks ...
> Trying device: sda ...
Success loading keyfile!
cryptsetup (osencrypt): unknown fstype, bad password or options?
cryptsetup (osencrypt): maximum number of tries exceeded
cryptsetup: going to sleep for 60 seconds...
done.
Begin: Running /scripts/local-premount ... [   82.809403] Btrfs loaded, crc32c=crc32c-intel
Scanning for Btrfs filesystems
[   82.891681] blk_update_request: I/O error, dev fd0, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[   82.902586] floppy: error 10 while reading block 0
done.
Begin: Waiting for root file system ... Begin: Running /scripts/local-block ... Trying to get the key from disks ...
> Trying device: sda ...
Success loading keyfile!
cryptsetup (osencrypt): unknown fstype, bad password or options?
Trying to get the key from disks ...
> Trying device: sda ...
Success loading keyfile!
cryptsetup (osencrypt): unknown fstype, bad password or options?
Trying to get the key from disks ...
> Trying device: sda ...
Success loading keyfile!
cryptsetup (osencrypt): unknown fstype, bad password or options?
cryptsetup (osencrypt): maximum number of tries exceeded
cryptsetup: going to sleep for 60 seconds...
[  456.427728] blk_update_request: I/O error, dev fd0, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[  456.442101] floppy: error 10 while reading block 0
mdadm: No arrays found in config file or automatically
done.
[  456.503735] blk_update_request: I/O error, dev fd0, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[  456.519904] floppy: error 10 while reading block 0
mdadm: No arrays found in config file or automatically
[snipped lots of repetition]
Gave up waiting for root file system device.  Common problems:
 - Boot args (cat /proc/cmdline)
   - Check rootdelay= (did the system wait long enough?)
 - Missing modules (cat /proc/modules; ls /dev)
ALERT!  /dev/mapper/osencrypt does not exist.  Dropping to a shell!


BusyBox v1.27.2 (Ubuntu 1:1.27.2-2ubuntu3.2) built-in shell (ash)
Enter 'help' for a list of built-in commands.

(initramfs)

At this point, we need to make the root device (/dev/mapper/osencrypt) available, then exit the shell to enable the system to continue booting.

We first need to obtain the key:

$ /lib/cryptsetup/scripts/azure_crypt_key.sh
Trying to get the key from disks ...
> Trying device: sda ...
[snipped key]Success loading keyfile!

To remove the debugging information printed to stderr, we can redirect the output to the null device:

$ /lib/cryptsetup/scripts/azure_crypt_key.sh 2>/dev/null
[snipped key]

The one-liner to attempt to recover is as follows:

/lib/cryptsetup/scripts/azure_crypt_key.sh | cryptsetup luksOpen --key-file - --header /boot/luks/osluksheader /dev/disk/azure/root-part1 osencrypt && exit

If this happens on every boot it's likely that the agent has incorrectly assumed that /dev/sda1 is the root device; apply the `crypttab` workaround.

Enabling from the Portal

From an Azure VM, select the Disks blade and click on the Encryption menu item to see options for configuring the disk encryption extensions in the UI. You can then query the settings it generates, e.g. for use in Terraform states, using either the `az` CLI:

$ az vm extension show --resource-group my-rg --vm-name my-vm --name AzureDiskEncryptionForLinux
{
  "autoUpgradeMinorVersion": false,
  "forceUpdateTag": null,
  "id": "/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/my-rg/providers/Microsoft.Compute/virtualMachines/my-vm/extensions/AzureDiskEncryptionForLinux",
  "instanceView": null,
  "location": "ukwest",
  "name": "AzureDiskEncryptionForLinux",
  "protectedSettings": null,
  "provisioningState": "Succeeded",
  "publisher": "Microsoft.Azure.Security",
  "resourceGroup": "my-rg",
  "settings": {
    "EncryptionOperation": "EnableEncryption",
    "KekVaultResourceId": "/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/my-rg/providers/Microsoft.KeyVault/vaults/vault",
    "KeyEncryptionAlgorithm": "RSA-OAEP",
    "KeyEncryptionKeyURL": "https://my-vault.vault.azure.net/keys/my-kek/00000000000000000000000000000000",
    "KeyVaultResourceId": "/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/my-rg/providers/Microsoft.KeyVault/vaults/vault",
    "KeyVaultURL": "https://my-vault.vault.azure.net/",
    "VolumeType": "All"
  },
  "tags": {},
  "type": "Microsoft.Compute/virtualMachines/extensions",
  "typeHandlerVersion": "1.1",
  "virtualMachineExtensionType": "AzureDiskEncryptionForLinux"
}

Or the `Az` PowerShell module:

> Get-AzVMExtension -ResourceGroupName my-rg -VMName simple -Name AzureDiskEncryptionForLinux

ResourceGroupName       : my-rg
VMName                  : simple
Name                    : AzureDiskEncryptionForLinux
Location                : ukwest
Etag                    : {}
Publisher               : Microsoft.Azure.Security
ExtensionType           : AzureDiskEncryptionForLinux
TypeHandlerVersion      : 1.1
Id                      : /subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/my-rg/providers/Microsoft.Compute/virtualMachines/simple/extensions/AzureDiskEncryptionForLinux
PublicSettings          : {
                            "EncryptionOperation": "EnableEncryption",
                            "KekVaultResourceId": "/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/my-rg/providers/Microsoft.KeyVault/vaults/my-vault",
                            "KeyEncryptionAlgorithm": "RSA-OAEP",
                            "KeyEncryptionKeyURL": "https://my-vault.vault.azure.net/keys/my-kek/00000000000000000000000000000000",
                            "KeyVaultResourceId": "/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/my-rg/providers/Microsoft.KeyVault/vaults/my-vault",
                            "KeyVaultURL": "https://my-vault.vault.azure.net/",
                            "VolumeType": "All"
                          }
ProtectedSettings       :
ProvisioningState       : Succeeded
Statuses                :
SubStatuses             :
AutoUpgradeMinorVersion : False
ForceUpdateTag          :

Troubleshooting installation

Attempt to unmount /oldroot failed

If you see errors like the following:

  • Attempt to unmount /oldroot failed with error
  • Command umount /oldroot failed with return code 32
  • umount: /oldroot: target is busy.

It likely means there are filesystems mounted below /oldroot that the encryption agent can't unmount. See below for sample output:

[AzureDiskEncryption] 6197: [Info] Attempt #6 to unmount /oldroot failed with error: Command umount /oldroot failed with return code 32
stdout:
stderr:
umount: /oldroot: target is busy.
, stack trace: Traceback (most recent call last):
  File "/var/lib/waagent/Microsoft.Azure.Security.AzureDiskEncryptionForLinux-1.1.0.46/main/oscrypto/ubuntu_1604/Ubuntu1604EncryptionStateMachine.py", line 170, in start_encryption
    self.retry_unmount_oldroot()
  File "/var/lib/waagent/Microsoft.Azure.Security.AzureDiskEncryptionForLinux-1.1.0.46/transitions/transitions/core.py", line 222, in trigger
    return self.machine.process(f)
  File "/var/lib/waagent/Microsoft.Azure.Security.AzureDiskEncryptionForLinux-1.1.0.46/transitions/transitions/core.py", line 526, in process
    return trigger()
  File "/var/lib/waagent/Microsoft.Azure.Security.AzureDiskEncryptionForLinux-1.1.0.46/transitions/transitions/core.py", line 247, in _trigger
    if t.execute(event):
  File "/var/lib/waagent/Microsoft.Azure.Security.AzureDiskEncryptionForLinux-1.1.0.46/transitions/transitions/core.py", line 145, in execute
    machine.callback(func, event_data)
  File "/var/lib/waagent/Microsoft.Azure.Security.AzureDiskEncryptionForLinux-1.1.0.46/transitions/transitions/core.py", line 518, in callback
    func(*event_data.args, **event_data.kwargs)
  File "/var/lib/waagent/Microsoft.Azure.Security.AzureDiskEncryptionForLinux-1.1.0.46/main/oscrypto/ubuntu_1604/Ubuntu1604EncryptionStateMachine.py", line 114, in on_enter_state
    super(Ubuntu1604EncryptionStateMachine, self).on_enter_state()
  File "/var/lib/waagent/Microsoft.Azure.Security.AzureDiskEncryptionForLinux-1.1.0.46/main/oscrypto/OSEncryptionStateMachine.py", line 65, in on_enter_state
    self.state_objs[self.state].enter()
  File "/var/lib/waagent/Microsoft.Azure.Security.AzureDiskEncryptionForLinux-1.1.0.46/main/oscrypto/ubuntu_1604/encryptstates/UnmountOldrootState.py", line 130, in enter
    self.command_executor.Execute('umount /oldroot', True)
  File "/var/lib/waagent/Microsoft.Azure.Security.AzureDiskEncryptionForLinux-1.1.0.46/main/CommandExecutor.py", line 84, in Execute
    raise Exception(msg)
Exception: Command umount /oldroot failed with return code 32
stdout:
stderr:
umount: /oldroot: target is busy.

To troubleshoot we're going to need a shell session with the machine, and we don't want the extension to be trying to modify the machine. Uninstall the VM extension and restart the VM, then SSH in.

First, check the list of mounted filesystems. A minimal set should look like the following:

$ mount
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
udev on /dev type devtmpfs (rw,nosuid,relatime,size=32898384k,nr_inodes=8224596,mode=755)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,noexec,relatime,size=6584172k,mode=755)
/dev/sdb1 on / type ext4 (rw,relatime,discard)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime)
sunrpc on /run/rpc_pipefs type rpc_pipefs (rw,relatime)
none on /tmp/tmproot type tmpfs (rw,relatime)
tmpfs on /run/user/1007 type tmpfs (rw,nosuid,nodev,relatime,size=6584172k,mode=700,uid=1007,gid=1007)

If you see paths like /snap/<name>/<number> you'll need to completely remove these in order to proceed. First, list the installed snaps:

sudo snap list

You'll be able to remove each of these except core, which we'll handle in a moment:

sudo snap remove <name>

Finally, uninstall Snap itself:

sudo apt purge snapd

No module named parted

It seems as though if APT is busy at the time the disk encryption process starts the agent can fail to install its dependencies. You'll see crash just before it exits:

[AzureDiskEncryption] 3240: [Error] Failed to enable the extension with error: No module named parted, stack trace: Traceback (most recent call last):
  File "/var/lib/waagent/Microsoft.Azure.Security.AzureDiskEncryptionForLinux-1.1.0.46/main/handle.py", line 2072, in daemon
    daemon_encrypt()
  File "/var/lib/waagent/Microsoft.Azure.Security.AzureDiskEncryptionForLinux-1.1.0.46/main/handle.py", line 1803, in daemon_encrypt
    from oscrypto.ubuntu_1604 import Ubuntu1604EncryptionStateMachine
  File "/var/lib/waagent/Microsoft.Azure.Security.AzureDiskEncryptionForLinux-1.1.0.46/main/oscrypto/ubuntu_1604/__init__.py", line 16, in <module>
    from Ubuntu1604EncryptionStateMachine import *
  File "/var/lib/waagent/Microsoft.Azure.Security.AzureDiskEncryptionForLinux-1.1.0.46/main/oscrypto/ubuntu_1604/Ubuntu1604EncryptionStateMachine.py", line 35, in <module>
    from encryptstates import *
  File "/var/lib/waagent/Microsoft.Azure.Security.AzureDiskEncryptionForLinux-1.1.0.46/main/oscrypto/ubuntu_1604/encryptstates/__init__.py", line 33, in <module>
    from SplitRootPartitionState import *
  File "/var/lib/waagent/Microsoft.Azure.Security.AzureDiskEncryptionForLinux-1.1.0.46/main/oscrypto/ubuntu_1604/encryptstates/SplitRootPartitionState.py", line 26, in <module>
    import parted
ImportError: No module named parted

Toward the beginning of the log, you'll see that the dependency installation failed:

[AzureDiskEncryption] 8448: [Info] Installing pre-requisites
[AzureDiskEncryption] 8448: [Info] Executing: apt-get update
[AzureDiskEncryption] 8448: [Info] Executing: apt-get install -y at cryptsetup-bin lsscsi python-parted python-six procps psmisc
[AzureDiskEncryption] 8448: [Info] Command apt-get install -y at cryptsetup-bin lsscsi python-parted python-six procps psmisc failed with return code 100
stdout:
Reading package lists...
Building dependency tree...
Reading state information...
at is already the newest version (3.1.20-3.1ubuntu2).
lsscsi is already the newest version (0.28-0.1).
python-six is already the newest version (1.11.0-2).
python-parted is already the newest version (3.11.1-1ubuntu2).
cryptsetup-bin is already the newest version (2:2.0.2-1ubuntu1.1).
procps is already the newest version (2:3.3.12-3ubuntu1.2).
psmisc is already the newest version (23.1-1ubuntu0.1).
The following packages were automatically installed and are no longer required:
  linux-azure-cloud-tools-5.0.0-1020 linux-azure-cloud-tools-5.0.0-1022
  linux-azure-cloud-tools-5.0.0-1023 linux-azure-cloud-tools-5.0.0-1025
  linux-azure-headers-5.0.0-1020 linux-azure-headers-5.0.0-1022
  linux-azure-headers-5.0.0-1023 linux-azure-headers-5.0.0-1025
  linux-azure-tools-5.0.0-1020 linux-azure-tools-5.0.0-1022
  linux-azure-tools-5.0.0-1023 linux-azure-tools-5.0.0-1025
Use 'apt autoremove' to remove them.

stderr:
E: Archives directory /var/cache/apt/archives/partial is missing. - Acquire (2: No such file or directory)

We'll help it along by manually installing the dependencies:

sudo apt-get install -y at cryptsetup-bin lsscsi python-parted python-six procps psmisc

Children
  1. No space left on device
  2. Root file system not on sda1