Lessons Learned From Using Azure Versioning and Soft-Delete

Recently we experienced a blobs-deletion incident in Managed Service. More specifically, blob entries associated with active databases were erroneously selected for deletion, assuming they were corresponding to dropped databases.

Lessons Learned From Using Azure Versioning and Soft-Delete

During the recovery process of the deleted blob entries we encountered an unexpected behavior for Azure containers not being able to restore the versioned blob entries that were deleted. The way we configured our Azure containers was by setting the Blob Versioning policy for seven days, meaning Azure containers were keeping the older versions of the blob entries for as long as seven days.

We were expecting this behavior would allow us to recover the accidentally deleted blob entries — but this wasn’t the case. In this document, we tested and verified all the options around Soft-Delete and Versioning provided by Azure ensuring we have reached the correct configuration of our bottomless and backup buckets — without risking any data loss when experiencing incidents that cause accidental blob deletions.

versioningVersioning

This is what a standard configuration looks like for the storage accounts we own and manage as SingleStore:

We chose to use Versioning and are setting a lifecycle management rule with the name versionrule with the following properties to cleanup old versioned blob entities:

The next step is the setting up the container under the storage account using the following configuration:

After creating the container, we’ll upload two small text files to simulate the following scenarios:

  • ‘hello.txt’: File that is not going to be overwritten and will be deleted
  • ‘Helloversioned.txt’ : File that is going to be overwritten by another file with the same name and then will get deleted

We’ll once again upload the ‘helloversioned.txt’ file, overwriting the existing one and modifying its contents to become larger:

Waiting for one day to pass, we see the original version of the ‘helloversioned.txt’ has been cleaned up by Azure:

Now, we can go ahead and delete both ‘hello.txt’ and ‘helloversioned.txt’ files in attempt to try to recover them:

In this scenario, we notice Azure keeps the versions of the blob entries we deleted, marking them as not being active. This also gives us the ability to restore those previous versions of the blob entries.  So, we’ll now adjust the policy to delete the blob entries after seven days. We triggered the delete on August 16, 2024 at 22:55 Eastern Time.

We are expecting those non-current versioned blob entries to be present to the container for seven days (until August 23, 2024 at 22:55 Eastern Time). From that point onward, they will be at risk of being deleted at any point. Checking on August 22, 2024 at 11:32 Eastern Time we notice the previous versions of the entries have been deleted. This means Azure is tracking the time for deleting previous versions from the time they have been created, which can lead to data-loss incidents (or false sense of having the ability to recover data):

soft-deleteSoft-delete

This is what a configuration looks like for a storage account that only has the Soft-Delete functionality enabled:

We choose to use the Blob soft delete option (setting it to one day for the purposes of this exercise) and do not need to set a lifecycle management rule. The next step is the setting up the container under the storage account using the following configuration (similar to the Versioning section):

After creating the container, we’ll upload one small text file (‘hello.txt’) to simulate the scenario of the file getting deleted and restored. Notice when we activate the option of soft-delete in the storage account, a default container with the name $logs is created that cannot be deleted (its lifecycle is the same as the lifecycle of the storage account it belongs to). First, we overwrite the file with a new one that has more contents to verify  the container is not keeping track of the old versions of the file:

After deleting the ‘helios.txt’ file we see t it still appears in the in the "Show active and deleted blobs" view:

We can see the file still appears in the container even after it has been deleted. We can restore the file by choosing to Undelete it:

We now delete the the blob entry (modifying the soft-delete period to four days):

We triggered the delete on August 18, 2024 at 02:50 Eastern Time. We are expecting those non-current versioned blob entries to be present to the container for three days (until August 21, 2024 at 02:50 Eastern Time) and from that point onward they will be at risk of being deleted at any point.

Something else we noticed is that if we try to upload a new file with the same name (example hello.txt) then Azure always assumes there is one file (no versioning) which means that the file will report immediately being in Active status. If the new file with the same name had different contents, that means that it permanently lost the original file. Last but not least, we have noticed the retention period for each blob entry is estimated only during the time the blob entry has been deleted — and is not affected if the soft-delete retention period of the storage account changes in the future (like the case with the Untitled.txt entry we deleted in the same container under the same storage account after we briefly modified the soft-delete retention period to be seven days, then  resetting it back to four days).

versioning-soft-deleteVersioning + soft-delete

This is what a configuration looks like for a storage account that has Versioning and  Soft-Delete functionality enabled:

We also set a lifecycle management rule with the name versionrule that will cleanup versioned objects one day after their creation:

Here are the contents of the container:

When we delete a blob entry (in our case hello.txt) the blob entry is marked with a Previous version status. Note: Azure hasn’t created a new version of the blob entry, it is the existing version of the blob being hidden from being able to be read or listed as we traverse the contents of the container. The entry appears as follows:

Being at that state, the blob entry (which technically represents one of the Previous version entries for the blob key hello.txt) will eventually get deleted from the lifecycle management rule if one has been defined for the storage account — in our scenario, we expect the hello.txt previous version to be deleted at any point from the one-day day lifecycle management rule we have in place). The unique thing about versioning and soft-delete being activated at the same time for a storage account is  we can now be ahead and delete this non-active versioned instance of our blob entry:

As displayed on the previous screenshot, the hello.txt. file with the “Last Modified” timestamp being 2024-08-18 03:17:32 AM undergoes the process from being active to getting deleted (recognized with the "previous version" status from Azure). This Previous Version entry was explicitly deleted, causing it to have a retention period of three days and not be affected by the lifecycle management rule we have introduced for the mgsoftdeleteversioning storage account. The reason why there is a second version of the same file listed with a  “Last Modified” timestamp being 2024-08-17 11:34:06 AM is because we did this process once more for the hello.txt blob entry before capturing the screenshots.

Then,we had to explicitly undelete this version (second entry in second screenshot) and undelete once more from the "active and deleted blobs" view of the container (first screenshot) which caused Azure to create a brand new version of the restored file with the “Last Modified” timestamp being 2024-08-18 03:17:32 AM. At 2024-08-18 15:44 PM we notice there are none previously associated with the helios.txtblob key, and there is only the delete marker associated with the version we deleted explicitly:

Apart from tracking the deletion process for the hello.txt file, we will also upload a new file Untitled.txt and we will only delete it so as to verify if the lifecycle management rule deletes the versioned file promoting it for soft-deletion:

On timestamp 2024-08-20 12:50 PM we notice that the Untitled.txt versioned file has been deleted due to the versionrule lifecycle management deleting versioned entries after one day.

We also see the Untitled.txt file has now been deleted, meaning it has a delete marker assigned to it. The Untitled.txt file will get deleted completely after three days (which you can see in the preceding screenshot).

conclusion-and-proposed-changesConclusion and proposed changes

Based on the scenarios previously explored, we can see Azure Versioning does not behave as expected, since it ends up deleting old versioned entries of a blob key based on their creation date and not their last modification date. This can result in situations where blob entries that were created and active at one point before a long amount of time (e.g. a year) can be picked up and cleaned up after one day, where they become ‘previous versions’ of the blob key. This means we are left with the following two options:

  1. Soft-delete only. When the bucket is defined with the soft-delete policy only, it is guaranteed that the deleted blob entry will be permanently erased after the period of time associated with this policy. However, if we are to use a soft-delete-only configuration, we lose track of previous versions associated with a blob key in cases where overwrites of files with the same blob key occur.
  2. Soft-delete and versioning. This is the complete configuration that makes an Azure bucket behave in the same fashion as an S3 bucket. With this configuration, Azure keeps track of all the versions associated with a blob key — meaning that with overwrites we don’t lose track of previous files. Last but not least, after the previous versions associated with a blob key get deleted, they are marked for soft-deletion, which means they will be permanently erased after the period of time associated with the soft-delete policy.

Our recommendation is to modify Azure shared and managed storage accounts to enable versioning, setting up a lifecycle management rule with the name versionrule deleting previous versions associated with a blob key after one day, and setting the soft-delete policy with an expiration period of eight days. That means previous versions associated with the blob key that were deleted are permanently erased after seven days.


Share