Avoiding Calamitous Data Migrations

Avoiding Calamitous Data Migrations

In the world of Continuous Delivery, data migrations are a looming challenge. A data migration is simply changing the structure of your live data, and the code to suit. The problem is a bit of a chicken-and-egg scenario. Do you push out the new code first, or do you change the data first? Classically, changes of this manner are made in a maintenance window, but with continuous delivery, you don't want to wait for a window, or you simply don’t have maintenance windows at all.

This is the root of data migration challenges, and most people just give up trying to automate it, leaving things to ad-hoc changes in production (yikes!). The following suggestions are guidelines to use for managing data in a continuous delivery world.

1. Data Changes are Code

Just like your code follows a delivery pipeline that is testable, automated and hands free, so should your data changes.

2. Ad-Hoc is Dunce

When you just need to fix a value, it is easy to run an ad-hoc query and make the correction, but avoid this at all costs. Not only are you circumventing process, what you are doing is not auditable, and is highly prone to error. Whenever possible, use your code’s interfaces, API’s, and abstractions. This has the advantage of keeping your business logic in one place, and is simply the most stable low-risk way of making data changes.

3. Multi-Version Concurrency

Your code and databases should be written to support two versions of the code running in parallel. This requires more work, but also makes it more stable and supports migrating data granularly.

4. Multi-Step change

To alleviate risk (but not necessarily help with backwards compatibility), use a multi-step process to making change. First, submit a migration that adds new data. Then submit a migration that transforms data, followed by a migration that cleans up the old data. Somewhere in this process you can also push new code to support both versions of the data.

5. Small, frequent changes

As with any continuous change, avoid making large sets of commits. Where possible reduce to single-statements instead of complex ones. Adjust one table at a time, if possible.

6. Understand the Risk

Data migrations and schema changes should be considered and categorized as high-risk and low-risk. Typically, continual data changes work best on low-risk changes, where high-risk changes may still need to be done differently. A simple distinction:

  • High-risk changes are those which have a potential for data loss / corruption if done wrong, and/or which might fail because of data content differences (such as foreign key changes). These changes should be carefully considered, reviewed and planned, and may not fit well in a continual process.

  • Low-risk changes can handle multi-step changes and other features described herein, allowing them to move through the standard pipeline to production with continuous delivery.

7. Understand the Scope (Atomic / Compatible)

Backwards compatible are those changes which are coded to support the prior version and the current version concurrently. Atomic are those changes which must be made with the prior version completely offline, the change made, and then the new version is brought online. These can only be done by hand, during Standard maintenance windows.