MongoDB Schema Migrations

MongoDB collections do not have schema definitions like tables in the SQL world. Still, when structure of application objects is changed, the persisted data must be migrated accordingly. Incremental migration is one possible way to deal with changes of implicit schema where migration code is integrated into the application. Every document will be upgraded next time it is loaded and saved later with the new schema. This way the application can deal with all schema versions and there is no downtimes caused by database upgrades.

We will have a look on how to implement incremental migration in MongoDB with C#.

Out of the box

There is a documented way on how to roll incremental upgrades with the official C# driver, simple and straightforward. One should implement the ISupportInitialize interface on data objects and include the ExtraElements property. It covers the case when some document properties were renamed. When new properties are added or deleted from documents, then no migration needed – thanks to the schemaless design of MongoDB. It still does not help when the type of the element was changed.

Extra Machinery

Intricate migration use cases could demand some extra machinery. Growing codebase, frequent deployments and permanent changes of data structures are examples where investing some more time into migration infrastructure would pay off. Two most important changes here could be isolation of migration code from domain objects and coupling of migrations on a particular application version.

It could be done with a couple of extensions for the C# driver:

[Migration(typeof (MigrationTo_1_0))]
private class Customer
{
    public ObjectId Id { get; set; }
    public string FirstName { get; set; }
    public string LastName { get; set; }
}

private class MigrationTo_1_0 : IMigration<Customer>
{
    public Version To
    {
        get { return new Version(1, 0); }
    }

    public void Upgrade(Customer obj, 
                        IDictionary<string, object> extraElements)
    {
        var fullName = (string) extraElements["Name"];

        obj.LastName = fullName.Split().Last();
        obj.FirstName = fullName
                .Substring(0, fullName.Length - obj.LastName.Length)
                .Trim();
        extraElements.Remove("Name");
    }
}

In this example the Customer does not need to implement the ISupportInitialize interface and does not have the ExtraElements property. MigrationTo_1_0 splits the Name into FirstName and LastName introduced in the v1.0 of the app. We could also attach MigrationTo_2_0 to Customer if there are going to be new relevant changes in v2.0. Since in incremental migration process we cannot always guarantee that the whole database was migrated to v1.0, both MigrationTo_1_0 and MigrationTo_2_0 must be retained. We keep migrations for all schema versions structured and delete old ones as soon as we know that all documents in the database are upgraded.

The migration class implements this interface:

public interface IMigration<T>
{
    Version To { get; }
    void Upgrade(T obj, IDictionary extraElements);
}

The To property and the Upgrade method should be enough to couple it onto the application version where the schema change took place. (I wonder if there is a use case for additional From and Downgrade members… Mobile apps?)

To activate the migration logic the MigrationSerializationProvider must be registered:


BsonSerializer.RegisterSerializationProvider(
    new MigrationSerializationProvider());

Now, every class containing MigrationAttribute annotation will serialize an additional 64-bit element with schema version inside. During deserialization all documents that have version less than the current assembly version will be upgraded.

Good news is that the overhead for using the MigrationSerializerProvider is neglecting small if no migration is applied.

Bad news is to make this work some changes are needed on the C# driver.

Custom migration capabilities is just a tiny addition to the BsonClassMapSerializer. The BsonClassMapSerializer is currently not extendable, so I added a couple of extension points to make my code work. On the other hand it would be overkill to implement a completely new serializer that makes to 90% same job as the existing one.

Further Notes

Having documents with several different schema versions in the database is not a big problem. The same migration code could be integrated into a small nightly job upgrading the whole database. There is still no downtime and afterwards upgrades for the older versions could be safely removed.

Complete source code for the article is available online.

Related articles:


6 Comments on “MongoDB Schema Migrations”

  1. noam says:

    Hi,
    I have a quick question.
    What would be the solution if for example we have already 4 migration changes and we are now in version 4 but one object is still in version 1. How would the new framework you propose will handle the incremental changes to go from version 1->2->3->4.

    Thanks,
    Noam

    • darkiri says:

      Hi Noam,

      every type that has an MigrationAttribute applied will store an additional _v element with the object version. So we will know, what updates should be applied next time object is deserialized. For the object in version 1 it will be all migrations with 1 < migration.To <= 4.

      If you have _already_ objects of different versions in the database, then you need some other special version-dependent element or some any other information that could help to identify object version.
      Hope, it helps.

      Regards,
      Kirill

      • noam says:

        Hi Kirill,
        Thanks for your quick reply.
        Have you thought about conducting the version checks at the DAO level? This way we don’t need to make / request and changes from the mongoDB driver.

        Noam

      • darkiri says:

        Hi Noam,
        the changes on the driver make three things possible: serialize and deserialize version field, handle the case where object has no ExtraElements defined and apply migrations. This could also be done in data objects using driver-provided ISupportInitialize interface, just the ExtraElements property must be always present. I could imagine something like MigratableObject base class that would implement the ISupportInitialize and has code similar to my BsonMigrationSerializer. If you are ok that data objects must inherit a base class like MigratableObject and must have ExtraElements, I could help with code snippets for it.
        Regards,
        Kirill

      • noam says:

        Hi Kirill,
        That’s great, it will be a cleaner solution for me until you are able to convince the driver team to add the changes you proposing.

        I will update my data objects accordingly.

        Thanks,
        Noam

      • darkiri says:

        Hi Noam,
        Here is a code snippet on how you could use migrations framework with ISupportInitialize: https://gist.github.com/darkiri/5486994

        The idea is to inherit all db objects from the MigratableObject. I have just typed it into gist and haven’t tried, so be prepared to adjust it accordingly if something does not work.

        Two notes on the code:
        – extraction of applied migrations out of attributes is costly. That’s why it may not be done in objects, it should be done once per application start somewhere in the infrastructure.
        – Version field will be serialized as a string (BsonMigratableSerializer stores it as a long field)

        Feel free to ask if you have any questions.
        Regards,
        Kirill


Leave a comment