MongoDB C# Driver Review – Query Serialization

We know already how the BSON Serialization works. In an earlier article I paid more attention to the serialization of custom classes which is an important aspect for populating the database with data. Still, not only the data must be serialized. It should be also possible to formulate a search query, map-reduce or an aggregate command. There must also be a way to serialize this information to a BSON document and hand it over to a MongoDB server.

Let us take the search query as an example. User facing API is self-describing:

var query = Query.And(Query.GTE("x", 3), Query.LTE("x" , 10));
// { "x" : { "$gte" : 3, "$lte" : 10 } }
var cursor = collection.Find(query);

Using the Query class is not the only available option. Another one is the LINQ provider.

The And, GTE and LTE methods used in the code snipped are defined this way:

public static class Query
{
    public static IMongoQuery And(params IMongoQuery[] queries) {...}
    public static IMongoQuery GTE(string name, BsonValue value) {...}
    public static IMongoQuery LTE(string name, BsonValue value) {...}
    //...
}

The IMongoQuery is a marker interface. Since all builder-methods in the Query return it and the methods that build compound queries also accept it as a parameter, we can easily build query trees by combining several calls.
Implementation of the IMongoQuery is just a BsonDocument. It means that passing GTE and LTE over to the And query builds an BsonDocument consisting out of other documents. Binary and JSON serialization for BsonDocuments is provided out of the box (see BsonDocumentSerializer).

The BsonDocument is the DOM representing … ehm … BSON documents. BsonDocument consists of BsonElements which is a name value pair. The name is an arbitrary string and the value is a BsonValue which could be something serializable (or another BsonDocument).

When the query is ready, it goes into the MongoCollection and somewhere it must be serialized and sent over the pipe. But the MongoCollection accepts just a IMongoQuery and knows little about serialization.

The first option here is to use the BsonDocumentWrapper:

// somewhere in the MongoCollection
public virtual long Count(IMongoQuery query)
{
    var command = new CommandDocument
    {
        { "count", _name },
        { "query", BsonDocumentWrapper.Create(query), query != null }
    };
    var result = RunCommand(command);
    return result.Response["n"].ToInt64();
}

Here the wrapper will be serialized, not direct the query. The wrapper is a IBsonSerializable and actually just serializes the underlying query using normal BsonSerializer.Serialize

// this class is a wrapper for an object that we intend to serialize as a BsonValue
// it is a subclass of BsonValue so that it may be used where a BsonValue is expected
// this class is mostly used by MongoCollection and MongoCursor when supporting generic query objects
public class BsonDocumentWrapper : BsonValue, IBsonSerializable
{
        [Obsolete("Serialize was intended to be private and will become private in a future release.")]
        public void Serialize(BsonWriter bsonWriter, Type nominalType, IBsonSerializationOptions options)
        {
            BsonDocumentWrapperSerializer.Instance.Serialize(bsonWriter, nominalType, this, options);
        }
    // ....
}

// somewhere in BsonDocumentWrapperSerializer.Serialize
BsonSerializer.Serialize(bsonWriter, wrapper.WrappedNominalType, wrapper.WrappedObject, null);

But not just this. The wrapper is also an BsonValue what means that it can be added as a node to a BsonDocument. Now we can use the query as a sub-element or sub-query. It is especially useful by running commands that consist of several queries. FindAndModify is a good example:

    var command = new CommandDocument
    {
        { "findAndModify", _name },
        { "query", BsonDocumentWrapper.Create(query), query != null },
        { "sort", BsonDocumentWrapper.Create(sortBy), sortBy != null },
        { "update", BsonDocumentWrapper.Create(update, true) },
        { "fields", BsonDocumentWrapper.Create(fields), fields != null },
        { "new", true, returnNew },
        { "upsert", true, upsert}
    };

By the way, the code where a CommandDocument is built looks a bit like a JSON, what is kind of nice. This syntactic sugar is possible since the CommandDocument – same as any BsonDocument – is an IEnumerable and has specialized Add methods.

So, the BsonDocumentWrapper enables a query to be added to a BsonDocument as a BsonValue and used mostly to run commands.

The second trick to serialize a query is this extension method:

public static BsonDocument ToBsonDocument(
            this object obj,
            Type nominalType,
            IBsonSerializationOptions options)
{
    if (obj == null)
    {
        return null;
    }

    var bsonDocument = obj as BsonDocument;
    if (bsonDocument != null)
    {
        return bsonDocument; // it's already a BsonDocument
    }

    var convertibleToBsonDocument = obj as IConvertibleToBsonDocument;
    if (convertibleToBsonDocument != null)
    {
        return convertibleToBsonDocument.ToBsonDocument(); // use the provided ToBsonDocument method
    }

    // otherwise serialize into a new BsonDocument
    var document = new BsonDocument();
    using ( var writer = BsonWriter.Create(document))
    {
        BsonSerializer.Serialize(writer, nominalType, obj, options);
    }
    return document;
}

This is a small wrapper similar to ToJson and ToBson that with some casting produces an BsonDocument from an arbitrary object. The difference to the BsonDocumentWrapper is that the wrapper is a BsonValue, not a document, while query.ToBsonDocument() produces a standalone BsonDocument and can be also used e.g. to merge several documents together:

// MongoCollection.MapReduce
var command = new CommandDocument
{
    { "mapreduce", _name },
    { "map", map },
    { "reduce", reduce }
};
command.AddRange(options.ToBsonDocument());

And finally there is a third option to serialize the query. The query could be just handed over to Update, Delete or MongoQueryMessage and serialized there using BsonSerializer.Serialize in an existing BsonBuffer.

Conclusion

For query comprehension there is a Query class intended to be used as a public API backed up by a QueryBuilder. Everything produced by a Query is a IMongoQuery – marker interface with two implementations – QueryDocument, which is a BsonDocument and can be used overall where a BsonDocument can be used, and a QueryWrapper, which is just a IBsonSerializable. Additionally queries are supported by the BsonDocumentWrapper that can serve as a BsonValue.

For different operations there are several types of marker interfaces implementing same pattern. Just to name a few:

  • IMongoUpdate (with Update/UpdateBuilder, UpdateDocument and UpdateWrapper)
  • IMongoGroupBy (with GroupBy/GroupByBuilder, GroupByDocument and GroupByWrapper)
  • IMongoFields (with Fields/FieldsBuilder, FieldsDocument and FieldsWrapper)

I do not really understand why there is a need in the *wrapper objects if there is always a *document, since the wrappers are just IBsonSerializable, same as the *documents. Just the BsonDocumentWrapper is notable for being a BsonValue.

Related articles:


BSON Serialization with MongoDB C# Driver

MongoDB C# Driver consists of two parts:

  • BSON Serialization support
  • The Driver itself

In this post we will have a look at the most important components of BSON Serialization and how it works under the cover. So let’s pull the Git repository and drill into the code.

High-Level API: ToJson and ToBson

Going top-down: the high-level serialization API are two handy extension methods: ToJson and ToBson. They can be used on an arbitrary object and hide complexity of underlying machinery:

There is an extensive set of unit tests for the C# Driver. Most of my code snippets are based on that tests.

[Test]
public void TestToJson()
{
    var c = new C { N = 1, Id = ObjectId.Empty };
    var json = c.ToJson();
    Assert.That(json, Is.EqualTo(
        "{ \"N\" : 1, \"_id\" : ObjectId(\"000000000000000000000000\") }"));
}
[Test]
public void TestToBson()
{
    var c = new C { N = 1, Id = ObjectId.Empty };
    var bson = c.ToBson();
    var expected = new byte[] { 29, 0, 0, 0, 16, 78, 0, 1, 0, 0, 0, 7, 95, 105, 100, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
    Assert.IsTrue(expected.SequenceEqual(bson));
}

Looking inside implementation we’ll immediately see, that the these are just thin wrappers for couple of other classes and call to BsonSerializer.Serialize as the central point:

// ToBson
using (var buffer = new BsonBuffer())
{
    using (var bsonWriter = BsonWriter.Create(buffer, settings))
    {
        BsonSerializer.Serialize(bsonWriter, nominalType, obj, options);
    }
    return buffer.ToByteArray();
}
// ToJson
using (var stringWriter = new StringWriter())
{
    using (var bsonWriter = BsonWriter.Create(stringWriter, settings))
    {
        BsonSerializer.Serialize(bsonWriter, nominalType, obj, options);
    }
    return stringWriter.ToString();
}

We’ll see later what is the purpose of BsonWriter.
For deserialization there are no extension methods, so one need directly grab the BsonSerializer:

// BSON
var c = BsonSerializer.Deserialize(bsonBytes);
Assert.AreEqual(1, c.N);
// Json
var c = BsonSerializer.Deserialize(jsonString);
Assert.AreEqual(1, c.N);

This is pretty much it – couple of straightforward functions that could cover 80% of use cases. For other 20% we should understand how it works underneath.

There is also an API for (de)serialization in form of DOM aka BsonDocument. Although BsonDocument is something completely different comparing to raw BSON byte stream, serialization is implemented using same design concepts – dependency injection in action.

Middle-Level: BsonReader, BsonWriter and BsonSerializer

Stepping one level down, we are getting to BsonReader and BsonWriter. These are actually class families with specific implementation for three related formats: BSON, Json and BsonDocument. Surfing through the code, it is not difficult to identify their responsibility: (de)serialize particular elements seeking over incoming/outgoing buffer – much like System.IO.BinaryReader or System.Xml.XmlReader. It means that for example BsonBinaryReader/Writer pair implements the BSON specification for particular elements and JsonReader/Writer do same for Json, including Mongo-related extensions like ObjectIds and $-notation for field names.

[Test]
public void TestRegularExpressionStrict()
{
    var json = "{ \"$regex\" : \"pattern\", \"$options\" : \"imxs\" }";
    using (var bsonReader = BsonReader.Create(json))
    {
        Assert.AreEqual(BsonType.RegularExpression, bsonReader.ReadBsonType());
        var regex = bsonReader.ReadRegularExpression();
        Assert.AreEqual("pattern", regex.Pattern);
        Assert.AreEqual("imxs", regex.Options);
        Assert.AreEqual(BsonReaderState.Done, _bsonReader.State);
    }
    var settings = new JsonWriterSettings { OutputMode = JsonOutputMode.Strict };
    Assert.AreEqual(json, BsonSerializer.Deserialize(new StringReader(json)).ToJson(settings));
 }

Responsibility of BsonSerializer in this context is to orchestrate individual calls to the readers and writers during serialization and compose result.
All in all the whole high-level process could be drawn this way:

BSON Serialization overview

Low-Level: Serializers and Conventions

Stepping down once again to see individual components under BsonSerializer:

BSON Serialization classes

BsonSerializer contains collection of serialization providers that can be used to look up particular serializer. An Serializer is something like this:

IBsonSerializer

It is hardly to overlook the ambiguity between BsonSerializer and IBsonSerializer. Nevertheless the classes serve very different purposes. The first one is the static class and the central point for the whole serialization logic, while the second one contains numerous particular implementation for the whole bunch of types and normally should not be used directly.

From this definition of IBsonSerializer we can identify its purpose – to create an object of specified type using particular BsonReader and vice versa. So the control flow is as follows:

  • BsonSerializer is called to (de)serialize specific type using particular reader or writer
  • It asks then an serialization provider (registry of particular serializers) if there is a serializer registered for the requested type
  • If there is one, the serializer triggers the actual low-level process, orchestrate calls to readers and writers

There are two predefined serialization providers – BsonDefaultSerializationProvider and BsonClassMapSerializationProvider. The Default provider is always used as the first one and delivers serializers for most of .NET native types and specialized BSON types (like ObjectId or JavaScript). If there is no predefined serializer for the requested type, then the ClassMap provider is used to engage the BsonClassMapSerializer. This one is a very powerful facility to handle user-defined types. The most important aspect here is the configuration of object-to-BSON mappings.

The mapping is handled by the BsonClassMap that contains all metadata for the requested type like serializable member names and their order, id field and id generation strategy, discriminator fields for polymorphic types and lots more. It works out of the box with reasonable behavior, but is also highly customizable:

BsonClassMap.RegisterClassMap(cm =>
{
    cm.MapIdProperty(e => e.EmployeeId);
    cm.MapProperty(e => e.FirstName).SetElementName("fn");
    cm.MapProperty(e => e.LastName).SetElementName("ln");
    cm.MapProperty(e => e.DateOfBirth).SetElementName("dob").SetSerializer(new DateOfBirthSerializer());
    cm.MapProperty(e => e.Age).SetElementName("age");
});

Nice to see that implementation of all customization concepts is not gathered in single place, but distributed over particular components, aka conventions. Every convention is responsible for some mapping aspect and could be applied to the BsonClassMap to update current configuration. (For example the MemberNameElementNameConvention could be applied to a MemberMap of BsonClassMap to set the corresponding BSON element name which surely should be same as the class member name, if not overridden by AttributeConvention using BsonElementAttribute.)

class TestClass
{
      [BsonElement("fn")]
      public string FirstName;
}
[Test]
public void TestOptsInMembers()
{
    var convention = AttributeConventionPack.Instance;
    var classMap = new BsonClassMap();
    new ConventionRunner(convention).Apply(classMap);

    Assert.AreEqual(1, classMap.DeclaredMemberMaps.Count());
    Assert.AreEqual("fn", classMap.GetMemberMap("FirstName").ElementName);
}

Conclusion

The whole class structure is very powerful. You can do pretty everything you want by plugging into it at an appropriate place:

  • Manipulate serialization process using IBsonSerializationOptions
  • Fine tune BSON structure though manual configuration of BsonClassMap
  • Implement and register own conventions for user defined types
  • Implement a serializer for your very special types and register it with own serialization provider

All in all it is very nice to see good separation of concerns in action – with few exceptions.

Further Reading: