Blog Details

  • Home
  • Migration from Rigid Tables to Flexible and Dynamic BSON Documents

This Article discusses the Influent Topics involved in Converting Relational Tables to MongoDB BSON Documents

Dynamic BSON Documents(Document Model)

Much of the data we use today has complex structures that can be modeled and represented more efficiently using JSON (JavaScript Object Notation) documents, rather than tables.

MongoDB stores JSON documents in a binary representation called BSON (Binary JSON). BSON encoding extends the popular JSON representation to include additional data types such as int, long and floating point.

With sub-documents and arrays, JSON documents also align with the structure of objects at the application level. This makes it easy for developers to map the data used in the application to its associated document in the database.

By contrast, trying to map the object representation of the data to the tabular representation of an RDBMS slows down development. Adding Object Relational Mappers (ORMs) can create additional complexity by reducing the flexibility to evolve schemas and to optimize queries to meet new application requirements.

The project team should start the schema design process by considering the application’s requirements. It should model the data in a way that takes advantage of the document model’s flexibility. In schema migrations, it may be easy to mirror the relational database’s flat schema to the document model. However, this approach negates the advantages enabled by the document model’s rich, embedded data structures. For example, data that belongs to a parent-child relationship in two RDBMS tables would commonly be collapsed (embedded) into a single document in MongoDB.

In Diagram Below, the RDBMS uses the “Pers_ID” field to JOIN the “Person” table with the “Car” table to enable the application to report each car’s owner. Using the document model, embedded sub-documents and arrays effectively pre-JOIN data by combining related fields in a single data structure. Rows and columns that were traditionally normalized and distributed across separate tables can now be stored together in a single document, eliminating the need to JOIN separate tables when the application has to retrieve complete records.


Modeling the same data in MongoDB enables us to create a schema in which we embed an array of sub-documents for each car directly within the Person document.

{ first_name: "Paul", surname: "Miller", city: "London", location: [45.123,47.232], cars: [ { model: "Bentley", year: 1973, value: 100000, ....}, { model: "Rolls Royce", year: 1965, value: 330000, ....}, ] }

In this simple example, the relational model consists of only two tables. (In reality most applications will need tens, hundreds or even thousands of tables.) This approach does not reflect the way architects think about data, nor the way in which developers write applications. The document model enables data to be represented in a much more natural and intuitive way.

To further illustrate the differences between the relational and document models, consider the example of a blogging platform in Diagram Below. In this example, the application relies on the RDBMS to join five separate tables in order to build the blog entry. With MongoDB, all of the blog data is contained within a single document, linked with a single reference to a user document that contains both blog and comment authors.


Other Advantages of the Document Model

In addition to making it more natural to represent data at the database level, the document model also provides performance and scalability advantages:

  • The complete document can be accessed with a single call to the database, rather than having to JOIN multiple tables to respond to a query. The MongoDB document is physically stored as a single object, requiring only a single read from memory or disk. On the other hand, 4RDBMS JOINs require multiple reads from multiple physical locations.
  • As documents are self-contained, distributing the database across multiple nodes (a process called sharding) becomes simpler and makes it possible to achieve massive horizontal scalability on commodity hardware. The DBA no longer needs to worry about the performance penalty of executing cross-node JOINs (should they even be possible in the existing RDBMS) to collect data from different tables.

Joining Collections for Data Analytics

Typically it is most advantageous to take a denormalized data modeling approach for operational databases – the efficiency of reading or writing an entire record in a single operation outweighing any modest increase in storage requirements. However, there are examples where normalizing data can be beneficial, especially when data from multiple sources needs to be blended for analysis – MongoDB 3.2 adds that capability with the $lookup stage in the MongoDB Aggregation Framework. The Aggregation Framework is a pipeline for data aggregation modeled on the concept of data processing pipelines. Documents enter a multi-stage pipeline that transforms the documents into aggregated results. The pipeline consists of stages; each stage transforms the documents as they pass through. While not offering as rich a set of join operations as some RDBMSs, $lookup provides a left outer equi-join which provides convenience for a selection of analytics use cases. A left outer equi-join, matches and embeds documents from the “right” collection in documents from the “left” collection.

As an example if the left collection contains order documents from a shopping cart application then the $lookup operator can match the product_id references from those documents to embed the matching product details from the products collection.