Bulk Operations vs Individual Operations in Amazon DocumentDB

When working with Amazon DocumentDB, optimizing performance is key to ensuring that your application runs efficiently at scale. One of the most effective ways to achieve better performance is by using bulk operations, such as bulkWrite(), instead of performing individual operations like insertOne() or updateOne() one by one.

In this article, we’ll compare individual operations with bulk operations in Amazon DocumentDB and explain why using bulk operations can drastically improve performance.

What is Amazon DocumentDB?

Amazon DocumentDB is a fully managed document database service that is compatible with MongoDB. This means you can use MongoDB’s APIs, including its drivers and commands, to interact with DocumentDB, making it easy to migrate from or integrate with existing MongoDB applications.

In this post, we’ll focus on one key MongoDB command: bulkWrite(), which allows you to perform multiple operations in a single request, reducing network latency and improving throughput.

Individual Operations: Inefficient for Multiple Documents

Let’s start by discussing how to perform operations individually. Suppose we need to insert multiple documents into a collection. If you insert each document one by one, each operation will require a separate network round-trip to the server.

Here’s how you would do that with individual operations using the insertOne() method:

Code Example: Individual Insert Operations

const { MongoClient } = require('mongodb');

async function insertDocumentsIndividually() {
  const client = new MongoClient('your-docdb-connection-string');
  try {
    await client.connect();
    const collection = client.db('testdb').collection('testcoll');

    const docs = [
      { _id: 1, name: 'Alice' },
      { _id: 2, name: 'Bob' },
      { _id: 3, name: 'Carol' }
    ];

    for (const doc of docs) {
      await collection.insertOne(doc); // One network call per document
      console.log(`Inserted ${doc.name}`);
    }
  } finally {
    await client.close();
  }
}

insertDocumentsIndividually();

Explanation:

The code above performs one insertOne() operation per document.
Each insertOne() makes a separate network round-trip to the Amazon DocumentDB server.
For three documents, this means three network calls to the database.

While this approach works, it is inefficient, especially when you need to insert many documents. Each individual request involves overhead due to the network round-trip, which can add up quickly and hurt performance, especially with larger datasets.

Bulk Operations: Efficient, Reduced Latency

To improve performance, you can use the bulkWrite() method. This method allows you to group multiple operations (like inserts, updates, and deletes) into a single request. By reducing the number of network round-trips, you can significantly increase throughput and reduce latency.

Here’s how to perform the same insertions using bulkWrite():

Code Example: Bulk Insert Operations

const { MongoClient } = require('mongodb');

async function bulkInsertDocuments() {
  const client = new MongoClient('your-docdb-connection-string');
  try {
    await client.connect();
    const collection = client.db('testdb').collection('testcoll');

    const operations = [
      { insertOne: { document: { _id: 1, name: 'Alice' } } },
      { insertOne: { document: { _id: 2, name: 'Bob' } } },
      { insertOne: { document: { _id: 3, name: 'Carol' } } }
    ];

    const result = await collection.bulkWrite(operations);
    console.log('Bulk write result:', result);
  } finally {
    await client.close();
  }
}

bulkInsertDocuments();

Explanation:

With bulkWrite(), we group all the insert operations into a single request.
The result is one network call instead of three, reducing network overhead significantly.
This approach is much more efficient when inserting multiple documents.

Why Bulk Operations Are Faster

Using bulkWrite() reduces the number of round-trips to the server, which is the primary factor contributing to slower performance with individual operations. When you perform a single bulkWrite() request with multiple operations, the database processes all of them in one go, and you only wait for the response once.

This becomes even more apparent when scaling up. If you need to insert hundreds or thousands of documents, performing each insert individually would be very inefficient. Bulk operations allow you to achieve high throughput by minimizing network latency.

Mixed Bulk Operations: Inserts, Updates, and Deletes

One of the strengths of bulkWrite() is that it supports mixed operations. This means you can insert, update, and delete documents in a single request, further optimizing your workflow.

Here’s an example that combines insert, update, and delete operations in one bulkWrite() call:

Code Example: Mixed Bulk Operations

const operations = [
  { insertOne: { document: { _id: 4, name: 'Dave' } } },
  { updateOne: { filter: { _id: 2 }, update: { $set: { name: 'Bobby' } }, upsert: true } },
  { deleteOne: { filter: { _id: 3 } } }
];

await collection.bulkWrite(operations);

Explanation:

The operations array includes three different actions: an insert (insertOne), an update (updateOne), and a delete (deleteOne).
These operations are sent to DocumentDB in a single network request, reducing the overhead of multiple round-trips for each operation type.

This flexibility is particularly useful when performing a mix of different operations on your data and further optimizes both performance and code simplicity.

Conclusion

When working with Amazon DocumentDB, using bulk operations instead of individual operations offers significant performance improvements. The key benefits include:

Reduced network overhead: Bulk operations reduce the number of round-trips between your application and the database.
Improved throughput: By grouping operations, the database can process them more efficiently.
Flexibility: bulkWrite() allows for a combination of different operations (inserts, updates, deletes) in a single request.

For applications dealing with large amounts of data or frequent updates, using bulkWrite() is a must. It helps you save time, improve performance, and ensure your application scales efficiently.

Dmitry Romanoff @dm8ry