MongoDB integration summary

Stitch’s MongoDB integration replicates data using the PyMongo 3.8.0 driver.

MongoDB feature snapshot

A high-level look at Stitch's MongoDB (v2) integration, including release status, useful links, and the features supported in Stitch.

STITCH
Release status

Deprecated on July 27, 2023

Supported by

Stitch

Stitch plan

Standard

Supported versions

2.6 through 5.0

API availability

Not available

Singer GitHub repository

singer-io/tap-mongodb

CONNECTION METHODS
SSH connections

Supported

SSL connections

Supported

REPLICATION SETTINGS
Anchor Scheduling

Supported

Advanced Scheduling

Supported

Table-level reset

Supported

Configurable Replication Methods

Supported

REPLICATION METHODS
Log-based Replication

Supported

Key-based Replication

Supported

Full Table Replication

Supported

DATA SELECTION
Table selection

Supported

Column selection

Supported

View replication

Unsupported

Select all

Supported, with prerequisites

TRANSPARENCY
Extraction Logs

Supported

Loading Reports

Supported

Connecting MongoDB

MongoDB setup requirements

To set up MongoDB in Stitch, you need:

  • Privileges in MongoDB that allow you to create/manage users. This is required to create the Stitch database user.

  • If using Log-based Incremental Replication, the userAdmin or userAdminAnyDatabase role. This is required to configure the database server for OpLog.

  • A MongoDB server that uses Auth mode. Auth mode requires every user who connects to Mongo to have a username and password. These credentials must be validated before the user will be granted access to the database.

  • A MongoDB database using a version between 2.6 and 5.0. While older versions may be connected to Stitch, we may not be able to provide support for issues that arise due to unsupported versions.

    We recommend always keeping your version current as a best-practice. If you encounter connection issues or other unexpected behavior, verify that your MongoDB version is one supported by Stitch.

  • If using SSL, your server must require SSL connections. Note: SSL isn’t required to connect a MongoDB database to Stitch.


Step 1: Configure database connection settings

In this step, you’ll configure the database server to allow traffic from Stitch to access it. There are two ways to connect your database:

  • A direct connection will work if your database is publicly accessible.
  • An SSH tunnel is required if your database isn’t publicly accessible. This method uses a publicly accessible instance, or an SSH server, to act as an intermediary between Stitch and your database. The SSH server will forward traffic from Stitch through an encrypted tunnel to the private database.

Click the option you’re using below and follow the instructions.

For the connection to be successful, you’ll need to configure your firewall to allow access from our IP addresses.

The IP addresses you’ll whitelist depend on the Data pipeline region your account is in.

  1. Sign into your Stitch account, if you haven’t already.
  2. Click User menu (your icon) > Edit User Settings and locate the Data pipeline region section to verify your account’s region.
  3. Locate the list of IP addresses for your region:

  4. Whitelist the appropriate IP addresses.
  1. Follow the steps in the Setting up an SSH Tunnel for a database connection guide to set up an SSH tunnel for MongoDB.
  2. Complete the steps in this guide after the SSH setup is complete.

Step 2: Create a Stitch database user

Step 2.1: Connect to your database

  1. Connect to your MongoDB server.
  2. Navigate to the authentication database. In this example, we’re using admin:

    mongo "mongodb://<username>@<database-host>:<port>/?authSource=admin"
    

    Replace <username>, <database-host>, and <port> with your MongoDB username, database host address, and the port used by the database, respectively.

    Note: If you’re connecting an Atlas-based instance, the authentication database will always be admin.

Step 2.2: Create the Stitch user

Next, you’ll create the Stitch user, set a password, and assign roles. This guide uses the built-in readAnyDatabase role, but you can use or create another role as long as it assigns the same privileges.

Select the version your MongoDB database is using to view the correct command to create the Stitch database user.

Create the user, using the addUser command for MongoDB versions 2.4 through 2.6. Replace <password> with a password:

use admin
db.addUser(
  {
    user: "stitch",
    pwd: "<password>",
    roles: ["readAnyDatabase"]
  }
)

Create the user, using the createUser command for MongoDB versions 3.0 through 3.2. Replace <password> with a password:

use admin
db.createUser(
  {
    user: "stitch",
    pwd: "<password>",
    roles: ["readAnyDatabase"]
  }
)

For versions 3.4 and above, the readAnyDatabase role doesn’t include the local database. Create the user, granting the additional read role on the local database:

use admin
db.createUser(
  {
    user: "stitch",
    pwd: "<password>",
    roles: ["readAnyDatabase", {role: "read", db: "local"} ]
  }
)

See the Privileges list tab for an explanation of why these permissions are required by Stitch.

In the table below are the database user privileges Stitch requires to connect to and replicate data from a MongoDB database.

Privilege name Reason for requirement
readAnyDatabase

Required to read data from databases in the cluster.

read

Note: You only need to explicitly grant this role if you’re using MongoDB version 3.4 or greater.

Required to read from the local database.

Step 3: Configure Log-based Incremental Replication

While Log-based Incremental Replication is the most accurate and efficient method of replication, using this replication method may, at times, require manual intervention or impact the source database’s performance. Refer to the Log-based Incremental Replication documentation for more info.

You can also use one of Stitch’s other Replication Methods, which don’t require any database configuration. Replication Methods can be changed at any time.

Step 3.1: Create a replica set

In this step, you’ll edit the /etc/mongod.conf file to add a replica set. A replica set is a group of mongod processes that maintain the same dataset.

  1. Start the MongoDB instance:

    mongod --port 27017
    
  2. Connect to the Mongo shell as a root user:

    mongo --port 27017 -u <root_username> -p <password> --authenticationDatabase admin
    
  3. Navigate to the /etc/mongod.conf file.

  4. In /etc/mongod.conf, uncomment replication and define the following configuration options. Note: As /etc/mongod.conf is a protected file, you may need to assume sudo to edit it.

    replication:
       replSetName: "rs0"
       oplogSizeMB: <integer>
    
    • replSetName: The name for the replica set. In this example, we used rs0. Use the rs.status() command to return this replica set’s name going forward.
    • oplogSizeMB: The maximum size, in megabytes, for the oplog. If undefined, MongoDB will use the default size - refer to MongoDB’s docs for more info.

      When the oplog reaches this size, MongoDB will automatically remove log entries to maintain the maximum oplog size. If Stitch is unable to replicate all of a table’s log entries before they age out, Stitch will re-replicate the table in full to ensure records aren’t missing. Refer to the Log-based Incremental guide for more info and examples.

      Note: If you’re using an existing replica set and want to change its maximum size, use the replSetResizeOplog command.

  5. Save the changes.

Step 3.2: Initiate the replica set

Next, you’ll restart the instance and initiate the replica set.

  1. Restart mongod with the configuration file:

    sudo mongod --auth --config /etc/mongod.conf
    
  2. Connect to the Mongo shell as a root user, replacing <root_username> and <password> with the root user’s username and password:

    mongo --port 27017 -u <root_username> -p <password> --authenticationDatabase admin
    
  3. Initiate the replica set, replacing <host_address> with the IP address or endpoint used by the mongod instance:

    rs.initiate({_id: "rs0", members: [{_id: 0, host: "<host_address>:27017"}]})
    

    If successful, you’ll receive a response similar to the following:

    { "ok" : 1 }
    

Step 3.3: Verify OpLog setup and access

Lastly, you’ll verify that the Stitch user can read from the OpLog.

  1. Disconnect from the Mongo shell.

  2. Reconnect as the Stitch database user you created in Step 2. Replace <stitch_username> and <password> with the Stitch user’s username and password, respectively:

    mongo --port 27017 -u <stitch_username> -p <password> --authenticationDatabase admin
    
  3. Switch to the local database:

    use local
    
  4. View oplog rows:

    db.oplog.rs.find()
    

    If successful, records from the oplog similar to the following will be returned:

    { "ts" : Timestamp(1524038245, 63), "t" : NumberLong(1), "h" : NumberLong("-596019791399272412"), "v" : 2, "op" : "i", "ns" : "stitchTest.customers", "ui"
    : UUID("0e623d9c-722c-41d5-a5e6-83947cc2466e"), "wall" : ISODate("2018-04-18T07:57:25.065Z"), "o" : { "_id" : 100, "name" : "Finn" } }
    

Step 4: Connect Stitch

In this step, you’ll complete the setup by entering the database’s connection details and defining replication settings in Stitch.

Step 4.1: Define the database connection details

  1. If you aren’t signed into your Stitch account, sign in now.
  2. On the Stitch Dashboard page, click the Add Integration button.

  3. Locate and click the MongoDB icon.
  4. Fill in the fields as follows:

    • Integration Name: Enter a name for the integration. This is the name that will display on the Stitch Dashboard for the integration; it’ll also be used to create the schema in your destination.

      For example, the name “Stitch MongoDB” would create a schema called stitch_mongodb in the destination. Note: The schema name cannot be changed after the integration is saved.

    • Host (Endpoint): Enter the host address (endpoint) used by the MongoDB instance. For example: This could be a network address such as 192.68.0.1, or a server endpoint like dbname.hosting-provider.com.

    • Port: Enter the port used by the MongoDB instance. The default is 27017.

    • Username: Enter the Stitch MongoDB database user’s username.

    • Password: Enter the password for the Stitch MongoDB database user.

    • **: Enter the name of the MongoDB database where the Stitch user is to be authenticated. Stitch will ‘find’ all the databases you gave the Stitch user access to - this is needed only to complete the connection.

      Note: If you’re connecting an Atlas-based MongoDB instance, this must be the admin database. See the Create a Mongo database user section for more info on this requirement.

    • Authentication Database: Enter the name of the Stitch user’s authentication database. This is the name of the database where the Stitch user was initially created.

      Note: If you’re connecting an Atlas-based MongoDB instance, this must be the admin database. See the Create a Mongo database user section for more info on this requirement.

    • Replica Set: Optional. The name of the replica set you created in Step 3.1 to be used for Log-based Incremental Replication. If needed, you can return the replica set name using the rs.status() command. The replica set name will be returned under key set:

      "set" : "Name of your replica set",
      ...
      
    • Include MongoDB database names in destination tables: Checking this setting will include schema names from the source database in the destination table name - for example: <source_schema_name>__<collection_name>.

      Stitch loads all selected replicated tables to a single schema, preserving only the collection name. If two collections canonicalize to the same name - even if they’re in different source databases or schemas - name collision errors can arise. Checking this setting can prevent these issues.

      Note: This setting can not be changed after the integration is saved. Additionally, this setting may create table names that exceed your destination’s limits. For more info, refer to the Database Integration Table Name Collisions guide.

Step 4.2: Define the SSH connection details

If you’re using an SSH tunnel to connect your MongoDB database to Stitch, you’ll also need to define the SSH settings. Refer to the Setting up an SSH Tunnel for a database connection guide for assistance with completing these fields.

  1. Click the SSH Tunnel checkbox.

  2. Fill in the fields as follows:

    • SSH Host: Enter the public IP address or hostname of the server Stitch will SSH into.

    • SSH Port: Enter the SSH port on your server. (22 by default)

    • SSH User: Enter the Stitch Linux (SSH) user’s username.

Step 4.3: Define the SSL connection details

Click the Connect using SSL checkbox if you’re using an SSL connection. Note: The database must support and allow SSL connections for this setting to work correctly.

Step 4.4: Define Log-based Replication setting

In the Log-based Replication section, you can set this as the integration’s default Replication Method.

When enabled, tables that are set to replicate will use Log-based Incremental Replication by default. If you don’t want a table to use Log-based Incremental Replication, you can change it in the Table Settings page for that table.

If this setting isn’t enabled, you’ll have to select a Replication Method for each table you set to replicate.

Step 4.5: Create a replication schedule

In the Replication Frequency section, you’ll create the integration’s replication schedule. An integration’s replication schedule determines how often Stitch runs a replication job, and the time that job begins.

MongoDB integrations support the following replication scheduling methods:

To keep your row usage low, consider setting the integration to replicate less frequently. See the Understanding and Reducing Your Row Usage guide for tips on reducing your usage.

Step 4.6: Save the integration

When finished, click Check and Save.

Stitch will perform a connection test to the MongoDB database; if successful, a Success! message will display at the top of the screen. Note: This test may take a few minutes to complete.

Step 5: Select data to replicate

The last step is to select the collections and fields you want to replicate.

Note: If a replication job is currently in progress, new selections won’t be used until the next job starts.

For MongoDB integrations, you can select:

  1. Individual collections and fields

  2. All collections and fields

Click the tabs to view instructions for each selection method.

  1. In the Integration Details page, click the Collections to Replicate tab.
  2. Locate a collection you want to replicate.
  3. Click the checkbox next to the collection’s name. A blue checkmark means the collection is set to replicate.

  4. On the page that displays, click the Collection Settings button.
  5. In the Collection Settings page:
    1. Define the collection’s Replication Method, or skip this step if you want to use the integration’s default method.

    2. If using Key-based Incremental Replication, select a Replication Key.

    3. Optional: Select or exclude fields by entering a projection query in the Fields to Replicate section. Refer to the Selecting MongoDB Fields Using Projection Query guide for instructions and examples.
    4. When finished, click Update Settings.
  6. Repeat this process for every collection you want to replicate.

  7. Click the Finalize Your Selections button at the bottom of the page to save your data selections.
  1. Click into the integration from the Stitch Dashboard page.
  2. Click the Tables to Replicate tab.

  3. Navigate to the collection level, selecting any databases and/or schemas that contain collections you want to replicate.

  4. In the list of collections, click the box next to the Collection Names column.
  5. In the menu that displays, click Track AllCollections (Except Views):

    The Track AllCollections (Except Views) menu in the Collections to Replicate tab

  6. Click the Finalize Your Selections button at the bottom of the page to save your data selections.

Initial and historical replication jobs

After you finish setting up MongoDB, its Sync Status may show as Pending on either the Stitch Dashboard or in the Integration Details page.

For a new integration, a Pending status indicates that Stitch is in the process of scheduling the initial replication job for the integration. This may take some time to complete.

Free historical data loads

The first seven days of replication, beginning when data is first replicated, are free. Rows replicated from the new integration during this time won’t count towards your quota. Stitch offers this as a way of testing new integrations, measuring usage, and ensuring historical data volumes don’t quickly consume your quota.

MongoDB replication

MongoDB Replication Keys

Unlike Replication Keys for other database integrations, those for MongoDB have special considerations due to MongoDB functionality. For example: MongoDB allows multiple data types in a single field, which can cause records to be skipped during replication.

Refer to the MongoDB Replication Keys guide before you define the Replication Keys for your collections, as incorrectly defining Replication Keys can cause data discrepancies.

Heavily nested data and destination column limits

MongoDB documents can contain heavily nested data, meaning an attribute can contain many other attributes.

If your destination doesn’t natively support nested data structures, Stitch will de-nest them to load them into the destination. Depending on how deeply nested the data is and the per table column limit of the destination, Stitch may encounter issues when loading heavily nested data.

Refer to the Nested Data Structures guide for more info and examples.



Questions? Feedback?

Did this article help? If you have questions or feedback, feel free to submit a pull request with your suggestions, open an issue on GitHub, or reach out to us.