Full Table Replication is a replication method in which all rows in a table - including new, updated, and existing - are replicated during every replication job. In this guide, we’ll cover:
How Full Table Replication works
Tables that use Full Table Replication are replicated in full during each replication job. Regardless of whether a record is new or simply modified, all records in the table will be selected for extraction.
If Full Table Replication were a SQL query, it would look like this:
SELECT column_you_selected_1,
column_you_selected_2,
[...]
FROM schema.table
When Full Table Replication should be used
Full Table Replication may be a good fit if:
- Records are hard deleted from the source.
- The table doesn’t contain a suitable column for Key-based Incremental Replication.
- Log-based Incremental Replication is unavailable for the source.
- For MongoDB-backed database integrations: The
_id
field contains only one data type. Refer to the Limitations section for more info.
Limitations of Full Table Replication
Before you select Full Table Replication as the Replication Method for a table, you should be aware of the limitations this method can have. Being aware of these limitations can help prevent data discrepancies and ensure your data is replicated in the most efficient manner possible.
The limitations of Full Table Replication are:
Limitation 1: Can cause latency
How large a source table is - that is, how many records the table contains - can affect how quickly Stitch is able to extract data from a source.
In the case of large tables using Full Table Replication, Stitch can only extract data as quickly as it is returned. This means that if a database or SaaS application returns data slowly, especially for a large table, latency in the replication process may increase. This is more probable with tables using Full Table Replication.
Limitation 2: Increased row consumption
Tables using Full Table Replication are replicated fully during every replication job, regardless of whether individual records were updated or not.
The more records a table contains, the more rows that will count towards usage. When paired with a high Replication Frequency, a single table can quickly consume an entire month’s row quota.
For example: A table contains 10,000 records and is using Full Table Replication. The integration’s Replication Frequency is every 30 minutes. The table below shows the number of rows replicated for the table per job as well as the total number used since the first job:
Job name | Start time | Rows replicated this job | Total rows replicated |
Job 1 | 00:00 | 10,000 | 10,000 |
Job 2 | 00:30 | 10,000 | 20,000 |
Job 3 | 01:00 | 10,000 | 30,000 |
Job 4 | 01:30 | 10,000 | 40,000 |
Job 5 | 02:00 | 10,000 | 50,000 |
If the integration were to continue replicating every 30 minutes until 11:59:59, this table would use 480,000 rows in 24 hours. Depending on the Stitch plan you’re using, this type of usage can quickly use up your row allotment.
Limitation 3: Unavailable for some integrations
Currently, Full Table Replication is unavailable for version v11-01-2016 of Stitch’s MongoDB integration. This version of MongoDB only supports Key-based Incremental Replication.
Full Table Replication is supported for all other versions of Stitch’s database and SaaS integrations.
Limitation 4: Multiple data types in the _id field can cause discrepancies (MongoDB)
Full Table Replication works a little differently for MongoDB-backed database integrations. For MongoDB, Stitch uses the _id
field in MongoDB collections as a pseudo-Replication Key.
During the replication job, Stitch first determines the current maximum value of the _id
column. Next, Stitch queries for documents with _id
values that are less than or equal to the current maximum _id
value.
For example: A query for Full Table Replication for a MongoDB collection would look like this:
SELECT field_you_selected_1,
field_you_selected_2,
[...]
FROM schema.collection
WHERE _id <= [max _id value]
The MongoDB integration functions this way to ensure that replication for the table can resume if a replication job is interrupted or doesn’t finish before the extraction job time limit.
Because MongoDB ranks BSON data types, this affects how the maximum _id
value is determined. As a result, if multiple data types are present in the _id
column, discrepancies may occur. Refer to the Missing Mongo data due to multiple data types guide for more info and examples.
Related | Troubleshooting |
Questions? Feedback?
Did this article help? If you have questions or feedback, feel free to submit a pull request with your suggestions, open an issue on GitHub, or reach out to us.