Understanding and Reducing Your Row Usage

In this guide, we’ll cover the basics of Stitch billing, how to check and understand your usage, and some tips for keeping your row count low.

In this guide, we’ll cover:

The basics of Stitch billing
Understanding your row usage
How to check your row usage in Stitch
Some tips for reducing your usage

How does Stitch billing work?

Much like the data part of a cell phone plan, each Stitch plan is allotted a certain number of replicated rows per month. For detailed info on pricing and what’s included in each plan, refer to the pricing page on our website.

What's a replicated row?

Stitch counts the following as a ‘replicated row’:

A new row, or a never-before-replicated row replicated through Stitch,
An updated row, or an existing row that’s been changed,
A sub-row created from de-nesting nested data structures, and
A copy of an existing row. For example: Rows in tables that are replicated fully during each replication job or rows replicated as a result of resetting Replication Keys.

Understand your usage

Source rows ≠ replicated rows
Impacts on usage
Usage examples

Source rows ≠ replicated rows

When viewing the number of replicated rows in Stitch, you may be surprised by the totals. You may ask yourself:

“How did Stitch replicate this many rows? There aren’t that many in my source or destination!”

We understand that this can be confusing. Keep in mind that row usage in Stitch is the total number of replicated rows. This means that the number of rows in the source won’t necessarily be equal to your row usage in Stitch.

Because Stitch counts updated rows, copies of existing rows, and rows created from de-nesting towards your total usage, the total of replicated rows and the total number of rows in your data sources or destination may not be equal.

Take the following example. As you can see in the second replication job, the total replicated rows reported by Stitch are cumulative, or the total rows replicated across all replication jobs.

Example demonstrating how source, Stitch-reported, and destination rows are not always equal.

Impacts on usage

The number of rows Stitch replicates is directly impacted by:

The number of tables set to replicate. The more tables that you select, the potential higher number of rows.
The Replication Methods used by the replicating tables. Tables using Full Table Replication can increase your row usage.
Integrations’ replication schedules. Integrations scheduled to replicate on a frequent basis can lead to increased row usage.
The volume and structure of the data in the selected tables. Some Stitch destinations - like Redshift and PostgreSQL - will break apart nested records and count each sub-record as a row. Refer to the Nested data structures guide for more info and examples.

Usage examples

Next, we’ll look at some examples of how certain factors can affect row usage in Stitch.

When looking at the examples, note the differences between the source rows and row totals reported in Stitch.

In each tab is an example of how certain factors can affect row usage in Stitch.

For these examples, assume that:

The tables are using Full Table Replication. This means the table is replicated in full during every replication job.
That Extraction has completed without issue for every scheduled replication job.

In this example, we’ll look at how different replication frequencies can affect the total number of replicated rows.

Below are the total number of replicated rows for a table with 100 rows using Full Table Replication:

Replication Frequency	30 minutes	1 hour	6 hours	12 hours	24 hours
Daily total	4,800	2,400	400	200	100
Billing period total	144,000	72,000	12,000	6,000	3,000

As you can see, slightly reducing the Replication Frequency can greatly reduce the number of replicated rows overall.

While this example only demonstrates row usage for a single table, think about how row usage will increase when there are multiple tables like this one set to replicate.

Note: This example is applicable only to destinations that don’t natively support nested data, such as Amazon Redshift or PostgreSQL-based destinations.

In this example, we’ll look at how data structured using JSON arrays can affect the total number of replicated rows.

For destinations that don’t natively support storing nested data, Stitch will “de-nest”, or normalize, complex JSON structures into relations. For JSON arrays, data is unpacked into subtables, where each sub-record is counted as a replicated row.

Source record in JSON format

When Stitch extracts data from a source, it's done by putting the data into JSON format. Below is a sample record from a table named people. Note the best_friends array:

{
   "id":1,
   "name":"Finn",
   "type":"human",
   "best_friends":[
      {
         "id":2,
         "name":"Jake",
         "type":"dog"
      },
      {
         "id":3,
         "name":"Bubblegum",
         "type":"princess"
      },
      {
         "id":4,
         "name":"BMO",
         "type":"robot"
      }
   ]
}

When Stitch loads this record into the destination, two tables will be created: people and people__best_friends. This is due to Stitch unpacking the JSON arrays.

Top-level: people

The top-level table will contain a single record:

id	name	type
1	Finn	human

Subtable: people__best_friends

The subtable will contain three records, one for each item in the best_friends array:

_sdc_source_key_id	_sdc_level_0_id	id	name	type
1	0	2	Jake	dog
1	1	3	Bubblegum	princess
1	2	4	BMO	robot

Row usage totals

This example assumes the table is using Full Table Replication. Below are the total number of replicated rows for the people (one record) and people__best_friends (three records) tables:

	30 minutes	1 hour	6 hours	12 hours	24 hours
Daily total	192	96	16	8	4
Billing period total	5,760	2,880	480	240	120

This example demonstrates that while there is only one record in the source, the number of rows replicated and loaded through Stitch will be four due to de-nesting.

While the example totals may not appear to be significant, think about this as it might relate to real data. Tables can contain dozens or hundreds of records, which will exponentially increase overall row usage.

Check your row usage

Row usage for all integrations
Row usage for an integration
Row usage reset date

Row usage for all integrations

On the Stitch Dashboard page, you can view the total number of replicated rows for all of your integrations for the current billing period:

Row usage graph on the Stitch Dashboard page for all integrations

Row usage for an integration

To take a closer look at an individual integration’s usage for the current billing period, click on the integration to open the Integration Details page, and check out the Rows Loaded Over Time section:

Rows loaded over time for an integration

Row usage reset date

The reset date - or the day your row count will reset to zero - can be found in the Your Usage section of your Billing page, accessed by clicking the User menu (your icon) > Billing:

Row usage reset date in the Your Usage section of the Stitch Billing page

Reduce your usage

While you can change your plan at any time to accommodate your data volume needs, below are some tried-and-true tips for reducing your row usage and staying within your plan’s row allotment:

Identify high usage integrations
Reduce Replication Frequencies
Use an incremental Replication Method
De-select unnecessary data
Pause integrations

Identify high usage integrations

For many of Stitch’s integrations, row usage shouldn’t be an issue. We attempt to use Key-based Incremental Replication for SaaS integrations whenever possible.

There are, however, times when high row usage may be unavoidable. For example:

Data contains many nested structures. This is applicable to row usage only when a destination doesn’t natively support nested JSON structures.
Source generates large amounts of data
Integration has a high number of tables using Full Table Replication
Integration doesn’t currently support table selection
Integration uses an for extraction

Click below to display integrations known to be heavy row users, and the potential reasons for their increased usage. If you’re using any of these integrations, you can use the remaining tips in this section to keep your usage down.

Integrations with known high row usage

To find out more about your SaaS integrations' data structure and replication methods, we recommend checking out our extensive SaaS integration docs. Every SaaS integration has detailed info about the tables Stitch will replicate and the methods used to do so.

Integration	Reasons
Any database	High data volume
Club Speed (v1)	High number of tables using Full Table Replication
Codat (v1)	High number of tables using Full Table Replication
Desk (v15-10-2015)	No table selection
Facebook Ads (v1)	Data contains many nested structures High data volume Integration uses an attribution window of 28 days for extraction
Google Ads (v1)	Integration uses an attribution window of 30 days for extraction
Google Ads (AdWords) (v1)	Integration uses an attribution window of 30 days for extraction
Google Analytics (AdWords) (v05-12-2017)	No table selection Integration uses an attribution window of 15 days for extraction
Google ECommerce (v15-10-2015)	No table selection Integration uses an attribution window of 15 days for extraction
HubSpot (v3)	Data contains many nested structures
JIRA (v2)	Data contains many nested structures High data volume High number of tables using Full Table Replication
Listrak (v1)	High number of tables using Full Table Replication
LivePerson (v1)	High number of tables using Full Table Replication
Microsoft Advertising (v2)	Integration uses an attribution window of 30 days for extraction
Mixpanel (v1)	Data contains many nested structures High data volume Integration uses an attribution window of 5 days for extraction
MongoDB (v2)	Data contains many nested structures High data volume
MongoDB (v3)	Data contains many nested structures High data volume
MongoDB Atlas (v2)	Data contains many nested structures High data volume
MongoDB Atlas (v3)	Data contains many nested structures High data volume
NetSuite (v2)	High data volume High number of tables using Full Table Replication
Pepperjam (v1)	Data contains many nested structures High data volume Integration uses an attribution window of 28-30 days for extraction
Shopify (v3)	Data contains many nested structures
Stripe (v3)	Data contains many nested structures Integration uses an attribution window of 600 days for extraction
Xero (v1)	Data contains many nested structures
Zendesk Support (v2)	Data contains many nested structures High data volume

Reduce Replication Frequencies

Generally, the more often an integration is scheduled to replicate, the higher the number of rows Stitch replicates for the inetgration.

If you’re able to get by without the freshest data, consider changing your integrations’ Replication Frequency to something less frequent. For example: Every hour or six hours.

Keep in mind that the Replication Frequency setting applies to the entire integration, not individual tables. This is especially important if there are a lot of tables that use Full Table Replication in the integration.

Use an incremental Replication Method

For integrations that support Replication Method configuration, we recommend using either Key-based or Log-based Incremental Replication whenever possible.

De-select unnecessary data

Note: This is only applicable to the integrations that support table and/or column selection.

To keep your row count down and your destination tidy, you can also de-select any tables or columns you don’t need.

For example: If a column contains nested data, additional sub-rows may be created to accommodate loading the data to certain destination types. This will increase the total row count, as Stitch counts sub-rows towards usage. If this column is no longer needed, you could de-select it and lower your usage.

Pause integrations

If all else fails, you can temporarily pause the integration to keep from going over your row limit.

Note: Pausing an integration will only prevent the extraction of additional records. Loading will continue for records that have been extracted prior to the pause.

For example: If there are records currently in Preparing when an integration is paused, Stitch will continue to load these records, complete the current replication job, and count them towards your usage.

Questions? Feedback?

Did this article help? If you have questions or feedback, feel free to submit a pull request with your suggestions, open an issue on GitHub, or reach out to us.