For each integration that you add to Stitch, a schema specific to that integration will be created in your destination. The integration’s schema is where all the data Stitch replicates from the data source will be stored.
In this guide, we’ll cover:
Schema terminology
In Stitch, we use the term schema
to refer to a location in a destination where integration data is loaded. Depending on the destination you’re using, this might mean different things.
For example:
- In destinations that are traditional databases, data is loaded to integration-specific schemas.
- In Google BigQuery destinations, data is loaded to integration-specific datasets.
- In Amazon S3 destinations, data is loaded into integration-specific folders. The integration names and S3 Object Keys determine the exact location in the S3 bucket.
Integration schema names
Creating schema names
When you create an integration, you’re asked to provide a name for the integration. This name is used to create the integration’s schema in your destination.
For example: An Integration Name of Recurly Stitch
would create a schema named recurly_stitch
in the destination.
If you want the schema name to be different than the display name when initially creating an integration:
-
Click the Use a different schema name? link below the Integration Name field:
-
Then, enter a schema name in the Integration Schema Name field that displays:
- Complete the integration setup.
- Save and create the integration.
Changing schema names
After an integration is initially saved and created, its schema name can’t be changed.
Changing an integration’s schema name requires you to create a new integration and re-replicate all historical data.
Re-using schema names
Schema names from deleted integrations can be re-used. However, if a naming collision occurs (two schema names canonicalize to the same name) the destination may reject the data. This is because deleting an integration in Stitch won’t delete that integration’s schema or data from your destination.
Note: Free historical loads are allowed only once for each integration namespace, or schema name. Refer to the Billing FAQ for more info and examples.
Integration schema composition
Integration schemas created by Stitch will contain two types of tables:
Integration tables
The tables that Stitch creates in an integration schema depends on whether the integration supports data selection.
If the integration supports data selection, the integration schema will contain only the tables (and columns, if column selection is supported) that you set to replicate.
Otherwise, all available tables and columns in the integration will be replicated to your destination.
Refer to the Integration table schemas section below more info on how individual integration tables are structured.
Stitch system tables
In addition to the integration tables, Stitch will create additional tables in the integration schema. These tables are prepended with _sdc
.
Every integration schema will contain an _sdc_rejected
table, which serves as the integration’s log for data loading issues. Refer to the Rejected records system table documentation for more info.
If using a Google BigQuery (v2) or Microsoft Azure Synapse Analytics destination, every integration schema will also contain a table named _sdc_primary_keys
. This table contains the Primary Keys for the tables in the integration schema. Refer to the Primary Keys system table documentation for more info.
Integration table schemas
The integration tables in the schema contain the replicated data from tables set to replicate. Note: If you de-select a table from replication, doing so won’t remove that table’s data from your destination.
Data storage
How your data is stored in the schemas, tables, and columns created by Stitch depends on a few things:
- How data is structured in that particular data source (For example: Use of nested data structures),
- Any changes you make to the data source (For example: adding/removing a column),
- Stitch-specific data handling rules (For example: Entirely
NULL
columns), and - How your destination handles data (For example: Columns with mixed data types, nested data structures)
Stitch will encounter dozens of scenarios when replicating and loading your data. Familiarizing yourself with these scenarios and the nuances of your destination will enable you to better understand your data’s structure and efficiently troubleshoot if issues arise.
To learn more about how handles these scenarios, check out the Data Loading guide for your destination.
Stitch system (_sdc) columns
In addition to the columns set to replicate in these tables, there are also a few columns prepended with _sdc
. Stitch uses these columns to replicate your data. Don’t remove these columns, as doing so will cause replication issues in Stitch.
For descriptions of the system columns used by Stitch, refer to the System tables and columns guide.
Related | Troubleshooting |
Questions? Feedback?
Did this article help? If you have questions or feedback, feel free to submit a pull request with your suggestions, open an issue on GitHub, or reach out to us.