Applicable only to Google BigQuery (v2) and Microsoft Azure Synapse Analytics destinations, the _sdc_primary_keys
table contains a list of the tables in an integration schema and the columns those tables use as Primary Keys. Every integration schema created by Stitch will contain this table.
In this guide, we’ll cover:
Applicable destinations
This guide is applicable to the following destinations:
Usage in replication
Because some destinations don’t have native support for Primary Keys, Stitch uses the _sdc_primary_keys
table to store Primary Key information and de-dupe data during loading incrementally-replicated tables.
De-duplicating data only applies to tables using an Incremental Replication Method and destinations configured to use Upsert loading. This ensures that only the most recent version of a record is loaded into the table.
Tables using Full Table Replication are not de-duped, but loaded in full during each replication job.
Determining Primary Keys
Depending on the data source type, Primary Keys are determined in one of two ways:
-
For database integrations, Primary Keys are defined by you in the source database. These will usually be columns with a Primary Key constrant or some other equivalent, depending on the type of database being used.
Note: For database views you set to replicate in Stitch, the Primary Key will be the field you define for the view during setup.
-
For SaaS integrations, Primary Keys are pre-defined by Stitch. Refer to the schema documentation for your SaaS integration for info on the Primary Keys Stitch uses for specific tables.
Primary Keys table creation
Every schema created by a Stitch integration contains a _sdc_primary_keys
table, which Stitch uses to store the Primary Key data for every table set to replicate.
Note: Stitch will create the _sdc_primary_keys
even if none of the tables in the integration have a Primary Key. Primary Key data will be added to the table when and if a table is replicated that has a defined Primary Key. This means it’s possible to have an empty _sdc_primary_keys
table.
Primary Keys table schema
The _sdc_primary_keys
table contains the following columns:
Column name | Data type | Description |
table_name | STRING |
The name of the table in the integration schema. Example data:emails
|
column_name | STRING |
The name of the column used as the table’s Primary Key. If a table uses multiple columns as a composite Primary Key, there will be a row for each column the table uses. Example data:email_id
|
ordinal_position | INTEGER |
Note: This column is only applicable to Microsoft Azure Synapse Analytics destinations. When Stitch receives data, Primary Keys are provided in an array. For example: The value of this column corresponds to the index of the column in the array Stitch receives. Example data:0
|
Example tables
For every column a table uses as a Primary Key, the _sdc_primary_keys
table will contain a row containing the table’s name, the name of the column, and for Microsoft Azure Synapse Analytics destinations, the column’s position in the Primary Key array Stitch receives.
When Stitch loads data for the emails
table, it will reference these records in _sdc_primary_keys
to de-duplicate the data. This will ensure that only the most recent version of a record exists in the emails
table.
Example table: Google BigQuery (v2)
In Google BigQuery (v2) destinations, the Primary Key data for the emails
table will look like this in _sdc_primary_keys
:
table_name | column_name |
emails | email_id |
emails | updated_at |
emails | customer_id |
Example table: Microsoft Azure Synapse Analytics
In Microsoft Azure Synapse Analytics destinations, the Primary Key data for the emails
table will look like this in _sdc_primary_keys
:
table_name | column_name | ordinal_position |
emails | email_id | 0 |
emails | updated_at | 1 |
emails | customer_id | 2 |
Effects of Primary Key changes
Replication issues can arise if Primary Keys in the source change, or if data in the _sdc_primary_keys
is incorrectly altered or removed.
Along with being unable to load data, Stitch will surface the following error if this occurs:
Primary Keys for table do not match Primary Keys of incoming data
If you receive this error, you should reset the table(s) mentioned in the error. This will queue a full re-replication of the table, which will ensure Primary Keys are correctly captured and used to de-dupe data when loading.
Related | Troubleshooting |
Questions? Feedback?
Did this article help? If you have questions or feedback, feel free to submit a pull request with your suggestions, open an issue on GitHub, or reach out to us.