From time to time, Stitch will encounter data that can’t be loaded losslessly into the destination table in your destination. When this happens, Stitch may have to alter the structure of the table in order to successfully load the data.
Reasons for table structural changes
Stitch may need to perform table alterations for several reasons, including:
VARCHAR
data of varying widths,- Multiple data types in a source table column, and
- Column additions or removals in the source table
In this guide are examples of how Stitch will behave in each of these scenarios for each currently supported destination type.
Examples in this guide
The examples in this guide will use an example table named customers
to demonstrate Stitch’s behavior for each scenario.
Excluding the _sdc
columns, Stitch determines this is the structure of the customers
table:
Column name | Data type |
id | BIGINT |
name | STRING |
age | BIGINT |
has_magic | BOOLEAN |
VARCHAR column widening
To preserve your destination’s performance and reduce disk usage, Stitch uses the smallest possible VARCHAR
column when storing string data.
For example: If the maximum width of a string column across all records is currently 127, Stitch will type the destination column as VARCHAR(128)
.
As string data can vary in width, Stitch will take different actions to accommodate the data, depending on the destination in use. See below for an example.
VARCHAR column widening example: First replication job
During the first replication job, Stitch extracts the following records for the customers
table:
id (BIGINT) | name (STRING) | age (BIGINT) | has_magic (BOOLEAN) |
1 | Finn | 15 | false |
2 | Jake | 7 | true |
VARCHAR column widening example: Second replication job
During the next replication job, Stitch extracts the records in the table below. In this example, the name
column contains data that exceeds its previously known width:
id (BIGINT) | name (STRING) | age (BIGINT) | has_magic (BOOLEAN) |
3 | PrincessBubblegumWhoIsTheCurrentIncarnationOfTheCandyElementalAndRulesOverTheCandyKingdomWasBornFromTheMotherGumAfterTheMushroomWarAndHasARivalWhoIsAVampireNamedMarcellineTheVampireQueen | 16 | true |
4 | BMO | 11 | false |
VARCHAR column widening example: New table structure
How Stitch loads the data depends on the type destination being used. Click the tabs below to see how accommodating this data works for each destination.
When Stitch detects string data that exceeds its previous maximum width in Azure Synapse Analytics destinations, Stitch will automatically widen the column to accommodate the data.
In this example, the name
column would have originally been NVARCHAR(128)
. Stitch will widen the column to NVARCHAR(256)
.
The structure of the customers
table would be altered to the following:
Column name | Data type |
id [pk] | BIGINT |
name | NVARCHAR(256) |
age | BIGINT |
has_magic | BOOLEAN |
When Stitch detects string data that exceeds its previous maximum width in BigQuery destinations, Stitch will not widen the column.
All string data is stored as TEXT
in BigQuery destinations, which do not require setting a width.
The structure of the customers
would remain unchanged:
Column name | Data type |
id [pk] | BIGINT |
name | TEXT |
age | BIGINT |
has_magic | BOOLEAN |
When Stitch detects string data that exceeds its previous maximum width in Panoply and Redshift destinations, Stitch will automatically widen the column to accommodate the data.
In this example, the name
column would have originally been VARCHAR(128)
. Stitch will widen the column to VARCHAR(256)
, which requires dropping and re-creating the column.
Note: Dropping and re-creating columns will affect views dependent on the column.
The structure of the customers
table would be altered to the following:
Column name | Data type |
id [pk] | BIGINT |
name | VARCHAR(256) |
age | BIGINT |
has_magic | BOOLEAN |
When Stitch detects string data that exceeds its previous maximum width in PostgreSQL destinations, Stitch will not widen the column.
All string data is stored as TEXT
in PostgreSQL destinations, which do not require setting a width.
The structure of the customers
would remain unchanged:
Column name | Data type |
id [pk] | BIGINT |
name | TEXT |
age | BIGINT |
has_magic | BOOLEAN |
When Stitch detects string data that exceeds its previous maximum width in Snowflake destinations, Stitch will automatically widen the column to accommodate the data.
In this example, the name
column would have originally been VARCHAR(128)
. Stitch will widen the column to VARCHAR(256)
.
The structure of the customers
table would be altered to the following:
Column name | Data type |
id [pk] | BIGINT |
name | VARCHAR(256) |
age | BIGINT |
has_magic | BOOLEAN |
Columns with mixed data types
Stitch requires that there only be one data type per column to properly type, load, and store data. If a column contains multiple data types, Stitch will create additional columns and append the data type to the column name. See below for an example.
Mixed data types example: First replication job
During the first replication job, the following rows are extracted for the customers
table:
id (BIGINT) | name (STRING) | age (BIGINT) | has_magic (BOOLEAN) |
1 | Finn | 15 | false |
2 | Jake | 7 | true |
Mixed data types example: Second replication job
During the next replication job, the following rows are extracted:
id (BIGINT) | name (STRING) | age (DOUBLE) | has_magic (STRING) |
3 | Bubblegum | 16.0 | yes |
4 | BMO | 11 | false |
Stitch will detect that the data types in these newly replicated rows differ than the ones from the initial replication job. In this case:
age
was originally aBIGINT
, but can sometimes be a decimalhas_magic
was originally aBOOLEAN
, but can sometimes be a string
Mixed data types example: New table structure
To accommodate the data, Stitch will create a new column for the newly detected data type and store the data for that data type in the new column.
How columns are named as a result of “splitting” mixed data types depends on the type of destination being used. Click the tabs below to see how accommodating this data works for each destination.
When mixed data types are detected in the same column in Azure Synapse Analytics destinations:
- Original columns: The column name remains as-is. The column’s data type will be the first data type Stitch detected for the column. For example:
age
will only storeBIGINT
data. If a different data type is detected forage
, it will be stored in an additional column and this column will benull
. - Additional columns: The column’s data type is appended to the column name. Only data of this type will be stored in this column. For example:
age__fl
will only store float/double data. If a different data type is detected forage__fl
, it will be stored in the correctly typed column and this column will benull
.
The structure of the customers
table would be altered to the following:
Column name | Data type |
id [pk] | BIGINT |
name | TEXT |
age | BIGINT |
age__fl | FLOAT |
has_magic | BOOLEAN |
has_magic__st | TEXT |
When mixed data types are detected in the same column in BigQuery destinations:
- Original columns: The column name remains as-is. The column’s data type will be the first data type Stitch detected for the column. For example:
age
will only storeBIGINT
data. If a different data type is detected forage
, it will be stored in an additional column and this column will benull
. - Additional columns: The column’s data type is appended to the column name. Only data of this type will be stored in this column. For example:
age__nu
will only store numeric data. If a different data type is detected forage__nu
, it will be stored in the correctly typed column and this column will benull
.
The structure of the customers
table would be altered to the following:
Column name | Data type |
id [pk] | BIGINT |
name | TEXT |
age | BIGINT |
age__nu | NUMERIC |
has_magic | BOOLEAN |
has_magic__st | TEXT |
When mixed data types are detected in the same column in Panoply and Redshift destinations:
- Original columns: The column is re-named to include the original (first detected) data type and only data of the original type will be stored in this column. For example:
age__bi
will only storeBIGINT
data. If a different data type is detected forage__bi
, it will be stored in an additional column and this column will benull
. - Additional columns: The column’s data type is appended to the column name. Only data of this type will be stored in this column. For example:
age__do
will only store double data. If a different data type is detected forage__do
, it will be stored in the correctly typed column and this column will benull
.
The structure of the customers
table would be altered to the following:
Column name | Data type |
id [pk] | BIGINT |
name | TEXT |
age__bi | BIGINT |
age__do | DOUBLE |
has_magic__bo | BOOLEAN |
has_magic__st | VARCHAR |
When mixed data types are detected in the same column in PostgreSQL destinations:
- Original columns: The column is re-named to include the original (first detected) data type and only data of the original type will be stored in this column. For example:
age__bi
will only storeBIGINT
data. If a different data type is detected forage__bi
, it will be stored in an additional column and this column will benull
. - Additional columns: The column’s data type is appended to the column name. Only data of this type will be stored in this column. For example:
age__do
will only store double data. If a different data type is detected forage__do
, it will be stored in the correctly typed column and this column will benull
.
The structure of the customers
table would be altered to the following:
Column name | Data type |
id [pk] | BIGINT |
name | TEXT |
age | BIGINT |
age__do | DOUBLE |
has_magic | BOOLEAN |
has_magic__st | TEXT |
When mixed data types are detected in the same column in Snowflake destinations:
- Original columns: The column name remains as-is. The column’s data type will be the first data type Stitch detected for the column. For example:
age
will only storeBIGINT
data. If a different data type is detected forage
, it will be stored in an additional column and this column will benull
. - Additional columns: The column’s data type is appended to the column name. Only data of this type will be stored in this column. For example:
age__fl
will only store numeric data. If a different data type is detected forage__fl
, it will be stored in the correctly typed column and this column will benull
.
The structure of the customers
table would be altered to the following:
Column name | Data type |
ID [pk] | BIGINT |
name | TEXT |
age | BIGINT |
age__fl | FLOAT |
has_magic | BOOLEAN |
has_magic__st | TEXT |
Adding and removing columns
-
Adding columns: When a new column is added to a source table and selected for replication, Stitch will append the column to the end of the destination table.
For Key-based Incremental tables, data for the column will be replicated onward from the saved Replication Key . Default
NULLs
will be placed in existing rows unless:- A historical backfill in the source updated the records’ Replication Key values, or
- A table-level reset is performed and a full re-replication is queued
-
Removing columns: When a column is removed in the source or de-selected from replication, Stitch will place default
NULLs
in the column going forward. Columns will not be removed from the destination.
Related | Troubleshooting |
Questions? Feedback?
Did this article help? If you have questions or feedback, feel free to submit a pull request with your suggestions, open an issue on GitHub, or reach out to us.