Understanding Table Structural Changes

From time to time, Stitch will encounter data that can’t be loaded losslessly into the destination table in your destination. When this happens, Stitch may have to alter the structure of the table in order to successfully load the data.

Reasons for table structural changes

Stitch may need to perform table alterations for several reasons, including:

VARCHAR data of varying widths,
Multiple data types in a source table column, and
Column additions or removals in the source table

In this guide are examples of how Stitch will behave in each of these scenarios for each currently supported destination type.

Examples in this guide

The examples in this guide will use an example table named customers to demonstrate Stitch’s behavior for each scenario.

Excluding the _sdc columns, Stitch determines this is the structure of the customers table:

Column name	Data type
id	BIGINT
name	STRING
age	BIGINT
has_magic	BOOLEAN

VARCHAR column widening

To preserve your destination’s performance and reduce disk usage, Stitch uses the smallest possible VARCHAR column when storing string data.

For example: If the maximum width of a string column across all records is currently 127, Stitch will type the destination column as VARCHAR(128).

As string data can vary in width, Stitch will take different actions to accommodate the data, depending on the destination in use. See below for an example.

VARCHAR column widening example: First replication job

During the first replication job, Stitch extracts the following records for the customers table:

id (BIGINT)	name (STRING)	age (BIGINT)	has_magic (BOOLEAN)
1	Finn	15	false
2	Jake	7	true

VARCHAR column widening example: Second replication job

During the next replication job, Stitch extracts the records in the table below. In this example, the name column contains data that exceeds its previously known width:

id (BIGINT)	name (STRING)	age (BIGINT)	has_magic (BOOLEAN)
3	PrincessBubblegumWhoIsTheCurrentIncarnationOfTheCandyElementalAndRulesOverTheCandyKingdomWasBornFromTheMotherGumAfterTheMushroomWarAndHasARivalWhoIsAVampireNamedMarcellineTheVampireQueen	16	true
4	BMO	11	false

VARCHAR column widening example: New table structure

How Stitch loads the data depends on the type destination being used. Click the tabs below to see how accommodating this data works for each destination.

When Stitch detects string data that exceeds its previous maximum width in Azure Synapse Analytics destinations, Stitch will automatically widen the column to accommodate the data.

In this example, the name column would have originally been NVARCHAR(128). Stitch will widen the column to NVARCHAR(256).

The structure of the customers table would be altered to the following:

Column name	Data type
id [pk]	BIGINT
name	NVARCHAR(256)
age	BIGINT
has_magic	BOOLEAN

When Stitch detects string data that exceeds its previous maximum width in BigQuery destinations, Stitch will not widen the column.

All string data is stored as TEXT in BigQuery destinations, which do not require setting a width.

The structure of the customers would remain unchanged:

Column name	Data type
id [pk]	BIGINT
name	TEXT
age	BIGINT
has_magic	BOOLEAN

When Stitch detects string data that exceeds its previous maximum width in Panoply and Redshift destinations, Stitch will automatically widen the column to accommodate the data.

In this example, the name column would have originally been VARCHAR(128). Stitch will widen the column to VARCHAR(256), which requires dropping and re-creating the column.

Note: Dropping and re-creating columns will affect views dependent on the column.

The structure of the customers table would be altered to the following:

Column name	Data type
id [pk]	BIGINT
name	VARCHAR(256)
age	BIGINT
has_magic	BOOLEAN

When Stitch detects string data that exceeds its previous maximum width in PostgreSQL destinations, Stitch will not widen the column.

All string data is stored as TEXT in PostgreSQL destinations, which do not require setting a width.

The structure of the customers would remain unchanged:

Column name	Data type
id [pk]	BIGINT
name	TEXT
age	BIGINT
has_magic	BOOLEAN

When Stitch detects string data that exceeds its previous maximum width in Snowflake destinations, Stitch will automatically widen the column to accommodate the data.

In this example, the name column would have originally been VARCHAR(128). Stitch will widen the column to VARCHAR(256).

The structure of the customers table would be altered to the following:

Column name	Data type
id [pk]	BIGINT
name	VARCHAR(256)
age	BIGINT
has_magic	BOOLEAN

Columns with mixed data types

Stitch requires that there only be one data type per column to properly type, load, and store data. If a column contains multiple data types, Stitch will create additional columns and append the data type to the column name. See below for an example.

Mixed data types example: First replication job

During the first replication job, the following rows are extracted for the customers table:

id (BIGINT)	name (STRING)	age (BIGINT)	has_magic (BOOLEAN)
1	Finn	15	false
2	Jake	7	true

Mixed data types example: Second replication job

During the next replication job, the following rows are extracted:

id (BIGINT)	name (STRING)	age (DOUBLE)	has_magic (STRING)
3	Bubblegum	16.0	yes
4	BMO	11	false

Stitch will detect that the data types in these newly replicated rows differ than the ones from the initial replication job. In this case:

age was originally a BIGINT, but can sometimes be a decimal
has_magic was originally a BOOLEAN, but can sometimes be a string

Mixed data types example: New table structure

To accommodate the data, Stitch will create a new column for the newly detected data type and store the data for that data type in the new column.

How columns are named as a result of “splitting” mixed data types depends on the type of destination being used. Click the tabs below to see how accommodating this data works for each destination.

When mixed data types are detected in the same column in Azure Synapse Analytics destinations:

Original columns: The column name remains as-is. The column’s data type will be the first data type Stitch detected for the column. For example: age will only store BIGINT data. If a different data type is detected for age, it will be stored in an additional column and this column will be null.
Additional columns: The column’s data type is appended to the column name. Only data of this type will be stored in this column. For example: age__fl will only store float/double data. If a different data type is detected for age__fl, it will be stored in the correctly typed column and this column will be null.

The structure of the customers table would be altered to the following:

Column name	Data type
id [pk]	BIGINT
name	TEXT
age	BIGINT
age__fl	FLOAT
has_magic	BOOLEAN
has_magic__st	TEXT

When mixed data types are detected in the same column in BigQuery destinations:

Original columns: The column name remains as-is. The column’s data type will be the first data type Stitch detected for the column. For example: age will only store BIGINT data. If a different data type is detected for age, it will be stored in an additional column and this column will be null.
Additional columns: The column’s data type is appended to the column name. Only data of this type will be stored in this column. For example: age__nu will only store numeric data. If a different data type is detected for age__nu, it will be stored in the correctly typed column and this column will be null.

The structure of the customers table would be altered to the following:

Column name	Data type
id [pk]	BIGINT
name	TEXT
age	BIGINT
age__nu	NUMERIC
has_magic	BOOLEAN
has_magic__st	TEXT

When mixed data types are detected in the same column in Panoply and Redshift destinations:

Original columns: The column is re-named to include the original (first detected) data type and only data of the original type will be stored in this column. For example: age__bi will only store BIGINT data. If a different data type is detected for age__bi, it will be stored in an additional column and this column will be null.
Additional columns: The column’s data type is appended to the column name. Only data of this type will be stored in this column. For example: age__do will only store double data. If a different data type is detected for age__do, it will be stored in the correctly typed column and this column will be null.

The structure of the customers table would be altered to the following:

Column name	Data type
id [pk]	BIGINT
name	TEXT
age__bi	BIGINT
age__do	DOUBLE
has_magic__bo	BOOLEAN
has_magic__st	VARCHAR

When mixed data types are detected in the same column in PostgreSQL destinations:

Original columns: The column is re-named to include the original (first detected) data type and only data of the original type will be stored in this column. For example: age__bi will only store BIGINT data. If a different data type is detected for age__bi, it will be stored in an additional column and this column will be null.
Additional columns: The column’s data type is appended to the column name. Only data of this type will be stored in this column. For example: age__do will only store double data. If a different data type is detected for age__do, it will be stored in the correctly typed column and this column will be null.

The structure of the customers table would be altered to the following:

Column name	Data type
id [pk]	BIGINT
name	TEXT
age	BIGINT
age__do	DOUBLE
has_magic	BOOLEAN
has_magic__st	TEXT

When mixed data types are detected in the same column in Snowflake destinations:

Original columns: The column name remains as-is. The column’s data type will be the first data type Stitch detected for the column. For example: age will only store BIGINT data. If a different data type is detected for age, it will be stored in an additional column and this column will be null.
Additional columns: The column’s data type is appended to the column name. Only data of this type will be stored in this column. For example: age__fl will only store numeric data. If a different data type is detected for age__fl, it will be stored in the correctly typed column and this column will be null.

The structure of the customers table would be altered to the following:

Column name	Data type
ID [pk]	BIGINT
name	TEXT
age	BIGINT
age__fl	FLOAT
has_magic	BOOLEAN
has_magic__st	TEXT

Adding and removing columns

Note: This section is not applicable to Log-based Incremental Replication. This section only applies when Key-based Incremental or Full Table Replication is being used. Refer to the Log-based Incremental Replication guide for info about schema changes and Log-based Incremental Replication.

Adding columns: When a new column is added to a source table and selected for replication, Stitch will append the column to the end of the destination table.

For Key-based Incremental tables, data for the column will be replicated onward from the saved Replication Key . Default NULLs will be placed in existing rows unless:
1. A historical backfill in the source updated the records’ Replication Key values, or
2. A table-level reset is performed and a full re-replication is queued
Removing columns: When a column is removed in the source or de-selected from replication, Stitch will place default NULLs in the column going forward. Columns will not be removed from the destination.

Questions? Feedback?

Did this article help? If you have questions or feedback, feel free to submit a pull request with your suggestions, open an issue on GitHub, or reach out to us.

Related	Troubleshooting
Syncing Data Syncing New Columns Replication Methods Replication Keys Resetting Replication Keys	Data Discrepancy Guide Missing Objects Missing Columns & NULL Values Non-Replicating Data & Unsupported Data Types Destination & Integration Compatibility