Learn how Stitch will load data from your integrations into Stitch’s Databricks Delta Lake on AWS destination.
In this guide, we’ll cover data loading scenarios involving:
-
Object identifiers in the destination, including naming limitations and transformations
-
Various data types, including how data is typed and structured in the destination
-
Schema changes in the source or structural changes in the destination
Primary Key scenarios
Scenarios involving Primary Key columns.
IF |
A table without a Primary Key is replicated. |
THEN |
|
IF |
A table with a single Primary Key is replicated. |
THEN |
|
IF |
A table with multiple Primary Keys is replicated. |
THEN |
|
IF |
The table’s Primary Key(s) is/are changed. |
THEN |
Changing a table’s Primary Key(s) is only permitted if the table is using Full Table Replication. If Primary Key columns are changed for a table using Key-based or Log-based Incremental Replication, Stitch will stop processing data for the table. |
AND |
The following error will display in the Notifications tab in Stitch: |
FIX IT |
For tables using Key-based or Log-based Incremental Replication:
|
IF |
A Primary Key column in a source contains multiple data types. |
|||||||||
THEN |
To accommodate data of varying types, Stitch will create multiple columns to ensure data is loaded with the correct type. In the destination, this will look like the column has been “split”. To ensure data is loaded correctly, a Primary Key column may only a single data type. For example: Stitch initially detected
Column splits will result in |
|||||||||
AND |
The following error will display in the Notifications tab in Stitch: |
|||||||||
FIX IT |
Verify the data type(s) for the Primary Key column in the source. If it contains multiple data types, you’ll need to ensure that the column only contains values of one data type. Note: If the table is using Key-based or Log-based Incremental Replication, you’ll also need to do the following:
|
IF |
You remove the Primary Key column(s) for a table in Databricks Delta. |
THEN |
Changing a table’s Primary Key(s) is not permitted in Databricks Delta. If Primary Key columns are changed, Stitch will stop processing data for the table. |
AND |
The following error will display in the Notifications tab in Stitch: |
FIX IT |
For tables using Key-based or Log-based Incremental Replication:
|
Replication Key scenarios
Scenarios involving Replication Keys and how data is loaded as a result.
IF |
A table using Key-based Incremental Replication is replicated where the Replication Key column contains |
THEN |
|
Object naming scenarios
Scenarios involving object identifiers in the destination, including naming limitations and transformations.
IF |
A table name contains more characters than allowed by Databricks Delta. |
THEN |
Databricks Delta will reject all data for the table. |
AND |
The following error will display in the Notifications tab in Stitch:
Rejected records will be logged in the |
FIX IT |
If possible, change the table name in the source to be less than Databricks Delta’s character limit of 78 characters.
Use the |
IF |
A column name contains more characters than allowed by Databricks Delta. |
THEN |
Databricks Delta will reject columns with names that exceed the column character limit. Other columns in the table will persist to Databricks Delta. |
AND |
The following error will display in the Notifications tab in Stitch:
Rejected records will be logged in the |
FIX IT |
If possible, change the column name in the source to be less than Databricks Delta’s character limit of 122 characters.
Use the |
IF |
Two columns are replicated that canonicalize to the same name. |
THEN |
For example: A table containing both Databricks Delta will reject the records and create a log for the rejected records in the |
AND |
The following error will display in the Notifications tab in Stitch:
Rejected records will be logged in the |
FIX IT |
If possible, re-name one of the columns in the source so that both column names will be unique when replicated to Databricks Delta.
Use the |
IF |
A column is replicated that has a mixed-case name. |
||||||||
THEN |
Databricks Delta will convert letters to lowercase. For example:
|
IF |
A column is replicated that has a name with spaces. |
||||||
THEN |
Databricks Delta will convert spaces to underscores. For example:
|
IF |
A column is replicated with a name that contains unsupported special characters. |
|||||||||
THEN |
Databricks Delta will convert special characters to underscores. For example:
|
IF |
A column is replicated with a name that begins with a non-letter. |
|||||||||
THEN |
Databricks Delta will conserve the non-letter characters and prefix the name with an underscore. For example:
|
Table scenarios
Scenarios involving table creation and modification in the destination.
IF |
A table contains entirely |
THEN |
No table is created in Databricks Delta. At least one column must have a non- |
IF |
A table arrives with more columns than Databricks Delta allows. |
THEN |
Databricks Delta doesn’t have a column limit for tables. Data will continue to load. |
Data typing scenarios
Scenarios involving various data types, including how data is typed and structured in the destination.
IF |
Stitch detects multiple data types for a single column. |
THEN |
To accommodate data of varying types, Stitch will create multiple columns to ensure data is loaded with the correct type. In the destination, this will look like the column has been “split”. For example: Stitch first detected that
Note: If the column is used as a Primary Key for the table, this scenario will result in a loading error. Refer to the Primary Key scenarios section for more info and examples. |
IF |
Data is replicated to Databricks Delta that is nested, containing many top-level properties and potentially nested sub-properties. |
THEN |
Nested data structures (JSON arrays and objects) will be loaded intact into a |
IF |
A |
THEN |
Databricks Delta will store |
IF |
|
THEN |
No widening will occur. |
IF |
A column containing date data with timezone info is replicated to Databricks Delta. |
THEN |
Databricks Delta will store the value as |
IF |
A column contains timestamp data that is outside Databricks Delta’s supported range. |
THEN |
Databricks Delta will reject the records that fall outside the supported range. |
AND |
The following error will display in the Notifications tab in Stitch:
Rejected records will be logged in the |
FIX IT |
To resolve the error, offending values in the source must be changed to be within Databricks Delta’s timestamp range.
Use the |
IF |
A column contains integer data. |
THEN |
Databricks Delta will store integer data as |
IF |
A column contains integer data that is outside Databricks Delta’s supported range. |
THEN |
Databricks Delta will reject the records that fall outside the supported range. |
AND |
The following error will display in the Notifications tab in Stitch:
Rejected records will be logged in the |
FIX IT |
To resolve the error, offending values in the source must be changed to be within Databricks Delta’s limit for integers.
Use the |
IF |
A column contains decimal data. |
THEN |
Databricks Delta will store decimal data as |
IF |
A column contains decimal data that is outside Databricks Delta’s supported range. |
THEN |
Databricks Delta will reject the records that fall outside the supported range. |
AND |
The following error will display in the Notifications tab in Stitch:
Rejected records will be logged in the |
FIX IT |
To resolve the error, offending values in the source must be changed to be within Databricks Delta’s limit for decimals.
Use the |
Schema change scenarios
Scenarios involving schema changes in the source or structural changes in the destination.
IF |
A new column is added in table already set to replicate. |
THEN |
If the column has at least one non- Note: If the table using either Key- or Log-based Incremental Replication, backfilled values for the column will only be replicated if:
Refer to Tracking new columns in an already replicating table guide for more info and examples. |
IF |
A new column is added by you to a Stitch-generated table in Databricks Delta. |
THEN |
Columns may be added to tables created by Stitch as long as they are nullable, meaning columns don’t have |
IF |
A column is deleted at the source. |
THEN |
How a deleted column is reflected in Databricks Delta depends on the Replication Method used by the table:
|
IF |
You remove a column from a Stitch-replicated table in your destination. |
THEN |
The result of deleting a column from a Stitch-generated table depends on the type of column being removed:
|
Destination changes
Scenarios involving modifications made to the destination, such as the application of workload/performance management features or user privilege changes.
IF |
Partitioning is applied to Stitch-generated tables in the destination. |
THEN |
Stitch will respect the partitioning application. |
IF |
Clustering is applied to Stitch-generated tables in the destination. |
THEN |
Stitch will respect the cluster application. |
Related | Troubleshooting |
Questions? Feedback?
Did this article help? If you have questions or feedback, feel free to submit a pull request with your suggestions, open an issue on GitHub, or reach out to us.