Snowflake is a SQL data warehouse built from the ground up for the cloud, designed with a patented new architecture to handle today’s and tomorrow’s data and analytics.
A fully-managed SaaS data warehouse solution, Snowflake runs on Amazon Web Services cloud infrastructure: AWS EC2 virtual compute instances are used for compute needs, while S3 is utilized for persistent data storage.
This guide serves as a reference for version 1 of Stitch’s Snowflake destination.
Details and features
Stitch features
High-level details about Stitch’s implementation of Snowflake, such as supported connection methods, availability on Stitch plans, etc.
Release status |
Released |
Stitch plan availability |
All Stitch plans |
Stitch supported regions |
Operating regions determine the location of the resources Stitch uses to process your data. Learn more. |
Supported versions |
Not applicable |
Connect API availability |
Supported
This version of the Snowflake destination can be created and managed using Stitch’s Connect API. Learn more. |
SSH connections |
Unsupported
Stitch does not support using SSH tunnels to connect to Snowflake destinations. |
SSL connections |
Supported
Stitch will attempt to use SSL to connect by default. No additional configuration is needed. |
VPN connections |
Unsupported
Virtual Private Network (VPN) connections may be implemented as part of a Premium plan. Contact Stitch Sales for more info. |
Static IP addresses |
Supported
This version of the Snowflake destination has static IP addresses that can be whitelisted. |
Default loading behavior |
Upsert |
Nested structure support |
Supported
|
Destination details
Details about the destination, including object names, table and column limits, reserved keywords, etc.
Note: Exceeding the limits noted below will result in loading errors or rejected data.
Maximum record size |
20MB |
Table name length |
255 characters |
Column name length |
251 characters |
Maximum columns per table |
None |
Maximum table size |
None |
Maximum tables per database |
100,000 |
Case sensitivity |
Insensitive |
Reserved keywords |
Refer to the Reserved keywords documentation. |
Snowflake pricing
Snowflake pricing is based on two factors: The volume or data stored in your Snowflake destination and the amount of compute usage (the time the server runs) in seconds.
Snowflake offers two types of plans, each with varying levels of access and features. There are On Demand plans which are commitment-free and usage-based. The alternative is a Capacity option, which guarantees secure price discounts. Learn more about Snowflake plans and pricing here.
Snowflake warehouse sizes
Snowflake data warehouses can be different sizes - X-Small, Large, and 3X-Large, for example - which defines how many servers will comprise each cluster in a warehouse.
While the size of a warehouse can impact the time required to execute queries, bigger doesn’t always mean better. Warehouse size is directly tied to the number of credits used, which will directly impact your Snowflake costs. Learn more about Snowflake warehouse sizes here.
To help you select the warehouse size that fits your needs and budget, check out Snowflake’s Warehouse Considerations guide before getting started.
Automated warehouse management
To reduce usage, you can elect to automate the management of your Snowflake warehouse. This means that you can elect to suspend the warehouse when there’s no activity after a specified period of time, and then automatically resume when there is. Note that these settings apply to the entire warehouse and not individual clusters.
Enabling these settings depends on your workload and availability needs. Learn more about the Auto Suspend and Auto Resume features here.
Note: Stitch will only ever impact your Snowflake usage when loading data.
Replication
Replication process overview
A Stitch replication job consists of three stages:
Step 1: Data extraction
Stitch requests and extracts data from a data source. Refer to the System overview guide for a more detailed explanation of the Extraction phase.
Step 2: Preparation
During this phase, the extracted data is buffered in Stitch’s durable, highly available internal data pipeline and readied for loading. Refer to the System overview guide for a more detailed explanation of the Preparation phase.
Step 3: Loading
Stitch loads the data into Snowflake.
Loading behavior
By default, Stitch will use Upsert loading when loading data into Snowflake.
If the conditions for Upsert loading aren’t met, data will be loaded using Append-Only loading.
Refer to the Understanding loading behavior guide for more info and examples.
Primary Keys
Stitch requires Primary Keys to de-dupe incrementally replicated data. When tables are created in the destination, Stitch will apply Primary Key constraints to columns used as Primary Keys. Primary Key constraints require that column values be unique and not null.
Note: Primary Keys cannot be changed in the destination.
Incompatible sources
No compatibility issues have been discovered between Snowflake and Stitch's integration offerings.
See all destination and integration incompatibilities.
Transformations
System tables and columns
Stitch will create the following tables in each integration’s dataset:
Additionally, Stitch will insert system columns (prepended with _sdc
) into each table.
Data typing
Stitch converts data types only where needed to ensure the data is accepted by Snowflake. In the table below are the data types Stitch supports for Snowflake destinations, and the Stitch types they map to.
- Stitch type: The Stitch data type the source type was mapped to. During the Extraction and Preparing phases, Stitch identifies the data type in the source and then maps it to a common Stitch data type.
- Destination type: The destination-compatible data type the Stitch type maps to. This is the data type Stitch will use to store data in Snowflake.
- Notes: Details about the data type and/or its allowed values in the destination, if available. If a range is available, values that exceed the noted range will be rejected by Snowflake.
Stitch type | Destination type | Notes |
BIGINT | NUMBER |
|
BOOLEAN | BOOLEAN | |
DATE | TIMESTAMP_TZ |
|
DOUBLE | FLOAT | |
FLOAT | FLOAT | |
INTEGER | NUMBER |
|
NUMBER | DECIMAL |
|
STRING | VARCHAR |
|
JSON structures
When Stitch replicates source data containing objects or arrays, Stitch will load the data intact into a VARIANT
column. This is a Snowflake data type that can contain semi-structured data like JSON arrays and objects.
You can then use Snowflake’s functions for semi-structured data to parse the data. Refer to Snowflake’s documentation for more info.
Column names
Column names in Snowflake:
- Must contain only letters (a-z, A-Z), numbers (0-9), or underscores (
_
) - Must begin with a letter or underscore
-
Must be less than the maximum length of 251 characters. Columns that exceed this limit will be rejected by Snowflake.
- Must not be prefixed or suffixed with any of Snowflake’s or Stitch’s reserved keyword prefixes or suffixes
Stitch will perform the following transformations to ensure column names adhere to the rules imposed by Snowflake:
Transformation | Source column | Destination column |
Convert lowercase and mixed case to uppercase |
CuStOmErId or customerid
|
CUSTOMERID
|
Replace spaces with underscores |
CUSTOMER ID
|
CUSTOMER_ID
|
Remove special characters |
CUSTOMER#ID or !CUSTOMERID
|
CUSTOMERID and CUSTOMERID
|
Remove non-letter leading characters |
4CUSTOMERID or _CUSTOMERID
|
CUSTOMERID
|
Timezones
Snowflake will store the value as TIMESTAMP_TZ(9)
and express it as UTC.
Compare destinations
Not sure if Snowflake is the data warehouse for you? Check out the Choosing a Stitch Destination guide to compare each of Stitch’s destination offerings.
Related | Troubleshooting |
Questions? Feedback?
Did this article help? If you have questions or feedback, feel free to submit a pull request with your suggestions, open an issue on GitHub, or reach out to us.