Amazon Redshift is a fully managed, cloud-based data warehouse. As Redshift is built for online analytic processing and business intelligence applications, it excels at executing large-scale analytical queries. For this reason, it exhibits far better performance than traditional, row-based relational databases like MySQL and PostgreSQL.
This guide serves as a reference for version 2 of Stitch’s Amazon Redshift destination.
Details and features
Stitch features
High-level details about Stitch’s implementation of Amazon Redshift, such as supported connection methods, availability on Stitch plans, etc.
Release status |
Released |
Stitch plan availability |
All Stitch plans |
Stitch supported regions |
Operating regions determine the location of the resources Stitch uses to process your data. Learn more. |
Supported versions |
Not applicable |
Connect API availability |
Supported
This version of the Amazon Redshift destination can be created and managed using Stitch’s Connect API. Learn more. |
SSH connections |
Supported
Stitch supports using SSH tunnels to connect to Amazon Redshift destinations. |
SSL connections |
Supported
Stitch will attempt to use SSL to connect by default. No additional configuration is needed. |
VPN connections |
Unsupported
Virtual Private Network (VPN) connections may be implemented as part of a Premium plan. Contact Stitch Sales for more info. |
Static IP addresses |
Supported
This version of the Amazon Redshift destination has static IP addresses that can be whitelisted. |
Default loading behavior |
Upsert |
Nested structure support |
Unsupported
|
Destination details
Details about the destination, including object names, table and column limits, reserved keywords, etc.
Note: Exceeding the limits noted below will result in loading errors or rejected data.
Maximum record size |
20MB |
Table name length |
127 characters |
Column name length |
115 characters |
Maximum columns per table |
1,600 |
Maximum table size |
None |
Maximum tables per database |
100,000 |
Case sensitivity |
Insensitive |
Reserved keywords |
Refer to the Reserved keywords documentation. |
Encodings, SORT, and DIST keys
To improve your query performance, you can apply encodings, SORT, and DIST keys to Stitch-created tables in your Amazon Redshift destination. Your settings will remain intact even when new data is loaded.
Refer to the Encodings, SORT, and DIST Keys guide for application instructions.
Amazon Redshift pricing
Currently, Amazon Redshift pricing is based on an hourly rate that varies depending on the type and number of nodes in a cluster. Check out Amazon’s pricing page for an in-depth look at their current plan offerings.
So, what’s a node? A node is a single computer that participates in a cluster. Your Amazon Redshift cluster can have one to many nodes; the more nodes, the more data it can store and the faster it can process queries. Amazon currently offers four different types of nodes, each of which has its own CPU, RAM, storage capacity, and storage drive type.
The type and number of node(s) you choose when creating your cluster is dependent on your needs and dataset. We do, however, recommend you set up a multi-node configuration to provide data redundancy.
For some guidance on choosing the right number of nodes for your cluster, check out Amazon’s Determining the Number of Nodes guide.
Replication
Replication process overview
A Stitch replication job consists of three stages:
Step 1: Data extraction
Stitch requests and extracts data from a data source. Refer to the System overview guide for a more detailed explanation of the Extraction phase.
Step 2: Preparation
During this phase, the extracted data is buffered in Stitch’s durable, highly available internal data pipeline and readied for loading. Refer to the System overview guide for a more detailed explanation of the Preparation phase.
Step 3: Loading
Stitch loads the data into Amazon Redshift.
Loading behavior
By default, Stitch will use Upsert loading when loading data into Amazon Redshift.
If the conditions for Upsert loading aren’t met, data will be loaded using Append-Only loading.
Refer to the Understanding loading behavior guide for more info and examples.
Primary Keys
Stitch requires Primary Keys to de-dupe incrementally replicated data. To ensure Primary Key data is available, Stitch creates a primary_keys
table comment. The comment is an array of strings that contain the names of the Primary Key columns for the table.
For example: A table comment for a table with a single Primary Key:
'{"primary_keys":["id"]}'
And a table comment for a table with a composite Primary Key:
'{"primary_keys":["event_id","created_at"]}'
Note: Removing or incorrectly altering Primary Key table comments can lead to replication issues.
Incompatible sources
The Amazon Redshift destination has reported incompatibilities with some of Stitch's integrations. Refer to the table below for more info.
Integration | Version | Level | Reason |
MongoDB | ANY |
Flattening nested JSON structures may result in tables with columns that exceed Amazon Redshift’s 1,600 column limit. Learn more. |
|
MongoDB Atlas | v1 |
Flattening nested JSON structures may result in tables with columns that exceed Amazon Redshift’s 1,600 column limit. Learn more. |
See all destination and integration incompatibilities.
Transformations
System tables and columns
Stitch will create the following tables in each integration’s dataset:
Additionally, Stitch will insert system columns (prepended with _sdc
) into each table.
Data typing
Stitch converts data types only where needed to ensure the data is accepted by Amazon Redshift. In the table below are the data types Stitch supports for Amazon Redshift destinations, and the Stitch types they map to.
- Stitch type: The Stitch data type the source type was mapped to. During the Extraction and Preparing phases, Stitch identifies the data type in the source and then maps it to a common Stitch data type.
- Destination type: The destination-compatible data type the Stitch type maps to. This is the data type Stitch will use to store data in Amazon Redshift.
- Notes: Details about the data type and/or its allowed values in the destination, if available. If a range is available, values that exceed the noted range will be rejected by Amazon Redshift.
Stitch type | Destination type | Notes |
BIGINT | BIGINT |
|
BOOLEAN | BOOLEAN | |
DATE | TIMESTAMP |
|
DOUBLE | DOUBLE | |
FLOAT | DOUBLE | |
INTEGER | INTEGER |
|
NUMBER | DECIMAL |
|
STRING | STRING |
|
JSON structures
Amazon Redshift destinations don’t have native support for nested data structures. To ensure nested data can be loaded, Stitch will flatten objects and arrays into columns and subtables, respectively. For more info and examples, refer to the Handling nested data structures guide.
Column names
Column names in Amazon Redshift:
- Must contain only letters (a-z, A-Z), numbers (0-9), or underscores (
_
) - Must begin with a letter
- Must be less than the maximum length of 115 characters. Columns that exceed this limit will be rejected by Amazon Redshift.
- Must not be prefixed or suffixed with any of Amazon Redshift’s or Stitch’s reserved keyword prefixes or suffixes
Stitch will perform the following transformations to ensure column names adhere to the rules imposed by Amazon Redshift:
Transformation | Source column | Destination column |
Convert uppercase and mixed case to lowercase |
CuStOmErId or CUSTOMERID
|
customerid
|
Remove special characters |
customer#id or !CuStOMerID
|
customerid and customerid
|
Remove non-letter leading characters |
4customerid or _customerid
|
customerid
|
Timezones
Amazon Redshift will store the value as TIMESTAMP WITHOUT TIMEZONE
. In Amazon Redshift, this data is stored without timezone information and expressed as UTC.
Compare destinations
Not sure if Amazon Redshift is the data warehouse for you? Check out the Choosing a Stitch Destination guide to compare each of Stitch’s destination offerings.
Related | Troubleshooting |
Questions? Feedback?
Did this article help? If you have questions or feedback, feel free to submit a pull request with your suggestions, open an issue on GitHub, or reach out to us.