Amazon DynamoDB (v1) | Stitch Documentation

Amazon DynamoDB extraction is supported by Stitch
This integration is powered by Singer's Amazon DynamoDB tap and certified by Stitch. Check out and contribute to the repo on GitHub.

For support, contact Support.

Amazon DynamoDB integration summary

Stitch’s Amazon DynamoDB integration replicates data using the Boto 3 1.9.57 driver.

Amazon DynamoDB feature snapshot

A high-level look at Stitch's Amazon DynamoDB (v1) integration, including release status, useful links, and the features supported in Stitch.

STITCH
Release status	Released on January 6, 2020	Supported by	Stitch
Stitch plan	Standard	Supported versions	n/a
API availability	Available	Singer GitHub repository	singer-io/tap-dynamodb
CONNECTION METHODS
SSH connections	Unsupported	SSL connections	Unsupported
REPLICATION SETTINGS
Anchor Scheduling	Supported	Advanced Scheduling	Supported
Table-level reset	Supported	Configurable Replication Methods	Supported
REPLICATION METHODS
Log-based Replication	Supported	Key-based Replication	Unsupported
Full Table Replication	Supported
DATA SELECTION
Table selection	Supported	Column selection	Supported
View replication	Supported	Select all	Unsupported
TRANSPARENCY
Extraction Logs	Supported	Loading Reports	Supported

Connecting Amazon DynamoDB

Amazon DynamoDB setup requirements

To set up Amazon DynamoDB in Stitch, you need:

An Amazon Web Services (AWS) account. Signing up is free - click here or go to https://aws.amazon.com to create an account if you don’t have one already.
Permissions in AWS Identity Access Management (IAM) that allow you to create policies, create roles, and attach policies to roles. This is required to grant Stitch authorization to Amazon DynamoDB.
If using Log-based Incremental replication, streams must be enabled in Amazon DynamoDB for every table you want to replicate using this method. Additionally, each stream must use the New Image or New and Old Images option in AWS. Refer to the Replication section for more info.

Step 1: Retrieve your Amazon Web Services account ID

Sign into your Amazon Web Services (AWS) account.
Click the user menu, located between the bell and Global menus in the top-right corner of the page.
Click My Account.
In the Account Settings section of the page, locate the Account Id field:

Keep this handy - you’ll need it to complete the setup.

Step 2: Configure Log-based Incremental Replication

Note: Skip this step if you’re not planning to use Log-based Incremental Replication. Click to skip ahead.

While Log-based Incremental Replication is the most accurate and efficient method of replication, using this replication method may, at times, require manual intervention or impact the source database’s performance. Refer to the Log-based Incremental Replication documentation for more info.

You can also use one of Stitch’s other Replication Methods, which don’t require any database configuration. Replication Methods can be changed at any time.

For every table you want to replicate using Log-based Incremental replication, you’ll need to enable Amazon DynamoDB streams. Each stream must use the New Image or New and Old Images option in AWS or replication will be unsuccessful.

Refer to Amazon’s documentation for instructions on enabling and configuring streams.

Step 3: Add Amazon DynamoDB as a Stitch data source

If you aren’t signed into your Stitch account, sign in now.
On the Stitch Dashboard page, click the Add Integration button.
Locate and click the Amazon DynamoDB icon.
Fill in the fields as follows:
- Integration Name: Enter a name for the integration. This is the name that will display on the Stitch Dashboard for the integration; it’ll also be used to create the schema in your destination.
  
  For example, the name “Stitch Amazon DynamoDB” would create a schema called stitch_amazon_dynamodb in the destination. Note: The schema name cannot be changed after the integration is saved.
- AWS Account ID: Paste the AWS account ID you retrieved in Step 1.
- AWS Region: Select the region that your instance resides in. For example: US East (N. Virginia)

Step 4: Create a replication schedule

Replication schedules affect the time Extraction begins, not the time to data loaded. Refer to the Replication Scheduling documentation for more information.

In the Replication Frequency section, you’ll create the integration’s replication schedule. An integration’s replication schedule determines how often Stitch runs a replication job, and the time that job begins.

Note: If using Log-based Incremental Replication, keep in mind that Amazon purges Amazon DynamoDB streams after 24 hours. To ensure you don’t lose data, set the integration’s Replication Frequency to an interval less than 24 hours. For example: 12 hours.

If Stitch identifies a stream that has aged out, Stitch will automatically reset the table and queue a full re-replication.

Amazon DynamoDB integrations support the following replication scheduling methods:

Replication Frequency
Anchor Scheduling
Advanced Scheduling using Cron (Advanced or Premium plans only)

To keep your row usage low, consider setting the integration to replicate less frequently. See the Understanding and Reducing Your Row Usage guide for tips on reducing your usage.

Step 5: Grant access to Amazon DynamoDB using AWS IAM

Note: To complete this step, you must have permissions in AWS Identity Access Management (IAM) that allow you to create/modify IAM policies and roles.

Next, Stitch will display a Configure Your DynamoDB Integration page. This page contains the info you need to configure bucket access for Stitch, which is accomplished via an IAM policy and role.

Note: Saving the integration before you’ve completed the steps below will result in connection errors.

Step 5.1: Create an IAM policy
Step 5.2: Create an IAM role for Stitch
Step 5.3: Check and save the connection in Stitch

Step 5.1: Create an IAM policy

An IAM policy is JSON-based access policy language to manage permissions to Amazon DynamoDB resources.

For more info about the permissions the auto-generated policy Stitch IAM policy grants, click the link below.

Amazon DynamoDB permissions

Permission name	Description
dynamodb:ListTables	Required to list table names associated with the current account.
dynamodb:DescribeStream	Required to return information about a stream, including the current status of the stream, its Amazon Resource Name (ARN), the composition of its shards, and its corresponding Amazon DynamoDB table.
dynamodb:ListStreams	Required to obtain the stream ARNs for the tables associated with the current account.
dynamodb:DescribeTable	Required to obtain information about the current account's tables.
dynamodb:GetRecords	Required to return records from a shard.
dynamodb:Scan	Required to perform a scan on tables using Full Table Replication.
dynamodb:GetShardIterator	Required to perform Log-based Incremental Replication.

To create the IAM policy:

In AWS, navigate to the IAM service by clicking the Services menu and typing IAM.
Click IAM once it displays in the results.
On the IAM home page, click Policies in the menu on the left side of the page.
Click Create Policy.
In the Create Policy page, click the JSON tab.
Select everything currently in the text field and delete it.
In the text field, paste the IAM policy from the Configure Your DynamoDB Integration page in Stitch.
Click Review policy.
On the Review Policy page, give the policy a name. For example: stitch_amazon_dynamodb
Click Create policy.

Step 5.2: Create an IAM role for Stitch

Required permissions

To complete this step, you need the following AWS IAM permissions: CreateRole and AttachRolePolicy. Refer to Amazon’s documentation for more info.

Roles can’t be used for multiple integrations

If you’re creating multiple Amazon DynamoDB integrations, you’ll need to complete this step for each integration you’re connecting.

The Role Name Stitch uses to connect to the Amazon resource is unique to the integration. Attempting to re-use a role for multiple integrations will cause connection errors.

In this step, you’ll create an IAM role for Stitch and apply the IAM policy from the previous step. This will ensure that Stitch is visible in any logs and audits.

To create the role, you’ll need the Account ID, External ID, and Role Name values provided on the Stitch Configure Your DynamoDB Integration page.

In AWS, navigate to the IAM Roles page.
Click Create Role.
On the Create Role page:
1. In the Select type of trusted entity section, click the Another AWS account option.
2. In the Account ID field, paste the Account ID from Stitch. Note: This isn’t your AWS account ID from Step 1 - this is the Account ID that displays in Stitch on the Configure Your DynamoDB Integration page.
3. In the Options section, check the Require external ID box.
4. In the External ID field that displays, paste the External ID from the Stitch Configure Your DynamoDB Integration page:
5. Click Next: Permissions.
On the Attach permissions page:
1. Search for the policy you created in the previous step.
2. Once located, check the box next to it in the table.
3. Click Next: Tags.
If you want to enter any tags, do so on the Add tags page. Otherwise, click Next: Review.
On the Review page:
1. In the Role name field, paste the Role Name from the Stitch Configure Your DynamoDB Integration page:
  
  Remember: Role names are unique to the Stitch Amazon DynamoDB integration they’re created for. Attempting to use the same role for multiple integrations will cause connection errors.
2. Enter a description in the Role description field. For example: Stitch role for Amazon DynamoDB integration.
3. Click Create role.

Step 5.3: Check and save the connection in Stitch

Note: Saving the integration before you’ve completed the IAM policy and role steps will result in connection errors.

After you’ve created the IAM policy and role, you can save the integration in Stitch. When finished, click Check and Save.

Step 6: Select data to replicate

Is an object missing or not replicating? Verify that the object meets the requirements for selection and replication.

The last step is to select the tables and columns you want to replicate.

Note: If a replication job is currently in progress, new selections won’t be used until the next job starts.

For Amazon DynamoDB integrations, you can select:

Individual tables and columns
Database views

Click the tabs to view instructions for each selection method.

In the Integration Details page, click the Tables to Replicate tab.
Locate a table you want to replicate.
Click the checkbox next to the table’s name. A blue checkmark means the table is set to replicate.
On the page that displays, click the Table Settings button.
In the Table Settings page:
1. Define the table’s Replication Method.
2. Optional: Select or exclude fields by entering a projection expression in the Fields to Replicate section. Refer to the Selecting DynamoDB Fields Using Projection Expression guide for instructions and examples.
3. When finished, click Update Settings.
Repeat this process for every table you want to replicate.
Click the Finalize Your Selections button at the bottom of the page to save your data selections.

Setting a database view to replicate is similar to selecting a table, with a few differences. Refer to the Replicating Database Views guide for detailed instructions.

At a high level, you’ll need to complete the following to select a database view:

Initial and historical replication jobs

After you finish setting up Amazon DynamoDB, its Sync Status may show as Pending on either the Stitch Dashboard or in the Integration Details page.

For a new integration, a Pending status indicates that Stitch is in the process of scheduling the initial replication job for the integration. This may take some time to complete.

Initial replication jobs with Anchor Scheduling

If using Anchor Scheduling, an initial replication job may not kick off immediately. This depends on the selected Replication Frequency and Anchor Time. Refer to the Anchor Scheduling documentation for more information.

Free historical data loads

The first seven days of replication, beginning when data is first replicated, are free. Rows replicated from the new integration during this time won’t count towards your quota. Stitch offers this as a way of testing new integrations, measuring usage, and ensuring historical data volumes don’t quickly consume your quota.

Replication will continue after the seven days are over. If you’re no longer interested in this source, be sure to pause or delete the integration to prevent unwanted usage.

Amazon DynamoDB replication

Details about Log-based Incremental Replication via Amazon DynamoDB streams
Details about Full Table Replication using scans and eventually consistent reads
Details about expected delays in Amazon DynamoDB replication

Log-based Incremental Replication

Stitch’s Amazon DynamoDB integration uses Amazon DynamoDB Streams to perform Log-based Incremental Replication. To use Log-based Incremental Replication, streams must be enabled on every table in Amazon DynamoDB you want to replicate using this Replication Method.

Refer to Amazon’s documentation for instructions on enabling streams for Amazon DynamoDB tables.

Note: The Manage Stream option must be one of the following, or replication will be unsuccessful:

New Image
New and old images

Note: DynamoDB streams are purged after 24 hours. To ensure you don’t lose data, set the integration’s Replication Frequency to an interval less than 24 hours. For example: 12 hours. If Stitch identifies a stream that has aged out, Stitch will automatically reset the table and queue a full re-replication.

Full Table Replication

To perform Full Table Replications with Stitch’s Amazon DynamoDB integration, Stitch uses scans to return data. A scan returns data by accessing all items within a table. As queries require you to specify the hash key (Primary Key), Stitch uses scans to simplify setup and replication. For more information about scans, click here.

Additionally, Stitch’s Amazon DynamoDB integration only uses eventually consistent reads from your selected Amazon DynamoDB tables. Note: This means that you will not see all of your recent data right away due to a delay from Amazon, but it will eventually catch up and return the latest records. For more information on Amazon DynamoDB read consistency, refer to Amazon’s documentation.

Replication delays

Stitch can’t replicate data from your Amazon DynamoDB database until the shard is closed in your account. This can result in a delay in the replicaton of new data, as the new data is available only after the shard has been closed. Forcing an extraction in Stitch won’t have any effect on replicating new data unless the shard is closed.

Questions? Feedback?

Did this article help? If you have questions or feedback, feel free to submit a pull request with your suggestions, open an issue on GitHub, or reach out to us.

Related	Troubleshooting
Destination & Integration Compatibility Replication Scheduling Replication Methods Replication Keys	Database Connection Errors Understanding & Reducing Your Usage Re-Authorizing Integrations Replication Issues