Stitch’s Import API allows you to push arbitrary data from a source to your Stitch account.
In this guide, we’ll cover what you need to know about structuring and typing the data you send to the Import API:
Endpoints in this guide
The Import API has two endpoints that accept and persist data to your Stitch destination:
Batch | Push | |
Resource URL | /v2/import/batch | /v2/import/push |
Accepts multiple records |
Supported Requests may contain multiple records, as long as all records have the same structure and are destined for the same table. |
Supported Requests may contain multiple records, whether destined for the same table or different tables. |
Accepts multiple tables |
Unsupported Requests may only contain data for a single table at a time. Data for multiple tables must be sent via a request for each table. |
Supported Requests may contain data for multiple tables. |
Enforces data types |
Supported Enforced using JSON schema. Records that violate the schema will result in a |
Unsupported Data types are assigned according to the value’s JSON data type. No strict enforcement occurs. Refer to the Push endpoint data typing section for more info. |
Requires Primary Keys |
Optional Defined using the Note: Tables without Primary Keys will be loaded using Append-Only loading. |
Required Defined using the |
Note: We recommend using the Batch endpoint for sending data to the Import API. The Push endpoint is mentioned only as a comparison to the Batch endpoint and as a reference for existing Import API scripts.
Structuring guidelines
General guidelines
When developing your Import API script, you should keep these general guidelines in mind:
-
Field names shouldn’t include reserved words. This includes the keywords reserved by Stitch and by your destination. For example: Fields shouldn’t contain
_sdc
, a Stitch system prefix. -
Fields should contain one data type per field. This affects not only how data is typed in your destination, but the resulting structure of destination tables. Refer to the Data typing section for more info.
Guidelines for request bodies
We recommend using the Batch endpoint to send data to the Import API. As such, this section only contains the request body requirements for the Batch endpoint.
Request bodies sent to the Batch endpoint must be valid JSON and adhere to the following:
-
Must contain data only for a single table.
-
Must contain all required arguments.
-
Cannot exceed 20MB in size.
-
Each
data
object in the request body must contain between 1 and 20,000 records. -
Each
data
object in the request body cannot exceed 10,000 individual data points.
Defining tables and Primary Keys
Tables
Tables are dynamically generated based on the table_name
specified in Import API requests. All tables pushed using the same API access token will be created in the same schema in your destination. You can find the name of the schema for your Import API integration by logging into Stitch.
Generally, we recommend creating one table for each type of record you want to push to the Import API. For example: If you have customer and product data, you should create two tables - one for customers
and one for products
.
Every record pushed to a table should have the same structure. For example: If a customers
table contains customer_id
, name
, and email
fields, every customer record pushed into this table should contain those fields.
Note: The Import API doesn’t support methods for specifically creating or deleting a table. If you need to delete a table, you should drop it in your destination and prevent any new data for the table from being pushed to the Import API. Any data accepted by Stitch will still be processed, even if the destination table has been dropped.
Primary Keys
While Primary Keys are optional when using the Batch endpoint, they will determine how Stitch loads data for the table:
-
For tables with Primary Keys, Stitch will use Primary Key columns to de-dupe data during the Loading phase of the replication process. This ensures that only the most recent version of a record is loaded into the destination.
-
If a table doesn’t have a Primary Key, or if the destination only supports Append-Only loading, records will be appended to the end of the table as new rows. Existing rows will not be updated. Refer to the Understanding loading behavior guide for more info and examples.
A table’s Primary Keys are defined using the key_names
property in the Batch endpoint. For example:
{
"key_names":[
"id"
],
"table_name":"customers",
"schema":{
"properties":{
"id":{
"type":"integer"
},
"name":{
"type":"string"
},
"age":{
"type":"integer"
},
"has_magic":{
"type":"boolean",
},
"modified_at":{
"type":"string",
"format":"date-time"
}
}
},
"messages":[
{
"action":"upsert",
"sequence":1565880017,
"data":{
"id":1,
"name":"Finn",
"age":15,
"has_magic":false,
"modified_at":"2018-04-30T17:00:00Z"
}
}
]
}
If you choose to define Primary Keys, keep the following in mind:
- Every record in a table must have a Primary Key.
- Primary Key columns should only contain a single data type.
- Primary Keys cannot be null.
-
Primary Key values must be unique. For composite keys, the value of all combined values must be unique across all records in the table.
For example: Let’s assume that
event_id
,app_id
, andcreated_at
are the Primary Keys for the table containing these records:Example of unique composite Primary Keys[ { "event_id":1, "app_id":1, "created_at":"2019-08-20T00:00:00+00:00" }, { "event_id":2, "app_id":1, "created_at":"2019-08-20T00:00:00+00:00" } ]
While
app_id
andcreated_at
have two identical values between these records, theevent_id
makes the records unique. -
Every column in the
key_names
property must be present in both the request’s Schema object and in every record for the table. For example:Example of defined composite Primary Keys{ "key_names":[ "id", "created_at" ] }
In this case, the Schema object must contain
id
andcreated_at
properties. Every record must contain also contain these properties or the Import API will return the following error:Missing key property error{ "error":"Record is missing key property <KEY_NAME>" }
Data typing
How data is typed depends on what endpoint you’re using to push data to the Import API:
Note: We recommend using the Batch endpoint for sending data to the Import API. The Push endpoint is mentioned only as a comparison to the Batch endpoint and as a reference for existing Import API scripts.
Batch (/v2/import/batch) endpoint data typing
When using the Batch endpoint, Stitch will assign data types based on the JSON schema in the Schema object in the request.
For example: This is the schema for a table named customers
:
{
"schema":{
"properties":{
"id":{
"type":"integer"
},
"name":{
"type":"string"
},
"age":{
"type":"number"
},
"has_magic":{
"type":"boolean"
},
"modified_at":{
"type":"string",
"format":"date-time"
}
}
}
}
A record sent to the Import API for the customers
table could look like this:
{
"action":"upsert",
"sequence":1565880017,
"data":{
"id":1,
"name":"Finn",
"age":15,
"has_magic":false,
"modified_at":"2018-04-30T17:00:00Z"
}
}
This data point would create a table similar to the following, depending on the data types used by your destination:
id (integer) | name (string) | age (integer) | has_magic (boolean) | modified_at (timestamp) |
---|---|---|---|---|
1 | Finn | 15 | false | 2018-04-30T17:00:00Z |
Records sent to the Import API must adhere to the JSON schema for the table that contains them, or the API will return a 400
response and an error similar to the following:
{
"error": "Record 0 did not conform to schema: #/<FIELD_NAME>: expected: <DATA_TYPE>, found: <DATA_TYPE>"
}
Refer to the Errors in the Batch endpoint documentation for a list of errors and their causes.
JSON schemas in the Batch endpoint
The schema specified in a request’s Schema object must be a valid JSON schema. The Batch endpoint uses jsonschema 2.6.0, a JSON Schema implementation for Python, to validate JSON schemas.
Walking through creating a JSON schema is outside the scope of this guide, but the official Understanding JSON Schema reference is a good resource for getting started. When you’re ready, you can use jsonschema.net to test and validate your own schemas.
Push (/v2/import/push) endpoint data typing
When using the Push endpoint, Stitch will type the data based on the value’s JSON data type. The Import API doesn’t infer data types on its own.
As JSON doesn’t explicitly enforce data types, all data typing needs to be handled withing your data source and Import API script.
For example:
{
"id":1,
"cost":3.14,
"tax":"1.00"
"modified_at":"2019-08-13T21:25:03+0000"
}
This data point would create a table similar to the following, depending on the data types used by your destination:
id (integer) | cost (double) | tax (string) | modified_at (string) |
---|---|---|---|
1 | 3.14 | 1.00 | 2019-08-13T21:25:03+0000 |
Consider the modified_at
field in the example. Even though this field contains an ISO 8601 formatted timestamp, the Import API won’t type this column as a timestamp
in the destination. This is because it’s being sent as a JSON string.
While JSON doesn’t allow for defining data types, you can use the Batch endpoint instead. This endpoint accepts a JSON schema and will enforce the data types it declares for each field.
Changed data type handling in the Push endpoint
The Push endpoint considers data types on a record-by-record basis. If a field’s data type changes from one record to the next, all data types received via the Push endpoint will be used when the data is loaded. In the destination, this will look like a field’s values have been “split” between columns.
For example: Consider the cost
values for each of the following records:
{
"id":1,
"cost":3.14, // number
"tax":"1.00"
"modified_at":"2019-08-13T21:25:03+0000"
},
{
"id":2,
"cost":10, // integer
"tax":"2.45"
"modified_at":"2019-08-13T21:34:14+0000"
},
{
"id":3,
"cost":5.61, // number
"tax":".55"
"modified_at":"2019-08-13T21:35:04+0000"
}
As a result of the cost
values changing between records, the destination table would look like this:
id (integer) | cost (double) | cost__it (integer) | tax (string) | modified_at (string) |
---|---|---|---|---|
1 | 3.14 | null | 1.00 | 2019-08-13T21:25:03+0000 |
2 | null | 10 | 2.45 | 2019-08-13T21:34:14+0000 |
3 | 5.61 | null | .55 | 2019-08-13T21:35:04+0000 |
To prevent this from occurring, each field should only ever contain a single data type. You can resolve column splits by:
- Using a view in your destination to coerce the data types
- Fixing the issue in the source, enforcing data typing, dropping the destination table, and re-pushing all historical data to Stitch. The table will be re-created with the correct data types.
Enforcing data types in the Push endpoint
The only way to enforce data types using the Push endpoint is to use Transit, as JSON on its own doesn’t allow for defining data types. You can use Transit libraries in your Import API script to specify data types for various fields.
Otherwise, we recommend using the Batch endpoint. This endpoint accepts a JSON schema and will enforce the data types it declares for each field.