Structuring Data for the Stitch Import API

Tip: Set up a test Import API integration! We recommend setting up a test Import API integration while developing your script. This ensures that you can try different structuring and typing approaches while figuring out what works for you.

Stitch’s Import API allows you to push arbitrary data from a source to your Stitch account.

In this guide, we’ll cover what you need to know about structuring and typing the data you send to the Import API:

The endpoints in this guide
Some general guidelines for structuring Import API data
How to define tables and Primary Keys
How data typing works in the Import API

Endpoints in this guide

The Import API has two endpoints that accept and persist data to your Stitch destination:

	Batch	Push
Resource URL	/v2/import/batch	/v2/import/push
Accepts multiple records	Supported Requests may contain multiple records, as long as all records have the same structure and are destined for the same table.	Supported Requests may contain multiple records, whether destined for the same table or different tables.
Accepts multiple tables	Unsupported Requests may only contain data for a single table at a time. Data for multiple tables must be sent via a request for each table.	Supported Requests may contain data for multiple tables.
Enforces data types	Supported Enforced using JSON schema. Records that violate the schema will result in a `400` response and an error. Refer to the Batch endpoint data typing section for more info.	Unsupported Data types are assigned according to the value’s JSON data type. No strict enforcement occurs. Refer to the Push endpoint data typing section for more info.
Requires Primary Keys	Optional Defined using the `key_names` property. If provided, each record for the specified table must have a unique, non-null Primary Key value. Note: Tables without Primary Keys will be loaded using Append-Only loading.	Required Defined using the `key_properties` property. Each record for the specified table must have a unique, non-null Primary Key value.

Note: We recommend using the Batch endpoint for sending data to the Import API. The Push endpoint is mentioned only as a comparison to the Batch endpoint and as a reference for existing Import API scripts.

General guidelines

When developing your Import API script, you should keep these general guidelines in mind:

Field names shouldn’t include reserved words. This includes the keywords reserved by Stitch and by your destination. For example: Fields shouldn’t contain _sdc, a Stitch system prefix.
Fields should contain one data type per field. This affects not only how data is typed in your destination, but the resulting structure of destination tables. Refer to the Data typing section for more info.

Guidelines for request bodies

We recommend using the Batch endpoint to send data to the Import API. As such, this section only contains the request body requirements for the Batch endpoint.

Request bodies sent to the Batch endpoint must be valid JSON and adhere to the following:

Must contain data only for a single table.
Must contain all required arguments.
Cannot exceed 20MB in size.
Each data object in the request body must contain between 1 and 20,000 records.
Each data object in the request body cannot exceed 10,000 individual data points.

Defining tables and Primary Keys

Tables
Primary Keys

Tables

Tables are dynamically generated based on the table_name specified in Import API requests. All tables pushed using the same API access token will be created in the same schema in your destination. You can find the name of the schema for your Import API integration by logging into Stitch.

Generally, we recommend creating one table for each type of record you want to push to the Import API. For example: If you have customer and product data, you should create two tables - one for customers and one for products.

Every record pushed to a table should have the same structure. For example: If a customers table contains customer_id, name, and email fields, every customer record pushed into this table should contain those fields.

Note: The Import API doesn’t support methods for specifically creating or deleting a table. If you need to delete a table, you should drop it in your destination and prevent any new data for the table from being pushed to the Import API. Any data accepted by Stitch will still be processed, even if the destination table has been dropped.

Primary Keys

While Primary Keys are optional when using the Batch endpoint, they will determine how Stitch loads data for the table:

For tables with Primary Keys, Stitch will use Primary Key columns to de-dupe data during the Loading phase of the replication process. This ensures that only the most recent version of a record is loaded into the destination.
If a table doesn’t have a Primary Key, or if the destination only supports Append-Only loading, records will be appended to the end of the table as new rows. Existing rows will not be updated. Refer to the Understanding loading behavior guide for more info and examples.

A table’s Primary Keys are defined using the key_names property in the Batch endpoint. For example:

{
  "key_names":[
      "id"
   ],
   "table_name":"customers",
   "schema":{
      "properties":{
         "id":{
            "type":"integer"
         },
         "name":{
            "type":"string"
         },
         "age":{
            "type":"integer"
         },
         "has_magic":{
            "type":"boolean",
         },
         "modified_at":{
            "type":"string",
            "format":"date-time"
         }
      }
   },
   "messages":[
      {
         "action":"upsert",
         "sequence":1565880017,
         "data":{
            "id":1,
            "name":"Finn",
            "age":15,
            "has_magic":false,
            "modified_at":"2018-04-30T17:00:00Z"
         }
      }
   ]
}

If you choose to define Primary Keys, keep the following in mind:

Every record in a table must have a Primary Key.
Primary Key columns should only contain a single data type.
Primary Keys cannot be null.
Primary Key values must be unique. For composite keys, the value of all combined values must be unique across all records in the table.

For example: Let’s assume that event_id, app_id, and created_at are the Primary Keys for the table containing these records:

Example of unique composite Primary Keys
```
[
 {
    "event_id":1,
    "app_id":1,
    "created_at":"2019-08-20T00:00:00+00:00"
 },
 {
    "event_id":2,
    "app_id":1,
    "created_at":"2019-08-20T00:00:00+00:00"
 }
]
```
While app_id and created_at have two identical values between these records, the event_id makes the records unique.
Every column in the key_names property must be present in both the request’s Schema object and in every record for the table. For example:

Example of defined composite Primary Keys
```
{
 "key_names":[
    "id",
    "created_at"
 ]
}
```
In this case, the Schema object must contain id and created_at properties. Every record must contain also contain these properties or the Import API will return the following error:

Missing key property error
```
{
   "error":"Record is missing key property <KEY_NAME>"
}
```

Data typing

How data is typed depends on what endpoint you’re using to push data to the Import API:

Batch (/v2/import/batch) endpoint
Push (/v2/import/push) endpoint

Batch (/v2/import/batch) endpoint data typing

When using the Batch endpoint, Stitch will assign data types based on the JSON schema in the Schema object in the request.

For example: This is the schema for a table named customers:

Example table schema

{
   "schema":{
      "properties":{
         "id":{
            "type":"integer"
         },
         "name":{
            "type":"string"
         },
         "age":{
            "type":"number"
         },
         "has_magic":{
            "type":"boolean"
         },
         "modified_at":{
            "type":"string",
            "format":"date-time"
         }
      }
   }
}

A record sent to the Import API for the customers table could look like this:

Example record sent to /v2/import/batch

{
   "action":"upsert",
   "sequence":1565880017,
   "data":{
      "id":1,
      "name":"Finn",
      "age":15,
      "has_magic":false,
      "modified_at":"2018-04-30T17:00:00Z"
   }
}

This data point would create a table similar to the following, depending on the data types used by your destination:

id (integer)	name (string)	age (integer)	has_magic (boolean)	modified_at (timestamp)
1	Finn	15	false	2018-04-30T17:00:00Z

Records sent to the Import API must adhere to the JSON schema for the table that contains them, or the API will return a 400 response and an error similar to the following:

Record did not conform to schema error

{
  "error": "Record 0 did not conform to schema: #/<FIELD_NAME>: expected: <DATA_TYPE>, found: <DATA_TYPE>"
}

Refer to the Errors in the Batch endpoint documentation for a list of errors and their causes.

JSON schemas in the Batch endpoint

The schema specified in a request’s Schema object must be a valid JSON schema. The Batch endpoint uses jsonschema 2.6.0, a JSON Schema implementation for Python, to validate JSON schemas.

Walking through creating a JSON schema is outside the scope of this guide, but the official Understanding JSON Schema reference is a good resource for getting started. When you’re ready, you can use jsonschema.net to test and validate your own schemas.

Push (/v2/import/push) endpoint data typing

Use the Batch endpoint for the best data typing experience. The Push endpoint doesn’t allow you to specify or enforce data types, which may lead to inconsistencies when data types change. The Batch endpoint allows this via a JSON schema you supply.

When using the Push endpoint, Stitch will type the data based on the value’s JSON data type. The Import API doesn’t infer data types on its own.

As JSON doesn’t explicitly enforce data types, all data typing needs to be handled withing your data source and Import API script.

For example:

Example record sent to /v2/import/push

{
   "id":1,
   "cost":3.14,
   "tax":"1.00"
   "modified_at":"2019-08-13T21:25:03+0000"
}

This data point would create a table similar to the following, depending on the data types used by your destination:

id (integer)	cost (double)	tax (string)	modified_at (string)
1	3.14	1.00	2019-08-13T21:25:03+0000

Consider the modified_at field in the example. Even though this field contains an ISO 8601 formatted timestamp, the Import API won’t type this column as a timestamp in the destination. This is because it’s being sent as a JSON string.

While JSON doesn’t allow for defining data types, you can use the Batch endpoint instead. This endpoint accepts a JSON schema and will enforce the data types it declares for each field.

Changed data type handling in the Push endpoint

The Push endpoint considers data types on a record-by-record basis. If a field’s data type changes from one record to the next, all data types received via the Push endpoint will be used when the data is loaded. In the destination, this will look like a field’s values have been “split” between columns.

For example: Consider the cost values for each of the following records:

{
   "id":1,
   "cost":3.14,         // number
   "tax":"1.00"
   "modified_at":"2019-08-13T21:25:03+0000"
},
{
   "id":2,
   "cost":10,           // integer
   "tax":"2.45"
   "modified_at":"2019-08-13T21:34:14+0000"
},
{
   "id":3,
   "cost":5.61,         // number
   "tax":".55"
   "modified_at":"2019-08-13T21:35:04+0000"
}

As a result of the cost values changing between records, the destination table would look like this:

id (integer)	cost (double)	cost__it (integer)	tax (string)	modified_at (string)
1	3.14	null	1.00	2019-08-13T21:25:03+0000
2	null	10	2.45	2019-08-13T21:34:14+0000
3	5.61	null	.55	2019-08-13T21:35:04+0000

To prevent this from occurring, each field should only ever contain a single data type. You can resolve column splits by:

Using a view in your destination to coerce the data types
Fixing the issue in the source, enforcing data typing, dropping the destination table, and re-pushing all historical data to Stitch. The table will be re-created with the correct data types.

Enforcing data types in the Push endpoint

The only way to enforce data types using the Push endpoint is to use Transit, as JSON on its own doesn’t allow for defining data types. You can use Transit libraries in your Import API script to specify data types for various fields.

Otherwise, we recommend using the Batch endpoint. This endpoint accepts a JSON schema and will enforce the data types it declares for each field.