To ensure compatibility and that the fields Stitch requires for replication are included in selected streams, Stitch enforces field selection and compatibility rules. Learn about the metadata types that control field inclusion in the Connect API.
Field types
Stitch requires two types of fields for stream replication: Primary Keys and, when applicable, Replication Keys.
Primary Key fields
To accurately replicate data for a stream, Stitch requires the Primary Key information for each stream. A Primary Key is a column or set of columns that uniquely define a record.
Depending on the source and stream type, this is handled one of several ways.
Database sources
For database sources, Stitch will typically query the database’s information schema to determine the Primary Key fields and then store the list of Primary Key field names as a list in the stream’s metadata table-key-properties
property:
{
"selected": null,
"stream_id": 2289176,
"tap_stream_id": "demni2mf59dt10-heroku-orders",
"stream_name": "orders",
"metadata": {
"database-name": "demni2mf59dt10",
"selected": null,
"replication-method": null,
"is-view": false,
"row-count": 447,
"schema-name": "heroku",
"table-key-properties": [
"id"
]
}
}
Database views
For database views, the stream’s metadata will contain an is-view
property with a value of true
:
{
"selected": true,
"stream_id": 2375830,
"tap_stream_id": "demni2mf59dt10-public-customer_view",
"stream_name": "customer_view",
"metadata": {
"database-name": "demni2mf59dt10",
"selected": true,
"is-view": true,
"replication-key": "updated_at",
"replication-method": "updated_at",
"row-count": 56,
"schema-name": "public",
"table-key-properties": [],
"view-key-properties": [
"id"
]
}
}
Primary Key information must be provided in the view-key-properties
metadata property when the stream is selected for replication.
SaaS sources
For SaaS sources, Primary Keys are typically hard-coded in the Singer tap backing the source. The list of Primary Key field names will be stored as a list in the stream’s metadata table-key-properties
property:
{
"selected": null,
"stream_id": 2288758,
"tap_stream_id": "custom_collections",
"stream_name": "custom_collections",
"metadata": {
"forced-replication-method": "INCREMENTAL",
"selected": null,
"table-key-properties": [
"id"
],
"valid-replication-keys": [
"updated_at"
]
}
}
Replication Key fields
If a stream’s replication-method
is INCREMENTAL
, an appropriate field must be set as the stream’s Replication Key. Replication Keys are columns used to identify new and updated data for replication. These are typically integer, datetime, or timestamp columns and are required to use Key-based Incremental Replication.
Like Primary Keys, this is handled in one of several ways depending on the source type.
Database sources
For database sources, a valid Replication Key must be provided using the replication-key
metadata property when the stream is selected.
{
"selected": null,
"stream_id": 2289176,
"tap_stream_id": "demni2mf59dt10-heroku-orders",
"stream_name": "orders",
"metadata": {
"database-name": "demni2mf59dt10",
"selected": null,
"replication-method": null,
"is-view": false,
"row-count": 447,
"schema-name": "heroku",
"table-key-properties": [
"id"
]
}
}
Note: This is also applicable to database views if the stream’s replication-method
is set to INCREMENTAL
.
SaaS sources
For SaaS sources, Replication Keys are hard-coded in the Singer tap backing the source. The list of Replication Key field names will be stored as a list in the stream’s metadata valid-replication-keys
property:
{
"selected": null,
"stream_id": 2288758,
"tap_stream_id": "custom_collections",
"stream_name": "custom_collections",
"metadata": {
"forced-replication-method": "INCREMENTAL",
"selected": null,
"table-key-properties": [
"id"
],
"valid-replication-keys": [
"updated_at"
]
}
}
Note: When selecting fields in SaaS streams with a valid-replication-keys
property, you must explicitly set the stream’s replication-key
to a field in the valid-replication-keys
property. Selecting this field for replication won’t automatically set the field as the stream’s Replication Key.
Field selection rules
Stitch requires Primary Key and Replication Key fields in streams to be selected in order to successfully and accurately replicate data.
To ensure the required fields are included in a stream’s field inclusion list, Stitch enforces field selection rules.
Metadata in field selection
Field selection rules are shaped by three metadata
fields in a Field-level Metadata object:
inclusion STRING READ-ONLY |
Indicates when a field will be included. Possible values are:
|
selected-by-default BOOLEAN READ-ONLY |
Indicates if a field will be selected by default. Possible values are:
|
selected BOOLEAN |
Indicates whether a field should be selected. Possible values are:
|
Field selection metadata combinations
Below are the possible combinations of metadata
field values and whether a field will be selected with the listed settings.
Note: A *
in the table indicates any possible value (null
, true
, or false
) for the metadata
field.
inclusion | selected | selected-by-default | replicated? |
automatic | * | * | |
unsupported | * | * | |
available | true | null | |
available | true | true | |
available | true | false | |
available | false | null | |
available | false | true | |
available | false | false | |
available | null | true | |
available | null | false | |
available | null | null |
Field compatibility rules
While all fields are subject to field selection rules, some fields are also subject to field compatibility rules. This means that certain combinations of fields are not able to be selected together in a single stream.
These restrictions primarily affect SaaS sources like Microsoft Advertising (formerly Bing Ads), Google Analytics, or Google AdWords, and are set by the source.
Field exclusion metadata
If a field is subject to compatibility rules, its Field-level Metadata object will contain a fieldExclusion
property. This property contains a list of arrays that correspond to the breadcrumb
of an incompatible field.
For example: Below is the field-level metadata for the DeviceOS
field in the Microsoft Advertising (formerly Bing Ads) ad_group_performance_report
stream:
{
"metadata": {
"fieldExclusions": [
[
"properties",
"ExactMatchImpressionSharePercent"
],
[
"properties",
"ImpressionLostToAdRelevancePercent"
],
[
"properties",
"ImpressionLostToBidPercent"
],
[
"properties",
"ImpressionLostToBudgetPercent"
],
[
"properties",
"ImpressionLostToExpectedCtrPercent"
],
[
"properties",
"ImpressionLostToRankPercent"
],
[
"properties",
"ImpressionSharePercent"
]
],
"inclusion": "available"
}
}
This indicates that when the DeviceOS
field is selected, the fields listed in the fieldExclusions
property cannot also be selected.
Google Analytics field compatibility
Google Analytics sources are the exception to the previous section. Fields in this source are still subject to compatibility rules, but field-level metadata won’t contain a fieldExclusion
property.
To determine what fields are compatible, we recommend using Google’s Dimensions and Metrics Explorer before sending field selection requests to the API.
Field exclusion violations
The Connect API may allow you to select fields that violate field exclusion/compatibility rules, but doing so will likely result in extraction job failures.
To avoid this scenario, Stitch recommends considering fieldExclusions
, if available, when building your own application. For Google Analytics sources, we recommend using Google’s Dimensions and Metrics Explorer to determine field compatibility.