This guide is applicable to:
- Destinations configured to use Append-Only Loading, or
- Tables configured to use Append-Only Loading, such as Google Ads’ Report tables
When data is loaded using Append-Only Loading, existing records aren’t updated, but instead appended to tables as new rows. This means that as time goes on, tables will contain different versions of the same record, reflecting how the record has changed over time.
While data stored this way can provide insights and historical details, sometimes you may just want the latest version of a record. In this guide, we’ll cover:
-
A querying strategy that retrieves the latest version of every record
-
How to simplify querying by creating a view in your destination
Before using this guide
Before using this guide, note that:
- You may need to modify the queries in this guide to use them yourself
-
Stitch Support’s expertise lies in replicating data, and as such does not provide data analysis or querying assistance. We can, however, help with data discrepancies.
If you’d like assistance with analysis or business intelligence solutions, we recommend reaching out to one of our analytics partners.
Using system columns to identify record versions
Every table created by Stitch contains columns prepended with _sdc
. These are system columns created and used by Stitch to load data into your destination.
For this guide, we’ll focus on just two columns:
Column name | Description |
_sdc_sequence |
A Unix epoch (in nanoseconds) that indicates the order in which data points were considered for loading. Stitch uses this column’s values in a few places to correctly order rows for loading, but it can be also used to retrieve the latest version of a record from an Append-Only table. This is the primary column our strategy will use. |
_sdc_batched_at |
Timestamp indicating when Stitch loaded the batch the record was a part of into the destination. Our strategy will use this column as a "tie breaker." |
Retrieving the latest version of every record
Let’s take a look at an example. Assume we have an orders
table that contains:
- A Primary Key of
id
, - The system
_sdc
columns added by Stitch, and - Other order attribute columns
Only using _sdc_sequence
If you wanted to create a snapshot of the latest version of this table, you could run a query like this using _sdc_sequence
:
SELECT DISTINCT orders.*
FROM [stitch-analytics-bigquery-123:ecommerce.orders] orders
INNER JOIN (
SELECT id,
MAX(_sdc_sequence) AS sequence
FROM [stitch-analytics-bigquery-123:ecommerce.orders]
GROUP BY id
) latest_orders
ON orders.id = latest_orders.id
AND orders._sdc_sequence = latest_orders.sequence
Here’s what’s happening in this query:
- The subquery retrieves a list of every record’s Primary Key and maximum
_sdc_sequence
value. - The outer query selects distinct versions of the latest version of every record.
- Lastly, the outer query joins the table to the list retrieved by the subquery, which makes all other columns available for querying.
Using _sdc_batched_at as a tie breaker
If only using _sdc_sequence
doesn’t yield the desired results, we recommend using _sdc_batched_at
as a “tie breaker”:
SELECT DISTINCT orders.*
FROM [stitch-analytics-bigquery-123:ecommerce.orders] orders
INNER JOIN (
SELECT id,
MAX(_sdc_sequence) AS sequence,
MAX(_sdc_batched_at) as batched_at
FROM [stitch-analytics-bigquery-123:ecommerce.orders]
GROUP BY id
) latest_orders
ON orders.id = latest_orders.id
AND orders._sdc_sequence = latest_orders.sequence
AND orders._sdc_batched_at = latest_orders.batched_at
The _sdc_batched_at
value indicates the time that Stitch loaded the batch containing the record into the destination. Selecting a record’s maximum _sdc_batched_at
and _sdc_sequence
values excludes versions of the record from older batches from the results.
Create views in your destination
To make this easier, you can turn queries like the one above into a view. We recommend this approach because a view will encapsulate all the logic and simplify the process of querying against the latest version of your data.
Refer to the documentation for your destination for more info on creating views:
Related | Troubleshooting |
Questions? Feedback?
Did this article help? If you have questions or feedback, feel free to submit a pull request with your suggestions, open an issue on GitHub, or reach out to us.