Google BigQuery pricing is based on usage instead of fixed pricing, unlike traditional relational databases and other cloud solutions. Because of this, it can be difficult to estimate the cost of using a BigQuery destination with Stitch.
Stitch employs a number of different operations across both Google Cloud Storage (GCS) and BigQuery as part of the replication process, which can impact your costs in several areas of Google Cloud Platform (GCP). In this article, we’ll cover:
Google Cloud Storage costs
Before your data is loaded into BigQuery, Stitch’s replication engine will replicate, process, and prepare data from your various integrations and temporarily move it into a Google Cloud Storage (GCS) bucket. There isn’t a cost for moving data into a GCS bucket, but there may be operational costs for handling data placed there.
This is dependent on the version of Stitch’s BigQuery destination you’re using:
v1 (Append-Only) | v2 (Upsert and Append-Only) |
Minimal cost to you The GCS bucket used to stage data for loading is automatically created by Stitch, but owned by you. As a result, there will be minimal costs for the Class A and B API calls Stitch makes during the replication process. Files placed in GCS are deleted immediately after data is loaded into BigQuery, so storage costs for the GCS bucket should be minimal. |
No cost to you Stitch owns and manages the GCS bucket used to stage data for loading to BigQuery. As a result, there are no GCS costs for using this version of Stitch’s BigQuery destination. |
Refer to Google’s documentation for more info on Google’s Cloud Storage pricing model.
BigQuery costs
BigQuery pricing has two categories:
-
Storage pricing, which is based on the amount of data stored in BigQuery
-
Query pricing, which is based on the amount of data processed by each executed query
Refer to Google’s BigQuery pricing documentation for more info.
Note: The following sections only cover the specific ways Stitch can impact BigQuery costs.
Storage pricing
The cost of storing your data in BigQuery is dependent on how much data is replicated into the destination.
BigQuery storage costs are impacted by the amount of data you replicated and the loading behavior used by the destination.
v1 (Append-Only) | v2 (Upsert and Append-Only) |
Loading behavior is predefined; all data is loaded in an Append-Only manner. Existing rows aren’t updated. Multiple versions of a row can exist in a table, creating a log of how a row changed over time. Tables size will increase after every replication job, and increase storage costs as a result. |
Loading behavior is defined by you during destination setup:
|
Query pricing
During loading, Stitch will run queries to move your data from the GCS bucket to BigQuery.
BigQuery query costs are impacted by the loading behavior used by the destination.
v1 (Append-Only) | v2 (Upsert and Append-Only) |
No cost to you The method Stitch uses to load data in an Append-Only manner is considered free under BigQuery’s pricing model. |
Costs are subject to BigQuery Data Manipulation Language (DML) pricing Google charges for queries based on the number of bytes processed, or bytes read. Additionally, Google imposes a minimum processing requirement of 10MB per table referenced by the query, and with a minimum 10MB data processed per query. This is applicable to both Upsert and Append-Only loading behavior. Note: As of November 18, 2024, this information is correct. Refer to Google’s pricing documentation for the most up-to-date information, and let us know if something is outdated. |
Refer to Google’s documentation for more info on BigQuery query pricing.
Cost considerations and management tools
When selecting the loading behavior for your BigQuery destination, we recommend keeping your BigQuery budget in mind. Understanding how Stitch can impact your costs will allow you to make an informed decision.
The loading behavior you select is dependent on both your data needs and BigQuery budget. While we can’t recommend one behavior type over another, we do recommend using Append-Only loading if BigQuery costs will be a concern. This recommendation is based on how Google bills for the queries Stitch uses to load using Upsert.
Additionally, Google provides tools in the GCP console that allow you to monitor and manage your costs:
Related | Troubleshooting |
Questions? Feedback?
Did this article help? If you have questions or feedback, feel free to submit a pull request with your suggestions, open an issue on GitHub, or reach out to us.