What is Data Integration? Examples and Use Cases
Data integration is the process of consolidating data from different sources. Data integration is often a prerequisite to other processes including analysis, reporting, and forecasting.
Data integration vs. application integration vs. ETL
Data integration is often confused with application integration and ETL/ELT. While they are closely related, there are important distinctions between the three terms.
Data integration is a process where data from many sources goes to a single centralized location, which is often a data warehouse. The end location needs to be flexible enough to handle lots of different kinds of data at potentially large volumes. Data integration is deal for powering analytical use cases.
Application integration involves moving data back and forth between individual applications to keep them in sync. Typically, each individual application has a particular way it emits and accepts data, and this data moves in smaller volumes. Application integration is ideal for powering operational use cases. One example is ensuring that a customer support system has the same customer records as the accounting system.
ETL stands for extract, transform, and load. This refers to the process of extracting data from source systems, transforming it into a different structure or format, and loading it into a destination. Data integration and application integration are two types of ETL.
Data integration example
Let's take the example of a company called See Food, Inc. (SFI). SFI's product is a mobile app where users can take pictures of different items and identify whether the item in the picture is, or is not, a hot dog. SFI uses a lot of tools to run its business:
- Facebook Ads and Google Ads in order to acquire new users
- Google Analytics to track events on its website and in its mobile app
- MySQL database to store user information and image metadata (e.g. hot dog or not hot dog)
- Marketo to send marketing email and nurture leads
- Zendesk to perform customer support
- Netsuite for accounting and financial tracking
Each of those applications has a silo of information about SFI's operations. For SFI to get a 360-degree view of the business, all of that data needs to be combined in one place. That process is data integration.
Data integration ROI
Getting a 360-degree view sounds nice, but before undertaking any data integration project, it's important to understand what the return on investment will be. Your use case will vary, but here's an example of the value data integration can bring.
Suppose SFI is considering increasing its advertising budget, but it's not sure if it should spend more on Facebook or Google. It could ask whether the cost of acquisition is lower on Facebook or Google, but that misses out on whether there are differences between the kinds of users they acquire on the two different channels. Some additional questions the company might want to ask are:
- Do users from Facebook post more photos of hot dogs?
- Do users from Google file more customer support tickets?
- Which users are more likely to refer friends?
Each of these questions can be combined and further segmented to individual campaigns and variations on ad creative. These questions can only be answered when the data is integrated.
Want to learn about setting the data strategy for your organization?
Sign up for a free 30-day course to learn how to succeed with data. We've helped more than 3,000 companies of all sizes build their data infrastructure, run analytics, and make data-driven decisions. Learn how the data landscape has changed and what that means for your company.
Get the Course →In-house data integration
Note — This section is for companies that are comfortable writing code and using the command line. If that's not for you, skip to the section on simple data integration with Stitch.
If you have software engineers on your team, you may want to initiate an in-house data integration project. Software engineers who specialize in building the systems that transmit data throughout a company are often called data engineers.
While you can start your data integration project from scratch, it's often helpful to leverage an open source project to save time. One option is the Singer open source ETL project. Singer leverages reusable components for pulling from data sources (taps) and sending to destinations (targets). That means if you need one additional source that hasn't been built before, you only need to build a single tap and it will automatically work with all of the other taps and targets.
Here's quick guide for how to pull data from GitHub. (You can access the code and more details here).
First, create a virtual environment and install the GitHub tap.
> virtualenv -p python3 venv
> source venv/bin/activate
> pip install tap-github
Then create a configuration file named config.json that contains your GitHub access token and the path to the repository from which you want data. It should looks something like this:
{"access_token": "your-access-token",
"repository": "singer-io/tap-github"}
Finally, run the application
tap-github --config config.json
Simple data integration with Stitch
Stitch is a cloud data integration service that connects to today’s most popular business tools — including Salesforce, Facebook Ads, and more than 100 others — and automatically replicates the raw data to a data warehouse. There's no code to write — with just a few clicks, Stitch will extract your data from wherever it lives and get it ready to be analyzed, understood, and acted upon. And Stitch automatically keeps your data up to date.
Stitch offers a free 14-day trial. Give it a try today!