Data integration is the process of combining data from multiple, disparate sources into a data warehouse destination. It's a key part of the process of turning raw data into insights that drive better, faster decision-making.
Data integration is a broad topic, with an abundance of related concepts, so it's easy to get confused. Let's begin by defining some terms.
Data integration | Data integration is the process of moving different sets of data from databases, SaaS platforms, and other sources into a single centralized repository.
Businesses choose to integrate data for a variety of reasons, including reducing data silos, updating legacy systems, and producing more thorough business intelligence (BI). |
Data replication | Data replication puts copies of data in different locations, generally to improve data availability or system resilience. In a data analytics environment, organizations replicate data from transactional databases to analytical repositories to facilitate BI reporting. |
Data migration | Data migration is a one-time event in which data is moved from one platform to another, and the destination becomes the new source of the data. |
Data ingestion | Data ingestion describes the process of a database accepting data from another source. Before data can be used for BI, it must be ingested. |
Extraction | Data extraction refers to the process of targeting and retrieving data from a source in order to begin moving it to a new destination — often one designed to support online analytical processing (OLAP). |
Transformation | Data transformation is the process of changing data as or after it is moved from a source to a destination.
Businesses may need to transform data when different database systems store data in different formats, or when data must be changed from a source format into a different format at a destination. |
Loading | Loading refers to the process of delivering data into a destination database after the data has been extracted from a source. Data may be transformed prior to loading (ETL) or post loading (ELT). |
ETL | ETL — extract, transform, load — refers to the process of extracting, transforming, and loading data into a new destination. |
ELT | ELT — extract, load, transform — is similar to ETL, but the data is transformed after it has been delivered to its destination, which is usually a cloud data warehouse. |
Try Stitch for free for 14 days
In the ETL process, an ETL tool extracts data from a source, loads it into physical or virtual storage, and transforms it as necessary for analysis. The sequence of these steps may vary, depending on the destination: on-premises or in the cloud.
ETL — extract, transform, load — is effective for integrating data to on-premises data centers. On-premises hardware has a limited ability to scale when processing demands spike, so it makes sense to do as much processing as possible in the data pipeline. Also, security concerns or regulatory requirements may make it impossible to store raw data in a destination, so the tool must transform the data before loading it to the destination.
The downside to ETL is that the transformation process can be resource-intensive, slowing the data integration process.
When the destination is a cloud-native data warehouse like Amazon Redshift, Google BigQuery, Snowflake, or Microsoft Azure SQL Data Warehouse, ELT is a better approach. Organizations can transform their raw data at any time, when and as necessary for their use case, and not as a step in the data pipeline.
Learn more about the next generation of ETL
Data integration was a $6.56 billion global industry in 2017, and that figure could reach $18.45 billion by 2025, according to the report "Global Data Integration Market – Size, Outlook, Trends and Forecasts (2019 – 2025)" by Envision Intelligence, a market research firm.
The data integration tools and technology are designed to turn raw data into insights. Business reports and dashboards tell decision-makers what's happening in their business, and also may shed light on why it's happening.
Data is a valuable enterprise resource, but its usefulness is limited unless the data is current, accurate, and available to everyone in the organization who would benefit from analyzing it.
Enterprise data often is siloed in organizational departments: financial data stays in the finance department, and marketing data stays in the marketing department. Data integration busts information silos by making data available to more decision-makers, who can use BI tools to get insight into improving operations, improving products, saving money, and discovering hidden opportunities. By assembling and analyzing data from multiple sources through integration, business leaders can discover trends and patterns that might have been impossible to find before.
Data on sales, expenses, and other business activities is in constant motion. Literally every second, transactional systems generate and store data, and organizations add new data sources all the time. Integration brings in fresh data for analysis as it's needed.
Integrate data from more than 90 sources
As you think about the data integration process, begin with a checklist to ensure that you don’t overlook any business or technical considerations. Here is an example of a basic checklist:
This checklist is not exhaustive, and every organization will have unique things to consider. But using a similar checklist to begin the data integration process will help to get stakeholders on the same page, and may spur conversations and generate useful ideas.
Building your own data integration solution may seem tempting if you're facing a smaller project with few data sources. In fact, however, creating an ETL platform from scratch requires no small amount of software development expertise, and adds another in-house application that needs ongoing maintenance.
It doesn’t have to be this way. Stitch is a cloud ETL service that provides connectors from more than 90 sources to the most popular data warehouse destinations. We make it easy to extract data from more than 90 sources and load it to an EDW in a flash. Our approach is simple, straightforward, and ready to go right out of the box. Give Stitch a try, on us — set up a free trial in minutes and get your data into an EDW today.
Stitch streams all of your data directly to your analytics warehouse.
Set up in minutesUnlimited data volume during trial