While there's no single definition of the term "big data," most definitions include a large dataset — terabytes, petabytes, exabytes, or even zettabytes — with at least several thousand discrete components. An organization can mine and analyze the big data to discover patterns or anomalies that lead to insights on which they can base decisions.
Many people don't realize, however, that handling big data isn't just an issue for search engines, media companies, and e-commerce. Nearly every industry generates and collects big data. They analyze it and use it as the basis for business decisions that can improve operations, customer satisfaction, and productivity.
In 2001, META Group — now Gartner Analytics — analyst Doug Laney formalized the concept of big data in a report that predicted "leading enterprises will increasingly use a centralized data warehouse to define a common business vocabulary that improves internal and external collaboration."
Laney's formulation credited big data with three "V's": velocity, which refers to the speed at which data is processed; volume, which refers to the amount of data in a dataset; and variety, which refers to different types of data in a dataset. Since then, pundits have proposed additional big data V's, include veracity, which refers to the accuracy of a dataset, and value, meaning the ability of a dataset to fulfill a given goal.
In the years since Laney's report corporate IT infrastructures grew, due in part to the widespread adoption of the internet for e-commerce and social media. Organizations generated and processed large volumes of data as part of their ordinary operations, and many businesses realized they could use the data to better understand their own operations and their customers' needs.
Soon, specialized tools for storing and working with big data, such as Hadoop and Spark, appeared, as did new approaches to storing data, such as NoSQL databases and in-memory databases. Today, we see the migration of big data workflows to the cloud, where it's easy and cost-efficient to scale tools and processes as big data gets bigger by the day.
Try Stitch for free for 14 days
Big data can impact nearly every imaginable business goal. Here are a few specific ways organizations use big data today:
Banking: Analysis of big data helps banks fight fraud by detecting unusual account or payment activity.
Government: Government agencies examine big data to discover patterns. For instance, the IRS uses big data to uncover tax underpayments, while the City of Boston combats potholes.
Health care: An analysis of big data can help doctors and researchers interpret the results of medical interventions or experiments, and help predict patients' risks for certain types of diseases.
Insurance: The price of automobile insurance usually is based upon factors such as the driver's age, location, credit score, claims history, and type of vehicle. But insurers that offer usage-based insurance (UBI) policies can use telematics to access a digital history of a vehicle — including automobile diagnostics and crash avoidance systems — and capture actual driving data via onboard sensors, cameras, and built-in tracking devices. A connected car provides streams of disparate data, including velocity, turns, braking, weather, and road conditions, along with distracted driving information. The big data generated by telematics enables insurers to integrate into business operations data that reflects actual driving behavior. In the future, analysts predict self-driving cars will generate massive amounts of data; according to a Barclay's analyst, a single self-driving car could generate 100GB of data every second.
Law enforcement: Police departments use real-time data and software that integrates, analyzes, and shares otherwise hidden clues from myriad law enforcement data sources in order to anticipate and possibly prevent crimes.
Legislation: Instead of writing laws based on ideas of how people _should _behave, lawmakers and lawyers can analyze big data to help assess how people _actually _behave. For example, datasets from court records can help reveal which aspects of a given law are most frequently broken, or how difficult a certain law is for ordinary citizens to understand. In these ways, big data can facilitate the creation of laws that are more effective and easier to enforce.
Manufacturing: Recent studies indicate that, for many manufacturers, unplanned factory downtime can cost a company as much as $260,000 an hour, so predictive maintenance is critical. IoT-based predictive maintenance employs machine learning algorithms to forecast potential risks and predict when equipment is likely to fail.
Retail: When feedback is anecdotal, it's difficult to extrapolate a single customer's views across all existing or potential customers. But, with the help of big data, companies analyze customer experience systematically by collecting survey responses from thousands of customers, and identify trends within the responses.
Big data is powerful, but collecting it, storing it, and leveraging it can be difficult for organizations. They face challenges with big data in several areas:
Integrating disparate data: Integrating data from different sources can be challenging, even when an organization uses an ETL tool and a cloud data warehouse as a base for analysis. Businesses must adopt a data strategy that addresses the integration and consolidation of disparate data sources.
Ensuring data quality: Large datasets tend to have issues with duplicate, missing, and inaccurate data, which makes it difficult to derive accurate insights. Data professionals can use data quality tools, including data deduplication tools to address duplicates in the data source, and they may use automated tools to minimize the risk of human error when moving data between systems.
ETL data from 100+ sources to your data warehouse
Using data analytics and business intelligence (BI) tools with big data has the potential to improve customer experience, increase retention and sales, and optimize back-end processes for managing inventory and labor. Organizations need three basic components to make the most of big data:
Stitch can help with the data integration component. It's an easy-to-use ETL tool for replicating data from more than 100 databases and SaaS platforms to cloud data warehouses, centralized and ready for analytics and BI solutions. Take advantage of big data analytics with Stitch today.
Stitch streams all of your data directly to your analytics warehouse.
Set up in minutesUnlimited data volume during trial