The biggest and most sophisticated software companies on the planet—famously Facebook and Twitter, but others such as Etsy and Spotify as well—have built their own data pipelines, but they've done so by dedicating dozens to hundreds of the best engineers in the world to the problem.
These companies built their own pipelines because when they were initially investing in this technology it was a competitive differentiator for them. There wasn't a software vendor offering an alternative. No one else had the engineering prowess to do what they did in 2009, 2010, and 2011, and this data sophistication is how they monetized their huge user bases.
Today, there's not a compelling reason to build your own pipeline. We believe that the vast majority of online businesses today are actually trying to solve the same data challenges, over and over again, and spending far too much valuable time and energy doing it. We believe that these companies would be far better served by buying a data pipeline rather than building their own from scratch.
Ten years ago, it wasn't uncommon for an online business to build their own CRM. This wasn't an unreasonable choice back then—enterprise solutions were a poor fit for many growing companies and Salesforce.com and others hadn't yet provided an effective alternative. But today, businesses don't just decide to build their own CRM. It's too much work, it's too expensive, and the alternatives are just too good. We believe this same transition is happening today for data pipelines.
When setting out to purchase a data pipeline, start by answering three important question about your data needs:
Every aspect of your organization now has data associated with it, and this data lives in different systems: customer data, transactional data, product usage, web clickstream, advertising, email marketing, CRM, accounting, operational data, and more. The breadth of the data you'll need to consolidate as well as the specific sources that are the highest priority will impact how you build your data pipeline.
One area where we see leaders get tripped up is assuming that they need to start with a big bang approach, waiting to roll out a data pipeline until it incorporates all important organizational data. We suggest, instead, an incremental approach: prioritize the data sources with the highest immediate return and get those up and running. Then, work to grow the universe of data you're consolidating one source at a time.
Using this approach requires that you build a comprehensive list of sources that you will eventually want to consolidate prior to making any purchase decisions. Even if a given source isn't a high priority today, you need to make sure that your platform of choice will support it when you get there.
In addition to being able to consolidate all the data that is relevant to your organization, there are several additional criteria that you should use when evaluating tools:
Once you've done all of your research, it's time to make a decision. This decision will impact your organization significantly. Here's our summary of the considerations for your build-vs-buy decision:
Consideration | Build | Buy |
---|---|---|
Technical Control | Higher | Lower |
Cost of Ownership | Higher | Lower |
Development Resources | Internal | External |
Time to Value | Slower | Faster |
Risk of Failure | Higher | Lower |
Analytical Functionality | Lower | Higher |
It's not a secret by this point that we strongly discourage you from building your own data pipeline. The cost benefit equation of making this technical decision simply doesn't add up today in the same way that it did even a few years ago. There are plenty of areas in your data strategy where you absolutely should roll your sleeves up and get technical; we strongly caution against doing that here. Building a data pipeline is hard work and takes you away from what should be your core focus: growing your business.