Snowflake is a cloud-native platform that eliminates the need for separate data warehouses, data lakes, and data marts, allowing secure data sharing across the organization. Its platform sits on public clouds and allows organizations to easily unify and connect to a single copy of all their data.
In 2020, Snowflake unveiled the Snowflake Data Cloud as the next iteration of their journey to help organizations simplify and leverage their data management. It creates an ecosystem of businesses and organizations that can share and consume shared data and data services.
The Snowflake Data Cloud uses technology to solve common data challenges for businesses, such as access, availability, and performance. It serves to democratize data and break down data silos to improve business performance.
Snowflake is built on top of the Amazon Web Services, Microsoft Azure, and Google Cloud infrastructure. There's no hardware or software to select, install, configure, or manage, so it's ideal for organizations that don't want to dedicate resources for setup, maintenance, and support of in-house servers. And data can be moved easily into Snowflake using an ETL solution like Stitch.
But what sets Snowflake apart is its architecture and data sharing capabilities. The Snowflake architecture allows storage and compute to scale independently, so customers can use and pay for storage and computation separately. The sharing functionality makes it easy for organizations to quickly share governed and secure data in real time.
The Snowflake Data Cloud supports multiple data workloads, including data warehouses, data lakes, data engineering, data science, and data applications across cloud providers. Its architecture delivers real-time, near-unlimited storage and computing to concurrent users.
Try Stitch for Snowflake for free for 14 days
Snowflake architecture consists of three layers, each of which is independently scalable: storage, compute, and cloud services. Its architecture allows for flexibility with big data.
Snowflake decouples the storage and compute functions, which means organizations that have high storage demands but less need for CPU cycles — or vice versa — don't have to pay for an integrated bundle that requires them to pay for both. Users can scale up or down as needed and pay for only the resources they use. Storage is billed by terabytes stored per month, and computation is billed on a per-second basis.
The database storage layer holds all data loaded into Snowflake, including structured and semi-structured data. Snowflake automatically manages all aspects of how the data is stored: organization, file size, structure, compression, metadata, and statistics. This storage layer runs independently of compute resources.
Snowflake’s compute layer is made up of virtual warehouses that execute data processing tasks required for queries. Each virtual warehouse (or cluster) can access all the data in the storage layer, then work separately, so the warehouses do not share — or compete for — compute resources. This enables nondisruptive, automatic scaling, which means that while queries are running, compute resources can scale without the need to redistribute or rebalance the data in the storage layer.
Finally, Snowflake’s cloud services layer uses ANSI SQL and coordinates the entire system. It eliminates the need for manual data warehouse management and tuning. Services in this layer include:
Deliver data from 140+ sources to Snowflake
Snowflake is built specifically for the cloud, and it's designed to address many of the problems found in older, hardware-based data warehouses, such as limited scalability, data transformation issues, and delays or failures due to high query volumes. Here are five ways Snowflake can benefit your business:
The elastic nature of the cloud means if you want to load data faster, or run a high volume of queries, you can scale up your virtual warehouse to take advantage of extra compute resources. Afterward, you can scale down the virtual warehouse and pay for only the time you used.
You can combine structured and semi-structured data for analysis and load it into the cloud database without the need for conversion or transformation into a fixed relational schema first. Snowflake automatically optimizes how the data is stored and queried.
With a traditional data warehouse and a large number of users or use cases, you could experience concurrency issues (such as delays or failures) when too many queries compete for resources.
Snowflake addresses concurrency issues with its unique multi-cluster architecture: Queries from one virtual warehouse never affect the queries from another, and each virtual warehouse can scale up or down as required. Data analysts, engineers, and scientists can get what they need, when they need it, without waiting for other loading and processing tasks to complete.
Snowflake's architecture enables data sharing among Snowflake Data Cloud users. It also allows organizations to seamlessly share data with any data consumer — whether they are a Snowflake customer or not — through reader accounts that can be created directly from the user interface. This functionality allows the provider to create and manage a Snowflake account for a consumer.
Snowflake is distributed across availability zones of the platform on which it runs — either AWS, Google Cloud, or Azure — and is designed to operate continuously and tolerate component and network failures with minimal impact to customers. It is SOC 2 Type II certified, and additional levels of security — such as support for PHI data for HIPAA customers, and encryption across all network communications — are available.
The Snowflake Data Cloud is ideal for data science, data engineering, and data analytics teams as they source and share data for business intelligence, product development, and other business decision making. It’s easy to use and supports citizen users in several ways:
Snowflake uses SQL and features APIs for Python, Java, and other programming languages. It is versatile and can connect to leading applications and systems to support data management across all industries. Always working to be more inclusive and useful to a wider audience, Snowflake has also created a new developer experience, Snowpark.
Snowpark is a developer experience that enables developers to write code in their preferred language and run their code directly on Snowflake. This exposes interfaces in Python, Scala, or Java to supplement Snowflake’s original SQL interface and to support a wider diversity of developers in building the applications and solutions they need. Snowpark is often seen as a machine learning and data science framework that offers the power of SQL within a Python flexibility; it can be used to train machine learning models.
Snowflake offers a Snowflake Marketplace, powered by Snowflake Data Sharing, that enables organizations to securely offer, discover, consume, and share live, governed data and data services at scale while eliminating the cost and latency often associated with traditional marketplaces. Data can be shared among business units, departments, as well as internally and externally with partners and customers. Snowflake customers can access datasets from Zillow, Weather Source, Epsilon, FactSet, and Safegraph, among numerous other major SaaS providers.
To load data into a Snowflake data repository, companies often use an extract, transform, load (ETL) process. Having the right ETL tool can make this process easy and more efficient. Stitch is a simple, powerful ETL service built for developers. It makes it easy to connect your ecosystem of data into Snowflake, by connecting to your first-party data sources and replicating that data to your data repository. Using Stitch to extract and load data makes migration simple, and users can run transformations on data stored within Snowflake.
As a Snowflake Partner, we make it easy to connect with Stitch from the Snowflake Partner Connect Portal. New users get a free 14-day trial, during which you can move an unlimited amount of data from more than 140 data sources, including popular platforms such as Google Analytics and Google Ads, Shopify, Salesforce, and Stripe.
Stitch streams all of your data directly to your analytics warehouse.
Set up in minutesUnlimited data volume during trial