Azure SQL Data Warehouse – Elasticity and Cost Control for Cloud Data Warehousing
Is there anything that Azure SQL Data Warehouse doesn’t do for users moving data and analytics into the cloud? As a scalable database service that can process massive volumes of data of any kind, SQL Data Warehouse offers fast provisioning, even faster compute scaling, cost savings by paying only for storage used and compute needed, and innovative security. And if users require additional data processing and analytics options, SQL Data Warehouse is compatible with Azure Machine Learning and Data Factory, as well as SQL Server tools and Power BI.
In particular, the Azure SQL Data Warehouse service:
- Offers rapid elasticity to handle batch-oriented data warehouse workloads. Provisioning can be done in a few minutes and compute power can be scaled in seconds.
- Dovetails with the SQL Server ecosystem. For example, many SQL Server Transact (T-SQL) commands can be used to develop on top of SQL Data Warehouse.
- Improves cost control. Storage and compute resources can be scaled independently of each other. While data stays intact, compute resources can be paused and only resumed and paid for again when they are needed. As a result, storage can be grown or shrunk without affecting compute, while compute can be grown or shrunk without moving data.
- Offers real time analytics. Power BI and Excel are complemented by solutions from popular BI partners, such as Looker Data Sciences, Tableau Software, and Qlik Technologies.
- Provides fault tolerance. The SQL Data Warehouse service is based on the Azure platform and benefits from automatic backups for fault tolerance, as well as easy deployment and efficient maintenance.
- Detects threats and facilitates auditing. Machine learning integrated into Azure SQL Data Warehouse identifies the patterns of your workload and alerts you to potential breaches before they happen. Seamless integration with Azure Active Directory also enables single sign-on and makes sure the right users access the right data.
- Enables distributed analytics over hybrid data sources. Within SQL Data Warehouse, Microsoft has also integrated Polybase, a technology to enable analytics over multiple data resources, including Hadoop clusters. This means you can run a single T-SQL query over a combination of relational and non-relational data.
Inside SQL Data Warehouse
As a distributed database system, the SQL Data Warehouse service uses massively parallel processing (MPP.) Working over multiple nodes, the service divides and conquers to tackle complex queries and workloads:
- Control node. This is the front end that manages and optimizes queries. It is powered by and acts like SQL Database. Your T-SQL query to the SQL Data Warehouse is then converted by the Control node into separate queries running in parallel on the Compute nodes (see below.)
- Compute Nodes. As their name suggests, these nodes, working in parallel, run your query and pass the results back to the Control node, which assembles and returns the final result to you.
- Storage. Your data is kept in Azure Blob storage, which is stable, replicated, and low cost. Azure storage is completely scalable and so, therefore, is SQL Data Warehouse. The Compute nodes read and write directly from and to the blob storage.
- Data Movement Service (DMS). This service moves data from one node to another for join and aggregation operations. DMS is a Windows service, not an Azure service, and is not accessed directly when using SQL Data Warehouse, but is tapped automatically for the data movements needed to run queries in parallel.
Easy Cost Control and Creation
Being able to turn compute on or off instantly is a powerful way to control costs. Consumption is measured in Data Warehouse Units (DWUs.) The number of DWUs can be modulated by simply moving a slider in the Azure portal, or by using T-SQL and PowerShell commands. Microsoft suggests an initial setting of 400 DWUs, while recommending the use of Azure SQL Data Warehouse for data sets of 1 TB or more (petabyte volumes are handled too), to leverage the true scalability and power available. You can also predict performance changes for a given DWU level before altering the size of your data warehouse.
Creation of an SQL Data Warehouse is also very simple, using your account in the Azure Portal. After the basic information of server selection, database name, performance (number of DWUs), subscription to be billed, resource group, and source selection has been filled in, the SQL Data Warehouse is created and displayed in your Azure Portal dashboard a few minutes afterwards.
By Jason Milgram, Director Software Development & Microsoft Azure MVP, Champion Solutions Group / MessageOps