So you’ve heard about Azure Data Factory or ADF, and the idea of automating a large chunk of your data processing in the cloud intrigues you. However, you’re feeling a little daunted by the prospect of the corporate PR and tech-marketing language that goes with it. For example, what on earth is the “data lineage” that is apparently one of the big selling points for ADF; and should you be getting some for your own business? Relax. The heavy lifting has now been done for you. Read on for an overview of Azure Data Factory that won’t go blowing any mental fuses.

What does Azure Data Factory Do?

In a nutshell, ADF takes in the data you need from different sources in the cloud and on your own systems, brings the data together, organizes its processing, and hands you back an ‘authoritative and trustworthy’ version. Is this a round journey to nowhere? Not at all. If you are going to work with data to develop business-critical models, forecasts, and insights, your data must be sufficiently complete and consistent. Otherwise your results will be shaky or downright wrong. The preparation of diverse and potentially huge amounts of data takes the right amount of computing power organized the right way. ADF handles this organization as a cloud service. It sends your data through its “pipelines” in the cloud, calling on different cloud resources to clean it and shape it for the next step.

Then What Happens?

The answer to this will depend on what you’re trying to achieve. Let’s suppose you have the bright idea of linking sales promotions in your national chain of supermarkets to the weather. For example, you want to know if you should stock up more on barbecue sets or lawnmowers over a sunny, summer weekend. To figure out how weather conditions drive different product sales, you may have to analyze tons of historical sales and weather data. Azure Data Factory will have already prepared all the data so that it can be piped into the next processing stages of analysis (finds the links between sales and the weather) and then reporting (tells you what those links are.)

What Does ADF Do that Other Microsoft Solutions Don’t?

Microsoft also offers another data integration solution known as SQL Server Integration Services or SSIS for short. SSIS is used to produce ‘authoritative and trustworthy’ versions of data held in Microsoft SQL Server databases. It offers ‘Extract, Transform and Load’ (ETL) capabilities. These look similar to the way pipelines in Azure Data Factory take data from different sources (Extract), get it cleaned and shaped (Transform), and produce a version for the next stage (Load) of data analysis. ADF however does two more things. It handles additional data sources that SQL Server cannot. It also tags and tracks the data from different sources. This is important for reliability and for security, and is known as “data lineage” (such a medieval name for a cutting-edge principle!)

Show Me the Numbers

While your mileage may vary, Microsoft offers an interesting case study concerning its customer Milliman and the use of Azure Data Factory. Milliman provides services to the insurance industry. The company needs to pull in and refine terabytes of data every day, from which to produce models and analyses. Milliman was already using other Azure cloud services, but wanted first of all to build its own solution to “stitch together” the different data sources and process them. The cost however was high and the deployment time long. A switch to ADF brought IT cost reductions of 30 percent and a reporting time reduction of 95 percent. As a bonus, the company was also able to double its supported insurer customer base.

Is Azure Data Factory for You?

“Try before you buy” is good advice. Remember too that ADF and SSIS (see above) are not necessarily better or worse than each other. SSIS also has other features that ADF does not. In general however, if you are dealing with large quantities of data (big data) and if the cloud and Azure in particular are sources, destinations or both for that data, taking Azure Data Factory for a trial run is a recommendation.

Was this article helpful?