This is the first time I am delivering this session. It still has that new presentation smell!
Azure Data Factory – ADF – is a cloud data engineering solution. ADF version 2 sports a snappy web GUI (graphical user interface) and supports the SSIS Integration Runtime (IR) – or “SSIS in the Cloud.”
Attend this session to learn: – How to build a “native ADF” pipeline; – How to lift and shift SSIS to the Azure Data Factory integration Runtime; and – ADF Design Patterns to execute and monitor pipelines and packages.
A few years ago I had a conversation with Scott Currie, the CEO of Varigence and inventor of Biml. Scott is one of the smartest people I know and I know a lot of very smart people. Whenever I have the opportunity to communicate with the very smart people I know, I often ask them – based upon their knowledge of me and what I do and any plans I have shared – for advice on stuff I should focus upon moving forward. Scott’s monosyllabic reply to this question?
Over the course of the past couple years I’ve shared – on this blog, even – some of my experiences learning more about Apache Spark. While I expected a little resistance, I was a little surprised by the… intensity… of some of the private messages and emails I received. As I replied to the intense people (and there were only a few), I could stop learning stuff right now, today, and continue working with SSIS for the next decade or so. I believe the same goes for anyone working with SSIS today.
I’m not going to stop learning stuff.
I love SSIS. But I don’t use SSIS because I love it. I love SSIS because it solves a particularly difficult piece of the story of enterprise data: data integration and data engineering. I feel a similar way about writing (and I’ve blogged recently about writing). I write because I like to write, not for clicks or branding or any of the other benefits I glean from writing. I consider those benefits cool, but side-effects of my desire to write.
The same goes for SSIS: I love SSIS because it solves a problem that I want to solve. I hope that makes sense.
How can you learn more about Spark? There are bunches of videos out there on YouTube. YouTube is crowd-sourced and free, which makes it awesome. The quality of YouTube training is crowd-sourced and free, which is a challenge.
I learned a lot from edX. I’m a fan of how edX approaches MOOC (online education). They offer lots of courses for free, which means you can grow your knowledge by investing only time. It’s tough to beat $free. If you want, you can pay for a certificate which you can then add to your online resume or LinkedIn profile like I added this one.
My advice? Spend some time learning. Pick your favorite learning platform and jump in. If you have SQL Server and / or SSIS experience, you already know a lot about the problem enterprises are trying to solve with data engineering tools. That’s a good place to be, but not required.
SSIS Framework Community Edition is free and open source. You may know can use SSIS Framework Community Edition to execute a collection of SSIS packages using a call to a single stored procedure passing a single parameter. But did you know you can also use it to execute a collection of SSIS packages in Azure Data Factory SSIS Integration Runtime? You can!
In this free webinar, Andy discusses and demonstrates SSIS Framework Community Edition – on-premises and in the cloud.
Join SSIS author, BimlHero, consultant, trainer, and blogger Andy Leonard at noon EDT Thursday 20 Sep 2018 as he demonstrates using Biml to make an on-premises copy of an Azure SQL DB.
I demonstrate a cool SSIS Catalog Browser feature that helps ADF developers configure the Execute SSIS Package activity.
To see it in action, download SSIS Catalog Browser – it’s one of the free utilities available at DILM Suite. Connect to the instance of Azure SQL DB that hosts an Azure Data Factory SSIS Integration Runtime Catalog, select the SSIS Package you desire to execute using the Execute SSIS Package activity, and then copy the Catalog Path from the Catalog Browser status message:
Paste that value into the Package Path property of the Execute SSIS Package activity:
You can rinse and repeat – Catalog Browser surfaces Environment paths as well:
It all started when GoDaddy created a DMZ for SQL Server databases. I found this functionality in 2008 and asked myself, “Self, how might we use this?”
Since That Time…
There have been two major iterations of AndyWeather. I use weather data collected during the first iteration for training purposes at SSIS Academy and when delivering training to Enterprise Data & Analytics customers.
The setup of the second iteration is fairly straightforward:
The Acurite Weather Station consists of an instrument pack plus a base station. The instruments collect weather measurements and transmit them to the base station.
The base station is connected to an older e-Machine running Windows 7 Ultimate (32-bit) on 2GB RAM.
An Acurite application interfaces with the base station and the application stores data locally in a single CSV file.
I wrote a very simple C# console application named “abt” (an acronym for “Azure Blob Transfer”) to transfer the CSV file to Azure Blog Storage.
An Azure Data Factory pipeline that loads an Azure SQL DB staging table.
The AndyWeather website which reads the latest weather data from the Azure SQL DB staging table.
I wrote another very simple C# application named “awt” (an acronym for “AndyWeather Tweets”) that tweets updates to the @AndyWeather twitter account.
Acurite Weather Station
The latest iteration began in early 2018 when I purchased an updated package of instruments and a new base station made by Acurite. So far, I like this station a lot. It was less expensive than the previous station and appears more rugged (again, so far – time will tell).
I recently relocated the weather station to improve connectivity between the instruments and the base station. I recorded a Data Driven *DataPoint* about it:
(Pay no attention to the exploding pecans in the background…)
I intentionally use an under-powered PC for the server. Why? I want to learn how the base station – and then everything downstream of the base station – responds to busy server conditions. This is Engineering 101 stuff and I’ve learned a lot:
I love this old machine!
The Acurite people maintain an application for communicating with base stations:
The PC Connect application allows me to configure how and when weather data is collected from the base station – which collects measurements from the instruments. The application lets me configure the units-of-measure and file location – and I can even share my weather data with Weather Underground. How cool is that?
The Azure Blob Transfer Console Application
The Azure Blob Transfer (abt) application is a very simple console application written in C#. It picks up the CSV file containing weather data stored by the Acurite PC Connect application and writes the file to an Azure Blob Storage container:
The CSV file in Azure Blob Storage is overwritten each time abt successfully executes. You can download a copy of the abt solution here.
Azure Data Factory Pipeline
An Azure Data Factory (ADF) pipeline calls a stored procedure that first truncates a staging table in a Azure SQL DB using a Stored Procedure activity, followed by a Copy Data activity that copies the weather data from the CSV file in Azure Blob Storage to an Azure SQL DB staging table:
At the time of this writing, ADF version 2 is current.
You can download the ARM template for the pipeline here.
The AndyWeather Website
The AndyWeather website has been around since the days of the first iteration of AndyWeather – the one that stored data in a SQL Server instance hosted at GoDaddy’s DMZ. It’s fairly straightforward code, which helps it perform fairly on desktops and mobile devices:
The biggest performance hit comes from executing the stored procedure against an Azure SQL DB, which can sometimes take 5-10 seconds to complete.
The AndyWeather Tweets Console Application
I snagged some C# code and a TwitterAPI class from a project named called TweetSharp to help build the awt console application:
It makes me happy every time I see a tweet from @AndyWeather:
I tell people, “It’s just a dumb little app,” but I really had fun building it. I learned a bunch, too!
The AndyWeather IoT solution uses hybrid technology – on-premises instruments and servers, combined with cloud services – to deliver weather data to a website and Twitter account. It’s accessible from social media and the web from desktops and mobile devices.
Just so you know, this isn’t everything I’ve built using the AndyWeather instruments. There’s a bunch more – some of which is still in the experimental phase. I’ll share more as time permits. But I want you all to know, I consider Azure a great big cyber-playground!
I am excited to announce a brand new course (it still has that new course smell) from Brent Ozar Unlimited and honored to deliver it! This one-day, live, online course is titled Fundamentals of Azure Data Factory and it’s designed to introduce you to Azure Data Factory (ADF).
There will be demos. Live demos. Lots of live demos!
Azure Data Factory, or ADF, is an Azure PaaS (Platform-as-a-Service) that provides hybrid data integration at global scale. Use ADF to build fully managed ETL in the cloud – including SSIS. Join Andy Leonard – author, blogger, and Chief Data Engineer at Enterprise Data & Analytics – as he demonstrates practical Azure Data Factory use cases.
In this course, you’ll learn:
The essentials of Azure Data Factory (ADF)
Developing, testing, scheduling, monitoring, and managing ADF pipelines
Lifting and shifting SSIS to ADF SSIS Integration Runtime (Azure-SSIS)
ADF design patterns
Data Integration Lifecycle Management (DILM) for the cloud and hybrid data integration scenarios
To know if you’re ready for this class, look for “yes” answers to these questions:
Do you want to learn more about cloud data integration in Azure Data Factory?
Is your enterprise planning to migrate its data, databases, data warehouse(s), or some of them, to the cloud?
In my experience, people either love or hate triggers in Transact-SQL. As a consultant, I get to see some interesting solutions involving triggers. In one case, I saw several thousand lines of T-SQL in a single trigger. My first thought was, “I bet there’s a different way to do that.” I was right. The people who’d written that trigger was unaware there was another way to solve the problem.
I see that a lot.
Software and database developers do their level best to deliver the assigned task using the tools available plus their experience with the tools. I don’t see malice in bad code; I see an opportunity to mature.
Granted, when I’m called in as a consultant I get paid by the hour. That’s not the case for many folks.
One ETL-Related Use For Triggers
Change detection is a sub-science of data engineering / data integration / ETL (Extract, Transform, and Load). One way to detect changes in a source is to track the last-updated datetime for the rows loaded by an ETL load process.
The next time the load process executes, your code grabs that value from the previous execution and executes a query that returns all rows inserted into or updated in the source since that time.
Simple, right? Not so fast.
Where to Manage Last-Updated
How the value of the last-updated datetime is managed is crucial. I’ve had arguments discussions with many data professionals about the best way to manage these values. My answer? Triggers.
“Why Triggers, Andy?”
I’m glad you asked. That is an excellent question!
Imagine the following scenario: Something unforeseen occurs with the data in the database. In our example, assume it’s something small – maybe impacting a single row of data. The most cost-effective manner to manage the update?
How is the last-updated value managed? It could be managed in a stored procedure. It could be managed in dynamic T-SQL hard-coded inside a class library for the application. If so – and given a data professional is now manually writing an UPDATE T-SQL statement to correct the issue, there is an opportunity to forget to update the last-updated column.
But if the last-updated column is managed by a trigger, it no longer matters if the data professional performing the update remembers or forgets, the trigger will fire and set the last-updated column to the desired value.
Change detection works and all is right with the world.
Troubleshooting triggers is… tough. There are some neat ways to troubleshoot built right into tools like SQL Server Management Studio (which is free). For example, in SSMS you can Debug T-SQL:
I like triggers for the things triggers do best, but triggers can definitely be misused. Choose wisely.
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.