“There I was…” happy as a clam, rolling right along in my free webinar titled “Introduction to Lifting and Shifting SSIS to the Cloud” (you can view the recording if you missed it [registration required]). I opened Catalog Browser to show off this free utility from DILM Suite and “Guess what my code did?”
“Exactly what you told it to do, Andy!”
I found a bug in Catalog Browser – and in the next version of Catalog Compare (they share the same Catalog visualization code base). I fixed the bug this fine Saturday morning and I can best describe it by showing you the previous and current versions side-by-side (click to enlarge):
As you can see from the SQL Server instance textboxes, I am connected to the same SSIS Catalog in both images. v0.6.2.0 had a bug in the way it surfaced connection references. In my defense, the logic for connection references is among the most complex functionality in Catalog Browser.
Something very similar to the left image happened during my webinar. I was using Catalog Browser to display the Values Everywhere feature – which I like a lot – and I got duplication in some of the nodes. Not just any duplication either, mind you, it was very… multiple-ish – maybe even exponential-y – duplication. As a result, I was surprised during my presentation and even thought for a minute the ADF Integration Runtime Catalog had been updated since my previous tests.
Then I thought about it some more and could not recall testing Values Everywhere with multiple connection reference mappings. That bugged me. So I decided to fix it post haste.
It’s fixed in both the latest release of Catalog Browser and the upcoming release of Catalog Compare.
Catalog Browser is free. It’s part of the DILM Suite (DILM stands for Data Integration Lifecycle Management) which is a collection of mostly free – and some open source – utilities that support managing SSIS in the enterprise. I wrote about several DILM Suite solutions in my latest book:
As I mentioned, Catalog Browser connects to all versions of the SSIS Catalog. At the time of this writing, “all versions” includes 2012, 2014, 2016, 2017, and Azure Data Factory Integration Runtime.
Yesterday, when I logged into my Azure Portal to set up an Azure Data Factory Integration Runtime for my presentation, the balance for my Azure free subscription (which allows me $100USD per month) was sitting around $82USD. This morning, after leaving a demo instance of ADFIR running overnight, my balance has dropped:
I absolutely love the cloud! This is why I’ve been delivering webinars about the cloud and blogging and presenting about Azure Data Factory.
I would just like to clarify a few things:
The Cloud Is Not Free
This much is true. And while the cloud can save enterprises – especially smaller enterprises – lots of cash on “burst-y” workloads via scale-on-demand – the cloud can actually costmore money when unneeded services are accidentally left running.
The Cloud Will Not Fix Bad Code
In a previous consulting life, a customer contacted us and asked for an evaluation of their architecture. They were attempting to scale the business and encountering… obstacles. A team looked over their software and database designs and recommended a rewrite of their custom code. We supplied a proposal (including an expensive estimate) to deliver the re-architecture, redesign, and rewriting of their code.
They felt the price was too high.
We understood, so we countered with a proposal to coach their team to deliver the same result. This would cost less, the outcome would be the same, and their team would grok the solution (because their team would build the solution). The catch? It would take longer.
They felt this solution would take too long.
We understood, so we offered to help in any way they would allow. The company sought help from other firms. A competing consulting firm offered to migrate their code and databases to a cloud provider. The company liked this solution and embarked on a migration.
The migration succeeded. It took longer than expected. The company hemorrhaged cash and clientele throughout the process, and is now a fraction of its previous size in people, clients, cash flow, and profit.
The lesson? Migrating to the cloud wasn’t a bad move for them. Ignoring the advice of experienced architects was a bad move for them.
Did this client need to move to the cloud? Perhaps. More than anything, this client needed to pay off a tremendous amount of technical debt. By the time we were called, they were well into technical inflation and slipping into a technical depression.
Yes, the cost for a fix was high. No, the costs for a fix wasn’t as high a price as the costs they have paid for not fixing their code. And…
… the cloud did not fix their bad code.
This was heartbreaking to witness. We could have helped them. Why were we unable to help them? We told them the truth; our competition told them what they wanted to hear.
See, bad contracting companies have a huge sales force that goes out and asks customers, “Hey, what problems are you having? Sure, we can solve those. Just sign here.” They make impossible promises about their – well, YOUR – capabilities and timelines.
Brent is right. This is why I ask, “Are you getting the best technical help available? or are you being sold technology by the best sales people?” There is sometimes a difference between consultants who are good at sales (or who hire good sales people) and consultants who are good at technology.
I’m going to take that one step further:
The very best technologist – the one most able to solve your problem for the lowest cost – may be someone who personally irritates you and who charges an hourly rate higher – perhaps multiples higher – than the competition.
Allow me to be (more) blunt:
Did the company in the (true) account above save money?
Did the company in the (true) account above believe they were going to save money?
The client did the math. I’m writing this post to beg you to also do the math. I’m writing this post to ask you to buy consulting services (if you need them) and not just be sold consulting services (if you need them). I’m writing this post to let you know that the cloud is awesome. But the cloud is not free.
“How Much Does Azure Cost?”
I hear this question a lot and it is difficult to answer. You can certainly browse over to the Azure Pricing Page to learn more. The difficulty is estimating how many __U’s you are going to use (see what I did there?).
How does an enterprise lift and shift SSIS projects from SSIS Catalogs on-premises to the Azure Data Factory Integration Runtime?
What are the best practices?
Do SSIS lifecycle management tools exist to support SSIS lift and shift?
Join Andy Leonard – SSIS author, trainer, and consultant – to learn the answers to these questions. In this webinar, Andy discusses and demonstrates migrating SSIS to the Azure Data Factory Integration Runtime.
You may want to hire Enterprise Data & Analytics to help your enterprise SSIS to Azure Data Factory because we grok ADF Integration Runtime.
You can connect to the Azure Data Factory (ADF) dashboard here. After you connect…
1. Navigate to the Author Page
The “author” is indicated by the pencil icon on the left side of the ADF dashboard:
On the Author page you may add ADF resources by clicking the “+” symbol beside the search textbox. The dropdown menu contains three items at the time of this writing: Pipeline, Dataset, and Copy Data. Select Pipeline:
2. Drag an Execute SSIS Package Activity Onto the Surface
Click the new Execute SSIS Package activity to edit it, clicking the tabs shown below the Azure Data Factory Integration Runtime (ADFIR) surface. Configure the Execute SSIS Package activity name on the General tab:
3. Configure the SSIS Package to Execute
Configure the Execute SSIS Package activity’s SSIS Package on the General tab. Set the Catalog Package Path property to <folder>/<project>/<package> as shown below (click to enlarge):
I can hear you thinking, “But Andy, what if I don’t know the Package Path?”
Fear not. The next release of Catalog Browser – a free utility from the DILM Suite (Data Integration Lifecycle Management Suite) – displays the Catalog Package Path when you click on an SSIS Package:
Please note: Catalog Browser displays backslashes in the Catalog Package Path and ADFIR’s Execute SSIS Package activity expects forward slashes.
The next version of Catalog Browser is scheduled for release on or before 1 Jul 2018.
That’s it. A quick Debug execution will verify the ADF pipeline executes:
Use SQL Server Management Studio (SSMS) to connect to the server hosting your Azure Data Factory Integration Runtime. View the All Executions report and confirm the execution happened and succeeded:
In this webinar I demonstrate and discuss migrating SSIS projects from on-premises to the Azure Data Factory Integration Runtime. In addition I cover some best practices and demonstrate some tools of the trade.
What if I told you, “You can quickly and easily deploy Integration Runtime SSIS projects to Azure Data Factory?” You can! As shown in the image, you can simply right-click the SSIS project name in SQL Server Data Tools (SSDT) Solution Explorer and click Deploy to deploy the project to the Azure Data Factory (ADF) Integration Runtime. You can also start the Integration Services Deployment Wizard by navigating to an ISPAC file using Windows Explorer and double-clicking it.
In Part 0 we learned how to create and configure an Azure Data Factory Integration Runtime. In this post we connect to ADFIR, deploy an SSIS project, and examine options for viewing the contents of the ADFIR SSIS Catalog.
First, Deploying SSIS
Follow the Wizard… the Integration Services Deployment Wizard, that is. Connect to the Destination and create a Catalog folder if needed:
Next comes Validation. Be sure to address any errors that appear here:
Review is next:
And then, finally, Deploying Integration Runtime SSIS projects is as easy as deploying SSIS projects to an SSIS Catalog on-premises:
There’s really very little difference between deploying an SSIS project to an on-premises instance of an SSIS Catalog and deploying an SSIS project to an instance of the ADF Integration Runtime.
That. Is. Awesome!
What Happens When You Deploy Integration Runtime SSIS Projects?
Remember all the waiting back in Part 0? ADF was creating a copy of the SSISDB database – which contains all the goodies for the SSIS Catalog. that’s right – there’s a real, live copy of an SSIS Catalog stored in the database on the server you created in Part 0. You can view it in the very same way you would view an SSIS Catalog that resides on-premises: using the Integration Services Catalog node in Object Explorer in SQL Server Management Studio (SSMS).
Viewing the Catalog in SQL Server Management Studio (SSMS)
Open SSMS and begin connecting to a database:
Before you click Connect, click the Connection Properties tab and change the “Connect to Database” property to SSISDB:
You should now be able to view the ADF SSIS Integration Runtime using the SSMS Object Explorer’s Integration Services Catalogs node:
If you do not change the “Connect to Database” property to SSISDB, you can still connect to the Azure SQL DB instance, but you will not see the Integration Services Catalogs node. Instead, you will see nodes for Databases and Security:
All is not lost. Simply reconnect – this time remembering to change the “Connect to Database” property to SSISDB on the Connection Properties tab.
Viewing the Catalog with SSIS Catalog Browser
You can also use SSIS Catalog Browser (it’s free!) from DILM Suite to surface contents of the SSIS Integration Runtime. Simply connect to the ADFIR server:
Catalog Browser surfaces rich SSIS Catalog in a unified view:
Viewing the Catalog with SSIS Catalog Compare (v3 Preview)
One last thing…
SSIS Catalog Compare version 3 is in Preview (want a copy? Contact me!) and facilitates direct lift and shift from on-premises SSIS Catalogs to ADF Integration Runtime SSIS Catalogs:
Let’s face it: All the cool kids are playing in the cloud. And why wouldn’t they be? Microsoft Azure gets new features every week… sometimes every day! One shiny feature is Azure Data Factory Integration Runtime (ADFIR) which is new in Azure Data Factory version 2. In this post I will show you one way to create the Integration Runtime.
Connect to Azure and Create a SQL DB
First, connect to the Azure Portal. Click the “Create a resource” link and add a new SQL DB and server. I like to create a new resource group as well because a resource group allows me to group together a bunch of related resources. It’s also easier to tear down once I’m done. (Who wants to pay extra for leaving things running after you’re done tinkering? Anyone? Bueller? Bueller?)
Create an Azure Data Factory
Next, click the “Add a resource” link, select Integration from the topics on the left, and then click Data Factory to create an Azure Data Factory:
Walk through the configuration steps, selecting unique names where required. I tend to choose “lesser” options when they are presented. Again, I do not wish to pay for resources unless and until I need them.
Once your Azure Data Factory is configured you can browse to the Author & Monitor site:
When you reach the ADF Author & Monitor site, click the image and link to “Configure SSIS Integration Runtime”:
Configuring the SSIS Integration Runtime
There are three steps to configure the SSIS Integration Runtime:
1) Set the Name, Location, Node size and number (click to enlarge):
2) Configure SQL Server settings (click to enlarge):
3) Configure Maximum Executions per Node and Advanced Connections settings (click to enlarge):
At the time of this writing it takes 20-30 minutes for the Azure Data Factory Runtime to be created:
It will finally start. Promise:
That’s it! You’ve configured Azure Data Factory’s Integration Runtime.
See Part 1 to learn more about deploying SSIS projects to your shiny new ADFIR!
I use VirtualBox for my hypervisor engine. I started using it years ago when it was the first free hypervisor I found that supported 64-bit guests. In this post I share some thoughts about testing SSIS using a test virtual machine (VM).
There are other awesome hypervisor platforms out there. In setting up a virtual machine (VM) for SSIS testing, I do nothing specific to VirtualBox. You can use Hyper-V or VMWare or AWS or Azure or some other platform.
What you use is not nearly as important as using it.
After creating a virtual machine and installing an operating system, I set up test instances of SQL Server:
Which instances? I set up Dev, Test, QA, and Prod.
A Note About Dev
“SSIS developers require sysadmin or near-sysadmin permissions in the Data Integration Development environment.” – Andy Leonard, circa 2009
Every time I make that statement in a presentation or class I see the DBAs in the room react. Usually they cringe. Bold DBAs respond with questions – legitimate questions such as, “What if the developers destroy the Data Integration Development environment?” “What if they do something wrong”? “What if they do something stupid?”
These are legitimate questions that every DBA should be asking.
The answer – as uncomfortable as it is – is: “The DBA should fix it for them.” Why? Because…
SSIS Development is Software Development
Developers should not deploy their own code. This is a best practice for a good reason. SSIS developers should package SSIS projects for deployment and hand off said packaging to DBAs or a Release Management Team for deployment – especially to Production.
This the first reason my SSIS Test VM includes four tiers: Dev, Test, QA, and Prod.
“Not the First…”
Ideally we want the person deploying SSIS to Production to practice the deployment prior to deploying to Production. A DBA or Release Team person deploying SSIS to Production without practice is a recipe for rollback. The Pareto Principle applies and adding just a single deployment test increases the efficiency of the deployment from 80% to 96% (80% added to the original 80%).
“Not the First…” (Not a Copy/Paste Error)
The SSIS developer needs to practice deployment using the deployment package. I just checked my FitBit and it’s 2018. We are well beyond the time of “It works on my machine” being an acceptable excuse. It needs to work on the customer’s machine.
“Who’s our customer, Andy?”
As an SSIS developer, you should also be testing deployments. This is the reason there’s more than a single instance – a single tier – on my SSIS test VM. It’s why your SSIS test VM should also have more than one tier and why you need to be a sysadmin in your Dev tier.
If you are not free to fail you are also not free to succeed.
All software is tested. Some intentionally.
Failure is going to happen. Is it going to be you failing? Or will it be your customer or the business?
Test. Create the deployment package. Deploy it. Then test the deployment. Fix any errors. Then test the deployment again. Then fix any errors again. Then test the deployment again. Then test the deployment again. (Again, not a typo…)
“Is It Possible to Test Too Much, Andy?”
SSIS Catalog Compare is designed to assist DBAs and Release Management Teams with SSIS code promotion. Check out this short (1:20) to learn more:
“Great Andy, But What If I Need To Load In Parallel Using A Framework?”
Enter the SSIS Design Pattern named the Controller Pattern. A Controller is an SSIS package that executes other SSIS packages. Controllers can be serial or parallel, or combinations of both. I’ve seen (great and talented) SSIS architects design frameworks around specific Controllers – building a Controller for each collection of loaders related to a subject area.
There’s nothing wrong with those solutions.
SSIS Framework Community Edition ships with a generic metadata-driven serial controller named Parent.dtsx which is found in the Framework SSIS solution.
Specific Controller Pattern Design
A specific Controller can appear as shown here (click to enlarge):
This controller achieves parallel execution. One answer to the question, “How do I execute packages in parallel?” is to build a specific controller like this one.
Simple and straightforward, uses out-of-the-box Execute Package Task
“All Executions” Catalog Report is… misleading…
“How is the All Executions Report misleading, Andy?”
I’m glad you asked. If you build and deploy a project such as SerialControllers SSIS project shown here – and then execute the SerialController.dtsx package – the All Executions reports only a single package execution: SerialController.dtsx (click to enlarge):
We see one and only one execution listed in the All Executions report. If we click on the link to view the Overview report we see each package listed individually:
The All Executions report accurately reflects an important aspect of the execution of the SerialController.dtsx SSIS package. The execution of this package – and the packages called by SerialController.dtsx – share the same Execution ID value. This is not necessarily a bad thing, but it is something of which to be aware.
Specific Controller Design in SSIS Framework Community Edition
A specific Controller built using SSIS Framework Community Edition can appear as shown here:
This controller uses Execute SQL Tasks instead of Execute Package Tasks. The T-SQL in the Execute SQL Tasks calls a stored procedure named custom.execute_catalog_package that is part of SSIS Framework Community Edition.
One answer to the question, “How do I execute packages in parallel using SSIS Framework Community Edition?” is to build a Controller.
The SSIS Catalog All Executions report is accurate
The All Executions Report is no longer misleading. If you build and deploy a project such as SerialControllersInFrameworkCE SSIS project shown here – and then execute the SerialControllerInFrameworkCE.dtsx package – the All Executions reports each package execution (click to enlarge):
We now see one execution listed in the All Executions report for each package. As before, All Executions accurately reflects an important aspect of the execution of the SerialControllerInFrameworkCE.dtsx SSIS package: The execution of the Controller and each Childpackage now have distinct Execution ID values.
When using specific Controllers with an SSIS Framework it’s common to create a single-package SSIS Application that simply starts the Controller, and then let the Controller package call the child packages. Parent.dtsx in SSIS Framework Community Edition is a generic metadata-driven Controller, but it doesn’t mind executing specific Controllers one bit!
Once Upon A Time…
Not too long ago, Kent Bradshaw and I endeavored to add automated parallel package execution to our Commercial and Enterprise SSIS Frameworks. We achieved our goal, but the solution added so much complexity to the Framework and its associated metadata that we opted to not market the implemented solution.
Why? Here are some reasons:
Starting SSIS packages in parallel is very easy to accomplish in the SSIS Catalog. The SYNCHRONIZED execution parameter is False by default. That means we could build a controller package similar to the SerialControllerInFrameworkCE.dtsx SSIS package – with precedence constraints between each Execute SQL Task, even – and the SSIS Catalog would start the packages in rapid succession. In some scenarios – such as the scenario discussed in this post (from which the current post was derived) – this then becomes a race condition engine.
A Race Condition Engine?
Yes. Because controlling only when packages start is not enough to effectively manage race conditions. To mitigate the race condition described in this post I need to make sure the dimension loaders complete before starting the fact loaders. A (much simplified) Controller for such a process could appear as shown here (click to enlarge):
I grabbed this screenshot after the dimension loader and some (vague) pre-operations process have completed in parallel but while the fact loader is still executing. Please note the combination of the Sequence Container and precedence constraint which ensure the fact loader doesnot start executing until the dimension loader execution is complete. The sequence container creates a “virtual step” whereby all tasks within the container must complete before the sequence container evaluates the precedence constraint. Since each task inside this container starts an SSIS package (and since the SYNCHRONIZED execution parameter is set to True by default in SSIS Framework Community Edition), nothing downstream of this container can begin executing until everything inside the container has completed executing. This is how we avoid the race condition scenario described earlier.
How does one automate this process in a framework?
It’s not simple.
The method Kent and I devised was to create and operate upon metadata used to define and configure a “virtual step.” In SSIS Framework Community Edition the Application Packages table is where we store the Execution Order attribute. We reasoned if two Application Package entities shared the same value for Execution Order, then the associated package(s) (I’m leaving out some complexity in the design here, but imagine executing the same package in parallel with itself…) compose a virtual step.
In a virtual step packages would start together, execute, and not proceed to the next virtual step – which could be another serial package execution or another collection of packages executing in parallel in yet another virtual step – until all packages in the current virtual step had completed execution. Here, again, I gloss over even more complexity regarding fault tolerance. Kent and I added metadata to configure whether a virtual step should fail if an individual package execution failed.
This was but one of our designs (we tried three). We learned managing execution dependency in a framework is not trivial. We opted instead to share the Controller pattern.
We Learned Something Else
While visiting a client who had deployed the Controller Pattern, we noticed something. The client used a plotter to print large copies of Controller control flows and post them on the walls outside his cubicle.
When we saw this we got it.
The tasks in the Controller’s control flow were well-named. They were, one could say, self-documenting. By posting updated versions of the Controller control flows whenever the design changed, the data engineer was informing his colleagues of changes to the process.
He didn’t need to explain what had changed. It was obvious to anyone stopping by his cubicle for a few minutes. Briliant!
In this post I wrote about some data integration theory. I also answered a question I regularly receive about performing parallel loads using SSIS Framework Community Edition. I finally covered some of the challenges of automating a solution to manage parallel execution of SSIS packages in a Framework.
Note: much of this material was shared earlier in this post. I broke the Controller Pattern part out in this post because the other post was really too long.
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.