In my first post in this series, SSIS, Docker, and Windows Containers, Part 0 – Getting Started, I covered:
- Installing docker
- Choosing a Container OS
- Pulling a test image
- Pulling a Windows Container with SQL Server installed
- interacting with with your instantiated container
In this post I reveal my ultimate goal and then travel down two rabbit holes to explain:
1. How I got here; and
2. One way to get answers to questions no one wants to hear.
My Ultimate Goal
My ultimate goal is to use one or more Windows containers running SQL Server to test SSIS. I can hear some of you thinking,..
My response? “Cuz.”
And, “I do what I want!”
Seriously, since I hopped the fence from developer to data I’ve dreamed of the day when I could practice lifecycle management with data-stuff like I used to practice lifecycle management with software development.
I recognize the obstacles. The greatest obstacle (in my humble opinion) is software is mostly stateless these days (these days started with Object-Oriented Programming and include its descendants). Stateless development solves lots of engineering problems in lifecycle management, and by “solves a lot of engineering problems” I mean some engineering problems simply don’t exist so lifecycle management for stateless stuff can simply ignore “a lot of engineering problems.”
A database, on the other hand, is all about that state. When it comes to managing lifecycle for a stateful platform – like a database – ACID gets dumped on many lifecycle management tools and solutions (see what I did there?).
Is it possible to manage a data-related lifecycle using stateless tools? Yes. But here there be obstacles. Let’s look at on use case:
Use Case 0
Your team releases an update that adds one column to one table. The column receives data. Someone realizes there’s a bug and determines a rollback is the solution. The web and middle-tier teams successfully rollback the release using functionality built into the release-management / DevOps enterprise tool.
What happens to the column in the table?
- Is it dropped?
- Is the data deleted?
- Since this isn’t a post about data lifecycle management, let’s not even consider what happens if the data is considered personal-identifying-information or some tuple otherwise subject to privacy regulations.
You see the problem? The answer is always going to be, “It depends.”
This is the problem I am trying to solve, specifically for data integration.
I want to be able to easily deploy and rollback data integration deployments. But to accomplish this, I need more metadata (data about the data) and a bunch more automation. Why? My stuff has state – by definition.
If you have not yet figured this out, Microsoft is a software development shop. They’re people trying to use software to solve business problems for customers – just like you and me. Just like you and me, they have schedules and deadlines and competition and a need to generate a profit. in other words, Microsoft is a business.
When some redneck from Farmville Virginia starts asking all sorts of out-of-scope questions, he sometimes hears, “That’s not supported, Andy,” in response. Ask me how I know.
So it helps to be
stubborn persistent. Persistence sometimes takes the form of thinking of another approach. When I began probing for ways to test SSIS in containers, my ultimate goal was to have the tests simulate Production as closely as possible. In 2019, Production SSIS means executing SSIS in the SSIS Catalog.
Is it possible to execute Production SSIS from the file system or the MSDB database in 2019? It is. I’ve not found a good use case for executing SSIS outside the Catalog, though – not in 2019. I am not trying to hurt anyone’s feelings, I promise. I’m just saying that the SSIS Catalog has been out for north of six years at the time of this writing. Plus: The only way to execute Azure-SSIS packages is via an SSIS Catalog in the cloud (again, at the time of this writing).
At Enterprise Data & Analytics we help clients migrate from file- / MSDB-based SSIS execution to the SSIS Catalog (both on-premises and Azure-SSIS). Some clients realize the benefits of SSIS Catalog-based execution and call us for help with training and consulting. Others ask for help performance-tuning SSIS, and we almost always recommend using the SSIS Catalog because SSIS packages almost always execute faster when executed from the Catalog. When do we not recommend using the SSIS Catalog? Some enterprises have designed their data environments in such a way that using the SSIS Catalog would require a major shift. The technical portion of the shift would be relatively straightforward, but – often – the cultural shift is more disruptive. Not all enterprises are willing to make such a cultural shift; to endure the disruption.
Getting Around “Not Supported”
When I started asking questions like, “Is it possible to run the SSIS Catalog in a Windows container?” the response was, “That’s not supported.”
Fine. What is supported? Running the SSIS Catalog on SQL Server installed on Windows Server Core. I was trying to solve many of the same problems in a container that would need to be solved running an SSIS Catalog on Core. It turned out many of these problems had indeed already been solved.
If you can make it to SQL Saturday #813 – Boston BI Edition on 30 Mar 2019, Liz Baron and Sreeja Pullagura are delivering an information-packed session titled SSIS DevOps with Azure Containers on this very topic. I’ve had the privilege and honor to work with Liz, Sreeja, and the awesome data engineering team at Teach For America – a worthy non-profit.
I predict Sreeja’s and Liz’s presentation is going to be a session that moves the needle for Data Integration Lifecycle Management (DILM).
My next several blog posts on this topic – sharing some of my efforts and what I’ve learned about executing SSIS using containers – will begin appearing 31 Mar 2019.