I… tweeted? X’d?… this message 11 Jul 2024. As promised, here’s the public post.
If you want to skip some history mixed with the description of EDNA’s SSIS framework, feel free to go to the section labeled “Imagine My Surprise…”
Some Back Story
I have a long and storied history with frameworks. My ETL frameworks odyssey begins with developing a solution to manage connections in Data Transformation Services (DTS) so that identical DTS packages could run in many enterprise data integration lifecycle management tiers (Dev, Test, Prod, etc.).
Metadata-driven frameworks appealed to my engineering / instrumentation heart because they:
- Abstract-away some of the complexity of very complex solutions
- Facilitate enterprise data engineering operation via rapid root cause diagnoses on failure
Critics of metadata-driven frameworks point out the complexity of framework solutions. These critics are not wrong: Execution frameworks are complex. The hope and goal is for the complexity of the framework to overcome and outweigh the complexity of the data engineering solutions that the framework manages. That’s not always the case. As business people liked to say in the early 2020’s, “the juice may not be worth the squeeze.”
The most common reason one should not employ a data engineering framework is enterprise data engineering solution size. If your enterprise data engineering consists of a few dozen packages or pipelines, you very likely do not need a framework to help manage them. The number of packages / pipelines is a fair order-of-magnitude measurement, but complexity, code re-use, and parameterization should be considerations, as well. I often share this when presenting – publicly or privately – on the topic:
“If you have a small number of pipelines or packages, you probably don’t need the overhead of maintaining a metadata-driven data engineering framework. If you have thousands of pipeline and / or packages? You likely already have some solution performing the function of a metadata-driven data engineering framework (or you desperately need one)!”
Enterprise Data & Analytics Sells Frameworks
I founded Enterprise Data & Analytics in September 2015 primarily to market SSIS Frameworks and a suite of Data Integration Lifecycle Management tools and utilities. I supported my coding habit with consulting and training, a practice I continue to this day. I remain committed to reducing friction between enterprises and their data engineering, and I believe frameworks are a good way – perhaps one of the best ways – to accomplish this goal.
Since our initial offerings of SSIS frameworks, EDNA (as we lovingly refer to our organization) has branched into Azure Data Factory frameworks. At the time of this writing, we are testing Fabric frameworks. The goals remain the same: reduce the friction by abstracting away complexity while facilitating enterprise data engineering ops.
Fail Application on Package Failure
Our SSIS framework feature named Fail Application on Package Failure is integral to SSIS Application workflow management. In our SSIS framework, we define a collection of SSIS Packages as an SSIS Application. One attribute of an SSIS Application Package is Fail Application on Package Failure.
An example is an SSIS Application that loads fact data for a star schema data warehouse. When you load a star schema fact, you want to make sure all related star schema dimension data is loaded first. If a dimension load process fails, you want the application to stop before loading the fact, which is dependent on the dimension data.
Application Restartability
One powerful feature of our SSIS frameworks is “Restart SSIS Applications at Failed Application Package.” Application restartability is driven by Fail Application on Package Failure functionality. An example will help explain:
Suppose your data warehouse load window (or “maintenance window”) is 4 hours each night, your data warehouse load process (SSIS Application) consists of 85 SSIS packages, and the current load time is averaging 3.5 hours per night. Imagine the 3.5 hour load process hits a snag 2 hours into a nightly load on package 67 of 85. The Fail Application on Package Failure bit is set for the package performing that part of the load process. When the package fails, the process stops. You’re on call and your phone alerts you that there’s been an issue. You spring from slumber at 2:00 AM, connect to the VPN, and begin troubleshooting. You isolate the root cause in 10 minutes, correct it in another 10 minutes, and restart the data warehouse load SSIS Application. Because your enterprise SSIS framework Application is configured to restart SSIS Applications at the Failed Application Package, the load process restarts at package 67 of 85.
Because of application restartability, the 2 hours of previous loading does not re-execute.
Package 67 of 85 succeeds this time and the data warehouse load completes without interruption – and on time.
Your support team doesn’t have to enable a flag to warn customers their data availability is delayed.
No Service Level Agreements (SLAs) are challenged.
Everyone wins.
This is one advantage of using a well-designed metadata-driven framework to manage data engineering.
Imagine My Surprise…
…when one of our earliest SSIS framework customers reached out to share – in the very nicest way possible – that Fail Application on Package Failure in our SSIS framework was “not working.” My response was, “Great Scott!”
The customer is running SQL Server 2016. That’s fine for EDNA’s SSIS frameworks because we have framework versions that work with several versions of SSIS and SQL Server from 2012 forward. I maintain a collection of virtual servers built using the tools available for each version, including the operating systems and versions of Visual Studio, SSIS, and SQL Server.
I had the customer send me related SSIS packages for testing. I fired up my SQL Server 2016 VM and began troubleshooting. What I found scared the dickens out of me: some of the .Net code in the Script Task that manages Fail Application on Package Failure functionality in our framework orchestrator SSIS package – Parent.dtsx – was missing.
“No problemo,” I thought. I copied that code from another version of Parent.dtsx and redeployed the package.
It didn’t work.
I’m a little embarrassed to share that I spent parts of several days trying to figure out what was going on.
I’m proud to say I did not give up.
If you don’t give up, you will find it (imagine that sentence spoken like this).
The Issue
SSIS 2016 Package Deploy ate the .Net code in my SSIS Script Task. How did I learn this? It took time and stubbornness persistence.
I kept seeing results that, frankly, made absolutely no sense. I would make a change, deploy it, test an execution (using a suite of SSIS framework test Applications that we ship with the product), and nothing would change. I kept getting the same results after I made changes to the code. I had no idea what was going on. For days.
I kept niching the problem down, reducing the factors to narrow the scope. I (finally) realized the problem wasn’t in the SSIS Framework orchestrator package (Parent.dtsx). I built a simple, single-package SSIS project named InformationTest. The single package was named Information.dtsx and initially contained a single Script Task.
On a lark, I deployed Information.dtsx using Package Deploy and then exported the SSIS project from the SSMS Object Explorer Integration Services Catalogs node:
I used 7-Zip to view the archive (ispac files are zipped). If you don’t want to use 7-Zip, rename the file by changing (or adding) the “.zip” extension:
I then extracted Information.dtsx to the file system and opened it using Notepad++:
I deployed the InformationTest SSIS project and the code worked as designed, raising an Information Event when the Information.dtsx package was executed from the SSIS Catalog (again, outside the SSIS framework). The Messages report confirmed the .Net code in the Script Task was firing by displaying the opening line to Big Time:
Viewing the Information.dtsx SSIS package XML in Notepad++, there be code here:
Problem Solved (Kinda)
I found the root cause. Or, at least, I found a root cause. I found enough to formulate a working theory about what may have happened to the .Net code in the Fail Application on Package Failure script task in the Parent.dtsx SSIS package. That made me feel better than I felt when I thought I’d sent a solution to a client with missing code.
Because we had SQL Server 2016 instances available, we tested the issue against SQL Server 2016 SP1, Sp2, and SP3. We saw similar results for Package Deployment in each service pack.
Re-testing confirmed expected behavior. the workaround was redeploying using Project deployment instead of Package deployment. After days of confusion, I felt like a steely-eyed missile man.
I sent a note to friends at Microsoft.
I X’d and posted on LinkedIn. I did my diligence in the note and those posts.
I’m sending this detailed explanation first to paid subscribers of my Substack newsletter because yall support me and that means something to me.
I’m making the blog post public in 20 days.
A Closing Product Ownership Thought
Product bug reports and negative feedback happen from time to time and it is always my fault.
I can hear some of you thinking, “Why is it always your fault, Andy?” That’s an excellent question. I’m glad you asked!
It’s my fault because I own the product.
I own the product when customers love it; I own the product when customers hate it.
I own the product when it Just Works; and I own the product when it is “not working.”
When you own a product, you don’t get to pick when you own it and when you don’t. That ship leaves the station the instant anyone stands up and declares, “This is my product.” (Yes, I mixed a metaphor. It’s my revenge on LLM owners for training using my blog and not compensating me for it. But I digress…)
It is my hope that, eventually, someone who was or is as puzzled as I was gains an understanding of (at least) some of what was going on when you battled this very issue. Perhaps more of you will identify with and / or be inspired by the power of beating your head against the monitor until you figure out the problem you are trying to solve.
My advice? Make the problem give up before you do.
Peace. :{>
Good process conducted