DILM Tiers for Enterprise Data Integration

“How many Data Integration Lifecycle Management tiers (DILM tiers) do I need to manage enterprise data integration, Andy?” If I had a nickel for every time I am asked this question, I would have a lot of nickels. My answer is, “Four.”

Why Four?

I’m glad you asked. Before I answer, I have a question for you: Have you ever been told by a developer, “It works on my machine.”? Do you know why it works on the developer’s machine?

Defaults.

Developers build software that contains configurations information. In the first iteration of building the software, most developers do not externalize – or parameterize – all configuration values. Instead, they hard-code (at least) some of these values into the application, usually as variable defaults.

I can hear you thinking…

“But This Is Azure Data Factory and SSIS, Andy”

And? Oh, you ask because SSIS stands for “SQL Server Integration Services.” I get that. It’s important you understand this:

SQL Server Integration Services suffers somewhat from its name which contains the name of a popular relational database engine (SQL Server). But SSIS is not a database, and SSIS development is not database development (and most definitely not database administration).

Similar arguments may be made for Azure Data Factory (ADF).

SSIS  – and ADF – development is software development.

Software Best Practices Apply

Because ADF development and SSIS development are software development, software development best practices apply. For example, ADF and SSIS developers should source-control and test their ADF solutions and SSIS projects and packages.

Perhaps testing is that thing you do in Production right after deployment. I have this pithy saying:

While I freely admit this statement is pithy, it is also true. The remainder of this quote is even pithier: “Some software is tested by your soon-to-be-former largest customer.”

I don’t want you to lose customers – especially your largest customer. That’s why I advocate you test your software by developing it on one Data Integration Lifecycle Management – or DILM – tier and then promoting the code to at least two other DILM tiers before deploying to the Production tier.

Tiers. Or Tears. Your Call.

So which tiers do I recommend? I’m glad you asked! I recommend:

  1. Development
  2. Test
  3. Pre-production
  4. Production

Development

You must build the software somewhere. Wherever you build the software is your development environment.

Now let’s take a step back and describe the development environment for an SSIS developer. An SSIS development DILM tier shares some characteristics. In Dev, SSIS developers can:

  • Create SSIS projects and packages
  • Deploy SSIS projects and packages
  • Overwrite existing SSIS projects and packages
  • Delete existing SSIS projects and packages
  • Execute SSIS packages
  • Manage (deploy, overwrite, delete) SSIS configurations
  • Manage (deploy, overwrite, delete) SSIS metadata
  • Create databases, tables, and files

The most important part of the SSIS Development DILM tier? SSIS developers can perform each action listed above without help from anyone else in the enterprise.

I can hear you thinking, “What if the SSIS developers blow something up, Andy? Huh? HUH?!?” Please permit me to ask another question: What happens when things go awry or amiss in Production? Isn’t the answer to that question, “I fix it,” or “Our DBAs fix it”? If that’s true in Production…

Then you or the DBAs fix it.
In Dev.

In fact (and I know this is a crazy thought) you could actually use a more-open Development tier to test and – perhaps, even – improve (script? automate?) break-fix procedures for Production. After all, developers are notoriously excellent testers (I write this as a developer: I can break things like nobody’s business!).

What if the Dev tier becomes a place for you to test your recovery procedures for Production? If your HA/DR procedures work in Development where developers have too much access, then they should work in Production where the environment has been locked down. Am I correct?

Over the years I have found myself making this statement to enterprises: “You have an environment named ‘Development’ but it is not a Development environment.” How do you know it’s a Development environment? If the developers can destroy it, it’s Dev.

One last thing before I kick this soapbox back under my desk. Consider that developers will be inconvenienced by the destruction of their Dev environment. It may slow them down. Heck, it may cost them a deadline. If that happens, you can take the opportunity to educate them, sharing what happened and why you believe it happened. Maybe they won’t do the dumb thing that burned down Dev again… Maybe.

Would having developers that better understand databases make your job a little easier in the long run?

One other last thing about Dev environments: Please do not go all passive-aggressive (without much passive) and respond, “Well then, the SSIS developers can just manage their development servers without DBA help.” If you think that way, do you also think DBAs should develop SSIS solutions? I’ve met a handful of DBAs who excel at both jobs, but they are few and far-between. I encourage people to work together as teams, with everyone sharing their knowledge with others as much as possible.

I like it when everyone grows together, and grows together.

That’s teamwork.
It makes the dream work.

Test

Once the software works on the SSIS developer’s machine, they need another environment to which they can deploy the SSIS project. Why? Remember those defaults? SSIS packages will always execute in Dev because default values are aimed at local resources for which the SSIS developer has authorization and access.

Which resources? Required access (CRUD – create, read, update, delete) to files and directories, databases, servers, etc. No SSIS developer would claim an SSIS package is ready unless and until she or he successfully executes the package in the SQL Server Data Tools (SSDT) debugger. But a successful SSDT debugger execution does not a completed SSIS package make.

To suss out missing configurations (externalization, parameterization), one needs to move the software to another lifecycle management tier and execute it there. Only then can the developer be sure they’ve externalized required defaults.

In Test, SSIS developers can:

  • Deploy SSIS projects and packages
  • Overwrite existing SSIS projects and packages
  • Delete existing SSIS projects and packages
  • Execute SSIS packages
  • Manage (deploy, overwrite, delete) SSIS configurations
  • Manage (deploy, overwrite, delete) SSIS metadata
  • Create databases, tables, and files

I can hear you thinking, “But Andy, this is almost the same list of permissions as Dev!”
Yes.

This part of the work must be completed. As professionals, we get to decide when and where the work gets done. Remember, it can always be done in Production. And it will be done in Production unless we are intentional about doing this work on another DILM tier.

Someone needs to do this part of the work. Do you want the DBAs to do it? Do you want to do it? Or would you rather have the person most familiar with the code – the SSIS developer – do it? They are not going to catch everything that needs to be externalized, either; not on this pass. But the Pareto Principle informs us that they will catch 80% of the missing externalization.

Deployment to the DILM Test tier serves as the first deployment test. I want the SSIS developers bundling up SSIS deployments. Why? It’s good practice and practice makes perfect.

On Estimation…

One last point regarding software testing: When estimating a software project I begin by estimating how much time it will take to build the software. After this step the calculation for how much time is required to test the software is really easy: It’s the same amount of time required to develop the software.

Andy, are you telling me it takes just as long to test software as to develop it?

Yes. Yes I am.

“Cutting Testing to Save Time”

You can change this relationship by “cutting testing to save time.” But you can never shorten testing, only lengthen it. Decades of software development experience inform me that the costs of testing software increase by an order of magnitude for each tier one puts off testing. The most expensive environment in which to test is Production.

Cutting testing never saves time.

Or cutting testing does save time; the time it takes you to deliver broken software. I can hear some of you thinking, “But can’t SSIS developers just develop better SSIS packages with fewer bugs?” The answer is the same as the answer to the physics question: “Can’t light just be either a wave or a particle, and not both?” Physics says no, light is both a wave and a particle. Andy says the same physics apply to software development along with the Theory of Constraints which tells us losses accumulate, gains do not.

As my friend Aaron Lowe used to say (before he passed away earlier in 2018 – man, I miss him, and others… we lost too many SQL Family members this year…), “Math is hard.”

You must include the cost of lost customers in the calculation.

Remember, all software is tested…

Pre-Production

The Pre-production (QA or UAT or whatever you call the environment sitting just this side of Production) environment should be locked down tighter than a tick. The security in place in this pre-Production tier should be identical to the security in the Production tier. Optimally, all hardware capability and storage capacity in Production will match Pre-production.

If you understand the dynamics of SSIS Data Flow Task or ADF Mapping Data Flow internals, it’s possible to test performance with less data on sub-optimal hardware. If you ask me to do it, I’m going to let you know there’s a possibility we will miss something. I’ll be able to tell you for sure after we run some performance test executions in Production (and we will need to do that to be sure, and that will cost both time and money, so how much money are you saving when you save money by buying sub-optimal hardware and storage for Pre-production? The answer is, “Not as much as you think.”).

The nice thing about ADF is you can manage costs by scaling up when needed, and then scaling down when not needed. Scaling is one of the coolest features of Azure, and it is extremely helpful for testing non-Production loads using Production capacities.

Deployment to Pre-production should be performed by the same team – optimally by the same individual – that will be performing the deployment to Production. Why? We do not want deployment to Production to be their first experience with deploying this SSIS project. The deployment to Pre-production is another deployment test, this time for the Production release management person. Perhaps this is a DBA. Perhaps this is a Release Management team member. Regardless, this step is their opportunity to practice the deployment to Production.

I can hear you thinking, “What’s the big deal, Andy?” I’m glad you asked. The big deal is: No matter how you execute SSIS in your enterprise, there’s more to the solution than merely the SSIS packages. If you’re running from the file system (and most enterprises execute SSIS from the file system), there are configurations stored in dtsConfig files or database tables or the SSIS Catalog that also need to be promoted. Some of the values – such as connection manager properties – need to be edited in these dtsConfig files, database tables, or SSIS Catalog at each DILM tier.

The same holds for deployments to the SSIS Catalog. And… if you are deploying (or planning to deploy) SSIS to the Azure Data Factory (ADF) SSIS Integration Runtime (Azure-SSIS), your options are deployment to an SSIS Catalog or Azure Blob Storage (or msdb or SSIS Package Store on SQL Managed Instance) where the same rules apply. Externalization is accomplished via Catalog Environments, References, and Literals in the SSIS Catalog. In Azure Data Factory, variable and parameter values may be externalized in metadata data stores – files in blob storage, metadata databases in Azure SQL, or Azure Key Vaults. Same stuff, different location. But both dtsConfig files and Catalog Environments must be managed. (Note: for SSIS Catalog deployments, SSIS Catalog Compare can help.)

Since editing will need to occur when it’s time to deploy to Production, I prefer the deploy-er practice the editing along with the deployment. I ask the developer to send the SSIS and scripts or files for managing external configurations metadata to the deploy-er, and I request these configuration artifacts be set up for the Test DILM tier.

Production

If you’ve taken my advice and marched SSIS development through this process or a similar process, the Pareto Principle says you’ve identified at least 96% of the bugs prior to deployment to Production. My experience bears this out; about 4% of my Production deployments fail because some parameter slipped through the cracks walking through Dev, Test, and Pre-production.

I can hear you thinking, “How does that happen, Andy?” Excellent question (and I am glad you asked!). parameters slip through the cracks because Dev and Test may be sharing data sources and/or destinations. It may be possible – desirable, even – to access Development data from the Test tier and vice versa.

Isolation

Remember: I want Pre-production to be identical to Prod, which means I want it equally isolated. Is Production behind a firewall? Is it impossible (without poking a hole in the firewall) to write from Test to Prod? If so, then it should also be impossible to write from Test to Pre-prod and from Pre-prod to Prod. If Prod has a firewall, Pre-production should have a firewall.

If your enterprise sports this architecture you can beat Pareto’s 4% prediction by 3.2% (an additional 80% of bugs will be caught) and experience a deployment-to-Production success rate of 99.2% (a failure rate of 0.8%).

To quote Foghorn Leghorn, “figures don’t lie.”

Conclusion

Is this the only way to properly manage ADF and SSIS in an enterprise? Goodness, no.

You can add more tiers to your Data Integration Lifecycle as needed. You can technically get by with three tiers (Dev, Test, Prod) but I don’t recommend it unless you’re the person wearing all the hats. Even then, the more times you test deployment, the more bugs you will catch prior to Production, and the fewer bugs you will deploy to Production.

It’s always easier to deploy more bugs but it is never less-expensive and it never saves time. Plus it may cost you your soon-to-be former largest customer.

Andy Leonard

andyleonard.blog

Christian, husband, dad, grandpa, Data Philosopher, Data Engineer, Azure Data Factory, SSIS guy, and farmer. I was cloud before cloud was cool. :{>

2 thoughts on “DILM Tiers for Enterprise Data Integration

  1. Hi Andy,

    This is the article I have been looking for! Glad I finally found it. I am still fairly new to SSIS development, or any development at all, but am wondering if you could point me to some other resources (for a newb) that explain the “typical” work flow from dev to prod using SSDT and Azure DevOps Pipelines (with GIT source control), and how it might differ for a team of one. Thanks! 😀

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.