Join me For Expert SSIS Training!

I’m honored to announce Expert SSIS – a course from Enterprise Data & Analytics!

The next delivery is 01-02 Apr 2019, 9:00 AM – 4:30 PM ET.

If you sign up by 31 Jan, you save money. Right now the course is on sale!

Data integration is the foundation of data science, business intelligence, and enterprise data warehousing. This instructor-led training class is specifically designed for SQL Server Integration Services (SSIS) professionals responsible for developing, deploying, and managing data integration at enterprise-scale.

You will learn to improve data integration with SSIS by:

  • Building faster data integration.
  • Making data integration execution more manageable.
  • Building data integration faster.

Agenda

  1. SSIS Design Patterns for Performance – how to build SSIS packages that execute and load data faster by tuning SSIS data flows and implementing performance patterns.
  2. SSIS Deployment, Configuration, Execution, and Monitoring – the “Ops” part of DevOps with SSIS using out-of-the-box tools and the latest utilities.
  3. Automation – how to use Business Intelligence Markup Language (Biml) to improve SSIS quality and reduce development time.

Did someone mention a sale?

Yep! The course is currently on sale until 31 Jan 2019!

I hope to see you there!

PS – Want to Learn More About Azure Data Factory?

The 2:00 AM Test

I’m working with a collection of clients these days providing data integration, cloud, and DevOps advice. I love architecture work! My clients love the results. Everyone wins.

Kent Bradshaw and I are tag-teaming with a couple clients who are converting data integration from other platforms to SSIS. In a recent meeting Kent mentioned “the 2:00 AM test.”

“What’s the 2:00 AM Test, Andy?”

I’m glad you asked! The 2:00 AM test is a question developers should ask themselves when designing solutions. The question?

“Will this make sense at 2:00 AM? When I have – or someone else has – been awakened by a job or application failure? Will I be able to look at this and intuit stuff about how this is supposed to work?”

Future You Will Thank You

The reason the 2:00 AM test is important is because later – at some point in the future when you’ve long-since stopped thinking about the stuff you’re thinking about when you designed this solution – you (or someone) will be awakened at 2:00 AM by some failure. Even if it’s you, there’s a good chance you won’t remember all the nuances of the design. Even if you do, you may not remember why you designed things the way you did.

So…?

Be like Hansel and Gretel. Leave yourself some breadcrumbs. What kind of breadcrumbs?

Comments

Code comments are the best way to record what you are thinking at the time you are thinking it. Consider the lifetime of a software solution. It may take an hour, a day, week, or month to build; but it may be in production for years. Are you going to remember why you made each decision you made? Are you going to be able to recall – years later – why you zigged instead of zagging?

Call out anything not readily accessible, anything invisible on the initial view of the immediate coding surface. If you’re writing VB or C#, include a comment explaining – at a minimum – where to learn more about classes and objects not readily available. If you used NuGet to import stuff, will it kill you to copy and paste the command into a comment?

In SSIS, if you’re using variables or parameters or event handlers, add annotation to the control flow or data flow task.

Leave yourself a note.

Practice Good Naming

My friend Joel has a t-shirt based on this tweet:

One the two hard things is naming things. I suggest descriptive names for stuff. Which stuff? In VB or C#, variables, methods, and classes for starters. For SSIS I heartily recommend Jamie Thomson’s iconic post titled Suggested Best Practises and naming conventions.

Good naming conventions promote self-documenting code.

Small Chunks

Coders on all platforms develop a sense of a “good size.” An easy trap to fall into is attempting to “boil the ocean.” Break things up. Practice separation of concerns. Write decoupled code.

You will find decoupled, small, function-of-work code is simpler to test, easier to maintain, and promotes code reuse. 

Conclusion

Future you will thank for incorporating these best practices in your coding, regardless of your software development platform.

Want to Learn More About Azure Data Factory?

From me?

I am honored to announce Getting Started with Azure Data Factory – a course from Enterprise Data & Analytics!

The next delivery is 04 Mar 2019, 9:00 AM – 4:30 PM ET.

If you sign up by 31 Jan, you save money. Right now the course is on sale!

Azure Data Factory, or ADF, is an Azure PaaS (Platform-as-a-Service) that provides hybrid data integration at global scale. Use ADF to build fully managed ETL in the cloud – including SSIS. Join Andy Leonard – authorblogger, and Chief Data Engineer at Enterprise Data & Analytics – as he demonstrates practical Azure Data Factory use cases.

In this course, you’ll learn:

  • The essentials of ADF
  • Developing, testing, scheduling, monitoring, and managing ADF pipelines
  • Lifting and shifting SSIS to ADF SSIS Integration Runtime (Azure-SSIS)
  • ADF design patterns
  • Data Integration Lifecycle Management (DILM) for the cloud and hybrid data integration scenarios

Did someone mention a sale?

Yep! The course is currently on sale until 31 Jan 2019!

I hope to see you there!

PS – Join me For Expert SSIS Training!

How May I Serve You?

2018 is drawing to a close.

Some readers recently shared requests for more blog posts on certain topics. I thought I’d put the question to my audience (both of you, hi Mom!) to gather your thoughts. Feel free to leave a comment or reach out via email.

Here are some topics I plan to blog about in 2019, Lord willing:

  • SSIS (of course)
  • Azure Data Factory
  • Azure DevOps
  • Data Integration Testing
  • Data Integration Lifecycle Management (DILM)
  • Containers

What do you think? Good list? 

If you were the boss of me, which topics would you like to see – or see more of? Any additions to this list?

Merry Christmas!
*<:{>

SSIS, Docker, and Windows Containers, Part 0 – Getting Started

To begin tinkering with SSIS in containers, you first need to install Docker. There are some prerequisites. I will not exhaust the prerequisites here. I strongly suggest you familiarize yourself with the requirements for Docker prior to attempting to install the software.

Since I use a PC (a Lenovo P51S) running Windows 10 Pro, I chose to use Docker for Windows.

I Needed a HyperVisor

I chose to run Docker for Windows with Hyper-V:

One reason I upgraded to Windows 10 was to work with containers. I read posts and articles that stated I could use Docker for Windows with VirtualBox, and I had been using VirtualBox for a long time. When I started using VirtualBox, it was the only HyperVisor that was:

  1. Free; and
  2. Supported 64-bit guests.

I decided to switch to Hyper-V, though, and rebuilt my demo and development virtual machines in Hyper-V.

IWOMM; YMMV (It works on my machine; your mileage may vary…)  😉

Choose a Container OS Platform

Once Docker is installed you need to decide whether to work with Linux or Windows containers, but you can switch anytime:

One nice thing about switching is it’s fairly painless, as one may glean from the message that displays when I click “Switch to Linux containers…”:

Docker for Windows with Windows Containers

The cool kids are using Linux containers, especially the cool SQL Server kids. I’ve tinkered a little with SQL Server on Linux. I’m interested in SSIS, though, so I’ve been focusing on running Windows containers in Docker for Windows.

Getting Started

Containers are like lightweight virtual machines. They typically consume less disk space. Containers spin up relatively quickly. A developer can configure a container, persist it in some state, and then turn it off. It’s faster and lighter than working with a VM. There are other benefits that we will get to later in this series.

After installing Docker, your next step is pulling an image.

Pulling an Image

I can hear you thinking, “Umm, Andy… what’s an image?” I’m glad you asked. From the Docker Glossary:

An Image is an ordered collection of root filesystem changes and the corresponding execution parameters for use within a container runtime. An image typically contains a union of layered filesystems stacked on top of each other. An image does not have state and it never changes.

I can now hear you thinking, “Umm… what?” I think of an image as a pre-configured container. What’s configured in the pre-configuration? Well, the operating system and other software I may want to use.

For example, the hello-world image is relatively small and will test your installation of docker. In the animated gif below, I show how I:

  • Search for images named “hello-world”
  • Locate the name of an image labeled “Official” and named “hello-world”
  • Pull the “hello-world” image (sped up – my internet is not that fast out here in Farmville…)
  • Re-pull the “hello-world” image to show how things look when the image is up to date.
  • Run the image

(click to open, enlarged, in a new tab)

As stated earlier, the hello-world image is a test.

Searching, Pulling, and Starting a Windows Container with SQL Server Installed

To search for, pull, and start a Windows container with SQL Server installed and configured, execute the following Docker commands:

  1. docker search mssql
  2. docker pull microsoft/mssql-server-windows-developer
  3. docker run -d -p 1433:1433 -e sa_password=$up3r$3cr3t -e ACCEPT_EULA=Y –name mySqlServerContainer microsoft/mssql-server-windows-developer

If all goes well, you will see the container id – a hexadecimal number such as:

2d06a9a756b7b75bb0e388173bdd6925ba712e8843848610b5f5276a69f4bf19

Command Line Switches

#3 above has a lot of switches on that command line. Here’s what they mean:

  • -d == detach, which tells docker to run the container in the background and print the container id
  • -p == publish list, which publishes a container’s port(s) to the host
    • I use -p to map the container’s port 1433 (SQL Server uses port 1433) to my laptop’s port 1433
  • -e == env list, which sets a container’s environment variable(s) to a value (or values).
    • I use -e to set the sa_password and ACCEPT_EULA environment variables
  • –name == assigns a name to the container
  • The final argument is the name of the image

By the way, you can get help on any command in docker by typing docker <command> — help. To complete the list above, I typed:

docker run –help

Interacting with Your Newly-Started Container

If you saw that long hex number, your SQL Server Windows container started. It’s a brand new container – it still has that new container smell.

There are a few ways to interact with this container. Let’s look at one, PowerShell.

Connect to PowerShell in mySqlServerContainer using the following command:

docker exec -it mySqlServerContainer powershell

Once connected, you can use PowerShell to execute all sorts of commands, such as:

dir
(or ls):

Cool? I think so too.

Conclusion

I’m going to stop here because this post is long enough and this is enough to get you started using SQL Server in a Windows container. That was my goal today.

Enjoy!

Packaging SSIS Catalog Deployments

I love the SSIS Catalog. It’s an elegant piece of data integration engineering and I cannot say enough positive things about it. Packaging SSIS Catalog deployments can be tricky, though.

The SSIS Catalog is a framework. Frameworks manage execution, configuration, and logging; and the SSIS Catalog handles each task with grace. Like I said, I love it!

But…

(You knew there was a “but…” coming, didn’t you?)

A Tale of Two Audiences

There are two audiences for the SSIS Catalog, two groups of consumers:

  1. Administrators
  2. Developers
  3. Release Managers

I listed three. Because…

Administrators

(click to enlarge)

SSIS is often administered by database administrators, or DBAs. I admire DBAs. It’s often a thankless job – more like a collection of jobs all rolled into one (and subsequently paid as if its one job…).

I believe the SSIS Catalog interface presented to DBAs in SQL Server Management Studio is sufficient.

My complaint is the SSIS administrator has to expand a handful of nodes in Object Explorer and then right-click to open the SSIS project configuration window and then double-click each referenced SSIS Catalog environment to determine which value is configured for use when an SSIS package is executed.

Click the screenshot above to see what I mean. Configuring SSIS Catalog deployments in SSMS is challenging. I find it… clunky. Once I understood all the windows, what they meant and how to configure an SSIS package and project deployed to the SSIS Catalog, this made sense. But – in my opinion – this interface works against comprehension.

Does this interface work, though? It certainly does. When I teach people how to use the SSIS Catalog, I show them how to use the Object Explorer interface provided in SSMS.

(click to enlarge)

I don’t stop there, however, because I built one solution to the problem. I call my solution SSIS Catalog Browser. If you click to enlarge this image you will note I am viewing the configuration of the same parameter displayed in the SSMS image above. I find this interface cleaner.

Do administrators still need to understand how to configure SSIS Catalog deployments and SSIS packages and projects deployed to the SSIS Catalog? You bet. There is no substitute for understanding. SSIS Catalog Browser surfaces the same metadata displayed in the SSMS Object Explorer. The only difference is Catalog Browser is easier to navigate – in my opinion.

Best of all, Catalog Browser is free.

Developers and Release Managers

SSIS developers and release managers (DevOps release teams) need more functionality. As I wrote in DILM Tiers for the SSIS Enterprise, an enterprise should have a minimum of four Data Integration Lifecycle Management (DILM) tiers to manage enterprise data integration with SSIS. Those tiers need to be:

  1. Development – an environment where SSIS developers build SSIS packages and projects. SSIS developers need permission / rights / roles to utterly destroy the database instances in Dev. If the SSIS developers lack this ability, you have “an environment named Development but not a Development environment.” There is a difference.
  2. Test or Integration – an environment where SSIS developers have permission / rights / roles to deploy, configure, execute, and view logs related to SSIS packages and projects.
  3. UAT or QA (User Acceptance Testing or Quality Assurance or any environment other than Production, Test, or Development) – an environment that mimics Production in security, permission / rights / roles. Developers may (or may not) have read access to logs, source, and destination data. SSIS administrators (DBAs or DevOps / Release teams) own this environment. The person performing the deployment to Production should perform the deployment to UAT / QA / Whatever because I do not want the Production deployment to be the very first time this person deploys and configures this SSIS project.
  4. Production.

I architect data integration environments in this manner to support DILM (Data Integration Lifecycle Management) with SSIS, as I wrote in Data Integration Lifecycle Management with SSIS.

Viewing the contents of an SSIS Catalog is not enough functionality to manage releases. Why, then, do I include developers? Because…

SSIS developers create the initial SSIS Catalog deployments in the DILM DevOps cycle.

I cannot overemphasize this point. Developers need an environment where they are free to fail to build SSIS. They aren’t free to succeed, in fact, unless and until they are free to fail.

Have you ever heard a developer state, “It works on my machine.”? Do you know why it works on their machine? Defaults. They coded it up using default values. The defaults have to work or the developer will not pass the code along to the next tier.

How does an SSIS developer know they’ve forgotten to parameterize values?
How do they figure this out?

It’s impossible to test for missing parameters in the Development environment.

The answer is: SSIS developers must deploy the SSIS project to another environment – an environment separate and distinct from the Development environment – to test for missing parameterization.

To review: SSIS developers need a Development environment (not merely an environment named Dev) and they need a different environment to which they can deploy, configure, execute, and monitor logs.

Error Elimination

Having the SSIS developers build and script the SSIS Catalog deployments eliminates 80% of deployment configuration errors (according to Pareto…).

Having SSIS administrators practice deployment to UAT / QA / Whatever eliminates 80% of the remaining errors.

Math tells me an enterprise practicing DILM in this manner will experience a 4% deployment error rate. (Want to knock that down to 0.8%? Add another tier.)

Packaging SSIS Deployment

I will not go into functionality missing from the SSMS Object Explorer Integration Services Catalogs node (nor the underlying .Net Framework assemblies). I will simply state that some functionality that I believe should be there is not there.

(click to enlarge)

I don’t stop there, however, because I built one solution to the problem. I call my solution SSIS Catalog Compare. If you click the image to enlarge it, you will see a treeview that surfaces SSIS Catalog artifacts in the same way as SSIS Catalog Browser (they share a codebase). You will also see the results of a comparison operation, and the user initiating the packaging of an SSIS folder deployment by scripting the folder and its contents.

The result is a file system folder that contains T-SQL script files and ISPAC files for each SSIS Catalog configuration artifact:

  • Folder
  • SSIS Project
  • Environment
  • Reference
  • Configured Literals

You can use SSIS Catalog Compare to lift and shift SSIS Catalog objects from any environment to any other environment – or from any DILM tier to any other DILM tier – provided you have proper access to said environments and tiers.

This includes lifting and shifting SSIS to the Azure Data Factory SSIS Integration Runtime, also know as Azure-SSIS.

Zip up the contents of this file system folder, attach it to a ticket, and let your enterprise DevOps process work for data integration.

Conclusion

The SSIS Catalog is a framework, and a pretty elegant framework at that. Some pieces are clunky and other pieces are missing.

DILM Suite helps.

Biml, Book Reviews, and Metadata-Driven Frameworks

I occasionally (rarely) read reviews at Amazon of books I’ve written. If I learn of a complaint regarding the book I often try to help. If a reader experiences difficulty with demos, I often offer to help, and sometimes I meet with readers to work through some difficulty related to the book (as I offered here). About half the time, there’s a problem with the way the book explains an example or the sample code; the other half the time the reader does not understand what is written.

I own both cases. As a writer it’s my job to provide good examples that are as easy to understand as possible. Sometimes I fail.

In very rare instances, I feel the review misrepresents the contents of a book –  enough to justify clarification. This review afforded one such opportunity. None of what I share below is new to regular readers of my blog. I chose to respond in a direct manner because I know and respect the author of the review, and believe he will receive my feedback in the manner in which it was intended – a manner (helpful feedback) very similar to the manner in which I believe his review was intended (also helpful feedback).

Enjoy.

My Reply to This Review of The Biml Book:

Thank you for sharing your thoughts in this review.

I maintain successful technology solutions are a combination of three factors:

  1. The problem we are trying to solve;
  2. The technology used to solve the problem; and
  3. The developer attempting to use a technology to solve the problem.

I believe any technologist, given enough time and what my mother refers to as “gumption” (a bias for action combined with a will – and the stubbornness, er… tenacity – to succeed), can solve any problem with any technology.

I find your statement, “It doesn’t really teach BIML but rather teaches a specific optional framework for those who want to do everything with BIML rather than increase productivity for repetitive patterns” inaccurate. Did you read the entire book? I ask that question because I most often encounter readers who state the precise opposite of the opinion expressed in this review (some of your fellow reviewers express the opposite opinion here, even). I disagree with your statement, but I understand your opinion in light of other opinions expressed to me by readers during my past decade+ of writing several books. I share one example in this blog post. The short version: we often get more – or different – things from reading than we realize.

There is a section in The Biml Book – made up of 3 of the 20 chapters and appendices – that proposes a basic metadata-driven Biml framework which is the basis of a metadata-driven Biml framework in production in several large enterprises. I wrote those 3 chapters and the basic metadata-driven Biml framework in question. Varigence has also produced a metadata-driven Biml framework called BimlFlex – based upon similar principles – which has been deployed at myriad enterprises. The enterprise architects at these organizations have been able to improve code quality and increase productivity, all while decreasing the amount of time required to bring data-related solutions to market. They do not share your opinion, although they share at least some of the problems (you mentioned ETL) you are trying to solve.

Am I saying Biml – or the framework about which I wrote or even BimlFlex – is the end-all-be-all for every SSIS development effort? Goodness no! In fact, you will find disclaimers included in my writings on Biml and SSIS frameworks. I know because I wrote the disclaimers.

Misalignment on any of the three factors for successful technology solutions – the problem, the technology, and/or the developer – can lead to impedance mismatches in application and implementation. For example, Biml is not a good solution for some classes of ETL or data integration (points 1 and 2). And learning Biml takes about 40 hours of tenacious commitment (which goes to point 3). This is all coupled with a simple fact: data integration is hard. SSIS does a good job as a generic, provider-driven solution – but most find SSIS challenging to learn and non-intuitive (I did when I first started using it). Does that mean SSIS is not worth the effort to learn? Goodness, no! It does mean complaints about simplifying the learning process are to be expected and somewhat rhetorical.

Architects disagree. We have varying amounts of experience. We have different kinds of experience. Some architects have worked only as lone-wolf consultants or as members of single-person teams. Others have worked only as leaders of small teams of ETL developers. Rarer still are enterprise architects who are cross-disciplined and experienced as both lone-wolf consultants and managers of large teams in independent consulting firms and large enterprises. Less-experienced architects sometimes believe they have solved “all the things” when they have merely solved “one of the things.” Does this make them bad people? Goodness, no. It makes them less-experienced, though, and helps identify them as such. Did the “one thing” need solving? Goodness, yes. But enterprise architecture is part science and part art. Understanding the value of a solution one does not prefer – or the value of a solution that does not apply to the problem one is trying to solve – lands squarely in the art portion of the gig.

Regarding the science portion of the gig: engineers are qualified to qualitatively pronounce any solution is “over-engineered.” Non-engineers are not qualified to make any such determination, in my opinion.

By way of example: I recently returned from the PASS Summit. Although I did not attend the session, I know an SSIS architect delivered a session in which he made dismissing statements regarding any and all metadata-driven frameworks related to ETL with SSIS. If I didn’t attend his session, how do I know about the content of the presentation? A number of people in attendance approached me after the session to share their opinion that the architect, though he shared some useful information, does not appreciate the problems faced by most enterprise architects – especially enterprise architects who lead teams of ETL developers.

My advice to all enterprise architects and would-be enterprise architects is: Don’t be that guy.

Instead, be the enterprise architect who continues to examine a solution until underlying constraints and drivers and reasons the solution was designed the way it was designed are understood. Realize and recognize the difference between “That’s wrong,” “I disagree,” and “I prefer a different solution.” One size does not fit all.

Finally, some of the demos and Biml platforms were not working at the time you wrote this review. Many have been addressed since that time. In the future, updates to SSIS, Biml, and ETL best practices will invalidate what this team of authors wrote during the first half of 2017. As such, the statements, “And much of the online information is outdated and no longer works. It’s a shame someone hasn’t found a way to simplify the learning process.” is a tautology that defines progress. Learning is part of the job of a technologist. I encourage us all – myself most of all – to continue learning.

Peace.

Catalog Browser v0.7.8.0

I’ve been making smaller, more incremental changes to SSIS Catalog Browser – a free utility from the Data Integration Lifecycle Management suite (DILM Suite).

You can use SSIS Catalog Browser to view SSIS Catalog contents on a unified surface. Catalog Browser works with SSIS Catalogs on-premises and Azure Data Factory SSIS Integration Runtime, or Azure SSIS. It’s pretty cool and the price ($0 USD) is right!

The latest change is a version check that offers to send you to the page to download an update. You will find this change starting with version 0.7.7.0.  Version 0.7.8.0 includes a slightly better-formatted version-check message. As I said, smaller, more incremental changes.

Enjoy!

Want a Free #DILM Book? See Me at the #PASSsummit 2018!

I’ll be at the PASS Summit 2018 next week. I’m delivering a full-day precon Monday, presenting Faster SSIS and participating in the BI & Data Visualization Panel Wednesday, and Enterprise Data & Analytics is exhibiting Wednesday through Friday.

“You Mentioned a Free Book…”

Oh. Yeah. That.

I will be giving away free copies of my latest book: Data Integration Life Cycle Management with SSIS: A Short Introduction by Example starting Wednesday! Would you like to score a free copy? You’ll have to catch me carrying them in the Washington State Convention Center.

They are free while supplies last.

Where Can You Find Andy at the PASS Summit 2018?

Precon!

Monday 5 Nov 2018, I’m delivering a full-day pre-conference session titled Intelligent Data Integration with SSIS. I’m going to cover  everything listed at that link but there is an update about my precon content:

There will be Azure Data Factory content and demos!

Why this addition? Two reasons:

  1. My presentation titled Faster SSIS was selected for Wednesday, 7 Nov 2018 at 1:30 PM PT in room 612. I usually include the three Faster SSIS demos in my precon. This time, you can just view the Faster SSIS session to see those demos. I’ve added some Azure Data factory to the precon in place of these three demos!
  2. I am participating in a panel discussion titled BI & Data Visualization Panel Wednesday, 7 Nov 2018 at 3:15 PM PT in room TCC Yakima 1.

Enterprise Data & Analytics is Exhibiting!

That’s right, you can find me in the Exhibition Hall! Enterprise Data & Analytics is exhibiting at the PASS Summit 2018!

If you are attending the PASS Summit 2018, please catch me for a free book while supplies last. If I’m out of physical copies, come see me anyway… we will work something out. Promise!

:{>

DILM Tiers for the SSIS Enterprise

“How many Data Integration Lifecycle Management tiers (DILM tiers) do I need to deploy SSIS, Andy?” If I had a nickel for every time I am asked this question, I would have a lot of nickels. My answer is, “Four.”

Why Four?

I’m glad you asked. Before I answer, I have a question for you: Have you ever been told by a developer, “It works on my machine.”? Do you know why it works on the developer’s machine?

Defaults.

Developers build software that contains configurations information. In the first iteration of building the software, most developers do no externalize – or parameterize – all configuration values. Instead, they hard-code these values into the application, usually as variable defaults.

I can hear you thinking…

“But This Is SSIS, Andy”

And? Oh, you ask because SSIS stands for “SQL Server Integration Services.” I get that. It’s important you understand this:

SSIS development is software development.

SQL Server Integration Services suffers somewhat from its name which contains the name of a popular relational database engine (SQL Server). But SSIS is not a database, and SSIS development is not database development (and most definitely not database administration).

SSIS development is software development.

Software Best Practices Apply

Because SSIS development is software development, software development best practices apply. For example, SSIS developers should source-control and test their SSIS code.

Perhaps testing is that thing you do in Production right after deployment. I have this pithy saying:

While I freely admit this statement is pithy, it is also true. The remainder of this quote is even pithier: “Some software is tested by your soon-to-be-former largest customer.”

I don’t want you to lose customers – especially your largest customer. That’s why I advocate you test your software by developing it on one Data Integration Lifecycle Management – or DILM – tier and then promoting the code to at least two other DILM tiers before deploying to the Production tier.

Tiers. Or Tears. Your Call.

So which tiers do I recommend? I’m glad you asked! i recommend:

  1. Development
  2. Test
  3. QA
  4. Production

Development

You must build the software somewhere. Wherever you build the software is your development environment.

Now let’s take a step back and describe the development environment for an SSIS developer. An SSIS development DILM tier shares some characteristics. In Dev, SSIS developers can:

  • Deploy SSIS projects and packages
  • Overwrite existing SSIS projects and packages
  • Delete existing SSIS projects and packages
  • Execute SSIS packages
  • Manage (deploy, overwrite, delete) SSIS configurations
  • Manage (deploy, overwrite, delete) SSIS metadata
  • Create databases, tables, and files

The most important part of the SSIS Development DILM tier? SSIS developers can perform each action listed above without help from anyone else in the enterprise.

I can hear you thinking, “What if the SSIS developers blow something up, Andy? Huh? HUH?!?” Let me ask another question: What happens when things go awry or amiss in Production? Isn’t the answer to that question, “I fix it.”? If that’s true in Production…

Then you fix it.
In Dev.

In fact (and I know this is a crazy thought) you could actually use a more-open Development tier to test and – perhaps, even – improve break-fix procedures for Production. After all, developers are notoriously excellent testers (I write this as a developer: I can break things like nobody’s business!).

What if the Dev tier becomes a place for you to test your recovery procedures for Production? If your HA/DR procedures work in Development where developers have too much access, then they should work in Production where the environment has been locked down. Am I correct?

Over the years I have found myself making this statement to enterprises: “You have an environment named Development but it is not a Development environment.” How do you know it’s a Development environment? If the developers can destroy it, it’s Dev.

One last thing before I kick this soapbox back under my desk. Consider that developers will be inconvenienced by the destruction of their Dev environment. It may slow them down. Heck, it may cost them a deadline. If that happens, you can take the opportunity to educate them, sharing what happened and why you believe it happened. Maybe they won’t do the dumb thing that burned down Dev again… Maybe.

Would having developers that better understand databases make your job a little easier in the long run?

Test

Once the software works on the SSIS developer’s machine, they need another environment to which they can deploy the SSIS project. Why? Remember those defaults? SSIS packages will always execute in Dev because default values are aimed at local resources for which the SSIS developer has authorization and access.

Which resources? Required access (CRUD – create, read, update, delete) to files and directories, databases, servers, etc. No SSIS developer would claim an SSIS package is ready unless and until she or he successfully executes the package in the SQL Server Data Tools (SSDT) debugger. But a successful SSDT debugger execution does not a completed SSIS package make.

To suss out missing configurations (externalization, parameterization), one needs to move the software to another lifecycle management tier and execute it there. Only then can the developer be sure they’ve externalized required defaults.

In Test, SSIS developers can:

  • Deploy SSIS projects and packages
  • Overwrite existing SSIS projects and packages
  • Delete existing SSIS projects and packages
  • Execute SSIS packages
  • Manage (deploy, overwrite, delete) SSIS configurations
  • Manage (deploy, overwrite, delete) SSIS metadata
  • Create databases, tables, and files

I can hear you thinking, “But Andy, this is the same list of permissions as Dev!”
Yes.

This part of the work must be completed. As professionals, we get to decide when and where the work gets done. Remember, it can always be done in Production. And it will be done in Production unless we are intentional about doing this work on another DILM tier.

Someone needs to do this part of the work. Do you want to do it? Or would you rather have the person most familiar with the code – the SSIS developer – do it? They are not going to catch everything that needs to be externalized, either; not on this pass. But the Pareto Principle informs us that they will catch 80% of the missing externalization.

Deployment to the DILM Test tier serves as the first deployment test. I want the SSIS developers bundling up SSIS deployments. Why? It’s good practice and practice makes perfect.

On Estimation…

One last point regarding software testing: When estimating a software project I begin by estimating how much time it will take to build the software. After this step the calculation for how much time is required to test the software is really easy: It’s the same amount of time required to develop the software.

Andy, are you telling me it takes just as long to test software as to develop it?

Yep.

“Cutting Testing to Save Time”

You can change this relationship by “cutting testing to save time.” But you can never shorten testing, only lengthen it. Decades of software development experience inform me that the costs of testing software increase by an order of magnitude for each tier of testing. The most expensive environment in which to test is Production.

Cutting testing never saves time.

Or cutting testing does save time; the time it takes you to deliver broken software. I can hear some of you thinking, “But can’t SSIS developers just develop better SSIS packages with fewer bugs?” The answer is the same as the answer to the physics question: “Can’t light just be either a wave or a particle, and not both?” Physics says no, light is both a wave and a particle. Andy says the same physics apply to software development along with the Theory of Constraints which tells us losses accumulate, gains do not.

As my friend Aaron Lowe used to say (before he passed away earlier in 2018 – man, I miss him, and others… we lost too many SQL Family members this year…), “Math is hard.”

You must include the cost of lost customers in the calculation.

Remember, all software is tested…

QA

The QA (or UAT or whatever you call the environment sitting just this side of Production) environment should be locked down tighter than a tick. The security in place in this pre-Production tier should be identical to the security in the Production tier. Optimally, all hardware capability and storage capacity in Production will match QA.

If you understand the dynamics of SSIS Data Flow Task internals, it’s possible to test performance with less data on sub-optimal hardware. If you ask me to do it, I’m going to let you know there’s a possibility we will miss something. I’ll be able to tell you for sure after we run some performance test executions in Production (and we will need to do that to be sure, and that will cost both time and money, so how much money are you saving when you save money by buying sub-optimal hardware and storage for QA?).

Deployment to QA should be performed by the same team – optimally by the same individual – that will be performing the deployment to Production. Why? We do not want deployment to Production to be their first experience with deploying this SSIS project. The deployment to QA is another deployment test, this time for the Production release management person. Perhaps this is a DBA. Perhaps this is a Release Management team member. Regardless, this step is their opportunity to practice the deployment to Production.

I can hear you thinking, “What’s the big deal, Andy?” I’m glad you asked. The big deal is: No matter how you execute SSIS in your enterprise, there’s more to the solution than merely the SSIS packages. If you’re running from the file system (and most enterprises execute SSIS from the file system), there are configurations stored in dtsConfig files or database tables that also need to be promoted. Some of the values – such as connection string components – need to be edited in these dtsConfig files at each DILM tier.

The same holds for deployments to the SSIS Catalog. And… if you are deploying (or planning to deploy) SSIS to the Azure Data Factory (ADF) SSIS Integration Runtime, your only option is deployment to an SSIS Catalog. Externalization is accomplished via Catalog Environments, References, and Literals in the SSIS Catalog. Same stuff, different location. But both dtsConfig files and Catalog Environments must be managed.

Since editing will need to occur when it’s time to deploy to Production, I prefer the deploy-er practice the editing along with the deployment. I ask the developer to send the SSIS and scripts or files for managing external configurations metadata to the deploy-er, and I request these configuration artifacts be set up for the Test DILM tier.

Production

If you’ve taken my advice and marched SSIS development through this process or a similar process, the Pareto Principle says you’ve identified at least 96% of the bugs prior to deployment to Production. My experience bears this out; about 4% of my Production deployments fail because some parameter slipped through the cracks walking through Dev, Test, and QA.

I can hear you thinking, “How does that happen, Andy?” Excellent question (and I am glad you asked!). parameters slip through the cracks because Dev and Test may be sharing data sources and/or destinations. It may be possible – desirable, even – to access Development data from the Test tier and vice versa.

Isolation

Remember: I want QA to be identical to Prod, which means I want it equally isolated. Is Production behind a firewall? Is it impossible (without poking a hole in the firewall) to write from Test to Prod? If so, then it should also be impossible to write from Test to QA and from QA to Prod. If Prod has a firewall, QA should have a firewall.

If your enterprise sports this architecture you can beat Pareto’s 4% prediction by 3.2% (an additional 80% of bugs will be caught) and experience a deployment-to-Production success rate of 99.2% (a failure rate of 0.8%).

To quote Foghorn Leghorn, “figures don’t lie.”

Conclusion

Is this the only way to properly manage SSIS in an enterprise? Goodness, no.

You can add more tiers to your Data Integration Lifecycle as needed. You can technically get by with three tiers (Dev, Test, Prod) but I don’t recommend it unless you’re the person wearing all the hats. Even then, the more times you test deployment, the more bugs you will catch prior to Production, and the fewer bugs you will deploy to Production.

It’s always easier to deploy more bugs but it is never less-expensive and it never saves time. Plus it may cost you your soon-to-be former largest customer.