Mean Time to Identify Failure

While managing a team of 40 ETL developers, I wanted to track lots of metrics. Some of the things I wanted to track were technical, like SSIS package execution times. Some metrics were people-centric. 

Andy’s First Rule of Statistics states:

You can use statistics for anything about people, except people.

Andy – circa 2005

It was important to me to track how long it took the on-call person to identify the problem. I didn’t use the information to beat on-call people over the head. I used the information to measure the results of several experiments for displaying metadata about the failure.

Reports For The Win

You may be as shocked by this as I was; reports helped a lot more than I anticipated. Before I deployed the reports the Mean Time to Identify Failure was tracking just under 30 minutes. After deploying the reports, the mean time to identify failure fell to 5 minutes.

As I said, I was shocked. There were mitigating circumstances. The on-call team members were getting more familiar with the information SSIS produces when it logs an error. They were gaining experience, seeing similar errors more than once.

I accounted for growing familiarity by narrowing the time window I examined. The least-impressive metrics put the reduction at 18 minutes to 5 minutes.

Pictures…

(click to enlarge)

Before I built and deployed the dashboard for SSIS Application Instances (like the one pictured at the top of this post), on-call people would query custom-logging tables we built to monitor enterprise data integration. The queries to return Application Instance log data were stored where everyone could reach them. In fact, I used the same queries as sources for this report.

A funny thing happened when I deployed the reports. Each week, one or more on-call people would ping me and tell me how much they liked the reports. Even though the data was the same, the presentation was different. A picture with a little color goes a long way.

The image at the beginning of this section – the SSIS Framework Task Instance Report – is displayed when a user clicks the Failed link shown in the initial screenshot. This design received he most comment by the on-call team members. The most common comment was, “I click the Failed link and it takes me to details about the failure.” The reports were passing The 2:00 AM Test.

SSIS Framework Applications

If you’ve read this far and wondered, “Andy, what’s an SSIS Application?” An SSIS Application is a construct I came up with to describe a collection of SSIS Packages configured to execute in a specific order. An application is a way to group SSIS packages for execution. You can get a sense of how our frameworks work – especially the application execution functionality – by checking out the SSIS Framework Community Edition at DILM Suite (DILM == Data Integration Lifecycle Management).

(click to enlarge)

An Application Instance is an instance of execution of an SSIS Application. An Application Instance is made up of Package Instances. the relationship between applications and packages appears straightforward: an application is a collection of packages; parent-child; one-to-many. But it’s not quite that simple. Our SSIS Frameworks facilitate patterns that execute the same package multiple times, sometimes in parallel! We can also create packages that perform utility functions – such as ArchiveFile.dtsx – and call it from multiple applications. When you do the Boolean algebra, the relationship between applications and packages is many-to-many. 

Our SSIS Frameworks are SSIS Catalog-integrated. They even work with the SSIS Integration Runtime that’s part of Azure Data Factory, Azure-SSIS. 

Dashboards… Evolved

While the Reporting Services dashboard was neat when it was released back in the day, the cool kids now play with Power BI. At DILM Suite you will also find a free – albeit basic – Power BI dashboard that surfaces many of the same metrics using even better reporting technology. The Basic SSIS Catalog Dashboard in Power BI is free at DILM Suite.

I’ve not yet collected Mean Time to Identify Failure metrics using the Basic SSIS Catalog Dashboard in Power BI dashboard. Perhaps you can be the first.

Enjoy!

Delivering Intelligent Data Integration through SSIS in Dallas 31 May 2019!

I am honored to announce I will be delivering Intelligent Data Integration through SSIS – a day-long SQL Saturday 841 – Dallas pre-conference session 31 May 2019!

Description:

What is Intelligent Data Integration? SSIS (SQL Server Integration Services) packages developed using tried and true design patterns, built to participate in a DevOps enterprise practicing DILM (Data Integration Lifecycle Management), produced using Biml (Business Intelligence Markup Language) and executed using an SSIS Framework.

Attend a day of training focused on intelligent data integration delivered by an experienced SSIS consultant who has also led an enterprise team of several ETL developers during multiple projects that spanned 2.5 years (and delivered).

Attendees will learn:

  • a holistic approach to data integration design.
  • a methodology for enterprise data integration that spans development through operational support.
  • how automation changes everything. Including data integration with SSIS.

Topics include:

  1. SSIS Design Patterns
  2. Executing SSIS in the Enterprise
  3. Custom SSIS Execution Frameworks
  4. DevOps and SSIS
  5. Biml, Biml Frameworks, and Tools

Prerequisites: Familiarity with SQL Server Integration Services (SSIS).

Continental Breakfast, Lunch and Afternoon Snacks are provided.

I hope to see you there!

:{>

Lift and Shift is Not a Strategy…

Lift and shift is a bridge.

If your enterprise is considering migrating enterprise data and data integration workloads from on-premises to the cloud, lifting and shifting is not the end of the story. Lifting and shifting is, rather, the mechanism – the conversion project – that positions your enterprise to leverage the economies of scale afforded by cloud technology.

Andy’s Lift and Shift FAQ

“Should We Lift and Shift?”

I hear this question often and my response is, “When it makes sense to do so, yes, you should lift and shift.” This begs the next question, which is…

“How Do We Know It’s Time to Lift and Shift?”

My engineer-y response to this question is, “You will know it’s time to lift and shift to the cloud when you want to leverage functionality available in the cloud that is not (or not yet) available in on-premises versions of the enterprise platform(s) in use in your enterprise.”

“What Are Some Advantages of Migrating to the Cloud?”

The biggest advantage of lifting and shifting enterprise data to the cloud is the ability to efficiently scale operations. By efficiently, I mean quickly and easily – especially when compared to the time and expense (don’t forget opportunity cost when calculating expense) to scale up systems on-premises.

The ability to scale up and scale down on demand is a huge advantage for some business models which experience “spike-y” demand for operations at different times of the year, quarter, or month. But even if that’s not the case, all data scales. It’s very handy to be able to connect to the Azure Portal and move a slider (as opposed to purchasing and provisioning more hardware…).

There’s a brand new (in my opinion) “knob” exposed by cloud-enabled efficient scaling. As I wrote in my post titled Time and Money in the Cloud:

Let’s say you pay $100 to incrementally load your data warehouse and the load takes 24 hours to execute at the scale you’ve selected in the cloud. Prior to thinking in DTUs, engineers and business people would think, “That’s just the way it is. If I want more or faster, I need to pay for more or faster.” But DTU math doesn’t quite work that way. Depending on your workload and DTU pricing at the time (FULL DISCLOSURE: DTU PRICING CHANGES REGULARLY!), you may be able to spend that same $100 on more compute capabilities and reduce the amount of time required to load the same data into the same data warehouse to minutes instead of hours…

The fact that the cost/performance curve can be altered in seconds instead of months meta-changes everything.

“Are There Disadvantages of Migrating to the Cloud?”

It depends. (You knew that was coming, didn’t you?)

Enterprise Data & Analytics helps enterprises migrate data, data integration, lifecycle management, and DevOps to the cloud. In some cases (~30%), the enterprises spend a little more money in the near-term. There are two reasons for this:

  1. When upgrading, it’s always a good idea to operate new systems in tandem with existing systems. In a lift and shift scenario, this means additional expenses for cloud operations while maintaining the expense of on-premises operations. As cloud operations are validated, on-premises operations are shut off; thereby reducing operating expenses. In truth, though, this dynamic (and expense) exists whether one is lifting and shifting to the cloud or simply upgrading system on-premises.
  2. “Standing on the bridge” (more in a bit) can cost more than remaining either on-premises or lifting and shifting the entire enterprise workload to the cloud.
  3. Regulatory requirements – including privacy and regulations about which data is allowed to leave nation-states – will constrain many industries, especially government agencies and NGOs (non-governmental organizations) who interact heavily with government agencies.

Standing On The Bridge

One option we at Enterprise Data & Analytics consider when assisting enterprises in lift and shift engagements is something we call “standing on the bridge.” 

Standing on the bridge is present in each lift and shift project. It’s one strategy for implementing hybrid data management, which almost every enterprise in the cloud today has implemented. Hybrid means part of the enterprise data remains on-premises and part of the enterprise data is lifted and shifted to the cloud. 

Hybrid is implemented for a variety of reasons which include:

  • Mitigating regulatory concerns; and
  • As part of the normal progression of lifting and shifting enterprise data and data workloads to the cloud.

Standing on the bridge for too long is a situation to avoid. 

“How Do We Avoid Standing on the Bridge For Too Long?”

Planning. Planning is how an enterprise avoids standing on the bridge too long. Your enterprise wants advice from experienced professionals to shepherd the lift and shift operation. 

Enterprise Data & Analytics can help.

Helpful Tools

Enterprise Data & Analytics has been delivering, and writing and speaking about Data Integration Lifecycle Management for years. 

We’ve built helpful tools and utilities that are available at the DILM Suite. Most of the DILM Suite tools are free and some are even open source. 

Enterprise Data & Analytics Welcomes Shannon Lowder!

I am honored and thrilled to welcome Shannon Lowder (@shannonlowder | blog | LinkedIn) to the Enterprise Data & Analytics team!

Shannon is a data engineer, data scientist, BimlHero (though not listed on the page at the time of this writing), and shepherd of the Biml Interrogator open source project. If you use Biml to generate SSIS projects that load flat files, you need Biml Interrogator.

Shannon, Kent Bradshaw, and I are also co-authoring a book tentatively titled Frameworks. (Confession: Kent and Shannon are mostly done… I am slacking…)

Shannon brings a metric ton of experience to serve our awesome clients. He has years of experience in data analytics, serving recently in the role of enterprise Lead Data Scientist. Shannon’s experience spans supply chain management, manufacturing, finance, and insurance.

In addition to his impressive data skills, Shannon is an accomplished .Net developer with enterprise senior developer experience (check out Biml Interrogator for a sample of his coding prowess).

Shannon is a regular speaker at SQL Saturday events, presenting on topics that include Business Intelligence, Biml, and data integration automation. He is a gifted engineer with experience in framework design, data integration patterns, and Azure who possesses a knack for automation. Shannon is an avid “tinkerer” who enjoys learning. He has experience implementing Azure Machine Learning and applying AI to predictive analytics using sources classified Big Data. Shannon is also a practitioner of data integration DevOps with SSIS. In other words, he fits right in with our team here at Enterprise Data & Analytics!

As Shannon writes on his LinkedIn profile:

I am a data guy with a passion for partnering with clients to solve their database and technology issues. Over my career, I’ve played all the roles: database developer, administrator, business intelligence developer, and architect, and now consultant. I’m the guy you call in when you have the impossible problem and everyone tells you it cannot be solved. I automate solutions in order to free your current staff to do the higher value tasks. I bring solutions outside of traditional relational database solutions, in order to find the shortest path between you and your goals.

As an accomplished Microsoft SQL data professional, recognized BimlHero, and practicing Data Scientist, I’m the resource you need to extract the most value from your data.

I’m humbled and thankful and excited to watch Enterprise Data & Analytics continue to (quietly) grow – adding cool people (another announcement is forthcoming) and service offerings like Data Concierge. It’s very cool to watch!

Welcome Shannon! I’m honored to work with you, my brother and friend.

For more information, please contact Enterprise Data & Analytics!

Join me For Expert SSIS Training!

I’m honored to announce Expert SSIS – a course from Enterprise Data & Analytics!

The next delivery is 01-02 Apr 2019, 9:00 AM – 4:30 PM ET.

Data integration is the foundation of data science, business intelligence, and enterprise data warehousing. This instructor-led training class is specifically designed for SQL Server Integration Services (SSIS) professionals responsible for developing, deploying, and managing data integration at enterprise-scale.

You will learn to improve data integration with SSIS by:

  • Building faster data integration.
  • Making data integration execution more manageable.
  • Building data integration faster.

Agenda

  1. SSIS Design Patterns for Performance – how to build SSIS packages that execute and load data faster by tuning SSIS data flows and implementing performance patterns.
  2. SSIS Deployment, Configuration, Execution, and Monitoring – the “Ops” part of DevOps with SSIS using out-of-the-box tools and the latest utilities.
  3. Automation – how to use Business Intelligence Markup Language (Biml) to improve SSIS quality and reduce development time.

I hope to see you there!

PS – Want to Learn More About Azure Data Factory?

Follow Andy Leonard’s SSIS Training page for more information.

The 2:00 AM Test

I’m working with a collection of clients these days providing data integration, cloud, and DevOps advice. I love architecture work! My clients love the results. Everyone wins.

Kent Bradshaw and I are tag-teaming with a couple clients who are converting data integration from other platforms to SSIS. In a recent meeting Kent mentioned “the 2:00 AM test.”

“What’s the 2:00 AM Test, Andy?”

I’m glad you asked! The 2:00 AM test is a question developers should ask themselves when designing solutions. The question?

“Will this make sense at 2:00 AM? When I have – or someone else has – been awakened by a job or application failure? Will I be able to look at this and intuit stuff about how this is supposed to work?”

Future You Will Thank You

The reason the 2:00 AM test is important is because later – at some point in the future when you’ve long-since stopped thinking about the stuff you’re thinking about when you designed this solution – you (or someone) will be awakened at 2:00 AM by some failure. Even if it’s you, there’s a good chance you won’t remember all the nuances of the design. Even if you do, you may not remember why you designed things the way you did.

So…?

Be like Hansel and Gretel. Leave yourself some breadcrumbs. What kind of breadcrumbs?

Comments

Code comments are the best way to record what you are thinking at the time you are thinking it. Consider the lifetime of a software solution. It may take an hour, a day, week, or month to build; but it may be in production for years. Are you going to remember why you made each decision you made? Are you going to be able to recall – years later – why you zigged instead of zagging?

Call out anything not readily accessible, anything invisible on the initial view of the immediate coding surface. If you’re writing VB or C#, include a comment explaining – at a minimum – where to learn more about classes and objects not readily available. If you used NuGet to import stuff, will it kill you to copy and paste the command into a comment?

In SSIS, if you’re using variables or parameters or event handlers, add annotation to the control flow or data flow task.

Leave yourself a note.

Practice Good Naming

My friend Joel has a t-shirt based on this tweet:

One the two hard things is naming things. I suggest descriptive names for stuff. Which stuff? In VB or C#, variables, methods, and classes for starters. For SSIS I heartily recommend Jamie Thomson’s iconic post titled Suggested Best Practises and naming conventions.

Good naming conventions promote self-documenting code.

Small Chunks

Coders on all platforms develop a sense of a “good size.” An easy trap to fall into is attempting to “boil the ocean.” Break things up. Practice separation of concerns. Write decoupled code.

You will find decoupled, small, function-of-work code is simpler to test, easier to maintain, and promotes code reuse. 

Conclusion

Future you will thank for incorporating these best practices in your coding, regardless of your software development platform.

Two (or More) Kinds of Developers

I’ve made statements about “two kinds of developers” for years. These statements are false inasmuch as all generalizations are false. The statements are not designed to be truisms. They are designed to make people think.

Last week – while presenting a full-day pre-conference session at the PASS Summit 2018 and again when delivering a session about Faster SSIS – I repeated the sentiment shown at the top of this post:

There are two kinds of developers:
1) Those who use source control; and
2) Those who will.

I follow up with: “Because if you do not use source control, you will lose code one day and it will break your heart.” Audience members laugh and the point is made.

More Than Two

There are myriad types of developers. And that’s a good thing. Why? Because there are myriad problems developers face in the wild and those problems need to be solved.

There are no one-size-fits-all solutions. If you attended the PASS Summit last week you likely saw some really cool demos. I know I did. And you may have thought – or even been told – that this, this right here is the answer for which you’ve searched your entire career.

There’s a possibility that the people selling you on whatever-this-is are absolutely correct.
There’s a greater possibility that they are less than absolutely correct.

I write this not to disparage anyone’s solution (or my own solutions). Promise.
I write this to dissuade the disparaging of anyone else’s solution (because that also happens).

The Bottom Line

Goldilocks. The bottom line is we all want the Goldilocks solution. We want the just-right solution that maximizes efficiency and minimizes complexity.

That’s just hard.

I can hear you thinking, “Why is maximizing efficiency and minimizing complexity hard, Andy?” I’m glad you asked. Solutions are a moving target, part art and part science, and the only way to learn where and when to draw the art-science line is experience.

A Moving Target

Maximizing efficiency and minimizing complexity is hard because it’s not at all the bottom line; it’s a line in the middle – a balancing of mutually-exclusive demands on your time, expertise, and energy.

Plus, it shifts.

Everything scales. Things scale either up and / or out or they scale down and / or in. In general (generality warning!), down and in is “bad” and up and out is “good.”

Experience Matters

Experienced architects understand subtle nuances; the art part of the art / science of enterprise software. When you speak with an experienced architect, you may hear her say, “It depends,” often. Good architects will finish the sentence and share at least some of the things upon which “it depends.” Less-experienced architects will present naked scalars and polarized advice.

Naked Scalars

Naked scalars are numeric values in a vacuum. They are unsupported because most are unsupportable. In other words, they are lies. Now, “lies” is a pretty harsh word. I prefer an engineering definition for the word “truth” that sounds an awful lot like the oath witnesses are asked to swear in US courts:

“Do you promise to tell the truth, the whole truth, and nothing but the truth, so help you God?”

This oath covers falsehoods that are shared, yes; but it also covers omissions of the truth.

Examples of naked scalars:

  • “97% of engineers believe ____.”
  • “10% of the people I know have practiced ____ successfully.”

Polarized Advice

Polarized advice can be a special case of naked scalars, advice focused on 0% and 100%. Polarized advice may or may not include scalars (naked or otherwise).

Examples of polarized advice:

  • “I’ve never seen a good use case for ____.”
  • “You should always ____.”

Are naked scalars and polarized advice always bad and wrong? Nope. That would be a generality (and we covered generalities already, did we not?).

Managing the Risk of Inexperience

What exactly is a consultant communicating when they engage naked scalars or polarized advice?
They are signalling a lack of experience.
They are, in effect, stating, “I do not have experience with ____.”

How do you manage the risk of inexperience?
You hire people – architects, especially – who understand there are good reasons systems are designed as they are. They will say things like, “I’m not sure why this was designed this way,” and mean it. It’s not a criticism; it’s an admission of curiosity. Trust me on this: You want curious consultants. They are more likely to identify a solution that solves the problem you are trying to solve in a way that doesn’t create new problems. Returning to the good reasons systems are designed as they are…

  1. Are (or were) the good reasons, well, good? Sometimes.
  2. Do the good reasons scale? Sometimes.
  3. Do the good reasons stand the test of time? Sometimes.

Good architects discern the baby from the bath water. Their experience separates good architects from the crowd. Not-as-good architects are less flexible, less willing to learn, and loathe to admit mistakes.

Let’s face facts, though: All architects and developers know what they know and don’t know what they don’t know. Better architects recognize these uncomfortable truths and mitigate them.

One way to mitigate inexperience – the best way, in my opinion, is to work with others.

The Story Of Us

At Enterprise Data & Analytics, our consultants and architects work together as a team. Our diverse group is a strength, bringing perspective to bear on the problems you are trying to solve. Our experience levels vary, the software and tools with which we work vary, and our demographics vary. As owner, I am honored to lead a team from diverse cultural – as well as diverse technical – backgrounds.

I didn’t set out to build a cultural- / age- / gender-diverse team. I set out to find the best people – to do what Jim Collins describes as “getting the right people on the bus.”

I found, though, that focusing on getting the right people on the bus had the side-effect of building a cultural- / age- / gender-diverse team.

As an added bonus, people of different genders approach problem-solving differently. People of different ethnicity pick up on stuff – especially cultural stuff, including enterprise culture – that people of other cultures miss.

EDNA‘s diversity is a strength that emerged unintentionally, but emerged nonetheless. As Chief Data Engineer, it’s very cool to watch our less-experienced consultants growing into more-experienced consultants and architects, while at the same time watching our people interact and perform as a team – each member catching stuff and contributing ideas because of their unique perspectives.

Cost Value

I can hear some of you thinking, “We’re on a budget here. Don’t good architects cost more than less-than-good architects, Andy?” I feel you. The answer is, “No. Good architects cost less than less-than-good architects.”

I can prove it. Because math. (Read that post for more information…)

It’s often accurate that good architects cost more per hour than less-than-good architects. Do you know why good architects charge more per hour?

Because they are worth more per hour.

(Generality!)

But consider this: “Time will tell” is a tried and true statement. Like good wine, the likelihood a generality is accurate improves with age. If enterprises continue to hire an organization – like Enterprise Data & Analytics or any other firm – to help them solve the problems they are trying to solve, then the folks shouting them down may be doing so in an effort to compete. Competition is fine, but I never hire anyone who talks bad about other clients or the competition. Why? They’ve demonstrated the capacity to talk bad about me at some later date.

Conclusion

I love our team!
I love our expertise!
I love our diversity!
I love that we always deliver value!

Contact me to learn more.

:{>

Want a Free #DILM Book? See Me at the #PASSsummit 2018!

I’ll be at the PASS Summit 2018 next week. I’m delivering a full-day precon Monday, presenting Faster SSIS and participating in the BI & Data Visualization Panel Wednesday, and Enterprise Data & Analytics is exhibiting Wednesday through Friday.

“You Mentioned a Free Book…”

Oh. Yeah. That.

I will be giving away free copies of my latest book: Data Integration Life Cycle Management with SSIS: A Short Introduction by Example starting Wednesday! Would you like to score a free copy? You’ll have to catch me carrying them in the Washington State Convention Center.

They are free while supplies last.

Where Can You Find Andy at the PASS Summit 2018?

Precon!

Monday 5 Nov 2018, I’m delivering a full-day pre-conference session titled Intelligent Data Integration with SSIS. I’m going to cover  everything listed at that link but there is an update about my precon content:

There will be Azure Data Factory content and demos!

Why this addition? Two reasons:

  1. My presentation titled Faster SSIS was selected for Wednesday, 7 Nov 2018 at 1:30 PM PT in room 612. I usually include the three Faster SSIS demos in my precon. This time, you can just view the Faster SSIS session to see those demos. I’ve added some Azure Data factory to the precon in place of these three demos!
  2. I am participating in a panel discussion titled BI & Data Visualization Panel Wednesday, 7 Nov 2018 at 3:15 PM PT in room TCC Yakima 1.

Enterprise Data & Analytics is Exhibiting!

That’s right, you can find me in the Exhibition Hall! Enterprise Data & Analytics is exhibiting at the PASS Summit 2018!

If you are attending the PASS Summit 2018, please catch me for a free book while supplies last. If I’m out of physical copies, come see me anyway… we will work something out. Promise!

:{>

DILM Tiers for the SSIS Enterprise

“How many Data Integration Lifecycle Management tiers (DILM tiers) do I need to deploy SSIS, Andy?” If I had a nickel for every time I am asked this question, I would have a lot of nickels. My answer is, “Four.”

Why Four?

I’m glad you asked. Before I answer, I have a question for you: Have you ever been told by a developer, “It works on my machine.”? Do you know why it works on the developer’s machine?

Defaults.

Developers build software that contains configurations information. In the first iteration of building the software, most developers do no externalize – or parameterize – all configuration values. Instead, they hard-code these values into the application, usually as variable defaults.

I can hear you thinking…

“But This Is SSIS, Andy”

And? Oh, you ask because SSIS stands for “SQL Server Integration Services.” I get that. It’s important you understand this:

SSIS development is software development.

SQL Server Integration Services suffers somewhat from its name which contains the name of a popular relational database engine (SQL Server). But SSIS is not a database, and SSIS development is not database development (and most definitely not database administration).

SSIS development is software development.

Software Best Practices Apply

Because SSIS development is software development, software development best practices apply. For example, SSIS developers should source-control and test their SSIS code.

Perhaps testing is that thing you do in Production right after deployment. I have this pithy saying:

While I freely admit this statement is pithy, it is also true. The remainder of this quote is even pithier: “Some software is tested by your soon-to-be-former largest customer.”

I don’t want you to lose customers – especially your largest customer. That’s why I advocate you test your software by developing it on one Data Integration Lifecycle Management – or DILM – tier and then promoting the code to at least two other DILM tiers before deploying to the Production tier.

Tiers. Or Tears. Your Call.

So which tiers do I recommend? I’m glad you asked! i recommend:

  1. Development
  2. Test
  3. QA
  4. Production

Development

You must build the software somewhere. Wherever you build the software is your development environment.

Now let’s take a step back and describe the development environment for an SSIS developer. An SSIS development DILM tier shares some characteristics. In Dev, SSIS developers can:

  • Deploy SSIS projects and packages
  • Overwrite existing SSIS projects and packages
  • Delete existing SSIS projects and packages
  • Execute SSIS packages
  • Manage (deploy, overwrite, delete) SSIS configurations
  • Manage (deploy, overwrite, delete) SSIS metadata
  • Create databases, tables, and files

The most important part of the SSIS Development DILM tier? SSIS developers can perform each action listed above without help from anyone else in the enterprise.

I can hear you thinking, “What if the SSIS developers blow something up, Andy? Huh? HUH?!?” Let me ask another question: What happens when things go awry or amiss in Production? Isn’t the answer to that question, “I fix it.”? If that’s true in Production…

Then you fix it.
In Dev.

In fact (and I know this is a crazy thought) you could actually use a more-open Development tier to test and – perhaps, even – improve break-fix procedures for Production. After all, developers are notoriously excellent testers (I write this as a developer: I can break things like nobody’s business!).

What if the Dev tier becomes a place for you to test your recovery procedures for Production? If your HA/DR procedures work in Development where developers have too much access, then they should work in Production where the environment has been locked down. Am I correct?

Over the years I have found myself making this statement to enterprises: “You have an environment named Development but it is not a Development environment.” How do you know it’s a Development environment? If the developers can destroy it, it’s Dev.

One last thing before I kick this soapbox back under my desk. Consider that developers will be inconvenienced by the destruction of their Dev environment. It may slow them down. Heck, it may cost them a deadline. If that happens, you can take the opportunity to educate them, sharing what happened and why you believe it happened. Maybe they won’t do the dumb thing that burned down Dev again… Maybe.

Would having developers that better understand databases make your job a little easier in the long run?

Test

Once the software works on the SSIS developer’s machine, they need another environment to which they can deploy the SSIS project. Why? Remember those defaults? SSIS packages will always execute in Dev because default values are aimed at local resources for which the SSIS developer has authorization and access.

Which resources? Required access (CRUD – create, read, update, delete) to files and directories, databases, servers, etc. No SSIS developer would claim an SSIS package is ready unless and until she or he successfully executes the package in the SQL Server Data Tools (SSDT) debugger. But a successful SSDT debugger execution does not a completed SSIS package make.

To suss out missing configurations (externalization, parameterization), one needs to move the software to another lifecycle management tier and execute it there. Only then can the developer be sure they’ve externalized required defaults.

In Test, SSIS developers can:

  • Deploy SSIS projects and packages
  • Overwrite existing SSIS projects and packages
  • Delete existing SSIS projects and packages
  • Execute SSIS packages
  • Manage (deploy, overwrite, delete) SSIS configurations
  • Manage (deploy, overwrite, delete) SSIS metadata
  • Create databases, tables, and files

I can hear you thinking, “But Andy, this is the same list of permissions as Dev!”
Yes.

This part of the work must be completed. As professionals, we get to decide when and where the work gets done. Remember, it can always be done in Production. And it will be done in Production unless we are intentional about doing this work on another DILM tier.

Someone needs to do this part of the work. Do you want to do it? Or would you rather have the person most familiar with the code – the SSIS developer – do it? They are not going to catch everything that needs to be externalized, either; not on this pass. But the Pareto Principle informs us that they will catch 80% of the missing externalization.

Deployment to the DILM Test tier serves as the first deployment test. I want the SSIS developers bundling up SSIS deployments. Why? It’s good practice and practice makes perfect.

On Estimation…

One last point regarding software testing: When estimating a software project I begin by estimating how much time it will take to build the software. After this step the calculation for how much time is required to test the software is really easy: It’s the same amount of time required to develop the software.

Andy, are you telling me it takes just as long to test software as to develop it?

Yep.

“Cutting Testing to Save Time”

You can change this relationship by “cutting testing to save time.” But you can never shorten testing, only lengthen it. Decades of software development experience inform me that the costs of testing software increase by an order of magnitude for each tier of testing. The most expensive environment in which to test is Production.

Cutting testing never saves time.

Or cutting testing does save time; the time it takes you to deliver broken software. I can hear some of you thinking, “But can’t SSIS developers just develop better SSIS packages with fewer bugs?” The answer is the same as the answer to the physics question: “Can’t light just be either a wave or a particle, and not both?” Physics says no, light is both a wave and a particle. Andy says the same physics apply to software development along with the Theory of Constraints which tells us losses accumulate, gains do not.

As my friend Aaron Lowe used to say (before he passed away earlier in 2018 – man, I miss him, and others… we lost too many SQL Family members this year…), “Math is hard.”

You must include the cost of lost customers in the calculation.

Remember, all software is tested…

QA

The QA (or UAT or whatever you call the environment sitting just this side of Production) environment should be locked down tighter than a tick. The security in place in this pre-Production tier should be identical to the security in the Production tier. Optimally, all hardware capability and storage capacity in Production will match QA.

If you understand the dynamics of SSIS Data Flow Task internals, it’s possible to test performance with less data on sub-optimal hardware. If you ask me to do it, I’m going to let you know there’s a possibility we will miss something. I’ll be able to tell you for sure after we run some performance test executions in Production (and we will need to do that to be sure, and that will cost both time and money, so how much money are you saving when you save money by buying sub-optimal hardware and storage for QA?).

Deployment to QA should be performed by the same team – optimally by the same individual – that will be performing the deployment to Production. Why? We do not want deployment to Production to be their first experience with deploying this SSIS project. The deployment to QA is another deployment test, this time for the Production release management person. Perhaps this is a DBA. Perhaps this is a Release Management team member. Regardless, this step is their opportunity to practice the deployment to Production.

I can hear you thinking, “What’s the big deal, Andy?” I’m glad you asked. The big deal is: No matter how you execute SSIS in your enterprise, there’s more to the solution than merely the SSIS packages. If you’re running from the file system (and most enterprises execute SSIS from the file system), there are configurations stored in dtsConfig files or database tables that also need to be promoted. Some of the values – such as connection string components – need to be edited in these dtsConfig files at each DILM tier.

The same holds for deployments to the SSIS Catalog. And… if you are deploying (or planning to deploy) SSIS to the Azure Data Factory (ADF) SSIS Integration Runtime, your only option is deployment to an SSIS Catalog. Externalization is accomplished via Catalog Environments, References, and Literals in the SSIS Catalog. Same stuff, different location. But both dtsConfig files and Catalog Environments must be managed.

Since editing will need to occur when it’s time to deploy to Production, I prefer the deploy-er practice the editing along with the deployment. I ask the developer to send the SSIS and scripts or files for managing external configurations metadata to the deploy-er, and I request these configuration artifacts be set up for the Test DILM tier.

Production

If you’ve taken my advice and marched SSIS development through this process or a similar process, the Pareto Principle says you’ve identified at least 96% of the bugs prior to deployment to Production. My experience bears this out; about 4% of my Production deployments fail because some parameter slipped through the cracks walking through Dev, Test, and QA.

I can hear you thinking, “How does that happen, Andy?” Excellent question (and I am glad you asked!). parameters slip through the cracks because Dev and Test may be sharing data sources and/or destinations. It may be possible – desirable, even – to access Development data from the Test tier and vice versa.

Isolation

Remember: I want QA to be identical to Prod, which means I want it equally isolated. Is Production behind a firewall? Is it impossible (without poking a hole in the firewall) to write from Test to Prod? If so, then it should also be impossible to write from Test to QA and from QA to Prod. If Prod has a firewall, QA should have a firewall.

If your enterprise sports this architecture you can beat Pareto’s 4% prediction by 3.2% (an additional 80% of bugs will be caught) and experience a deployment-to-Production success rate of 99.2% (a failure rate of 0.8%).

To quote Foghorn Leghorn, “figures don’t lie.”

Conclusion

Is this the only way to properly manage SSIS in an enterprise? Goodness, no.

You can add more tiers to your Data Integration Lifecycle as needed. You can technically get by with three tiers (Dev, Test, Prod) but I don’t recommend it unless you’re the person wearing all the hats. Even then, the more times you test deployment, the more bugs you will catch prior to Production, and the fewer bugs you will deploy to Production.

It’s always easier to deploy more bugs but it is never less-expensive and it never saves time. Plus it may cost you your soon-to-be former largest customer.

Viewing SSIS Configurations Metadata in SSIS Catalog Browser

SSIS Catalog Browser is a pretty neat product. “How neat is it, Andy?” I’m glad you asked.

It’s free. That makes it difficult to beat the cost.

SSIS Catalog Browser is designed to surface all SSIS Catalog artifacts and properties in a single view. “What exactly does that mean, Andy?” You’re sharp. Let’s talk about why the surface-single-view is important.

Before I go on, you may read what I’m about to write here and in the companion post and think, “Andy doesn’t like the Integration Services Catalogs node in SSMS.” That is not accurate. I do like the Integration Services Catalogs node in SSMS. It surfaces enough information for the primary target user of SSMS – the Database Administrator – to see what they need to see to do their job, without “cluttering up” their interface with stuff that they rarely need to see and even more-rarely change.

In the companion post I shared this image of the windows (and pages) you need to open in SSMS to view the configured execution-time value of a parameter that is mapped via reference:

(click to enlarge)

That’s a lot of open windows.

So how does one view the same configuration metadata in SSIS Catalog Browser?

Under the Project node (LiftAndShift), we find a virtual folder that holds Project Parameters.

In Project parameters, we find a reference mapping – indicated by underlined text decoration and describing the reference mapping as between the parameter (ProjectParameter) and the SSIS Catalog Environment Variable (StringParameter).

Expanding the reference mapping node surfaces References. There are two references named env1 and env2. Since references can reference SSIS Catalog Environments in other Catalog folders, the fully-qualified path to each SSIS Catalog environment is shown in folder/environment format.

Expanding each reference node surfaces the value of the SSIS Catalog Environment Variable in each SSIS Catalog Environment.

I call this feature Values Everywhere, and I like it. A lot.

Values Everywhere From the Project Reference Perspective

Values Everywhere is perspective-aware. Whereas from the perspective of an SSIS Project Parameter, Values Everywhere surfaces the reference mapping in the format parameter–>environment variable, in the Project References virtual folder, Values Everywhere surfaces the same relationship as environment variable–>parameter:

Values Everywhere perspectives follow intuition when surfacing reference mapping relationships. (Did I mention I like this feature? A lot?)

Conclusion

SSIS Catalog Browser provides a clean interface for enterprise Release Management and Configuration teams. And it’s free.

I can hear you thinking, “Why is Catalog Browser free, Andy?” I give away Catalog Browser to demonstrate the surfacing capabilities of SSIS Catalog Compare.

SSIS Catalog Compare

SSIS Catalog Compare not only surfaces two SSIS Catalogs side by side, you can compare the contents of the Catalogs:

You can also script an entire SSIS Catalog which produces T-SQL script and ISPAC files for every artifact in the SSIS Catalog (organized by instance and folder):

You can also deploy all artifacts contained in an SSIS Catalog Folder from one SSIS Catalog to another:

This functionality is an efficient method for Data Integration Lifecycle Management – or DevOps – with SSIS.

SSIS Catalog Compare even works with Azure Data Factory SSIS Integration Runtime, so you can use SSIS Catalog Compare to lift and shift SSIS from on-premises Catalogs to the cloud.