SSIS Catalog Compare: Help for DBAs Who Support SSIS

In my SQL Server Central article, Help for DBAs Who Support SSIS, I walk through several DILM (Data Integration Lifecycle Management) use cases including:

  • Surfacing SSIS Catalog metadata
  • Comparing SSIS Project configurations in two SSIS Catalogs
  • Scripting SSIS Project ISPAC files
  • Deploying an SSIS Project from one SSIS Catalog directly to another SSIS Catalog
  • Generating scripts for an SSIS Catalog Folder and all Folder contents, including:
    • SSIS Projects
    • SSIS Catalog Environments
    • SSIS Catalog Environment Variables
    • References
    • Reference Mappings
    • Literal Overrides

One feature I did not cover in the article is one of my favorite features: Generate All Catalog Scripts. This feature is especially useful if your enterprise is practicing (or desires to practice) DevOps with SSIS because it facilitates standing up a Test or QA SSIS Catalog quickly.

After connecting to an instance of an SSIS Catalog, simply right-click the SSISDB node and click Generate All Catalog Scripts:

You will be prompted to select (or create) a folder for the scripts:

When you click the OK button, scripts for the contents of the SSIS Catalog are created! The directory structure inside the folder you selected matches the SSIS Catalog folders, beginning with a directory named after the SQL Server instance that hosts the SSIS Catalog you scripted. Inside the instance directory you find a directory for each SSIS Catalog folder – each containing scripts for the objects contained in that folder plus their dependencies (click to enlarge):

To learn more and see the helpful short video demonstrations of SSIS Catalog Compare, please visit the SSIS Catalog Compare page at DILM Suite. Don’t take my word for it, please read the testimonials at the site! One of my favorites is:

“I tried to reconcile two SSIS Catalogs for a week. I used SSIS Catalog Compare and was finished in less than one hour.”
– an Enterprise Architect

 

Data Integration Pattern: The Pessimistic Load

“What is a Pessimistic Load?”

I’m glad you asked! A pessimistic load is any load pattern that incorporates a strict reconciliation process. Strict reconciliation throws an error when the reconciliation effort fails.

How Does It Work?

First you load some records. Then you check to see if the records you intended to load have been loaded. That’s it.

The first part is the load process, the second part is the reconciliation. The reconciliation process can be broken into chunks – and the chunks can be executed during the load process. For example, one can stage data and then reconcile stage-to-source.

Reconciliation Method 1: Row Counts

There are a number of ways to accomplish reconciliation. The most common is record counts. Count the number of rows you expected to load from a source – perhaps a flat file or database table – and then count the number of rows you loaded. This is easiest when you are deleting all the records from the destination before the load and you are then loading all the records from a source to a destination. You simply count the rows in the source and count the rows in the destination, and then check to see if they are identical.

Things get tricksy when you are only loading some of the rows from a source or if you are loading rows to a destination that already contains records; or both.

Reconciliation Method 2: Numeric Values

Another way to accomplish reconciliation is to use numeric values. Star schema data warehouses contain dimensions and facts. Dimensions are usually descriptive and Facts are usually collections of numeric measures (plus artificial keys to related dimensions). Summing numeric measures in a fact is one way to reconcile values between a fact source and destination, but sometimes the numeric values become very large – they can grow to exceed the maximum value for the data type.

One way to address exceeding a numeric data type is to hash numeric values. There are a couple techniques for hashing numeric data but the goal is the same: present a unique-ish value that can be compared between the source and the destination. One of my favorite tricks is to use a running modulus with a prime number – the larger the prime number, the better.

The Development Lifecycle of a Pessimistic Load

Pessimistic loads flow through a maturity model or lifecycle. Like every load process, pessimistic loads begin by failing a lot. Initial failures occur because the source data is not cleansed (or not properly cleansed) and raises errors – like NULLs or duplicates – when one tries to load a destination from a source.

As the process matures NULLs are replaced or managed, duplicates are detected and removed, and then quality / consistency checks – soft rules – are added to make sure the data is correct. Soft rules include checks for things like “is the start date of a record earlier than the end date?”

Therefore the first phase of implementing a pessimistic load is filled with data quality errors.

…once the reconciliation rules are synchronized with the data cleansing rules, the enterprise achieves a load process that is nigh bulletproof.

The second phase is interesting because data quality errors begin dropping in the load process (and in the load packages if one is loading using SSIS) but the reconciliation process continues to throw errors – stopping the load process if so designed – because the reconciliation checks do not know the data being loaded is good.

Process Fault Inversion

It is common for the enterprise to witness a kind of “process fault inversion” as the load process cleanses more and more data, while the reconciliation process lags behind the loaders in recognizing and separating (and counting or calculating hash values for) data that conforms to updated cleansing rules. Eventually the load process reaches a point where the only failures are due to the reconciliation process rejecting or missing or miscalculating essentially good data.

Although this is a frustrating phase of the project, it is a necessary and positive development in the maturity of a pessimistic load because once the reconciliation rules are synchronized with the data cleansing rules, the enterprise achieves a load process that is nigh bulletproof.

Conclusion

Confidence in the integrity of the load increases as errors decrease – optimally to 0 – and, on the rare error, the mature pessimistic load process is trusted to detect truly new and different source data conditions. Hence the name: Pessimistic load.

Honored to Present Faster SSIS at SQL Saturday #719 – Chicago 17 Mar 2018

I am honored to present Faster SSIS at SQL Saturday #719 in Chicago 17 Mar 2018!

Abstract:

Ever wonder why SSIS runs so slow? Watch SSIS author Andy Leonard as he runs test loads using sample and real-world data and shows you how to tune SQL Server 2016 Integration Services (SSIS 2016) packages.

We’ll start by experimenting with SSIS design patterns to improve performance loading AdventureWorks data. We will implement different change detection patterns and compare execution performance for each. Then, we’ll explain a Data Flow Task’s bottleneck when loading binary large objects – or Blobs.

Finally, we’ll demonstrate a design pattern that uses a Script Component in a Data Flow to boost load performance to MySql, whether on-premises or in the cloud.

I hope to see you there. Register today!

:{>

I’m a Speaker Mentor!

I’m honored to be part of an effort started by Alex Yates of DLM Consultants called Speaking Mentors.

From the Home page:

We are a bunch of experienced speakers who freely and without caveats offer our time to support first time speakers to deliver their first few sessions at tech meetups and/or conferences.

If you would like help to start your speaking career you can contact any of us using the details on our personal pages. You can find our pages in the menu above or by clicking our names in the slider below.

Please read this information about becoming a mentor if you are interested in offering your own time to mentor others.

:{>

Presenting DevOps for SSIS 08 Feb 2018

I am honored to present DevOps for SSIS (or Data Integration Lifecycle Management) for the PASS DevOps Virtual Chapter 08 Feb 2018.

Abstract:
DevOps for SSIS (or Data Integration Lifecycle Management) How does SSIS participate in enterprise DevOps? Sharing topics from his latest book (Data Integration Life Cycle Management with SSIS), author and presenter Andy Leonard shares hard won advice from the trenches. Data Integration Lifecycle Management (DILM) is the key, and SSIS design and development must be approached with DILM in mind. Andy shares one way to accomplish enterprise DILM along with patterns, best practices, and a few utilities he’s built to help achieve DevOps with SSIS.

Register today!

:{>

Tom Roush

This is a picture of my friend Tom Roush and me from a few years ago. Tom passed away 29 Jan 2018 after a protracted battle with cancer. He was my friend and brother and I will miss his easy smile, sage wisdom, and German wit.

I used to refer to Tom as “the best unpublished writer I know.” Well, that’s changed, thanks to some hard work this past year by Tom and his sister, Petra Martin (blog). Tom’s blog remains an awesome source of his many stories. Petra and Tom edited a bunch of his funnier stories into a book titled Stupid Things Papa Did When He Was Younger: And Other Tales of Hard-Won Wisdom. I’ve read the stories and it’s a great book!

Free

Right now, for a limited time, the book is free at Amazon:

It’s not only free, but it’s trending #1 in a couple categories (at the time of this writing)!

Congratulations Tom and Petra!

A Request

I have a request: If you download the free book – and please do – it’s awesome and you will love it, please consider donating to help Tom’s family. To do so, please head over to a site Yanni Robel (a close friend of the family) set up to help Cindy Roush and their family. The site is a YouCaring donation site titled Tom Roush Memorial Funds:

I miss my friend already. Because Tom and I are believers, I know I will see him again.

I look forward to that day. It will be a day that never ends.

:{>

Not* Presenting A Day of Intelligent Data Integration in Nashville 16 Feb 2018

Update: I am not presenting A Day of Intelligent Data Integration in Nashville 16 Feb 2018.

I’m honored to present A Day of Intelligent Data Integration in Nashville 16 Feb 2018!

Description
What is Intelligent Data Integration? SSIS packages developed using tried and true design patterns, built
to participate in a DevOps enterprise practicing DILM, produced using Biml and executed using an SSIS
Framework.

Attend a day of training focused on intelligent data integration delivered by an experienced SSIS
consultant who has also led an enterprise team of several ETL developers during multiple projects that
spanned 2.5 years. And delivered.

Attendees will learn:
– a holistic approach to data integration design.
– a methodology for enterprise data integration that spans development through operational support.
– how automation changes everything. Including data integration with SSIS.

Topics include:
1. SSIS Design Patterns
2. Executing SSIS in the Enterprise
3. Custom SSIS Execution Frameworks
4. DevOps and SSIS
5. Biml, Biml Frameworks, and Tools

Register today!

:{>

Andy’s Excellent SSIS-in-the-Cloud Adventure and PowerShell

If you’re new to PowerShell or just don’t know PowerShell, tinkering with Azure Data Factory Version 2 Integration Runtime (ADFv2 IR) can be intimidating.

In Andy’s Excellent SSIS-in-the-Cloud Adventure, Part 1 – Build an ADFv2 IR Instance, I walk you through the PowerShell part, sharing my (meager) knowledge of this awesome utility, assuming little (assuming nothing, I hope!) and using the PowerShell ISE with no configuration – raw PowerShell, if you will!

Enjoy!

:{>

Andy’s Excellent SSIS-in-the-Cloud Adventure, Part 3 – Execution, Monitoring, and Deployment Alternatives

In Andy’s Excellent SSIS-in-the-Cloud Adventure, Part 1 – Build an ADFv2 IR Instance I wrote about the three steps to configure Azure-SSIS Integration Runtime in Azure Data Factory Version 2 (ADFv2 IR, or SSIS in the cloud). In Andy’s Excellent SSIS-in-the-Cloud Adventure, Part 2 – Deploy, Execute, and Monitor an SSIS Project I wrote about deploying, executing, and monitoring execution. In this post, I’m going to continue by showing you some alternative methods for executing, monitoring, and deploying SSIS packages running in ADFv2 IR.

If you haven’t read the first two posts, I recommend you do so before attempting any code in this post.

Execution Alternative – Scripting

As with SSIS packages executing in a local SSIS Catalog, you can also script the execution of ADFv2 IR SSIS packages. Recall from Part 2 how to execute a package using SSMS? The alternative process is identical – to a point.

As before, right-click the package in the SSMS Object Explorer Integration Services Catalogs node and then click Execute:

When the Execute Package window displays, click the Script button to script execution of the SSIS package:

The script generated when I click the Script button is:

The next step is easy – execute the script! After some time – I’ve seen this range from a few seconds to more than a minute – the SSIS package will begin executing and the query will return the @execution_id parameter value:

If you’re familiar with executing SSIS packages from scripts in on-premises, pre-Scale Out SSIS Catalogs, you’ll note a few differences in this script. The @runinscaleout, @retry_count, and @useanyworker parameters are used in SSIS Scale Out scenarios. For more information on these parameters please see the SSISDB.catalog.create_execution and SSISDB.catalog.start_execution pages.

If you comment-out or delete these parameters – @runinscaleout, @retry_count, and @useanyworker – the SSIS package will execute.

Monitoring Alternative – Catalog Reports

The reporting solution built into SQL Server Management Studio (SSMS) is cool. I like it. I just don’t like that it’s stuck in SSMS. Why? In order to give someone access to near-real-time information regarding the execution status of an SSIS package, I have to install SSMS somewhere where they can reach it – and grant them rights on the server to access the information – probably for Production. If I put on my principles-of-least-privilege hat and think about this some, I get “no, no, and heck no!”

What are your options if there’s a less-SQL-Server-savvy stakeholder in your enterprise who has good cause to view this information? One option is to build your own SQL Server Reporting Services (SSRS) reporting solution. Or you can use the SSRS solution I built. It’s called Catalog Reports and it’s not only free, it’s open source!

The reports shown below mirror those in the SSMS solution:

  • Execution
  • Overview
  • Messages
  • Performance

If you don’t like something, you can change it – that’s the beauty of open source. The T-SQL is contained within the reports in the Catalog Reports solution.

Deployment (and Configuration) Alternative – SSIS Catalog Compare

Note: Most of the functionality described in this post is part of the current version of SSIS Catalog Compare. Connectivity to Azure Data Factory Version 2 Integration Runtime is not currently available, but will be available in an upcoming release.

Consider the following “life and shift” scenario: You migrate your data warehouse to Azure SQL DW and want to migrate some of your data integration with SSIS to the cloud as well. Your SSIS Catalog projects are configured and tested on your on-premises servers. Wouldn’t it be nice if there was some way to migrate SSIS projects, packages, environments, references, and other configurations to the cloud?

There is!

SSIS Catalog Compare is built for this. I have the project we used in Part 2 working big as life in vmSQL17\Dev’s SSIS Catalog. I want to move it to my Cloud Catalog named svssis.database.windows.net. I have a couple options using SSIS Catalog Compare:

Option One: Script Everything

I can certainly script my folder, then export the ISPAC file for the project, then generate scripts for the environment, reference, and reference mappings; beginning with the Folder script:

Or I can script the folder and all its dependent objects with a single click:

The scripts are numbered in order of dependency and are ready to be deployed:

Option Two: Deploy It!

Or I can simply deploy directly from one SSIS Catalog – the one on-premises in this case – to another SSIS Catalog – the one in ADFv2 IR:

I choose Deploy Folder to deploy the folder alone:

I could just as easily select the option to “Deploy Folder and Contents” and deploy the folder, the project(s), environment(s), reference(s), reference mapping(s), literal override(s) – everything.

I decide to deploy the ISPAC file (exported using SSIS Catalog Compare) next. The folder has been deployed in script 1 so it’s already there for me to specify during deployment:

Done and done:

A compare at this point reveals a lot of differences in my Catalog projects:

Let’s start with the Environment. I’ll use the script I created earlier and edit it so the ConnectionString parameter values are correct (click to enlarge):

Note that the ConnectionStringPassword_1 parameter contains no value. SSIS Catalog Compare is designed to never write your sensitive values to a script.

I really like the messages these scripts output. They are built for copying and pasting into the Notes field on a ticket:

Back for another compare:

The environment is there but the ConnectionString variable values are different. That’s good, actually – I want them to be different! But I don’t want differences in my compare to trip me up. So I click View–>Options and select the Ignore Environment Variable Values option:

A re-compare now ignores different values for the Environment Variables. That’s good and bad. My ConnectionStringPassword value needs to be updated in the cloud…

So I use SSIS Catalog Compare to update this sensitive value:

Carefully enter the correct password:

Next I want to migrate the reference and reference mappings. I could easily do this with the script, but it’s kinda cool to use SSIS Catalog Compare:

A quick re-compare and boom – no more differences:

If I deselect the Ignore Environment Variable Values option, I see the source of “color-ized” nodes – they’re all related to the different value for the ConnectionString Environment Variable:

There you have it, another way to lift and shift SSIS projects to Azure Data Factory Version 2 Integration Runtime.

SSIS Catalog Compare is a for-pay product and part of the DILM (Data Integration Lifecycle Management) Suite.

Catalog Browser Is Free

Catalog Browser is a free product from the DILM Suite that surfaces a unified view of an SSIS Catalog – either on-premises or in ADFv2 IR. You can think of Catalog Browser as a limited, single-tree version of SSIS Catalog Compare:

I hope this series is demonstrating the usefulness of Azure Data Factory Version 2 Integration Runtime. It’s an exciting time to be doing this kind of work! And there’s more cool stuff to come…

In Part 4 we take a look at using an SSIS Execution Framework alongside Azure Data Factory Version 2 Integration Runtime!

:{>