Parsing SSIS Catalog Messages for Lookup Performance, v2

A couple years ago I wrote Administering SSIS: Parsing SSIS Catalog Messages for Lookups. I’ve updated the script to return all Lookup Transformation performance messages captured by the SSIS Catalog with Basic logging enabled (the SSIS Catalog’s default Logging Level). Some smart data integration people recommend setting the SSIS Catalog Logging Level to Verbose and querying catalog schema views to retrieve row counts and performance metrics. I’m not a fan of running a Verbose logging level by default due to the overhead. Granted, the overhead is relatively light but it’s not nothing. Sometimes I need all the cycles!

I prefer to passively gather performance metrics – even if it’s a little convoluted and more work (for me). I don’t often need these metrics immediately available, so I execute queries similar to the query below and store the performance metrics in a table. I can (and do) build dashboards to track SSIS performance (perhaps I should blog about them…) using this passively-collected performance metrics.

Without further ado, the updated query:

Use SSISDB
go

declare @LookupIdString varchar(100) = ‘% rows in the cache. The processing time was %’
declare @LookupNameStartSearchString varchar(100) = ‘%:Information: The %’
declare @LookupNameStartLen int = Len(@LookupNameStartSearchString) – 2
declare @LookupNameEndSearchString varchar(100) = ‘% processed %’
declare @LookupNameEndLen int = Len(@LookupNameEndSearchString) – 2
declare @ProcessingTimeString varchar(100) = ‘The processing time was ‘
declare @ProcessingTimeSearchString varchar(100) = ‘%’ + @ProcessingTimeString + ‘%’
declare @CacheUsedString varchar(100) = ‘The cache used ‘
declare @CacheUsedSearchString varchar(100) = ‘%’ + @CacheUsedString + ‘%’

Select
SubString(om.[message]
        , (PatIndex(@LookupNameStartSearchString, om.[message]) + @LookupNameStartLen)
        , (PatIndex(@LookupNameEndSearchString, om.[message]) – (PatIndex(@LookupNameStartSearchString, om.[message]) + @LookupNameStartLen))
          ) As LookUpName
, Convert(bigint, Substring(om.[message]
                          , (PatIndex(@LookupNameEndSearchString, om.[message]) + @LookupNameEndLen)
                          , ((CharIndex(‘ ‘, om.[message], PatIndex(@LookupNameEndSearchString, om.[message]) + @LookupNameEndLen))
                              –
                             (PatIndex(@LookupNameEndSearchString, om.[message]) + @LookupNameEndLen)
                            )
                           )
         ) As LookupRowsCount
, Convert(decimal(16, 3), Substring(om.[message]
                                  , (PatIndex(@ProcessingTimeSearchString, om.[message]) + Len(@ProcessingTimeString) + 1)
                                  , ((CharIndex(‘ ‘, om.[message], PatIndex(@ProcessingTimeSearchString, om.[message]) + Len(@ProcessingTimeString) + 1))
                                     –
                                     (PatIndex(@ProcessingTimeSearchString, om.[message])+ Len(@ProcessingTimeString) + 1)
                                    )
                                   )
         ) As LookupProcessingSeconds
, Convert(bigint, Substring(om.[message]
                         , (Patindex(@CacheUsedSearchString, om.[message]) + Len(@CacheUsedString) + 1)
                         , ((Charindex(‘ ‘, om.[message], Patindex(@CacheUsedSearchString, om.[message]) + Len(@CacheUsedString) + 1))
                             –
                            (Patindex(@CacheUsedSearchString, om.[message]) + Len(@CacheUsedString) + 1))
                           )
         ) As LookupBytesUsed
, Convert(decimal(16, 3), (Convert(bigint, Substring(om.[message]
                                                   , (PatIndex(@LookupNameEndSearchString, om.[message]) + @LookupNameEndLen)
                                                   , ((CharIndex(‘ ‘, om.[message], PatIndex(@LookupNameEndSearchString, om.[message]) + @LookupNameEndLen))
                                                      –
                                                      (PatIndex(@LookupNameEndSearchString, om.[message]) + @LookupNameEndLen))
                                                    )
                                  )
                                 /
                                  (Convert(decimal(16, 3), Substring(om.[message]
                                                                   , (PatIndex(@ProcessingTimeSearchString, om.[message]) + Len(@ProcessingTimeString) + 1)
                                                                   , ((CharIndex(‘ ‘, om.[message], PatIndex(@ProcessingTimeSearchString, om.[message]) + Len(@ProcessingTimeString) + 1))
                                                                      –
                                                                      (PatIndex(@ProcessingTimeSearchString, om.[message])+ Len(@ProcessingTimeString) + 1 ))
                                                                    )
                                           )
                                  )
                           )
        ) As LookupRowsThroughput
, ex.start_time As ExecutionStartTime
, ex.folder_name As FolderName
, ex.project_name As ProjectName
, ex.package_name As PackageName
–Into rpt.LookupMetrics — drop table rpt.LookupMetrics
From [catalog].operation_messages om
Join [catalog].executions ex
  On ex.execution_id = om.operation_id
Where om.message_source_type = 60 — Data Flow Task
  And om.[message] Like @LookupIdString
  — data type-checking
  And IsNumeric(Substring(om.[message], (PatIndex(@LookupNameEndSearchString, om.[message]) + @LookupNameEndLen ), ((CharIndex(‘ ‘, om.[message], PatIndex(@LookupNameEndSearchString, om.[message]) + @LookupNameEndLen)) – (PatIndex(@LookupNameEndSearchString, om.[message]) + @LookupNameEndLen)))) = 1 — Check LookupRowsCount
  And IsNumeric(Substring(om.[message], (PatIndex(@ProcessingTimeSearchString, om.[message]) + Len(@ProcessingTimeString) + 1 ), ((CharIndex(‘ ‘, om.[message], PatIndex(@ProcessingTimeSearchString, om.[message]) + Len(@ProcessingTimeString) + 1)) – (PatIndex(@ProcessingTimeSearchString, om.[message])+ Len(@ProcessingTimeString) + 1 )))) = 1 — Check LookupProcessingSeconds
  And IsNumeric(Substring(om.[message], (Patindex(@CacheUsedSearchString, om.[message]) + Len(@CacheUsedString) + 1 ), ((Charindex(‘ ‘, om.[message], Patindex(@CacheUsedSearchString, om.[message]) + Len(@CacheUsedString) + 1)) – (Patindex(@CacheUsedSearchString, om.[message]) + Len(@CacheUsedString) + 1)))) = 1 — Check LookupBytesUsed
  — catch 0-valued denominator rows
  And Convert(bigint, Substring(om.[message], (PatIndex(@LookupNameEndSearchString, om.[message]) + @LookupNameEndLen ), ((CharIndex(‘ ‘, om.[message], PatIndex(@LookupNameEndSearchString, om.[message]) + @LookupNameEndLen)) – (PatIndex(@LookupNameEndSearchString, om.[message]) + @LookupNameEndLen)))) <> 0 — Check LookupRowsCount
Order By operation_id DESC

:{>

Related Training:
IESSIS1: Immersion Event on Learning SQL Server Integration Services – April 2017, Chicago 
From Zero to Biml – 19-22 Jun 2017, London
IESSIS2: Immersion Event on Advanced Integration Services – Oct 2017, Chicago

Learn More:
SSIS Design Patterns and Biml: A Day of Intelligent Data Integration – Boston SQL Saturday precon, 24 Feb 2017
Save Time and Improve SSIS Quality with Biml
An Example of Data Integration Lifecycle Management with SSIS, Part 4
The Recordings for SSIS Academy: Using the SSIS Catalog are Available
SSIS Catalog Compare v2.0 Launch Event 7 Mar 2017!

Immersion Event on Learning SSIS – May 2018!

I am honored to partner with the awesome team at SQLSkills to deliver IESSIS1: Immersion Event on Learning SQL Server Integration Services in person in Chicago 7-11 May 2018!

Click here to register!

Here’s what I’ll be covering:

Module 1: Introduction

In the introduction we share information about the instructor, course flow, and content, and gather information about attendee expectations for the course. Topics covered include:

  • Training overview
  • Expectations
  • Introductions

Module 2: Creating Your First SSIS package

We believe many learn SSIS best by doing, so we get attendee’s hands on SSIS as soon as possible. In this module, we demonstrate how to build an SSIS package and invite attendees to walk through building a basic SSIS package with us. Topics covered include:

  • Introduction to SQL Server Data Tools – Business Intelligence (SSDT-BI), a version of Microsoft Visual Studio that serves as the SSIS integration development environment (IDE)
  • Visual Studio tips and tricks
  • Exercise: Create an SSIS package

Module 3: Introduction to the Data Flow Task

SSIS was designed to move data. The SSIS Data Flow Task is a complex data integration engine built to support generic data load use cases. In this module, we introduce SSIS Data Flow concepts. Topics covered include:

  • Introduction to the Data Flow Task
  • Connection Manager and Data Flow Task basics
  • Source and destination adapters
  • Exercise: Create a Real Data Flow, Part 1

Module 4: Data Flow Task 201

Once the basics of Data Flow development are understood, we introduce a basic version of the SSIS design pattern called the incremental load. Building idempotent (safely re-executable) SSIS packages is one key to successful data integration in the enterprise. Topics covered include:

  • Intermediate Data Flow concepts
  • Building re-executable data loaders
  • Exercise: Create a Real Data Flow, Part 2

Module 5: Data Flow Task 202

Change detection is one of the most important components of Extract, Transform, and Load (ETL) projects. In this module, we expand the basic change detection introduced in the previous module and begin to examine SSIS design patterns that improve SSIS Data Flow performance. Topics covered include:

  • Advanced Data Flow concepts
  • Building and tuning robust incremental loader
  • Exercise: Create a Real Data Flow, Parts 3 and 4

Module 6: The Control Flow

The SSIS Control Flow is the work flow manager for an SSIS package. In this module, we demonstrate work flow containment and management in the SSIS Control Flow, including advanced Precedence Constraint configuration and package execution control. Topics covered include:

  • Containers, precedence, and work flow
  • Transactions, restartability, and locking

Module 7: Loop Containers

Loop containers provide execution work flow isolation and iterative functionality. In this module, we examine traditional and non-traditional configurations for the For Loop and ForEach Loop containers. Topics covered include:

  • Using For Loop and Foreach Loop Containers

Module 8: Data Flow Task 301

After turning our attention to the SSIS Control Flow, we return to the SSIS Data Flow for an extended period of hands-on labs. In this module, attendees are tasked with the first of three labs that build a robust data integration loader using SSIS, with an instructor available to assist with questions. This lab focuses on transforming the data by applying strong-typing rules and managing data that violates those rules. Topics covered include:

  • Data typing basics
  • Building an incremental loader for real-world data
  • Exercise: Build a Robust Incremental Staging Loader, Part 1

Module 9: Data Flow Task 302

In this module, attendees are tasked with the second of three labs that build a robust data integration loader using SSIS, with an instructor available to assist with questions. This lab focuses on using built-in SSIS Adapter functionality to manage and execute DDL operations during data integration staging development. Topics covered include:

  • Intermediate data staging
  • Managing real-world changes to data sources
  • Exercise: Build a Robust Incremental Staging Loader, Part 2

Module 10: Data Flow Task 303

In this module, attendees are tasked with the third of three labs that build a robust data integration loader using SSIS, with an instructor available to assist with questions. This lab reinforces writing idempotent SSIS Data Flow packages and introduces data quality concepts via data cleansing in SSIS. Topics covered include:

  • Basic data cleansing
  • Deriving values
  • Exercise: Build a Robust Incremental Cleansing Loader

Module 11: Event Handlers, Logging, and Configurations

An important component of any solution is messaging and configuration. In this module, we demonstrate SSIS’s messaging model in the context of event handlers, logging, and managing externalized parameter and variable values. Topics covered include:

  • A survey of SSIS event handlers focusing on OnError and OnInformation events
  • Using SSIS’s built-in logging facility to capture package execution details and statistics
  • Using built-in package configurations to externalize variable values

Module 12: Security, Deployment, and Execution

SSIS facilitates secure connection and parameter value management. In this module, we build on our understanding of messaging, logging, and externalization by examining project, package, and connection security. We then turn our attention to deployment and execution patterns, and discuss the role of security in each. Topics covered include:

  • SSIS Package deployment options
  • Security implications
  • SSIS Package execution

Module 13: ETL Design Patterns

SSIS is a powerful enterprise data integration engine. In this module, we discuss experiences using SSIS to solve complex and “edgy” use cases, and some patterns we’ve realized from those engagements. Topics covered include:

  • Leveraging less-documented features of SSIS
  • Using Parent-Child design pattern to achieve “hands-free” custom logging and creative custom configuration
  • ETL Instrumentation

Module 14: Enterprise Execution Patterns

Data integration in a large enterprise is different from data integration in a smaller shop. In this module, we discuss patterns of SSIS development and management that lower the total cost of owning a data integration solution built using SSIS. Topics covered include:

  • Leveraging the Parent-Child design pattern and much of what we’ve learned over the past four days to build a metadata-driven SSIS execution framework engine

I hope to see you there!

:{>

One Way to Install SQL Server 2017 Developer Edition

My intention in sharing this post it to help you install an instance of SQL Server 2017 Developer Edition. I accept most of the defaults in this installation, changing a few things that I only change in non-Production environments (like Mixed Mode Authentication for security). Please follow the advice of Microsoft and SQL Server professionals when installing production instances of Microsoft SQL Server 2017.

Download SQL Server 2017 Developer Edition.

Open setup.exe:

Click the Installation page and click the New SQL Server stand-alone installation or add features to an existing installation link:

Pay attention on the Product Key page. Be sure to select Developer Edition (and not accept the default Evaluation Edition):

Accept the license terms:

Select (or don’t select) to Use Microsoft Updates:

Check and address (if required) Installation Rules:

Select options and set directories:

Select the default instance or provide a name for a named instance:

Review Server configuration:

Configure Security:

Install:

This is one way to get a SQL Server 2017 Developer Edition instance up and running quickly so you can begin tinkering with the new functionality and testing old functionality.

Hope this helps!

:{>

SSIS Change Detection Patterns: None

A Post About Nothing

Actually not. But as a fan of Seinfeld, it’s cool to work in that quote.

Seriously, though, No Change Detection is a valid change detection use case. In SSIS it looks like this:

Whack ‘n’ Load

I (lovingly) refer to  this SSIS Design Pattern as the Whack ‘n’ Load Pattern. Is it an Incremental Load design pattern? Maybe?? It depends on how you want to classify incremental loads. I will argue for “yes, it’s an incremental load pattern because it picks up new and changed rows since the last execution.” But even as I was just typing that, I could feel the definition is a stretch.

The SSIS package shown above loads data from a flat file into a SQL Server database table. The Execute SQL Task named “SQL Truncate Target” executes a statement to truncate the destination table similar to:

Truncate Table dbo.SomeTable

The Data Flow Task named “DFT Load Target” contains the components shown in the “breakout,” a Flat File Source Adapter named “FFSrc Flat File” and an OLE DB Destination Adapter named “OLEDBDest Target.” The OLE DB Destination Adapter is aimed at the same table truncated in the Execute SQL task named “SQL Truncate Target.”

When executed, the SSIS package appears as shown here:

The flat file contains only five rows at this time. Tomorrow it make contain 5, 6, or 6,000 rows. Sometimes, data scales.

The Whack ‘n’ Load pattern always clears the destination table and then loads the contents of the source into the destination table.

Advantages:

  1. There are few moving parts so there’s less opportunity for the load process to fail. When a Whack ‘n’ Load package fails, it’s usually due to unanticipated data conditions such as a NULL where no NULLs are allowed, string truncation (trying to put ten pounds of string into a five-pound column, for example), or some catastrophic failure in either the SSIS execution engine, SQL Server, the Operating System (OS), or hardware.
  2. It’s possible for change detection to “miss” a changed record in the source. There are various reasons this occurs and sundry ways in which it happens when it occurs. One reason changes are not detected is hash collisions. A hash collision occurs when different values (or combinations of values) produce the same hash value. All hash algorithms have a collision probability and Jeff Preshing does a great job explaining both the theory and how to calculate the hash collision probability in his post titled Hash Collision Probabilities. The Whack ‘n’ Load design pattern does not miss changes.

Disadvantages:

  1. Whack ‘n’ Load is slow – especially for larger data sets.

When to Use the Whack ‘n’ Load Design Pattern

I often use the Whack ‘n’ Load design pattern when loading small data sets like reference lookup data.

I often build Whack ‘n’ Load SSIS packages to prototype staging databases or to quickly stand up a reporting instance. These days I use Business Intelligence Markup Language (Biml) to accomplish the initial portion of a data integration project because I can generate a metric tonne of SSIS packages in a few minutes. Later I identify poorly-performing packages and modify them to use another design pattern.

When Not to Use the Whack ‘n’ Load Design Pattern

I don’t use the Whack ‘n’ Load design pattern when loading larger data sets because of performance concerns.

I don’t use the Whack ‘n’ Load design pattern when loading data that requires more complex storage schemes, such as the change history requirement for loading a Type II Star Schema Dimension.

Conclusion

Use the Whack ‘n’ Load design pattern to build a quick and dirty staging solution – especially if you’re using automation like Biml.

:{>

Learn more:

SQLSkills Immersion Event IESSIS1: SSIS Training – 2-6 Oct 2017 in Chicago
Expert SSIS Training (Live, 2 Days) – 4-5 Dec 2017
Contact Enterprise Data & Analytics for private SSIS and Biml training anytime, anywhere!

Presenting at the Seattle SQL Pro Workshop 2017

I am honored to be among the presenters at the Seattle SQL Pro Workshop 2017! Also presenting are Jason Brimhall, Louis Davidson, Jimmy May, Gail Shaw and Wayne Sheffield. I’m looking forward to learning a bunch from these smart folks!

My session is titled Designing an SSIS Framework:

In this “demo-tastic” presentation, SSIS trainer, author, and consultant Andy Leonard explains the what, why, and how of an SSIS framework that delivers metadata-driven package execution, connections management, and centralizes logging. Key takeaways: 1) A metadata-driven approach to SSIS package execution using SSIS Framework Community Edition – a free and open-source SSIS Catalog-integrated execution framework. 2) Demonstration of Catalog Reports, a free and open source SSRS solution for reporting SSIS Catalog data.

Register here!

I hope to see you there.

:{>

Available for Pre-Order – The Biml Book!

I am excited to announce The Biml Book is now available for pre-order from Amazon and Apress!

I was honored to write with the inventor of Biml, Scott Currie, and an awesome team of BimlHeroes: Jacob Alley, Martin Andersson, Peter Avenant, Bill Fellows, Simon Peck, Reeves Smith, Raymond Sondak, Benjamin Weissman, and Cathrine Wilhelmsen.

Chapters

Part I: Learning Biml

  • Biml Tools
  • Introduction to the Biml Language
  • Basic Staging Operations
  • Importing Metadata
  • Reusing Code, Helper Classes and Methods

Part II: Biml Frameworks

  • A Custom Biml Framework
  • Using Biml as an SSIS Design Patterns Engine
  • Integration with a Custom SSIS Execution Framework
  • Metadata Automation
  • Advanced Biml Frameworks and BimlFlex

Part III: Biml Topics

  • Biml and Analysis Services
  • Biml for T-SQL
  • Documenting Your Biml Solution
  • Troubleshooting Metadata
  • Troubleshooting Biml

Part IV: Appendices

  • Source Control
  • Parallel Load Patterns in Biml
  • Metadata Persistence

Amazon projects the book’s release in early December but we anticipate the book will be available sooner.

Reserve your copy today!

:{>