Skipping SQL Server 2005

According to numerous sources – most recently a SQL Server Magazine article entitled Too Soon for SQL Server 2008? Readers Say “Yes!” – a lot of SQL Server customers are holding off upgrading to SQL Server 2005, preferring to wait for the upcoming release of SQL Server 2008. While I understand the economics, I worry about the DBAs and database developers in the trenches.

Personally, I prefer to learn new technology while it’s still, well, new. That makes me an early adopter. In my opinion, it also flattens the learning curve somewhat.

One the other hand, as new technology is put into play by early adopters bugs and enhancements are found and suggested – maturing the technology and making it easier to learn. Encountering fewer upgrade and translation issues flattens the learning curve too. So perhaps skipping a version isn’t such a big deal after all.

One thing to consider: Microsoft is unlikely to return to 5-year release cycles for server products. So how do we make upgrading less painful / expensive / painful? (…not a typo…)

One suggestion is to add regression tests to your Disaster Recovery plan. You have regression testing in place already if you have a DR plan. It may consist of throw the thrid switch and monitor the error email account. This is an EMP (Expensive Management Practice) and there are much better ways to manage regression tests as we draw near the close of 2007.

I predict Test Automation will be a hot topic next year as more folks dig into the features of Team Foundation Server 2008. The powerful idea behind automated regression tests is very simple: Create a suite of tests and you can run them forever (for free, even!). Regression tests are money in the bank – you write them once and run them every time you release a (hopefully) backwards-compatible version. The test suite grows over time as more bugs are discovered, but the SOBER (Same Old Bugs Every Release) spectre passes into memory of days gone by. And you get the added benefits of improved quality and reduced time to market – what’s not to love about regression tests?

If you don’t have regression tests in place, start saving your troubleshooting queries and utility applications in a central location – build a repository starting today. When the next new version of SQL Server is released (circa 2011), your applications will be much more agile and able to make the leap. Then you won’t be painted into the corner of “not having enough time to test” the impact of the new functionality and changes to existing functionality – you will be able to know where you stand.

How much is the option of upgrading worth to your company?

 :{> Andy

Technorati Tags:

 

SSIS Design Pattern – ETL Instrumentation, Part 3

Introduction 

This post is part of a series of posts on ETL Instrumentation.

In Part 1 we built a database to hold collected SSIS run time metrics and an SSIS package to deomnstrate how and why we would load metrics into the database.

In Part 2 we expanded on our database and the SSIS package to annotate version metadata, manage error metrics capture, and task status reporting.

In Part 3, we start using the ETL Instrumentation infrastructure we’ve built to measure some actual ETL. We now have the concepts – it’s time to put them to work!

A Brief History Of Our ETL Instrumentation Project

To review, our existing database is named SSISRunTimeMetrics. It contains a schema named ssis. In this schema are eight objects:
 – a table named ssis.RunTimeMetrics.
 – a table named ssis.RunTimeErrors.
 – a table named ssis.TaskMetrics.
 – a stored procedure named ssis.usp_RecordPackageStart.
 – a stored procedure named ssis.usp_RecordPackageEnd.
 – a stored procedure named ssis.usp_RecordPackageError.
 – a stored procedure named ssis.usp_RecordTaskStart.
 – a stored procedure named ssis.usp_RecordTaskEnd.

We expanded the types of run-time data we are collecting. Part 1 introduced Status collection, in Part 2 we added Exception collection. We also introduced scope into both types of collection, recording Exception information on error and finalizing Status (reporting that an error occurred).

At the beginning of SSIS package execution, we call ssis.usp_RecordPackageStart from an Execute SQL Task. We pass the package start date and time, the package name, and the package version. We also pass in a status of “Started”. From this stored procedure we get the ID of the newly created row, which we then push into a Package Load ID variable (iPackageLoadID).

At the beginning of a task or collection of tasks that define a process, we call ssis.usp_RecordTaskStart from an Execute SQL Task. We pass the task or process start date and time, the task (source) name, iPackageLoadID, and a status of “Started”. From this stored procedure we get the ID of the newly created row, which we then push into a Task Load ID variable (iTaskLoadID).

When the task completes, we call ssis.usp_RecordTaskEnd from an Execute SQL Task. We pass in the Task Load ID from the iTaskLoadID variable, the current date and time, and the status “Succeeded”.

On error, we capture Exception data and record an Error Status – both are crucial to knowing what happens when an exception is thrown.

When the package completes execution, we call ssis.usp_RecordPackageEnd from an Execute SQL Task. We pass in the Package Load ID from the variable, the current date and time, and the status “Succeeded”.

Version Metadata

Let’s create a target database for our ETL process. Open SQL Server Management Studio (SSMS) and execute the following script to create SSISRunTimeMetrics_Target:

— create a target database…
use master
go
if not exists(select name
              from master.sys.databases
              where name = ‘SSISRunTimeMetrics_Target’)
Create Database SSISRunTimeMetrics_Target
go

Open the existing SSISRunTimeMetrics SSIS solution. Open the SSISRunTimeMetrics package and edit the version metadata:

This is a major revision – we’re moving to version 2.0 and resetting the VersionBuild property. I also add text to the VersionComments property.

Don’t forget the package annotation:

Now we’re ready to make some code changes.

Extract, Transform, Load? Two Out Of Three Ain’t Bad…

Rename the “Step 1 – Do some stuff” Sequence Container to “Step 1 – Load Contacts”. Double-click the “Log Start of Task” Execute SQL Task to open the editor. Set the SQLStatement property to:

declare @Now datetime
set @Now = GetDate()
exec ssis.usp_RecordTaskStart ?,NULL,@Now,?,’Started’

Change the name property to “Load Contact”:

Click the Parameter Mapping page and click the Add button. Set the Variable Name to System::TaskName, change the Data Type to VarChar, and the Parameter Name to 1. 

Other applicable properties were modified in Part 2 of this series. Because of the design of the related Task Status collection in Part 2, we do not need to make major changes to the “Log Successful End of Task” or “Log Failed End of Task” Execute SQL Tasks – they are ready for our purposes. Next, delete the Test Script Task. Drag a Data Flow Task into the space vacated by the script task:

Double-click the Data Flow Task to open the editor. Drag an OLE DB Source Adapter onto the Data Flow Task canvas and double-click it to open the editor. Click the New button beside the OLE DB connection manager to open the Configure OLE DB Connection Manager dialog. Click the New button to open the Connection Manager dialog. Enter “(local)” in the Server name dropdown box. In the “Select or enter a database name” dropdown, select AdventureWorks.

 

You can download SQL Server 2005 Samples and Sample Databases if you don’t already have them installed.

Click the OK button to close the Connection Manager dialog. Click the OK button again to close the Configure OLE DB Connection Manager dialog. Select “Person.Contact” from the “Name of the table or the view” dropdown and click the OK button to close the OLE DB Source Editor.

Drag an OLE DB Destination Adapter onto the Data Flow canvas and connect it to the OLE DB Source Adapter.

Double-click the OLE DB Destination Adapter to open the editor. Using the procedure outlined above, create a connection to the SSISRunTimeMetrics_Target database you created earlier:

 

 Click the New button beside the “Name of the table or the view” dropdown. Change the table name in the Create Table statement from “[OLE DB Destination]” to “[dbo].[Contact]”. Your script should now read:

CREATE TABLE [dbo].[Contact] (
[ContactID] INTEGER,
[NameStyle] BIT,
[Title] NVARCHAR(8),
[FirstName] NVARCHAR(50),
[MiddleName] NVARCHAR(50),
[LastName] NVARCHAR(50),
[Suffix] NVARCHAR(10),
[EmailAddress] NVARCHAR(50),
[EmailPromotion] INTEGER,
[Phone] NVARCHAR(25),
[PasswordHash] VARCHAR(40),
[PasswordSalt] VARCHAR(10),
[AdditionalContactInfo] NTEXT,
[rowguid] UNIQUEIDENTIFIER,
[ModifiedDate] DATETIME
)

Click the OK button. Click the Mappings page and then the Ok button to close the editor.

Now we have a process that extracts data from the AdventureWorks database and loads it into our SSISRunTimeMetrics_Target database. Execute the package and observe the Data Flow:

 

Let’s modify the report query from Part 2:

select
m.packageName
,m.packageStartDateTime
,DateDiff(ss, m.packageStartDateTime, m.packageEndDateTime) as ‘packageRunTime’
,m.packageStatus
,t.SourceName
,t.TaskStartDateTime
,DateDiff(ss, t.TaskStartDateTime, t.TaskEndDateTime) as ‘taskRunTime’
,t.TaskStatus
,e.ErrorNumber
,e.ErrorDescription
from ssis.RunTimeMetrics m
inner join ssis.TaskMetrics t on t.RunTimeMetricsID = m.id
left outer join ssis.RunTimeErrors e on e.RunTimeMetricsID = m.id
where m.id = (select Max(id)
              from ssis.RunTimeMetrics)

The left outer join to the RunTimeErrors table allows us to view successful executions. Chnaging the Where clause to bring back the latest id value from the RunTimeMetrics table will prove useful when we add tasks.

Count Me In

On the Data Flow Task, delete the data flow path between the OLE DB Source Adapter and OLE DB Destination Adapter. Drag a Row Count Tranformation from the Toolbox onto the Data Flow canvas and connect it between the OLE DB Source and OLE DB Destination Adapters.

Click SSIS, Variables and create a new package-scoped Int32 variable named iContactCount.

Double-click the Row Count transformation to open the editor. Click the VariableName dropdown and select the User::iContactCount variable to hold the record counts.

Now we have a variable to count the number of rows loaded by the data flow – we now need a place to store this data. Let’s create a table to hold the row counts. Use the following script to create the ssis.RowCounts table:

use SSISRunTimeMetrics
go
— vars…
declare @sql varchar(255)
— create ssis schema…
if not exists(select name
              from sys.schemas
              where name = ‘ssis’)
 begin
  set @sql = ‘Create Schema ssis’
  exec(@sql)
 end

— create RunTimeErrors table…
if not exists(select s.name + ‘.’ + t.name
              from sys.tables t
              inner join sys.schemas s on s.schema_id = t.schema_id
              where t.name = ‘RowCounts’
               and s.name = ‘ssis’)
Create Table ssis.RowCounts
(RowCountsID int identity(1,1)
,TaskMetricsID int null
,RunTimeMetricsId int not null
,ParentTaskMetricsID int null
,[RowCount] int null)

As always, let’s create a stored procedure to handle the insert:

use SSISRunTimeMetrics
go
if exists(select s.name + ‘.’ + p.name
          from sys.procedures p
          inner join sys.schemas s on s.schema_id = p.schema_id
          where p.name = ‘usp_RecordRowCount’
           and s.name = ‘ssis’)
 begin
  Drop Procedure ssis.usp_RecordRowCount
 end
go

Create Procedure ssis.usp_RecordRowCount
  @RunTimeMetricsID int
 ,@TaskMetricsID int
 ,@RowCount int
 ,@ParentTaskMetricsID int = null
As
begin
— insert the run time errors data…
insert into ssis.RowCounts
 (TaskMetricsID
 ,RunTimeMetricsId
 ,ParentTaskMetricsID
 ,[RowCount])
values
 (@TaskMetricsID
 ,@RunTimeMetricsID
 ,@ParentTaskMetricsID
 ,@RowCount)
end
go

There are different ways (methodologies) to instrument an ETL SSIS package. Some designers choose to place the code to insert row counts – as well as the code to store start and end metrics we capture in the ssis.TaskMetrics table – on the Control Flow. Other choose to use Event Handlers – OnPreExecute and OnPostExecute – to house the logic performing these operations. 

There are sound reasons for choosing each method. An argument for using Event Handlers is it provides a separation of the ETL logic and metrics / status logic. An argument for not using Event Handlers is all the logic is readily accessible on the Control Flow and Data Flow. Both arguments are legitimate and the customer’s maintenance needs and SSIS experience should be considered in this design decision.

For the purposes of this series, I choose to have most of the logic visible on the Control Flow and Data Flow.

Return to the Control Flow to add a mechanism to insert the RowCount data into our new table. Drag an Execute SQL Task into the “Step 1 – Load Contact” Sequence Container. Connect a Success Precendence Constraint from the “Log Successful End of Task” Execute SQL Task to the new Execute SQL Task.

Double-click the new Execute SQL Task to open the editor. Configure the General and Parameter Mappings pages as shown:

Click the OK button to close the Execute SQL Task editor.

Before we test the new row count functionality, let’s clear the destination table. Execute the following SQL statement in SSMS:

Truncate Table SSISRunTimeMetrics_Target.dbo.Contact

Now execute the package.

A slight modification of our earlier report query adds the record counts:

select
m.packageName
,m.packageStartDateTime
,DateDiff(ss, m.packageStartDateTime, m.packageEndDateTime) as ‘packageRunTime’
,m.packageStatus
,t.SourceName
,t.TaskStartDateTime
,DateDiff(ss, t.TaskStartDateTime, t.TaskEndDateTime) as ‘taskRunTime’
,t.TaskStatus
,c.[RowCount]
,e.ErrorNumber
,e.ErrorDescription
from ssis.RunTimeMetrics m
inner join ssis.TaskMetrics t on t.RunTimeMetricsID = m.id
left outer join ssis.RunTimeErrors e on e.RunTimeMetricsID = m.id
left outer join ssis.RowCounts c on c.RunTimeMetricsID = m.id
where m.id = (select Max(id)
              from ssis.RunTimeMetrics)

The results appear as shown:

We get all the rows this execution – cool.

This demostration is nearly true-to-life. Let’s take it closer by adding incremental loading. (I cover another SSIS Design Pattern – Incremental Loads – in another post.)

First return to the Control Flow. Click anywhere in the Control Flow canvas where there is white space. This sets focus to the Control Flow, which sets focus to the Package level. Click SSIS, Variables to display the Variables dialog. Add three package-scoped Int32 variables named iNewRowsCount, iChangedRowsCount, and iUnchangedRowsCount.

Click the Data Flow tab and drag an OLE DB Command, Lookup, and Conditional Split transformations onto the Data Flow Task. Also drag three Row Count transformations onto the Data Flow.

Rename the OLE DB Destination Adapter “Insert New Rows” and delete the data flow path (the green line / arrow) that connects it to the Row Count transformation. Connect the Lookup to the output of the Row Count to the input of the Lookup and double-click the Lookup to open the editor. Set the OLE DB connection manager property to “(local).SSISRunTimeMetrics_Target” and the “Use a table or view” dropdown to “[dbo].[Contact]”:

Click the Columns tab. Right-click in the mappings area and click “Select All Mappings”. Right-click in the mappings area again and click “Delete Selected Mappings”. Drag ContactID from the Available Input Columns and drop it on the Available Lookup Columns’ ContactID column.

Select all the columns in the Available Lookup Columns list except the AdditionalContactInfo and rowguid columns. In the Output Alias column, add a “Dest_” prefix to each field. This will allow easier identification in the pipeline:

Click the Configure Error Output button and change the Lookup Output Error dropdown from “Fail component” to “Ignore failure”. This changes the default behavior of the Loopup operation from an Inner Join to a Left Outer Join.

Click the OK button to close the Lookup editor. Connect the output of the Lookup transformation to the input of the Conditional Split transformation and double-click the Conditional Split to open the editor.

Name the first Output “New Rows” and set the Condition to “IsNull(Dest_ContactID)”.

Name the second Output “Changed Rows” and set the Condition to “(IsNull(Dest_NameStyle) ? True : Dest_NameStyle) != (IsNull(NameStyle) ? True : NameStyle) || (IsNull(Dest_Title) ? “null” : Dest_Title) != (IsNull(Title) ? “null” : Title) || (IsNull(Dest_FirstName) ? “null” : Dest_FirstName) != (IsNull(FirstName) ? “null” : FirstName) || (IsNull(Dest_MiddleName) ? “null” : Dest_MiddleName) != (IsNull(MiddleName) ? “null” : MiddleName) || (IsNull(Dest_LastName) ? “null” : Dest_LastName) != (IsNull(LastName) ? “null” : LastName) || (IsNull(Dest_Suffix) ? “null” : Dest_Suffix) != (IsNull(Suffix) ? “null” : Suffix) || (IsNull(Dest_EmailAddress) ? “null” : Dest_EmailAddress) != (IsNull(EmailAddress) ? “null” : EmailAddress) || (IsNull(Dest_EmailPromotion) ? -1 : Dest_EmailPromotion) != (IsNull(EmailPromotion) ? -1 : EmailPromotion) || (IsNull(Dest_Phone) ? “null” : Dest_Phone) != (IsNull(Phone) ? “null” : Phone) || (IsNull(Dest_PasswordHash) ? “null” : Dest_PasswordHash) != (IsNull(PasswordHash) ? “null” : PasswordHash) || (IsNull(Dest_PasswordSalt) ? “null” : Dest_PasswordSalt) != (IsNull(PasswordSalt) ? “null” : PasswordSalt) || (IsNull(Dest_ModifiedDate) ? (DT_DBDATE)”1/1/1901″ : Dest_ModifiedDate) != (IsNull(ModifiedDate) ? (DT_DBDATE)”1/1/1901″ : ModifiedDate)”.

Change the Default Output Name to “Unchanged Rows”.

Drag an output from the Conditional Split to one of the Row Count transformations. When prompted, select the New Rows output from the Conditional Split transformation.

Rename this Row Count “New Rows Count” and double-click it to open the editor. Set the VariableName property to User::iNewRowsCount and click the OK button to close the editor.

Connect the output of the New Rows Count transformation to the OLE DB Destination Adapter “Insert New Rows”.

Drag a second output from the Conditional Split to another Row Count transformation. When prompted, select the Changed Rows output.

Rename the Row Count transformation “Changed Rows Count” and double-click the Row Count transformation to open the editor. Set the VariableName property to iChangedRowsCount and click the OK button to close the editor.

Double-click the OLE DB Command transformation to open the advanced editor. Set the Connection Manager property to “(local).SSISRunTimeMetrics_Target”.

Click the Component Properties tab. Click the SQLCommand ellipsis and enter the following statement in the String Value Editor:

update dbo.Contacts
set NameStyle = ?
,Title = ?
,FirstName = ?
,MiddleName = ?
,LastName = ?
,Suffix = ?
,EmailAddress = ?
,EmailPromotion = ?
,Phone = ?
,PasswordHash = ?
,PasswordSalt = ?
,ModifiedDate = ?
where ContactID = ?

Connect the remaining Conditional Split output (Unchanged Rows) to the remaining Row Count transformation. Rename the Row Count “Unchanged Rows Count”, double-click it to open the editor, and set the VariableName to User::iUnchangedRowCount.

Some Optimization… 

Before we leave the Data Flow, let’s optimize it for performance. There’s a couple good reasons for this, one big reason is the current design may very well result in blocking, maybe even deadlocks!

Delete the “Update Changed Rows” OLE DB Command. Drag an OLE DB Destination Adapter onto the Data Flow Task and connect it to the Changed Rows Count transformation. Change the name of the Adapter to “stgContactChangedRows” and double-click it to open the editor. Set the OLE DB connection manager to “(local).SSISRUnTimeMetrics_Target”. Change the Data Access Mode to “Table or view”. Click the New button beside the “Name of table or view” dropdown:

Click the OK button to create the new table, then click the Mappings page to complete configuration. Click the OK buton to close the editor.

On the Control Flow, add an Execute SQL Task just after the Data Flow Task. Name it “Apply Staged Updates” and set the Connection property to “(local).SSISRunTimeMetrics_Target”. Set the SQLStatement property to the follow SQL:

UPDATE c
SET c.ContactID = u.ContactID
,c.NameStyle = u.NameStyle
,c.Title = u.Title
,c.FirstName = u.FirstName
,c.MiddleName = u.MiddleName
,c.LastName = u.LastName
,c.Suffix = u.Suffix
,c.EmailAddress = u.EmailAddress
,c.EmailPromotion = u.EmailPromotion
,c.Phone = u.Phone
,c.PasswordHash = u.PasswordHash
,c.PasswordSalt = u.PasswordSalt
,c.AdditionalContactInfo = u.AdditionalContactInfo
,c.rowguid = u.rowguid
,c.ModifiedDate = u.ModifiedDate
FROM dbo.Contact c
INNER JOIN dbo.stgContactChangedRows u on u.ContactID = c.ContactID

This converts the RBAR (Row-By-Agonizing-Row) update statement in the OLE DB Command to a set-based update. Much better. 

We need to capture these row counts, but we need a way to mark them as well. Let’s return to our ssis.RowCounts table design and modify it. Before modifying the table, let’s create and populate a lookup table for RowCountTypes:

use SSISRunTimeMetrics
go

— vars…
declare @sql varchar(255)

— create ssis schema…
if not exists(select name
              from sys.schemas
              where name = ‘ssis’)
 begin
  set @sql = ‘Create Schema ssis’
  exec(@sql)
 end

— create RunTimeErrors table…
if not exists(select s.name + ‘.’ + t.name
              from sys.tables t
              inner join sys.schemas s on s.schema_id = t.schema_id
              where t.name = ‘RowCountTypes’
               and s.name = ‘ssis’)
Create Table ssis.RowCountTypes
 (RowCountTypeID char(1) not null
 ,RowCountTypeName varchar(25) null
 ,RowCountTypeDescription varchar(255) null)
go

Execute the following statement to add the lookup column to the ssis.RowCounts table:

if not exists(select c.name
from sys.columns c
inner join sys.tables t on t.object_id = c.object_id
inner join sys.schemas s on s.schema_id = t.schema_id
where s.name = ‘ssis’
and t.name = ‘RowCounts’
and c.name = ‘RowCountTypeID’)
Alter Table ssis.RowCounts Add RowCountTypeID char(1) null
go

Execute the following statements to populate the RowCountTypes table:

if not exists(select RowCountTypeID
from ssis.RowCountTypes
where RowCountTypeID = ‘I’)
insert into ssis.RowCountTypes
(RowCountTypeID
,RowCountTypeName
,RowCountTypeDescription)
values
(‘I’
,’Selected Input Rows’
,’Input rows selected from a source’)

if not exists(select RowCountTypeID
from ssis.RowCountTypes
where RowCountTypeID = ‘N’)
insert into ssis.RowCountTypes
(RowCountTypeID
,RowCountTypeName
,RowCountTypeDescription)
values
(‘N’
,’New Rows’
,’New rows’)
if not exists(select RowCountTypeID
from ssis.RowCountTypes
where RowCountTypeID = ‘C’)

insert into ssis.RowCountTypes
(RowCountTypeID
,RowCountTypeName
,RowCountTypeDescription)
values
(‘C’
,’Changed Rows’
,’Changed rows’)
if not exists(select RowCountTypeID
from ssis.RowCountTypes
where RowCountTypeID = ‘U’)

insert into ssis.RowCountTypes
(RowCountTypeID
,RowCountTypeName
,RowCountTypeDescription)
values
(‘U’
,’Unchanged Rows’
,’No changes detected in rows’)

We now need to updated our stored procedure that inserts records into ssis.RowCounts:

use SSISRunTimeMetrics
go

if exists(select s.name + ‘.’ + p.name
          from sys.procedures p
          inner join sys.schemas s on s.schema_id = p.schema_id
          where p.name = ‘usp_RecordRowCount’
           and s.name = ‘ssis’)
 begin
  Drop Procedure ssis.usp_RecordRowCount
 end
go

Create Procedure ssis.usp_RecordRowCount
 @RunTimeMetricsID int
,@TaskMetricsID int
,@RowCount int
,@RowCountType char(1)
,@ParentTaskMetricsID int = null
As
begin
— insert the run time errors data…
insert into ssis.RowCounts
(TaskMetricsID
,RunTimeMetricsId
,ParentTaskMetricsID
,[RowCount]
,RowCountTypeID)
values
(@TaskMetricsID
,@RunTimeMetricsID
,@ParentTaskMetricsID
,@RowCount
,@RowCountType)
end
go

Finally, let’s update the “Record Row Count” SQLStatement.While we’re in there, let’s add the inserts for the additional row count variables. Return to the Control Flow and double-click the “Record Row Count” Execute SQL Task to open the editor.

Change the SQLStatement to:

exec ssis.usp_RecordRowCount ?,?,?,’I’
exec ssis.usp_RecordRowCount ?,?,?,’N’
exec ssis.usp_RecordRowCount ?,?,?,’C’
exec ssis.usp_RecordRowCount ?,?,?,’U’

Click the Parameter Mappings page and edit as shown:

Note every third parameter is a count measurement, the order indicated by the SQLStatement property. Click the OK button to close the editor and execute the package.

Let’s modify our report query yet again to have a look at the collected metrics:

select
m.packageName
,m.packageStartDateTime
,DateDiff(ss, m.packageStartDateTime, m.packageEndDateTime) as ‘packageRunTime’
,m.packageStatus
,t.SourceName
,t.TaskStartDateTime
,DateDiff(ss, t.TaskStartDateTime, t.TaskEndDateTime) as ‘taskRunTime’
,t.TaskStatus
,c.[RowCount]
,ct.RowCountTypeName
,e.ErrorNumber
,e.ErrorDescription
from ssis.RunTimeMetrics m
inner join ssis.TaskMetrics t on t.RunTimeMetricsID = m.id
left outer join ssis.RunTimeErrors e on e.RunTimeMetricsID = m.id
left outer join ssis.RowCounts c on c.RunTimeMetricsID = m.id
left outer join ssis.RowCountTypes ct on ct.RowCountTypeID = c.RowCountTypeID
where m.id = (select Max(id)
              from ssis.RunTimeMetrics)

The results?

The Data It Is A-Changin’ 

Let’s simulate live data changes by making some changes in the source data. Execute the following statements to delete and modify data in SSISRunTimeMetrics_Target:

use SSISRunTimeMetrics_Target

delete dbo.Contact
where ContactID % 17 = 0

update dbo.Contact
set MiddleName = IsNull(MiddleName,”) + ‘A’
where ContactID % 25 = 0

Execute the package again and observe the Data Flow. Now this is starting to look like SSIS ETL!

Let’s have a look at our metrics to see if the numbers match:

 

I’d say that settles it.

We are now capturing enough information to gain good insight into our ETL process – how cool is that? More next time…

Get the code!

:{> Andy

PS – The PASS Summit 2015 Call for Speakers closes at 9:00 PM PDT Sunday, 15 Mar 2015 (04:00 GMT 16 Mar 2015). There’s still time, but hurry!

Learn more: 
Linchpin People Blog: SSIS 
Stairway to Biml 
Stairway to Integration Services 

SSIS2014DesignPatterns200 

SSIS Design Pattern – ETL Instrumentation, Part 2

Introduction

This post is part of a series of posts on ETL Instrumentation.

In Part 1 we built a database to hold collected SSIS run time metrics and an SSIS package to deomnstrate how and why we would load metrics into the database.

In Part 2 we will expand on our database and the SSIS package to annotate version metadata, manage error metrics capture, and task status reporting.

A Brief History Of Our ETL Instrumentation Project

To review, our existing database is named SSISRunTimeMetrics. It contains a schema named ssis. In this schema are three objects:
 – a table named ssis.RunTimeMetrics.
 – a stored procedure named ssis.usp_RecordPackageStart.
 – a stored procedure named ssis.usp_RecordPackageEnd.

At the beginning of SSIS package execution, we call ssis.usp_RecordPackageStart from an Execute SQL Task. We pass the package start date and time, the package name, and the package version. We also pass in a status of “Started”. From this stored procedure we get the ID of the newly created row, which we then push into a Package Load ID variable (iPackageLoadID).

When the package completes execution, we call ssis.usp_RecordPackageEnd from an Execute SQL Task. We pass in the Package Load ID from the variable, the current date and time, and the status “Succeeded”.

First Things First

Let’s start by opening the SSIS package we built in Part 1. The first thing we want to change is the package metadata – the data in our SSIS package about our SSIS package. Why? We always want to know what changes were made, who made the changes, and when they were made.

Let’s begin by updating the version. Click on the Control Flow tab and press the F4 key to open properties. Scroll to the Version Properties and change the VersionMinor from 0 to 1:

Also edit the Control Flow annotation that contains version information:

Update package description annotation. I like to include a Change Log summary inside each package:

In the Change Log, I include 

  • A version number in the format Major.Minor.Date.
  • The date of the change.
  • The name of the developer who made the change.
  • A brief description of the change.

Now I’m ready to actually make changes! To facilitate package execution on your machine, I want to edit the name and target SQL Server instance of the OLEDB Connection Manager. Double-click the connection manager to edit it and change the Server name property to “(local)”.

Click the OK button to close the connection manager editor. For consistency, right-click the connection manager and click Rename. Change the name to “(local).SSISRunTimeMetrics”:

There. Housekeeping accomplished!

A test run verifies I haven’t broken anything:

Unit test successful – ready to continue developing.

Errors Aside

Our current solution works well as long as our SSIS package succeeds. And this is the way all good engineering is done; Use Case 1 should always be “Everything Succeeds”. But what happens if the SSIS package fails? Shouldn’t we record the failure and as much metadata about the failure as possible? I’m glad you asked: Yes, we should.

In SSIS, errors “bubble”. The concept of bubbling is familiar to ASP developers – it simply means events follow a prescribed path through the scope hierarchy of the application. Ok, you can scratch the word “simply” out of the preceeding sentence.

Let’s use the script task to generate an error. Right the Test Script Task and click Edit to open the editor. Click the Script page and the Design Script button to open the Visual Studio for Applications (VSA) script editor. I use Script Tasks for exception testing. The following code  – replacing the “Dts.TaskResult = Dts.Results.Success” line – will prompt you to have the task succeed or fail:

Dim iResponse As Integer
iResponse = MsgBox(“Have task complete with Success?”, MsgBoxStyle.YesNo)
If iResponse = vbYes Then
  Dts.TaskResult = Dts.Results.Success
Else
  Dts.TaskResult = Dts.Results.Failure
End If

The additional code causes a second messagebox:

 

If you click Yes in response, the task will succeed. Clicking No will cause the task to fail:

SSIS has some great logging and error-handling functionality built in. The first place to look when an error occurs is the Progress / Execution Results tab in the SSIS Designer GUI. The tab is named Progress while the package is executing and becomes Execution Results once execution completes. Click the Progress / Execution Results tab to view the error raised when you clicked the No button:

This isn’t a very descriptive error message. Some SSIS error messages aren’t very descriptive. Let’s add some error logging capabilities to our ETL Instrumentation application.

To Err is Human… 

There are lots of ways to develop solutions with SSIS. Some of them are best practices, more are simply matters of coding style. For instance, here you could build the database table to hold error information first; or you could build the error-handling functionality in the SSIS package first. This is a style decision. I choose to build the table first.

My initial errors table will be named RunTimeErrors and will contain the following fields:

  • RunTimeMetricsID – to map the error to the execution instance. I want to know the error occurred when I executed the package at a certain time.
  • ErrorDateTime – when the error actually occurred.
  • SourceName – the task that raised the error.
  • ErrorNumber – the error number.
  • ErrorDescription – the error description.

I generate the table with the following T-SQL script:

use SSISRunTimeMetrics
go

— vars…
declare @sql varchar(255)

— create ssis schema…
if not exists(select name
              from sys.schemas
              where name = ‘ssis’)
begin
 set @sql = ‘Create Schema ssis’
 exec(@sql)
end

— create RunTimeErrors table…
if not exists(select s.name + ‘.’ + t.name
              from sys.tables t
              inner join sys.schemas s on s.schema_id = t.schema_id
              where t.name = ‘RunTimeErrors’
               and s.name = ‘ssis’)
 Create Table ssis.RunTimeErrors
 (RunTimeMetricsId int
 ,ErrorDateTime datetime null
 ,SourceName varchar(255) null
 ,ErrorNumber int null
 ,ErrorDescription varchar(2500) null)

Note this script is re-runnable. I encourage you to take the time to write re-runnable scripts when developing database solutions.

We could stop here and directly insert rows into our error table, but let’s complete the development process by creating a stored procedure to do this work for us. Execute the following code in SSMS:

use SSISRunTimeMetrics
go

if exists(select s.name + ‘.’ + p.name
          from sys.procedures p
          inner join sys.schemas s on s.schema_id = p.schema_id
          where p.name = ‘usp_RecordPackageError’
           and s.name = ‘ssis’)
 begin
  Drop Procedure ssis.usp_RecordPackageError
 end
go

Create Procedure ssis.usp_RecordPackageError
  @RunTimeMetricsID int
 ,@SourceName varchar(255) = null
 ,@ErrorNumber int = null
 ,@ErrorDescription varchar(2500) = null
As

 begin

  — set @Now…
  declare @Now datetime
  set @Now = GetDate()

  — insert the run time errors data…
  insert into ssis.RunTimeErrors
  (RunTimeMetricsID
  ,ErrorDateTime
  ,SourceName
  ,ErrorNumber
  ,ErrorDescription)
  values
  (@RunTimeMetricsID
  ,@Now
  ,@SourceName
  ,@ErrorNumber
  ,@ErrorDescription)

 end

go

Again, re-runnable code. Now lets’ return to Visual Studio to populate our table with error information.

Click on the Test Script Task on the Control Flow. Then click the Event Handlers tab. Because you clicked the Test Script Task first, the Event Handler Executable defaults to the Test Script Task. The Event Handler defaults to the OnError event.

 

To create the OnError Event Handler, click the link labeled “Click here to create an ‘OnError’ event handler for executable ‘Test Script Task'”. Let’s do that now.

After you click the link you will note the canvas looks remarkably familiar: it’s nearly identical to the Control Flow.

Our goal is to load data into our RunTimeErrors table. Let’s use an Execute SQL Task to call the stored procedure. Drag an Execute SQL Task onto the OnError Event Handler canvas and double-click it to open the editor. Name the Execute SQL Task “Record Error”. Set the connection to “(local).SSISRunTimeMetrics”. Set the SQLStatement property to “exec ssis.usp_RecordPackageError ?,?,?,?”:

Click the Parameter Mapping page. Parameterized queries use question marks (?) as parameter placeholders. There are a few ways to access the parameters (a subject for another blog post). We will use ordinal position to access them. Click the Add button to add a new parameter. Set the Variable Name property to “User::iPackageLoadID” and the Parameter Name property to “0”. The default data type (“Long”) works for SSIS integer data type variables. This maps the iPackageLoadID variable to the first question mark in our SQLStatement property, which is the @RunTimeMetricsID parameter for the stroed procedure. Got it? Good.

Add three more parameters:

Variable Name: System::SourceName
Data Type: VarChar
Parameter Name: 1

Variable Name: System::ErrorCode
Data Type: Long
Parameter Name: 2

Variable Name: System::ErrorDescription
Data Type: VarChar
Parameter Name: 3

Click the OK button to close the editor and execute the package. 

Click the OK button on the first messagebox, then click the No button on the messagebox dialog to fail the Test Script Task. When the task fails, the OnError event fires and calls our stored procedure:

 

Let’s look at the table to see what was recorded. Return to SSMS and execute the following query:

select *
from ssis.RunTimeErrors
order by ErrorDateTime desc 

The results:

We now have basic error information from our failed task. This is great for our Test Script Task, but what about the other tasks on our SSIS package? Do we have to create OnError event handlers for them as well? That could be a lot of work.

Remember that bubbling thing we mentioned at the beginning of this post? We can employ the fact that errors bubble up through the object model. When an error occurs in the Test Script Task, an error bubbles up to the Sequence Container named “Step 1 – Do some stuff”. You can see it in this earlier screenshot:

To demonstrate, let’s add error handling to the Sequence Container. First, stop debugging the SSIS package if you have not already done so. Then right-click the “Record Error” Execute SQL Task and click Copy to copy it to the clipboard. Click the Executable dropdown and select the “Step 1 – Do some stuff” Sequence Container:

Select the OnError event handler and click the link to create the OnError event handler. Then paste the “Record Error” Execute SQL Task onto the new OnError event handler canvas. Re-execute the package observing the new OnError event handler. Click the No button to cause the Test Script Task to fail.

Note the new OnError event handler fires, in addition to the Test Script Task OnError event handler:

Re-execute the query in SSMS to verify:

Note the subsequent execution (RunTimeMetricsID 10, in my case) records the error twice. Hmmm… this is not the desired behavior. I only want the error to show up once per instance. However, also note the information recorded is nearly identical – only the date is different (and then only by 183 milliseconds). So the 20:17:58.803 occurrence of the error is from the Test Script Task and the 20:17:58.990 occurrence is the same error bubbling into the “Step 1 – Do some stuff” Sequence Container’s context and firing an identicaly-configured OnError event handler.

Again, this is not the desired behavior but it gives me an idea: Since I know errors bubble up the object model, all errors should eventually make it to the Package level. If I delete my existing OnError event handlers and create a single OnError event handler at the package level, I should then be able to record any and all errors – no matter where they occur in the SSIS package.

Let’s do that. First, copy the “Record Error” Execute SQL Task to the clipboard. Click the Delete button to delete the Sequence Container’s OnError event handler, then navigate to the Test Script Task’s OnError event handler and click the Delete button there. Click the Yes button on the confirmation dialog each time:

Navigate to the SSISRunTimeMetrics (package) event handlers:

Create the event handler by clicking the link and paste the Execute SQL Task onto the canvas. Voila, you now have error handling for then entire package. Here we must also add package status logging. The quickest way to accomplish this is to return to the Control Flow and copy the “Log End of Package Execution” Execute SQL Task to the clipboard. Navigate to the package OnError event handler and paste the Execute SQL Task onto the canvas. Connect it to the “Record Error” script task as shown:

Double-click the task to open the editor. Rename the task “Log Failed Package Execution” and change the SQLStatement property to:

declare @Now datetime
set @Now = GetDate()
Exec ssis.usp_RecordPackageEnd ?,@Now,’Failed’

Click the OK button to close the editor.

Re-execute the package to test, answering No to the dialog and failing the Script Task. Re-execute the query in SSMS to verify a single instance of the error is recorded:

Note the SourceName column contains the name of the task where the error originates, even though the error has bubbled through two layers of object model. This also holds for the ErrorNumber and ErrorDescription fields.

Task Status, Anyone?

It’s helpful to be able to track performance at a granular level. When instrumenting an ETL process, I like to record additional (and different) metadata at the Task level.

Since containers contain tasks and groups can represent linked processes or related tasks, task metrics must include parent-task relationships. Execute the following script to build our first TaskMetrics table:

use SSISRunTimeMetrics
go

— vars…
declare @sql varchar(255)

— create ssis schema…
if not exists(select name
              from sys.schemas
              where name = ‘ssis’)
 begin
  set @sql = ‘Create Schema ssis’
  exec(@sql)
 end

— create RunTimeErrors table…
if not exists(select s.name + ‘.’ + t.name
              from sys.tables t
              inner join sys.schemas s on s.schema_id = t.schema_id
              where t.name = ‘TaskMetrics’
               and s.name = ‘ssis’)
 Create Table ssis.TaskMetrics
 (TaskMetricsID int identity(1,1)
 ,RunTimeMetricsId int not null
 ,ParentTaskMetricsID int null
 ,TaskStartDateTime datetime null
 ,TaskEndDateTime datetime null
 ,SourceName varchar(255) null
 ,TaskStatus varchar(25) null)

As before, let’s create stored procedures to manage inserting start-of-task information into the TaskMetrics table and updating end-of-task information:

use SSISRunTimeMetrics
go

if exists(select s.name + ‘.’ + p.name
          from sys.procedures p
          inner join sys.schemas s on s.schema_id = p.schema_id
          where p.name = ‘usp_RecordTaskMetrics’
           and s.name = ‘ssis’)
 begin
  Drop Procedure ssis.usp_RecordTaskMetrics
 end
go

Create Procedure ssis.usp_RecordTaskStart
  @RunTimeMetricsID int
 ,@ParentTaskMetricsID int = null
 ,@TaskStartDateTime datetime = null
 ,@SourceName varchar(255) = null
 ,@TaskStatus varchar(25) = null
As

begin

 — set @TaskStartDateTime default…
 declare @Now datetime
 set @Now = GetDate()
 if @TaskStartDateTime is null
  set @TaskStartDateTime = @Now

 — insert the run time errors data…
insert into ssis.TaskMetrics
 (RunTimeMetricsID
 ,ParentTaskMetricsID
 ,TaskStartDateTime
 ,SourceName
 ,TaskStatus)
output inserted.TaskMetricsID as ‘Inserted_Task_ID’
values
 (@RunTimeMetricsID
 ,@ParentTaskMetricsID
 ,@TaskStartDateTime
 ,@SourceName
 ,@TaskStatus)

end

go

Now for the end-of-task update:

use SSISRunTimeMetrics
go

if exists(select s.name + ‘.’ + p.name
          from sys.procedures p
          inner join sys.schemas s on s.schema_id = p.schema_id
          where p.name = ‘usp_RecordTaskEnd’
           and s.name = ‘ssis’)
 begin
  Drop Procedure ssis.usp_RecordTaskEnd
 end
go

Create Procedure ssis.usp_RecordTaskEnd
 @TaskMetricsID int
,@TaskEndDateTime datetime = null
,@TaskStatus varchar(25) = null
As

begin

— set @TaskEndDateTime default…
declare @Now datetime
set @Now = GetDate()

if (@TaskEndDateTime Is Null)
 set @TaskEndDateTime = @Now

— update the task metrics data…
update ssis.TaskMetrics
Set TaskEndDateTime = @TaskEndDateTime
   ,TaskStatus = @TaskStatus
where TaskMetricsid = @TaskMetricsid

end

go

Instrumenting the Task 

We will use the new procedures to instrument the task.

We need some variables for these parameters. At a bare minimum, we need a variable to hold the Task Metrics ID returned from the ssis.usp_RecordTaskStart stored procedure. Open variables and create a new package-scoped, Int32 type variable named iTaskLoadID:

 

Drag an Execute SQL Task into the Sequence Container an configure the General page as shown:

Set the SQLStatement property to the follow script:

declare @Now datetime
set @Now = GetDate()
exec ssis.usp_RecordTaskStart ?,NULL,@Now,’Do some stuff’,’Started’

Click the Parameter Mapping page and add a parameter. Set the Variable Name to User::iPackageLoadID and Parameter Name to 0 – accept the defaults for the remaining parameter properties. Click the Result Set page and add a new result. Set the ResultName to 0 and the Variable Name to User::iTaskLoadID. This will hold the TaskMetricsID value returned from the ssis.usp_RecordTaskStart stored procedure:

Click the OK button to close the editor. Next, let’s create a way to update TaskMetrics table after a successful execution.

Drag an Execute SQL Task into the “Step 1 – Do some stuff” Sequence Container and connect it as shown:

 

Double-click the Execute SQL Task to edit it. Name it “Log Successful End of Task”. Set the SQLStatement to:

declare @Now datetime
set @Now = GetDate()
exec ssis.usp_RecordTaskEnd ?,@Now,’Completed’

Add one parameter mapped to User::iTaskLoadID and set to Parameter Name 0. Test the new tasks by executing the SSIS package. When prompted, have the script task succeed. Execute the following query to observe the results:

select *
from ssis.TaskMetrics
order by TaskMetricsID desc

Note TaskMetricsID 2 completed while TaskMetricsID 1 started and then stopped.

We now need to add a way to track and record task error status. Add another Execute SQL Task to the Control Flow canvas and connect a Failed Precedence Constraint from the Sequence Container to the new Execute SQL Task as shown:

 

 Name the new Execute SQL Task “Log Failed End of Task”. Set the SQLStatement to:

declare @Now datetime
set @Now = GetDate()
exec ssis.usp_RecordTaskEnd ?,@Now,’Failed’

Add one parameter mapped to User::iTaskLoadID and set to Parameter Name 0. Test the new tasks by executing the SSIS package. When prompted, have the script task fail. Execute the query to observe the results:

You now have enough information to correlate the three tables. For example, we can view the results of the last execution with the following script:

select
  m.packageName
 ,m.packageStartDateTime
 ,DateDiff(ss, m.packageStartDateTime, m.packageEndDateTime) as ‘packageRunTime’
 ,m.packageStatus
 ,t.SourceName
 ,t.TaskStartDateTime
 ,DateDiff(ss, t.TaskStartDateTime, t.TaskEndDateTime) as ‘taskRunTime’
 ,t.TaskStatus
 ,e.ErrorNumber
 ,e.ErrorDescription
from ssis.RunTimeMetrics m
inner join ssis.RunTimeErrors e on e.RunTimeMetricsID = m.id
inner join ssis.TaskMetrics t on t.RunTimeMetricsID = m.id
where m.packageStartDateTime = (select Max(packageStartDateTime)
                                from ssis.RunTimeMetrics)

Those results appear as shown here:

This makes a fine source query for a SQL Server Reporting Services solution!

Conclusion

In this second part of ETL Instrumentation, we modified version metadata to indicate we have made changes; added package error logging and error status logging; and added task status logging.

Get the code!

:{> Andy

 PS – The PASS Summit 2015 Call for Speakers closes at 9:00 PM PDT Sunday, 15 Mar 2015 (04:00 GMT 16 Mar 2015). There’s still time, but hurry!

Learn more: 
Linchpin People Blog: SSIS 
Stairway to Biml 
Stairway to Integration Services 

SSIS2014DesignPatterns200

SSIS Design Pattern – ETL Instrumentation, Part 2

Introduction

This post is part of a series of posts on ETL Instrumentation.

In Part 1 we built a database to hold collected SSIS run time metrics and an SSIS package to deomnstrate how and why we would load metrics into the database.

In Part 2 we will expand on our database and the SSIS package to annotate version metadata, manage error metrics capture, and task status reporting.

A Brief History Of Our ETL Instrumentation Project

To review, our existing database is named SSISRunTimeMetrics. It contains a schema named ssis. In this schema are three objects:
 – a table named ssis.RunTimeMetrics.
 – a stored procedure named ssis.usp_RecordPackageStart.
 – a stored procedure named ssis.usp_RecordPackageEnd.

At the beginning of SSIS package execution, we call ssis.usp_RecordPackageStart from an Execute SQL Task. We pass the package start date and time, the package name, and the package version. We also pass in a status of “Started”. From this stored procedure we get the ID of the newly created row, which we then push into a Package Load ID variable (iPackageLoadID).

When the package completes execution, we call ssis.usp_RecordPackageEnd from an Execute SQL Task. We pass in the Package Load ID from the variable, the current date and time, and the status “Succeeded”.

First Things First

Let’s start by opening the SSIS package we built in Part 1. The first thing we want to change is the package metadata – the data in our SSIS package about our SSIS package. Why? We always want to know what changes were made, who made the changes, and when they were made.

Let’s begin by updating the version. Click on the Control Flow tab and press the F4 key to open properties. Scroll to the Version Properties and change the VersionMinor from 0 to 1:

Also edit the Control Flow annotation that contains version information:

Update package description annotation. I like to include a Change Log summary inside each package:

In the Change Log, I include 

  • A version number in the format Major.Minor.Date.
  • The date of the change.
  • The name of the developer who made the change.
  • A brief description of the change.

Now I’m ready to actually make changes! To facilitate package execution on your machine, I want to edit the name and target SQL Server instance of the OLEDB Connection Manager. Double-click the connection manager to edit it and change the Server name property to “(local)”.

Click the OK button to close the connection manager editor. For consistency, right-click the connection manager and click Rename. Change the name to “(local).SSISRunTimeMetrics”:

There. Housekeeping accomplished!

A test run verifies I haven’t broken anything:

Unit test successful – ready to continue developing.

Errors Aside

Our current solution works well as long as our SSIS package succeeds. And this is the way all good engineering is done; Use Case 1 should always be “Everything Succeeds”. But what happens if the SSIS package fails? Shouldn’t we record the failure and as much metadata about the failure as possible? I’m glad you asked: Yes, we should.

In SSIS, errors “bubble”. The concept of bubbling is familiar to ASP developers – it simply means events follow a prescribed path through the scope hierarchy of the application. Ok, you can scratch the word “simply” out of the preceeding sentence.

Let’s use the script task to generate an error. Right the Test Script Task and click Edit to open the editor. Click the Script page and the Design Script button to open the Visual Studio for Applications (VSA) script editor. I use Script Tasks for exception testing. The following code  – replacing the “Dts.TaskResult = Dts.Results.Success” line – will prompt you to have the task succeed or fail:

Dim iResponse As Integer
iResponse = MsgBox(“Have task complete with Success?”, MsgBoxStyle.YesNo)
If iResponse = vbYes Then
  Dts.TaskResult = Dts.Results.Success
Else
  Dts.TaskResult = Dts.Results.Failure
End If

The additional code causes a second messagebox:

 

If you click Yes in response, the task will succeed. Clicking No will cause the task to fail:

SSIS has some great logging and error-handling functionality built in. The first place to look when an error occurs is the Progress / Execution Results tab in the SSIS Designer GUI. The tab is named Progress while the package is executing and becomes Execution Results once execution completes. Click the Progress / Execution Results tab to view the error raised when you clicked the No button:

This isn’t a very descriptive error message. Some SSIS error messages aren’t very descriptive. Let’s add some error logging capabilities to our ETL Instrumentation application.

To Err is Human… 

There are lots of ways to develop solutions with SSIS. Some of them are best practices, more are simply matters of coding style. For instance, here you could build the database table to hold error information first; or you could build the error-handling functionality in the SSIS package first. This is a style decision. I choose to build the table first.

My initial errors table will be named RunTimeErrors and will contain the following fields:

  • RunTimeMetricsID – to map the error to the execution instance. I want to know the error occurred when I executed the package at a certain time.
  • ErrorDateTime – when the error actually occurred.
  • SourceName – the task that raised the error.
  • ErrorNumber – the error number.
  • ErrorDescription – the error description.

I generate the table with the following T-SQL script:

use SSISRunTimeMetrics
go

— vars…
declare @sql varchar(255)

— create ssis schema…
if not exists(select name
              from sys.schemas
              where name = ‘ssis’)
begin
 set @sql = ‘Create Schema ssis’
 exec(@sql)
end

— create RunTimeErrors table…
if not exists(select s.name + ‘.’ + t.name
              from sys.tables t
              inner join sys.schemas s on s.schema_id = t.schema_id
              where t.name = ‘RunTimeErrors’
               and s.name = ‘ssis’)
 Create Table ssis.RunTimeErrors
 (RunTimeMetricsId int
 ,ErrorDateTime datetime null
 ,SourceName varchar(255) null
 ,ErrorNumber int null
 ,ErrorDescription varchar(2500) null)

Note this script is re-runnable. I encourage you to take the time to write re-runnable scripts when developing database solutions.

We could stop here and directly insert rows into our error table, but let’s complete the development process by creating a stored procedure to do this work for us. Execute the following code in SSMS:

use SSISRunTimeMetrics
go

if exists(select s.name + ‘.’ + p.name
          from sys.procedures p
          inner join sys.schemas s on s.schema_id = p.schema_id
          where p.name = ‘usp_RecordPackageError’
           and s.name = ‘ssis’)
 begin
  Drop Procedure ssis.usp_RecordPackageError
 end
go

Create Procedure ssis.usp_RecordPackageError
  @RunTimeMetricsID int
 ,@SourceName varchar(255) = null
 ,@ErrorNumber int = null
 ,@ErrorDescription varchar(2500) = null
As

 begin

  — set @Now…
  declare @Now datetime
  set @Now = GetDate()

  — insert the run time errors data…
  insert into ssis.RunTimeErrors
  (RunTimeMetricsID
  ,ErrorDateTime
  ,SourceName
  ,ErrorNumber
  ,ErrorDescription)
  values
  (@RunTimeMetricsID
  ,@Now
  ,@SourceName
  ,@ErrorNumber
  ,@ErrorDescription)

 end

go

Again, re-runnable code. Now lets’ return to Visual Studio to populate our table with error information.

Click on the Test Script Task on the Control Flow. Then click the Event Handlers tab. Because you clicked the Test Script Task first, the Event Handler Executable defaults to the Test Script Task. The Event Handler defaults to the OnError event.

 

To create the OnError Event Handler, click the link labeled “Click here to create an ‘OnError’ event handler for executable ‘Test Script Task'”. Let’s do that now.

After you click the link you will note the canvas looks remarkably familiar: it’s nearly identical to the Control Flow.

Our goal is to load data into our RunTimeErrors table. Let’s use an Execute SQL Task to call the stored procedure. Drag an Execute SQL Task onto the OnError Event Handler canvas and double-click it to open the editor. Name the Execute SQL Task “Record Error”. Set the connection to “(local).SSISRunTimeMetrics”. Set the SQLStatement property to “exec ssis.usp_RecordPackageError ?,?,?,?”:

Click the Parameter Mapping page. Parameterized queries use question marks (?) as parameter placeholders. There are a few ways to access the parameters (a subject for another blog post). We will use ordinal position to access them. Click the Add button to add a new parameter. Set the Variable Name property to “User::iPackageLoadID” and the Parameter Name property to “0”. The default data type (“Long”) works for SSIS integer data type variables. This maps the iPackageLoadID variable to the first question mark in our SQLStatement property, which is the @RunTimeMetricsID parameter for the stroed procedure. Got it? Good.

Add three more parameters:

Variable Name: System::SourceName
Data Type: VarChar
Parameter Name: 1

Variable Name: System::ErrorCode
Data Type: Long
Parameter Name: 2

Variable Name: System::ErrorDescription
Data Type: VarChar
Parameter Name: 3

Click the OK button to close the editor and execute the package. 

Click the OK button on the first messagebox, then click the No button on the messagebox dialog to fail the Test Script Task. When the task fails, the OnError event fires and calls our stored procedure:

 

Let’s look at the table to see what was recorded. Return to SSMS and execute the following query:

select *
from ssis.RunTimeErrors
order by ErrorDateTime desc 

The results:

We now have basic error information from our failed task. This is great for our Test Script Task, but what about the other tasks on our SSIS package? Do we have to create OnError event handlers for them as well? That could be a lot of work.

Remember that bubbling thing we mentioned at the beginning of this post? We can employ the fact that errors bubble up through the object model. When an error occurs in the Test Script Task, an error bubbles up to the Sequence Container named “Step 1 – Do some stuff”. You can see it in this earlier screenshot:

To demonstrate, let’s add error handling to the Sequence Container. First, stop debugging the SSIS package if you have not already done so. Then right-click the “Record Error” Execute SQL Task and click Copy to copy it to the clipboard. Click the Executable dropdown and select the “Step 1 – Do some stuff” Sequence Container:

Select the OnError event handler and click the link to create the OnError event handler. Then paste the “Record Error” Execute SQL Task onto the new OnError event handler canvas. Re-execute the package observing the new OnError event handler. Click the No button to cause the Test Script Task to fail.

Note the new OnError event handler fires, in addition to the Test Script Task OnError event handler:

Re-execute the query in SSMS to verify:

Note the subsequent execution (RunTimeMetricsID 10, in my case) records the error twice. Hmmm… this is not the desired behavior. I only want the error to show up once per instance. However, also note the information recorded is nearly identical – only the date is different (and then only by 183 milliseconds). So the 20:17:58.803 occurrence of the error is from the Test Script Task and the 20:17:58.990 occurrence is the same error bubbling into the “Step 1 – Do some stuff” Sequence Container’s context and firing an identicaly-configured OnError event handler.

Again, this is not the desired behavior but it gives me an idea: Since I know errors bubble up the object model, all errors should eventually make it to the Package level. If I delete my existing OnError event handlers and create a single OnError event handler at the package level, I should then be able to record any and all errors – no matter where they occur in the SSIS package.

Let’s do that. First, copy the “Record Error” Execute SQL Task to the clipboard. Click the Delete button to delete the Sequence Container’s OnError event handler, then navigate to the Test Script Task’s OnError event handler and click the Delete button there. Click the Yes button on the confirmation dialog each time:

Navigate to the SSISRunTimeMetrics (package) event handlers:

Create the event handler by clicking the link and paste the Execute SQL Task onto the canvas. Voila, you now have error handling for then entire package. Here we must also add package status logging. The quickest way to accomplish this is to return to the Control Flow and copy the “Log End of Package Execution” Execute SQL Task to the clipboard. Navigate to the package OnError event handler and paste the Execute SQL Task onto the canvas. Connect it to the “Record Error” script task as shown:

Double-click the task to open the editor. Rename the task “Log Failed Package Execution” and change the SQLStatement property to:

declare @Now datetime
set @Now = GetDate()
Exec ssis.usp_RecordPackageEnd ?,@Now,’Failed’

Click the OK button to close the editor.

Re-execute the package to test, answering No to the dialog and failing the Script Task. Re-execute the query in SSMS to verify a single instance of the error is recorded:

Note the SourceName column contains the name of the task where the error originates, even though the error has bubbled through two layers of object model. This also holds for the ErrorNumber and ErrorDescription fields.

Task Status, Anyone?

It’s helpful to be able to track performance at a granular level. When instrumenting an ETL process, I like to record additional (and different) metadata at the Task level.

Since containers contain tasks and groups can represent linked processes or related tasks, task metrics must include parent-task relationships. Execute the following script to build our first TaskMetrics table:

use SSISRunTimeMetrics
go

— vars…
declare @sql varchar(255)

— create ssis schema…
if not exists(select name
              from sys.schemas
              where name = ‘ssis’)
 begin
  set @sql = ‘Create Schema ssis’
  exec(@sql)
 end

— create RunTimeErrors table…
if not exists(select s.name + ‘.’ + t.name
              from sys.tables t
              inner join sys.schemas s on s.schema_id = t.schema_id
              where t.name = ‘TaskMetrics’
               and s.name = ‘ssis’)
 Create Table ssis.TaskMetrics
 (TaskMetricsID int identity(1,1)
 ,RunTimeMetricsId int not null
 ,ParentTaskMetricsID int null
 ,TaskStartDateTime datetime null
 ,TaskEndDateTime datetime null
 ,SourceName varchar(255) null
 ,TaskStatus varchar(25) null)

As before, let’s create stored procedures to manage inserting start-of-task information into the TaskMetrics table and updating end-of-task information:

use SSISRunTimeMetrics
go

if exists(select s.name + ‘.’ + p.name
          from sys.procedures p
          inner join sys.schemas s on s.schema_id = p.schema_id
          where p.name = ‘usp_RecordTaskMetrics’
           and s.name = ‘ssis’)
 begin
  Drop Procedure ssis.usp_RecordTaskMetrics
 end
go

Create Procedure ssis.usp_RecordTaskStart
  @RunTimeMetricsID int
 ,@ParentTaskMetricsID int = null
 ,@TaskStartDateTime datetime = null
 ,@SourceName varchar(255) = null
 ,@TaskStatus varchar(25) = null
As

begin

 — set @TaskStartDateTime default…
 declare @Now datetime
 set @Now = GetDate()
 if @TaskStartDateTime is null
  set @TaskStartDateTime = @Now

 — insert the run time errors data…
insert into ssis.TaskMetrics
 (RunTimeMetricsID
 ,ParentTaskMetricsID
 ,TaskStartDateTime
 ,SourceName
 ,TaskStatus)
output inserted.TaskMetricsID as ‘Inserted_Task_ID’
values
 (@RunTimeMetricsID
 ,@ParentTaskMetricsID
 ,@TaskStartDateTime
 ,@SourceName
 ,@TaskStatus)

end

go

Now for the end-of-task update:

use SSISRunTimeMetrics
go

if exists(select s.name + ‘.’ + p.name
          from sys.procedures p
          inner join sys.schemas s on s.schema_id = p.schema_id
          where p.name = ‘usp_RecordTaskEnd’
           and s.name = ‘ssis’)
 begin
  Drop Procedure ssis.usp_RecordTaskEnd
 end
go

Create Procedure ssis.usp_RecordTaskEnd
 @TaskMetricsID int
,@TaskEndDateTime datetime = null
,@TaskStatus varchar(25) = null
As

begin

— set @TaskEndDateTime default…
declare @Now datetime
set @Now = GetDate()

if (@TaskEndDateTime Is Null)
 set @TaskEndDateTime = @Now

— update the task metrics data…
update ssis.TaskMetrics
Set TaskEndDateTime = @TaskEndDateTime
   ,TaskStatus = @TaskStatus
where TaskMetricsid = @TaskMetricsid

end

go

Instrumenting the Task 

We will use the new procedures to instrument the task.

We need some variables for these parameters. At a bare minimum, we need a variable to hold the Task Metrics ID returned from the ssis.usp_RecordTaskStart stored procedure. Open variables and create a new package-scoped, Int32 type variable named iTaskLoadID:

 

Drag an Execute SQL Task into the Sequence Container an configure the General page as shown:

Set the SQLStatement property to the follow script:

declare @Now datetime
set @Now = GetDate()
exec ssis.usp_RecordTaskStart ?,NULL,@Now,’Do some stuff’,’Started’

Click the Parameter Mapping page and add a parameter. Set the Variable Name to User::iPackageLoadID and Parameter Name to 0 – accept the defaults for the remaining parameter properties. Click the Result Set page and add a new result. Set the ResultName to 0 and the Variable Name to User::iTaskLoadID. This will hold the TaskMetricsID value returned from the ssis.usp_RecordTaskStart stored procedure:

Click the OK button to close the editor. Next, let’s create a way to update TaskMetrics table after a successful execution.

Drag an Execute SQL Task into the “Step 1 – Do some stuff” Sequence Container and connect it as shown:

 

Double-click the Execute SQL Task to edit it. Name it “Log Successful End of Task”. Set the SQLStatement to:

declare @Now datetime
set @Now = GetDate()
exec ssis.usp_RecordTaskEnd ?,@Now,’Completed’

Add one parameter mapped to User::iTaskLoadID and set to Parameter Name 0. Test the new tasks by executing the SSIS package. When prompted, have the script task succeed. Execute the following query to observe the results:

select *
from ssis.TaskMetrics
order by TaskMetricsID desc

Note TaskMetricsID 2 completed while TaskMetricsID 1 started and then stopped.

We now need to add a way to track and record task error status. Add another Execute SQL Task to the Control Flow canvas and connect a Failed Precedence Constraint from the Sequence Container to the new Execute SQL Task as shown:

 

 Name the new Execute SQL Task “Log Failed End of Task”. Set the SQLStatement to:

declare @Now datetime
set @Now = GetDate()
exec ssis.usp_RecordTaskEnd ?,@Now,’Failed’

Add one parameter mapped to User::iTaskLoadID and set to Parameter Name 0. Test the new tasks by executing the SSIS package. When prompted, have the script task fail. Execute the query to observe the results:

You now have enough information to correlate the three tables. For example, we can view the results of the last execution with the following script:

select
  m.packageName
 ,m.packageStartDateTime
 ,DateDiff(ss, m.packageStartDateTime, m.packageEndDateTime) as ‘packageRunTime’
 ,m.packageStatus
 ,t.SourceName
 ,t.TaskStartDateTime
 ,DateDiff(ss, t.TaskStartDateTime, t.TaskEndDateTime) as ‘taskRunTime’
 ,t.TaskStatus
 ,e.ErrorNumber
 ,e.ErrorDescription
from ssis.RunTimeMetrics m
inner join ssis.RunTimeErrors e on e.RunTimeMetricsID = m.id
inner join ssis.TaskMetrics t on t.RunTimeMetricsID = m.id
where m.packageStartDateTime = (select Max(packageStartDateTime)
                                from ssis.RunTimeMetrics)

Those results appear as shown here:

This makes a fine source query for a SQL Server Reporting Services solution!

Conclusion

In this second part of ETL Instrumentation, we modified version metadata to indicate we have made changes; added package error logging and error status logging; and added task status logging.

Get the code!

:{> Andy

 PS – The PASS Summit 2015 Call for Speakers closes at 9:00 PM PDT Sunday, 15 Mar 2015 (04:00 GMT 16 Mar 2015). There’s still time, but hurry!

Learn more: 
Linchpin People Blog: SSIS 
Stairway to Biml 
Stairway to Integration Services 

SSIS2014DesignPatterns200