SSIS Design Pattern – ETL Instrumentation, Part 2

Introduction

This post is part of a series of posts on ETL Instrumentation.

In Part 1 we built a database to hold collected SSIS run time metrics and an SSIS package to deomnstrate how and why we would load metrics into the database.

In Part 2 we will expand on our database and the SSIS package to annotate version metadata, manage error metrics capture, and task status reporting.

A Brief History Of Our ETL Instrumentation Project

To review, our existing database is named SSISRunTimeMetrics. It contains a schema named ssis. In this schema are three objects:
 – a table named ssis.RunTimeMetrics.
 – a stored procedure named ssis.usp_RecordPackageStart.
 – a stored procedure named ssis.usp_RecordPackageEnd.

At the beginning of SSIS package execution, we call ssis.usp_RecordPackageStart from an Execute SQL Task. We pass the package start date and time, the package name, and the package version. We also pass in a status of “Started”. From this stored procedure we get the ID of the newly created row, which we then push into a Package Load ID variable (iPackageLoadID).

When the package completes execution, we call ssis.usp_RecordPackageEnd from an Execute SQL Task. We pass in the Package Load ID from the variable, the current date and time, and the status “Succeeded”.

First Things First

Let’s start by opening the SSIS package we built in Part 1. The first thing we want to change is the package metadata – the data in our SSIS package about our SSIS package. Why? We always want to know what changes were made, who made the changes, and when they were made.

Let’s begin by updating the version. Click on the Control Flow tab and press the F4 key to open properties. Scroll to the Version Properties and change the VersionMinor from 0 to 1:

Also edit the Control Flow annotation that contains version information:

Update package description annotation. I like to include a Change Log summary inside each package:

In the Change Log, I include 

  • A version number in the format Major.Minor.Date.
  • The date of the change.
  • The name of the developer who made the change.
  • A brief description of the change.

Now I’m ready to actually make changes! To facilitate package execution on your machine, I want to edit the name and target SQL Server instance of the OLEDB Connection Manager. Double-click the connection manager to edit it and change the Server name property to “(local)”.

Click the OK button to close the connection manager editor. For consistency, right-click the connection manager and click Rename. Change the name to “(local).SSISRunTimeMetrics”:

There. Housekeeping accomplished!

A test run verifies I haven’t broken anything:

Unit test successful – ready to continue developing.

Errors Aside

Our current solution works well as long as our SSIS package succeeds. And this is the way all good engineering is done; Use Case 1 should always be “Everything Succeeds”. But what happens if the SSIS package fails? Shouldn’t we record the failure and as much metadata about the failure as possible? I’m glad you asked: Yes, we should.

In SSIS, errors “bubble”. The concept of bubbling is familiar to ASP developers – it simply means events follow a prescribed path through the scope hierarchy of the application. Ok, you can scratch the word “simply” out of the preceeding sentence.

Let’s use the script task to generate an error. Right the Test Script Task and click Edit to open the editor. Click the Script page and the Design Script button to open the Visual Studio for Applications (VSA) script editor. I use Script Tasks for exception testing. The following code  – replacing the “Dts.TaskResult = Dts.Results.Success” line – will prompt you to have the task succeed or fail:

Dim iResponse As Integer
iResponse = MsgBox(“Have task complete with Success?”, MsgBoxStyle.YesNo)
If iResponse = vbYes Then
  Dts.TaskResult = Dts.Results.Success
Else
  Dts.TaskResult = Dts.Results.Failure
End If

The additional code causes a second messagebox:

 

If you click Yes in response, the task will succeed. Clicking No will cause the task to fail:

SSIS has some great logging and error-handling functionality built in. The first place to look when an error occurs is the Progress / Execution Results tab in the SSIS Designer GUI. The tab is named Progress while the package is executing and becomes Execution Results once execution completes. Click the Progress / Execution Results tab to view the error raised when you clicked the No button:

This isn’t a very descriptive error message. Some SSIS error messages aren’t very descriptive. Let’s add some error logging capabilities to our ETL Instrumentation application.

To Err is Human… 

There are lots of ways to develop solutions with SSIS. Some of them are best practices, more are simply matters of coding style. For instance, here you could build the database table to hold error information first; or you could build the error-handling functionality in the SSIS package first. This is a style decision. I choose to build the table first.

My initial errors table will be named RunTimeErrors and will contain the following fields:

  • RunTimeMetricsID – to map the error to the execution instance. I want to know the error occurred when I executed the package at a certain time.
  • ErrorDateTime – when the error actually occurred.
  • SourceName – the task that raised the error.
  • ErrorNumber – the error number.
  • ErrorDescription – the error description.

I generate the table with the following T-SQL script:

use SSISRunTimeMetrics
go

— vars…
declare @sql varchar(255)

— create ssis schema…
if not exists(select name
              from sys.schemas
              where name = ‘ssis’)
begin
 set @sql = ‘Create Schema ssis’
 exec(@sql)
end

— create RunTimeErrors table…
if not exists(select s.name + ‘.’ + t.name
              from sys.tables t
              inner join sys.schemas s on s.schema_id = t.schema_id
              where t.name = ‘RunTimeErrors’
               and s.name = ‘ssis’)
 Create Table ssis.RunTimeErrors
 (RunTimeMetricsId int
 ,ErrorDateTime datetime null
 ,SourceName varchar(255) null
 ,ErrorNumber int null
 ,ErrorDescription varchar(2500) null)

Note this script is re-runnable. I encourage you to take the time to write re-runnable scripts when developing database solutions.

We could stop here and directly insert rows into our error table, but let’s complete the development process by creating a stored procedure to do this work for us. Execute the following code in SSMS:

use SSISRunTimeMetrics
go

if exists(select s.name + ‘.’ + p.name
          from sys.procedures p
          inner join sys.schemas s on s.schema_id = p.schema_id
          where p.name = ‘usp_RecordPackageError’
           and s.name = ‘ssis’)
 begin
  Drop Procedure ssis.usp_RecordPackageError
 end
go

Create Procedure ssis.usp_RecordPackageError
  @RunTimeMetricsID int
 ,@SourceName varchar(255) = null
 ,@ErrorNumber int = null
 ,@ErrorDescription varchar(2500) = null
As

 begin

  — set @Now…
  declare @Now datetime
  set @Now = GetDate()

  — insert the run time errors data…
  insert into ssis.RunTimeErrors
  (RunTimeMetricsID
  ,ErrorDateTime
  ,SourceName
  ,ErrorNumber
  ,ErrorDescription)
  values
  (@RunTimeMetricsID
  ,@Now
  ,@SourceName
  ,@ErrorNumber
  ,@ErrorDescription)

 end

go

Again, re-runnable code. Now lets’ return to Visual Studio to populate our table with error information.

Click on the Test Script Task on the Control Flow. Then click the Event Handlers tab. Because you clicked the Test Script Task first, the Event Handler Executable defaults to the Test Script Task. The Event Handler defaults to the OnError event.

 

To create the OnError Event Handler, click the link labeled “Click here to create an ‘OnError’ event handler for executable ‘Test Script Task'”. Let’s do that now.

After you click the link you will note the canvas looks remarkably familiar: it’s nearly identical to the Control Flow.

Our goal is to load data into our RunTimeErrors table. Let’s use an Execute SQL Task to call the stored procedure. Drag an Execute SQL Task onto the OnError Event Handler canvas and double-click it to open the editor. Name the Execute SQL Task “Record Error”. Set the connection to “(local).SSISRunTimeMetrics”. Set the SQLStatement property to “exec ssis.usp_RecordPackageError ?,?,?,?”:

Click the Parameter Mapping page. Parameterized queries use question marks (?) as parameter placeholders. There are a few ways to access the parameters (a subject for another blog post). We will use ordinal position to access them. Click the Add button to add a new parameter. Set the Variable Name property to “User::iPackageLoadID” and the Parameter Name property to “0”. The default data type (“Long”) works for SSIS integer data type variables. This maps the iPackageLoadID variable to the first question mark in our SQLStatement property, which is the @RunTimeMetricsID parameter for the stroed procedure. Got it? Good.

Add three more parameters:

Variable Name: System::SourceName
Data Type: VarChar
Parameter Name: 1

Variable Name: System::ErrorCode
Data Type: Long
Parameter Name: 2

Variable Name: System::ErrorDescription
Data Type: VarChar
Parameter Name: 3

Click the OK button to close the editor and execute the package. 

Click the OK button on the first messagebox, then click the No button on the messagebox dialog to fail the Test Script Task. When the task fails, the OnError event fires and calls our stored procedure:

 

Let’s look at the table to see what was recorded. Return to SSMS and execute the following query:

select *
from ssis.RunTimeErrors
order by ErrorDateTime desc 

The results:

We now have basic error information from our failed task. This is great for our Test Script Task, but what about the other tasks on our SSIS package? Do we have to create OnError event handlers for them as well? That could be a lot of work.

Remember that bubbling thing we mentioned at the beginning of this post? We can employ the fact that errors bubble up through the object model. When an error occurs in the Test Script Task, an error bubbles up to the Sequence Container named “Step 1 – Do some stuff”. You can see it in this earlier screenshot:

To demonstrate, let’s add error handling to the Sequence Container. First, stop debugging the SSIS package if you have not already done so. Then right-click the “Record Error” Execute SQL Task and click Copy to copy it to the clipboard. Click the Executable dropdown and select the “Step 1 – Do some stuff” Sequence Container:

Select the OnError event handler and click the link to create the OnError event handler. Then paste the “Record Error” Execute SQL Task onto the new OnError event handler canvas. Re-execute the package observing the new OnError event handler. Click the No button to cause the Test Script Task to fail.

Note the new OnError event handler fires, in addition to the Test Script Task OnError event handler:

Re-execute the query in SSMS to verify:

Note the subsequent execution (RunTimeMetricsID 10, in my case) records the error twice. Hmmm… this is not the desired behavior. I only want the error to show up once per instance. However, also note the information recorded is nearly identical – only the date is different (and then only by 183 milliseconds). So the 20:17:58.803 occurrence of the error is from the Test Script Task and the 20:17:58.990 occurrence is the same error bubbling into the “Step 1 – Do some stuff” Sequence Container’s context and firing an identicaly-configured OnError event handler.

Again, this is not the desired behavior but it gives me an idea: Since I know errors bubble up the object model, all errors should eventually make it to the Package level. If I delete my existing OnError event handlers and create a single OnError event handler at the package level, I should then be able to record any and all errors – no matter where they occur in the SSIS package.

Let’s do that. First, copy the “Record Error” Execute SQL Task to the clipboard. Click the Delete button to delete the Sequence Container’s OnError event handler, then navigate to the Test Script Task’s OnError event handler and click the Delete button there. Click the Yes button on the confirmation dialog each time:

Navigate to the SSISRunTimeMetrics (package) event handlers:

Create the event handler by clicking the link and paste the Execute SQL Task onto the canvas. Voila, you now have error handling for then entire package. Here we must also add package status logging. The quickest way to accomplish this is to return to the Control Flow and copy the “Log End of Package Execution” Execute SQL Task to the clipboard. Navigate to the package OnError event handler and paste the Execute SQL Task onto the canvas. Connect it to the “Record Error” script task as shown:

Double-click the task to open the editor. Rename the task “Log Failed Package Execution” and change the SQLStatement property to:

declare @Now datetime
set @Now = GetDate()
Exec ssis.usp_RecordPackageEnd ?,@Now,’Failed’

Click the OK button to close the editor.

Re-execute the package to test, answering No to the dialog and failing the Script Task. Re-execute the query in SSMS to verify a single instance of the error is recorded:

Note the SourceName column contains the name of the task where the error originates, even though the error has bubbled through two layers of object model. This also holds for the ErrorNumber and ErrorDescription fields.

Task Status, Anyone?

It’s helpful to be able to track performance at a granular level. When instrumenting an ETL process, I like to record additional (and different) metadata at the Task level.

Since containers contain tasks and groups can represent linked processes or related tasks, task metrics must include parent-task relationships. Execute the following script to build our first TaskMetrics table:

use SSISRunTimeMetrics
go

— vars…
declare @sql varchar(255)

— create ssis schema…
if not exists(select name
              from sys.schemas
              where name = ‘ssis’)
 begin
  set @sql = ‘Create Schema ssis’
  exec(@sql)
 end

— create RunTimeErrors table…
if not exists(select s.name + ‘.’ + t.name
              from sys.tables t
              inner join sys.schemas s on s.schema_id = t.schema_id
              where t.name = ‘TaskMetrics’
               and s.name = ‘ssis’)
 Create Table ssis.TaskMetrics
 (TaskMetricsID int identity(1,1)
 ,RunTimeMetricsId int not null
 ,ParentTaskMetricsID int null
 ,TaskStartDateTime datetime null
 ,TaskEndDateTime datetime null
 ,SourceName varchar(255) null
 ,TaskStatus varchar(25) null)

As before, let’s create stored procedures to manage inserting start-of-task information into the TaskMetrics table and updating end-of-task information:

use SSISRunTimeMetrics
go

if exists(select s.name + ‘.’ + p.name
          from sys.procedures p
          inner join sys.schemas s on s.schema_id = p.schema_id
          where p.name = ‘usp_RecordTaskMetrics’
           and s.name = ‘ssis’)
 begin
  Drop Procedure ssis.usp_RecordTaskMetrics
 end
go

Create Procedure ssis.usp_RecordTaskStart
  @RunTimeMetricsID int
 ,@ParentTaskMetricsID int = null
 ,@TaskStartDateTime datetime = null
 ,@SourceName varchar(255) = null
 ,@TaskStatus varchar(25) = null
As

begin

 — set @TaskStartDateTime default…
 declare @Now datetime
 set @Now = GetDate()
 if @TaskStartDateTime is null
  set @TaskStartDateTime = @Now

 — insert the run time errors data…
insert into ssis.TaskMetrics
 (RunTimeMetricsID
 ,ParentTaskMetricsID
 ,TaskStartDateTime
 ,SourceName
 ,TaskStatus)
output inserted.TaskMetricsID as ‘Inserted_Task_ID’
values
 (@RunTimeMetricsID
 ,@ParentTaskMetricsID
 ,@TaskStartDateTime
 ,@SourceName
 ,@TaskStatus)

end

go

Now for the end-of-task update:

use SSISRunTimeMetrics
go

if exists(select s.name + ‘.’ + p.name
          from sys.procedures p
          inner join sys.schemas s on s.schema_id = p.schema_id
          where p.name = ‘usp_RecordTaskEnd’
           and s.name = ‘ssis’)
 begin
  Drop Procedure ssis.usp_RecordTaskEnd
 end
go

Create Procedure ssis.usp_RecordTaskEnd
 @TaskMetricsID int
,@TaskEndDateTime datetime = null
,@TaskStatus varchar(25) = null
As

begin

— set @TaskEndDateTime default…
declare @Now datetime
set @Now = GetDate()

if (@TaskEndDateTime Is Null)
 set @TaskEndDateTime = @Now

— update the task metrics data…
update ssis.TaskMetrics
Set TaskEndDateTime = @TaskEndDateTime
   ,TaskStatus = @TaskStatus
where TaskMetricsid = @TaskMetricsid

end

go

Instrumenting the Task 

We will use the new procedures to instrument the task.

We need some variables for these parameters. At a bare minimum, we need a variable to hold the Task Metrics ID returned from the ssis.usp_RecordTaskStart stored procedure. Open variables and create a new package-scoped, Int32 type variable named iTaskLoadID:

 

Drag an Execute SQL Task into the Sequence Container an configure the General page as shown:

Set the SQLStatement property to the follow script:

declare @Now datetime
set @Now = GetDate()
exec ssis.usp_RecordTaskStart ?,NULL,@Now,’Do some stuff’,’Started’

Click the Parameter Mapping page and add a parameter. Set the Variable Name to User::iPackageLoadID and Parameter Name to 0 – accept the defaults for the remaining parameter properties. Click the Result Set page and add a new result. Set the ResultName to 0 and the Variable Name to User::iTaskLoadID. This will hold the TaskMetricsID value returned from the ssis.usp_RecordTaskStart stored procedure:

Click the OK button to close the editor. Next, let’s create a way to update TaskMetrics table after a successful execution.

Drag an Execute SQL Task into the “Step 1 – Do some stuff” Sequence Container and connect it as shown:

 

Double-click the Execute SQL Task to edit it. Name it “Log Successful End of Task”. Set the SQLStatement to:

declare @Now datetime
set @Now = GetDate()
exec ssis.usp_RecordTaskEnd ?,@Now,’Completed’

Add one parameter mapped to User::iTaskLoadID and set to Parameter Name 0. Test the new tasks by executing the SSIS package. When prompted, have the script task succeed. Execute the following query to observe the results:

select *
from ssis.TaskMetrics
order by TaskMetricsID desc

Note TaskMetricsID 2 completed while TaskMetricsID 1 started and then stopped.

We now need to add a way to track and record task error status. Add another Execute SQL Task to the Control Flow canvas and connect a Failed Precedence Constraint from the Sequence Container to the new Execute SQL Task as shown:

 

 Name the new Execute SQL Task “Log Failed End of Task”. Set the SQLStatement to:

declare @Now datetime
set @Now = GetDate()
exec ssis.usp_RecordTaskEnd ?,@Now,’Failed’

Add one parameter mapped to User::iTaskLoadID and set to Parameter Name 0. Test the new tasks by executing the SSIS package. When prompted, have the script task fail. Execute the query to observe the results:

You now have enough information to correlate the three tables. For example, we can view the results of the last execution with the following script:

select
  m.packageName
 ,m.packageStartDateTime
 ,DateDiff(ss, m.packageStartDateTime, m.packageEndDateTime) as ‘packageRunTime’
 ,m.packageStatus
 ,t.SourceName
 ,t.TaskStartDateTime
 ,DateDiff(ss, t.TaskStartDateTime, t.TaskEndDateTime) as ‘taskRunTime’
 ,t.TaskStatus
 ,e.ErrorNumber
 ,e.ErrorDescription
from ssis.RunTimeMetrics m
inner join ssis.RunTimeErrors e on e.RunTimeMetricsID = m.id
inner join ssis.TaskMetrics t on t.RunTimeMetricsID = m.id
where m.packageStartDateTime = (select Max(packageStartDateTime)
                                from ssis.RunTimeMetrics)

Those results appear as shown here:

This makes a fine source query for a SQL Server Reporting Services solution!

Conclusion

In this second part of ETL Instrumentation, we modified version metadata to indicate we have made changes; added package error logging and error status logging; and added task status logging.

Get the code!

:{> Andy

 PS – The PASS Summit 2015 Call for Speakers closes at 9:00 PM PDT Sunday, 15 Mar 2015 (04:00 GMT 16 Mar 2015). There’s still time, but hurry!

Learn more: 
Linchpin People Blog: SSIS 
Stairway to Biml 
Stairway to Integration Services 

SSIS2014DesignPatterns200

SSIS Design Pattern – ETL Instrumentation, Part 2

Introduction

This post is part of a series of posts on ETL Instrumentation.

In Part 1 we built a database to hold collected SSIS run time metrics and an SSIS package to deomnstrate how and why we would load metrics into the database.

In Part 2 we will expand on our database and the SSIS package to annotate version metadata, manage error metrics capture, and task status reporting.

A Brief History Of Our ETL Instrumentation Project

To review, our existing database is named SSISRunTimeMetrics. It contains a schema named ssis. In this schema are three objects:
 – a table named ssis.RunTimeMetrics.
 – a stored procedure named ssis.usp_RecordPackageStart.
 – a stored procedure named ssis.usp_RecordPackageEnd.

At the beginning of SSIS package execution, we call ssis.usp_RecordPackageStart from an Execute SQL Task. We pass the package start date and time, the package name, and the package version. We also pass in a status of “Started”. From this stored procedure we get the ID of the newly created row, which we then push into a Package Load ID variable (iPackageLoadID).

When the package completes execution, we call ssis.usp_RecordPackageEnd from an Execute SQL Task. We pass in the Package Load ID from the variable, the current date and time, and the status “Succeeded”.

First Things First

Let’s start by opening the SSIS package we built in Part 1. The first thing we want to change is the package metadata – the data in our SSIS package about our SSIS package. Why? We always want to know what changes were made, who made the changes, and when they were made.

Let’s begin by updating the version. Click on the Control Flow tab and press the F4 key to open properties. Scroll to the Version Properties and change the VersionMinor from 0 to 1:

Also edit the Control Flow annotation that contains version information:

Update package description annotation. I like to include a Change Log summary inside each package:

In the Change Log, I include 

  • A version number in the format Major.Minor.Date.
  • The date of the change.
  • The name of the developer who made the change.
  • A brief description of the change.

Now I’m ready to actually make changes! To facilitate package execution on your machine, I want to edit the name and target SQL Server instance of the OLEDB Connection Manager. Double-click the connection manager to edit it and change the Server name property to “(local)”.

Click the OK button to close the connection manager editor. For consistency, right-click the connection manager and click Rename. Change the name to “(local).SSISRunTimeMetrics”:

There. Housekeeping accomplished!

A test run verifies I haven’t broken anything:

Unit test successful – ready to continue developing.

Errors Aside

Our current solution works well as long as our SSIS package succeeds. And this is the way all good engineering is done; Use Case 1 should always be “Everything Succeeds”. But what happens if the SSIS package fails? Shouldn’t we record the failure and as much metadata about the failure as possible? I’m glad you asked: Yes, we should.

In SSIS, errors “bubble”. The concept of bubbling is familiar to ASP developers – it simply means events follow a prescribed path through the scope hierarchy of the application. Ok, you can scratch the word “simply” out of the preceeding sentence.

Let’s use the script task to generate an error. Right the Test Script Task and click Edit to open the editor. Click the Script page and the Design Script button to open the Visual Studio for Applications (VSA) script editor. I use Script Tasks for exception testing. The following code  – replacing the “Dts.TaskResult = Dts.Results.Success” line – will prompt you to have the task succeed or fail:

Dim iResponse As Integer
iResponse = MsgBox(“Have task complete with Success?”, MsgBoxStyle.YesNo)
If iResponse = vbYes Then
  Dts.TaskResult = Dts.Results.Success
Else
  Dts.TaskResult = Dts.Results.Failure
End If

The additional code causes a second messagebox:

 

If you click Yes in response, the task will succeed. Clicking No will cause the task to fail:

SSIS has some great logging and error-handling functionality built in. The first place to look when an error occurs is the Progress / Execution Results tab in the SSIS Designer GUI. The tab is named Progress while the package is executing and becomes Execution Results once execution completes. Click the Progress / Execution Results tab to view the error raised when you clicked the No button:

This isn’t a very descriptive error message. Some SSIS error messages aren’t very descriptive. Let’s add some error logging capabilities to our ETL Instrumentation application.

To Err is Human… 

There are lots of ways to develop solutions with SSIS. Some of them are best practices, more are simply matters of coding style. For instance, here you could build the database table to hold error information first; or you could build the error-handling functionality in the SSIS package first. This is a style decision. I choose to build the table first.

My initial errors table will be named RunTimeErrors and will contain the following fields:

  • RunTimeMetricsID – to map the error to the execution instance. I want to know the error occurred when I executed the package at a certain time.
  • ErrorDateTime – when the error actually occurred.
  • SourceName – the task that raised the error.
  • ErrorNumber – the error number.
  • ErrorDescription – the error description.

I generate the table with the following T-SQL script:

use SSISRunTimeMetrics
go

— vars…
declare @sql varchar(255)

— create ssis schema…
if not exists(select name
              from sys.schemas
              where name = ‘ssis’)
begin
 set @sql = ‘Create Schema ssis’
 exec(@sql)
end

— create RunTimeErrors table…
if not exists(select s.name + ‘.’ + t.name
              from sys.tables t
              inner join sys.schemas s on s.schema_id = t.schema_id
              where t.name = ‘RunTimeErrors’
               and s.name = ‘ssis’)
 Create Table ssis.RunTimeErrors
 (RunTimeMetricsId int
 ,ErrorDateTime datetime null
 ,SourceName varchar(255) null
 ,ErrorNumber int null
 ,ErrorDescription varchar(2500) null)

Note this script is re-runnable. I encourage you to take the time to write re-runnable scripts when developing database solutions.

We could stop here and directly insert rows into our error table, but let’s complete the development process by creating a stored procedure to do this work for us. Execute the following code in SSMS:

use SSISRunTimeMetrics
go

if exists(select s.name + ‘.’ + p.name
          from sys.procedures p
          inner join sys.schemas s on s.schema_id = p.schema_id
          where p.name = ‘usp_RecordPackageError’
           and s.name = ‘ssis’)
 begin
  Drop Procedure ssis.usp_RecordPackageError
 end
go

Create Procedure ssis.usp_RecordPackageError
  @RunTimeMetricsID int
 ,@SourceName varchar(255) = null
 ,@ErrorNumber int = null
 ,@ErrorDescription varchar(2500) = null
As

 begin

  — set @Now…
  declare @Now datetime
  set @Now = GetDate()

  — insert the run time errors data…
  insert into ssis.RunTimeErrors
  (RunTimeMetricsID
  ,ErrorDateTime
  ,SourceName
  ,ErrorNumber
  ,ErrorDescription)
  values
  (@RunTimeMetricsID
  ,@Now
  ,@SourceName
  ,@ErrorNumber
  ,@ErrorDescription)

 end

go

Again, re-runnable code. Now lets’ return to Visual Studio to populate our table with error information.

Click on the Test Script Task on the Control Flow. Then click the Event Handlers tab. Because you clicked the Test Script Task first, the Event Handler Executable defaults to the Test Script Task. The Event Handler defaults to the OnError event.

 

To create the OnError Event Handler, click the link labeled “Click here to create an ‘OnError’ event handler for executable ‘Test Script Task'”. Let’s do that now.

After you click the link you will note the canvas looks remarkably familiar: it’s nearly identical to the Control Flow.

Our goal is to load data into our RunTimeErrors table. Let’s use an Execute SQL Task to call the stored procedure. Drag an Execute SQL Task onto the OnError Event Handler canvas and double-click it to open the editor. Name the Execute SQL Task “Record Error”. Set the connection to “(local).SSISRunTimeMetrics”. Set the SQLStatement property to “exec ssis.usp_RecordPackageError ?,?,?,?”:

Click the Parameter Mapping page. Parameterized queries use question marks (?) as parameter placeholders. There are a few ways to access the parameters (a subject for another blog post). We will use ordinal position to access them. Click the Add button to add a new parameter. Set the Variable Name property to “User::iPackageLoadID” and the Parameter Name property to “0”. The default data type (“Long”) works for SSIS integer data type variables. This maps the iPackageLoadID variable to the first question mark in our SQLStatement property, which is the @RunTimeMetricsID parameter for the stroed procedure. Got it? Good.

Add three more parameters:

Variable Name: System::SourceName
Data Type: VarChar
Parameter Name: 1

Variable Name: System::ErrorCode
Data Type: Long
Parameter Name: 2

Variable Name: System::ErrorDescription
Data Type: VarChar
Parameter Name: 3

Click the OK button to close the editor and execute the package. 

Click the OK button on the first messagebox, then click the No button on the messagebox dialog to fail the Test Script Task. When the task fails, the OnError event fires and calls our stored procedure:

 

Let’s look at the table to see what was recorded. Return to SSMS and execute the following query:

select *
from ssis.RunTimeErrors
order by ErrorDateTime desc 

The results:

We now have basic error information from our failed task. This is great for our Test Script Task, but what about the other tasks on our SSIS package? Do we have to create OnError event handlers for them as well? That could be a lot of work.

Remember that bubbling thing we mentioned at the beginning of this post? We can employ the fact that errors bubble up through the object model. When an error occurs in the Test Script Task, an error bubbles up to the Sequence Container named “Step 1 – Do some stuff”. You can see it in this earlier screenshot:

To demonstrate, let’s add error handling to the Sequence Container. First, stop debugging the SSIS package if you have not already done so. Then right-click the “Record Error” Execute SQL Task and click Copy to copy it to the clipboard. Click the Executable dropdown and select the “Step 1 – Do some stuff” Sequence Container:

Select the OnError event handler and click the link to create the OnError event handler. Then paste the “Record Error” Execute SQL Task onto the new OnError event handler canvas. Re-execute the package observing the new OnError event handler. Click the No button to cause the Test Script Task to fail.

Note the new OnError event handler fires, in addition to the Test Script Task OnError event handler:

Re-execute the query in SSMS to verify:

Note the subsequent execution (RunTimeMetricsID 10, in my case) records the error twice. Hmmm… this is not the desired behavior. I only want the error to show up once per instance. However, also note the information recorded is nearly identical – only the date is different (and then only by 183 milliseconds). So the 20:17:58.803 occurrence of the error is from the Test Script Task and the 20:17:58.990 occurrence is the same error bubbling into the “Step 1 – Do some stuff” Sequence Container’s context and firing an identicaly-configured OnError event handler.

Again, this is not the desired behavior but it gives me an idea: Since I know errors bubble up the object model, all errors should eventually make it to the Package level. If I delete my existing OnError event handlers and create a single OnError event handler at the package level, I should then be able to record any and all errors – no matter where they occur in the SSIS package.

Let’s do that. First, copy the “Record Error” Execute SQL Task to the clipboard. Click the Delete button to delete the Sequence Container’s OnError event handler, then navigate to the Test Script Task’s OnError event handler and click the Delete button there. Click the Yes button on the confirmation dialog each time:

Navigate to the SSISRunTimeMetrics (package) event handlers:

Create the event handler by clicking the link and paste the Execute SQL Task onto the canvas. Voila, you now have error handling for then entire package. Here we must also add package status logging. The quickest way to accomplish this is to return to the Control Flow and copy the “Log End of Package Execution” Execute SQL Task to the clipboard. Navigate to the package OnError event handler and paste the Execute SQL Task onto the canvas. Connect it to the “Record Error” script task as shown:

Double-click the task to open the editor. Rename the task “Log Failed Package Execution” and change the SQLStatement property to:

declare @Now datetime
set @Now = GetDate()
Exec ssis.usp_RecordPackageEnd ?,@Now,’Failed’

Click the OK button to close the editor.

Re-execute the package to test, answering No to the dialog and failing the Script Task. Re-execute the query in SSMS to verify a single instance of the error is recorded:

Note the SourceName column contains the name of the task where the error originates, even though the error has bubbled through two layers of object model. This also holds for the ErrorNumber and ErrorDescription fields.

Task Status, Anyone?

It’s helpful to be able to track performance at a granular level. When instrumenting an ETL process, I like to record additional (and different) metadata at the Task level.

Since containers contain tasks and groups can represent linked processes or related tasks, task metrics must include parent-task relationships. Execute the following script to build our first TaskMetrics table:

use SSISRunTimeMetrics
go

— vars…
declare @sql varchar(255)

— create ssis schema…
if not exists(select name
              from sys.schemas
              where name = ‘ssis’)
 begin
  set @sql = ‘Create Schema ssis’
  exec(@sql)
 end

— create RunTimeErrors table…
if not exists(select s.name + ‘.’ + t.name
              from sys.tables t
              inner join sys.schemas s on s.schema_id = t.schema_id
              where t.name = ‘TaskMetrics’
               and s.name = ‘ssis’)
 Create Table ssis.TaskMetrics
 (TaskMetricsID int identity(1,1)
 ,RunTimeMetricsId int not null
 ,ParentTaskMetricsID int null
 ,TaskStartDateTime datetime null
 ,TaskEndDateTime datetime null
 ,SourceName varchar(255) null
 ,TaskStatus varchar(25) null)

As before, let’s create stored procedures to manage inserting start-of-task information into the TaskMetrics table and updating end-of-task information:

use SSISRunTimeMetrics
go

if exists(select s.name + ‘.’ + p.name
          from sys.procedures p
          inner join sys.schemas s on s.schema_id = p.schema_id
          where p.name = ‘usp_RecordTaskMetrics’
           and s.name = ‘ssis’)
 begin
  Drop Procedure ssis.usp_RecordTaskMetrics
 end
go

Create Procedure ssis.usp_RecordTaskStart
  @RunTimeMetricsID int
 ,@ParentTaskMetricsID int = null
 ,@TaskStartDateTime datetime = null
 ,@SourceName varchar(255) = null
 ,@TaskStatus varchar(25) = null
As

begin

 — set @TaskStartDateTime default…
 declare @Now datetime
 set @Now = GetDate()
 if @TaskStartDateTime is null
  set @TaskStartDateTime = @Now

 — insert the run time errors data…
insert into ssis.TaskMetrics
 (RunTimeMetricsID
 ,ParentTaskMetricsID
 ,TaskStartDateTime
 ,SourceName
 ,TaskStatus)
output inserted.TaskMetricsID as ‘Inserted_Task_ID’
values
 (@RunTimeMetricsID
 ,@ParentTaskMetricsID
 ,@TaskStartDateTime
 ,@SourceName
 ,@TaskStatus)

end

go

Now for the end-of-task update:

use SSISRunTimeMetrics
go

if exists(select s.name + ‘.’ + p.name
          from sys.procedures p
          inner join sys.schemas s on s.schema_id = p.schema_id
          where p.name = ‘usp_RecordTaskEnd’
           and s.name = ‘ssis’)
 begin
  Drop Procedure ssis.usp_RecordTaskEnd
 end
go

Create Procedure ssis.usp_RecordTaskEnd
 @TaskMetricsID int
,@TaskEndDateTime datetime = null
,@TaskStatus varchar(25) = null
As

begin

— set @TaskEndDateTime default…
declare @Now datetime
set @Now = GetDate()

if (@TaskEndDateTime Is Null)
 set @TaskEndDateTime = @Now

— update the task metrics data…
update ssis.TaskMetrics
Set TaskEndDateTime = @TaskEndDateTime
   ,TaskStatus = @TaskStatus
where TaskMetricsid = @TaskMetricsid

end

go

Instrumenting the Task 

We will use the new procedures to instrument the task.

We need some variables for these parameters. At a bare minimum, we need a variable to hold the Task Metrics ID returned from the ssis.usp_RecordTaskStart stored procedure. Open variables and create a new package-scoped, Int32 type variable named iTaskLoadID:

 

Drag an Execute SQL Task into the Sequence Container an configure the General page as shown:

Set the SQLStatement property to the follow script:

declare @Now datetime
set @Now = GetDate()
exec ssis.usp_RecordTaskStart ?,NULL,@Now,’Do some stuff’,’Started’

Click the Parameter Mapping page and add a parameter. Set the Variable Name to User::iPackageLoadID and Parameter Name to 0 – accept the defaults for the remaining parameter properties. Click the Result Set page and add a new result. Set the ResultName to 0 and the Variable Name to User::iTaskLoadID. This will hold the TaskMetricsID value returned from the ssis.usp_RecordTaskStart stored procedure:

Click the OK button to close the editor. Next, let’s create a way to update TaskMetrics table after a successful execution.

Drag an Execute SQL Task into the “Step 1 – Do some stuff” Sequence Container and connect it as shown:

 

Double-click the Execute SQL Task to edit it. Name it “Log Successful End of Task”. Set the SQLStatement to:

declare @Now datetime
set @Now = GetDate()
exec ssis.usp_RecordTaskEnd ?,@Now,’Completed’

Add one parameter mapped to User::iTaskLoadID and set to Parameter Name 0. Test the new tasks by executing the SSIS package. When prompted, have the script task succeed. Execute the following query to observe the results:

select *
from ssis.TaskMetrics
order by TaskMetricsID desc

Note TaskMetricsID 2 completed while TaskMetricsID 1 started and then stopped.

We now need to add a way to track and record task error status. Add another Execute SQL Task to the Control Flow canvas and connect a Failed Precedence Constraint from the Sequence Container to the new Execute SQL Task as shown:

 

 Name the new Execute SQL Task “Log Failed End of Task”. Set the SQLStatement to:

declare @Now datetime
set @Now = GetDate()
exec ssis.usp_RecordTaskEnd ?,@Now,’Failed’

Add one parameter mapped to User::iTaskLoadID and set to Parameter Name 0. Test the new tasks by executing the SSIS package. When prompted, have the script task fail. Execute the query to observe the results:

You now have enough information to correlate the three tables. For example, we can view the results of the last execution with the following script:

select
  m.packageName
 ,m.packageStartDateTime
 ,DateDiff(ss, m.packageStartDateTime, m.packageEndDateTime) as ‘packageRunTime’
 ,m.packageStatus
 ,t.SourceName
 ,t.TaskStartDateTime
 ,DateDiff(ss, t.TaskStartDateTime, t.TaskEndDateTime) as ‘taskRunTime’
 ,t.TaskStatus
 ,e.ErrorNumber
 ,e.ErrorDescription
from ssis.RunTimeMetrics m
inner join ssis.RunTimeErrors e on e.RunTimeMetricsID = m.id
inner join ssis.TaskMetrics t on t.RunTimeMetricsID = m.id
where m.packageStartDateTime = (select Max(packageStartDateTime)
                                from ssis.RunTimeMetrics)

Those results appear as shown here:

This makes a fine source query for a SQL Server Reporting Services solution!

Conclusion

In this second part of ETL Instrumentation, we modified version metadata to indicate we have made changes; added package error logging and error status logging; and added task status logging.

Get the code!

:{> Andy

 PS – The PASS Summit 2015 Call for Speakers closes at 9:00 PM PDT Sunday, 15 Mar 2015 (04:00 GMT 16 Mar 2015). There’s still time, but hurry!

Learn more: 
Linchpin People Blog: SSIS 
Stairway to Biml 
Stairway to Integration Services 

SSIS2014DesignPatterns200

SSIS Design Pattern – ETL Instrumentation, Part 2

Introduction

This post is part of a series of posts on ETL Instrumentation.

In Part 1 we built a database to hold collected SSIS run time metrics and an SSIS package to deomnstrate how and why we would load metrics into the database.

In Part 2 we will expand on our database and the SSIS package to annotate version metadata, manage error metrics capture, and task status reporting.

A Brief History Of Our ETL Instrumentation Project

To review, our existing database is named SSISRunTimeMetrics. It contains a schema named ssis. In this schema are three objects:
 – a table named ssis.RunTimeMetrics.
 – a stored procedure named ssis.usp_RecordPackageStart.
 – a stored procedure named ssis.usp_RecordPackageEnd.

At the beginning of SSIS package execution, we call ssis.usp_RecordPackageStart from an Execute SQL Task. We pass the package start date and time, the package name, and the package version. We also pass in a status of “Started”. From this stored procedure we get the ID of the newly created row, which we then push into a Package Load ID variable (iPackageLoadID).

When the package completes execution, we call ssis.usp_RecordPackageEnd from an Execute SQL Task. We pass in the Package Load ID from the variable, the current date and time, and the status “Succeeded”.

First Things First

Let’s start by opening the SSIS package we built in Part 1. The first thing we want to change is the package metadata – the data in our SSIS package about our SSIS package. Why? We always want to know what changes were made, who made the changes, and when they were made.

Let’s begin by updating the version. Click on the Control Flow tab and press the F4 key to open properties. Scroll to the Version Properties and change the VersionMinor from 0 to 1:

Also edit the Control Flow annotation that contains version information:

Update package description annotation. I like to include a Change Log summary inside each package:

In the Change Log, I include 

  • A version number in the format Major.Minor.Date.
  • The date of the change.
  • The name of the developer who made the change.
  • A brief description of the change.

Now I’m ready to actually make changes! To facilitate package execution on your machine, I want to edit the name and target SQL Server instance of the OLEDB Connection Manager. Double-click the connection manager to edit it and change the Server name property to “(local)”.

Click the OK button to close the connection manager editor. For consistency, right-click the connection manager and click Rename. Change the name to “(local).SSISRunTimeMetrics”:

There. Housekeeping accomplished!

A test run verifies I haven’t broken anything:

Unit test successful – ready to continue developing.

Errors Aside

Our current solution works well as long as our SSIS package succeeds. And this is the way all good engineering is done; Use Case 1 should always be “Everything Succeeds”. But what happens if the SSIS package fails? Shouldn’t we record the failure and as much metadata about the failure as possible? I’m glad you asked: Yes, we should.

In SSIS, errors “bubble”. The concept of bubbling is familiar to ASP developers – it simply means events follow a prescribed path through the scope hierarchy of the application. Ok, you can scratch the word “simply” out of the preceeding sentence.

Let’s use the script task to generate an error. Right the Test Script Task and click Edit to open the editor. Click the Script page and the Design Script button to open the Visual Studio for Applications (VSA) script editor. I use Script Tasks for exception testing. The following code  – replacing the “Dts.TaskResult = Dts.Results.Success” line – will prompt you to have the task succeed or fail:

Dim iResponse As Integer
iResponse = MsgBox(“Have task complete with Success?”, MsgBoxStyle.YesNo)
If iResponse = vbYes Then
  Dts.TaskResult = Dts.Results.Success
Else
  Dts.TaskResult = Dts.Results.Failure
End If

The additional code causes a second messagebox:

 

If you click Yes in response, the task will succeed. Clicking No will cause the task to fail:

SSIS has some great logging and error-handling functionality built in. The first place to look when an error occurs is the Progress / Execution Results tab in the SSIS Designer GUI. The tab is named Progress while the package is executing and becomes Execution Results once execution completes. Click the Progress / Execution Results tab to view the error raised when you clicked the No button:

This isn’t a very descriptive error message. Some SSIS error messages aren’t very descriptive. Let’s add some error logging capabilities to our ETL Instrumentation application.

To Err is Human… 

There are lots of ways to develop solutions with SSIS. Some of them are best practices, more are simply matters of coding style. For instance, here you could build the database table to hold error information first; or you could build the error-handling functionality in the SSIS package first. This is a style decision. I choose to build the table first.

My initial errors table will be named RunTimeErrors and will contain the following fields:

  • RunTimeMetricsID – to map the error to the execution instance. I want to know the error occurred when I executed the package at a certain time.
  • ErrorDateTime – when the error actually occurred.
  • SourceName – the task that raised the error.
  • ErrorNumber – the error number.
  • ErrorDescription – the error description.

I generate the table with the following T-SQL script:

use SSISRunTimeMetrics
go

— vars…
declare @sql varchar(255)

— create ssis schema…
if not exists(select name
              from sys.schemas
              where name = ‘ssis’)
begin
 set @sql = ‘Create Schema ssis’
 exec(@sql)
end

— create RunTimeErrors table…
if not exists(select s.name + ‘.’ + t.name
              from sys.tables t
              inner join sys.schemas s on s.schema_id = t.schema_id
              where t.name = ‘RunTimeErrors’
               and s.name = ‘ssis’)
 Create Table ssis.RunTimeErrors
 (RunTimeMetricsId int
 ,ErrorDateTime datetime null
 ,SourceName varchar(255) null
 ,ErrorNumber int null
 ,ErrorDescription varchar(2500) null)

Note this script is re-runnable. I encourage you to take the time to write re-runnable scripts when developing database solutions.

We could stop here and directly insert rows into our error table, but let’s complete the development process by creating a stored procedure to do this work for us. Execute the following code in SSMS:

use SSISRunTimeMetrics
go

if exists(select s.name + ‘.’ + p.name
          from sys.procedures p
          inner join sys.schemas s on s.schema_id = p.schema_id
          where p.name = ‘usp_RecordPackageError’
           and s.name = ‘ssis’)
 begin
  Drop Procedure ssis.usp_RecordPackageError
 end
go

Create Procedure ssis.usp_RecordPackageError
  @RunTimeMetricsID int
 ,@SourceName varchar(255) = null
 ,@ErrorNumber int = null
 ,@ErrorDescription varchar(2500) = null
As

 begin

  — set @Now…
  declare @Now datetime
  set @Now = GetDate()

  — insert the run time errors data…
  insert into ssis.RunTimeErrors
  (RunTimeMetricsID
  ,ErrorDateTime
  ,SourceName
  ,ErrorNumber
  ,ErrorDescription)
  values
  (@RunTimeMetricsID
  ,@Now
  ,@SourceName
  ,@ErrorNumber
  ,@ErrorDescription)

 end

go

Again, re-runnable code. Now lets’ return to Visual Studio to populate our table with error information.

Click on the Test Script Task on the Control Flow. Then click the Event Handlers tab. Because you clicked the Test Script Task first, the Event Handler Executable defaults to the Test Script Task. The Event Handler defaults to the OnError event.

 

To create the OnError Event Handler, click the link labeled “Click here to create an ‘OnError’ event handler for executable ‘Test Script Task'”. Let’s do that now.

After you click the link you will note the canvas looks remarkably familiar: it’s nearly identical to the Control Flow.

Our goal is to load data into our RunTimeErrors table. Let’s use an Execute SQL Task to call the stored procedure. Drag an Execute SQL Task onto the OnError Event Handler canvas and double-click it to open the editor. Name the Execute SQL Task “Record Error”. Set the connection to “(local).SSISRunTimeMetrics”. Set the SQLStatement property to “exec ssis.usp_RecordPackageError ?,?,?,?”:

Click the Parameter Mapping page. Parameterized queries use question marks (?) as parameter placeholders. There are a few ways to access the parameters (a subject for another blog post). We will use ordinal position to access them. Click the Add button to add a new parameter. Set the Variable Name property to “User::iPackageLoadID” and the Parameter Name property to “0”. The default data type (“Long”) works for SSIS integer data type variables. This maps the iPackageLoadID variable to the first question mark in our SQLStatement property, which is the @RunTimeMetricsID parameter for the stroed procedure. Got it? Good.

Add three more parameters:

Variable Name: System::SourceName
Data Type: VarChar
Parameter Name: 1

Variable Name: System::ErrorCode
Data Type: Long
Parameter Name: 2

Variable Name: System::ErrorDescription
Data Type: VarChar
Parameter Name: 3

Click the OK button to close the editor and execute the package. 

Click the OK button on the first messagebox, then click the No button on the messagebox dialog to fail the Test Script Task. When the task fails, the OnError event fires and calls our stored procedure:

 

Let’s look at the table to see what was recorded. Return to SSMS and execute the following query:

select *
from ssis.RunTimeErrors
order by ErrorDateTime desc 

The results:

We now have basic error information from our failed task. This is great for our Test Script Task, but what about the other tasks on our SSIS package? Do we have to create OnError event handlers for them as well? That could be a lot of work.

Remember that bubbling thing we mentioned at the beginning of this post? We can employ the fact that errors bubble up through the object model. When an error occurs in the Test Script Task, an error bubbles up to the Sequence Container named “Step 1 – Do some stuff”. You can see it in this earlier screenshot:

To demonstrate, let’s add error handling to the Sequence Container. First, stop debugging the SSIS package if you have not already done so. Then right-click the “Record Error” Execute SQL Task and click Copy to copy it to the clipboard. Click the Executable dropdown and select the “Step 1 – Do some stuff” Sequence Container:

Select the OnError event handler and click the link to create the OnError event handler. Then paste the “Record Error” Execute SQL Task onto the new OnError event handler canvas. Re-execute the package observing the new OnError event handler. Click the No button to cause the Test Script Task to fail.

Note the new OnError event handler fires, in addition to the Test Script Task OnError event handler:

Re-execute the query in SSMS to verify:

Note the subsequent execution (RunTimeMetricsID 10, in my case) records the error twice. Hmmm… this is not the desired behavior. I only want the error to show up once per instance. However, also note the information recorded is nearly identical – only the date is different (and then only by 183 milliseconds). So the 20:17:58.803 occurrence of the error is from the Test Script Task and the 20:17:58.990 occurrence is the same error bubbling into the “Step 1 – Do some stuff” Sequence Container’s context and firing an identicaly-configured OnError event handler.

Again, this is not the desired behavior but it gives me an idea: Since I know errors bubble up the object model, all errors should eventually make it to the Package level. If I delete my existing OnError event handlers and create a single OnError event handler at the package level, I should then be able to record any and all errors – no matter where they occur in the SSIS package.

Let’s do that. First, copy the “Record Error” Execute SQL Task to the clipboard. Click the Delete button to delete the Sequence Container’s OnError event handler, then navigate to the Test Script Task’s OnError event handler and click the Delete button there. Click the Yes button on the confirmation dialog each time:

Navigate to the SSISRunTimeMetrics (package) event handlers:

Create the event handler by clicking the link and paste the Execute SQL Task onto the canvas. Voila, you now have error handling for then entire package. Here we must also add package status logging. The quickest way to accomplish this is to return to the Control Flow and copy the “Log End of Package Execution” Execute SQL Task to the clipboard. Navigate to the package OnError event handler and paste the Execute SQL Task onto the canvas. Connect it to the “Record Error” script task as shown:

Double-click the task to open the editor. Rename the task “Log Failed Package Execution” and change the SQLStatement property to:

declare @Now datetime
set @Now = GetDate()
Exec ssis.usp_RecordPackageEnd ?,@Now,’Failed’

Click the OK button to close the editor.

Re-execute the package to test, answering No to the dialog and failing the Script Task. Re-execute the query in SSMS to verify a single instance of the error is recorded:

Note the SourceName column contains the name of the task where the error originates, even though the error has bubbled through two layers of object model. This also holds for the ErrorNumber and ErrorDescription fields.

Task Status, Anyone?

It’s helpful to be able to track performance at a granular level. When instrumenting an ETL process, I like to record additional (and different) metadata at the Task level.

Since containers contain tasks and groups can represent linked processes or related tasks, task metrics must include parent-task relationships. Execute the following script to build our first TaskMetrics table:

use SSISRunTimeMetrics
go

— vars…
declare @sql varchar(255)

— create ssis schema…
if not exists(select name
              from sys.schemas
              where name = ‘ssis’)
 begin
  set @sql = ‘Create Schema ssis’
  exec(@sql)
 end

— create RunTimeErrors table…
if not exists(select s.name + ‘.’ + t.name
              from sys.tables t
              inner join sys.schemas s on s.schema_id = t.schema_id
              where t.name = ‘TaskMetrics’
               and s.name = ‘ssis’)
 Create Table ssis.TaskMetrics
 (TaskMetricsID int identity(1,1)
 ,RunTimeMetricsId int not null
 ,ParentTaskMetricsID int null
 ,TaskStartDateTime datetime null
 ,TaskEndDateTime datetime null
 ,SourceName varchar(255) null
 ,TaskStatus varchar(25) null)

As before, let’s create stored procedures to manage inserting start-of-task information into the TaskMetrics table and updating end-of-task information:

use SSISRunTimeMetrics
go

if exists(select s.name + ‘.’ + p.name
          from sys.procedures p
          inner join sys.schemas s on s.schema_id = p.schema_id
          where p.name = ‘usp_RecordTaskMetrics’
           and s.name = ‘ssis’)
 begin
  Drop Procedure ssis.usp_RecordTaskMetrics
 end
go

Create Procedure ssis.usp_RecordTaskStart
  @RunTimeMetricsID int
 ,@ParentTaskMetricsID int = null
 ,@TaskStartDateTime datetime = null
 ,@SourceName varchar(255) = null
 ,@TaskStatus varchar(25) = null
As

begin

 — set @TaskStartDateTime default…
 declare @Now datetime
 set @Now = GetDate()
 if @TaskStartDateTime is null
  set @TaskStartDateTime = @Now

 — insert the run time errors data…
insert into ssis.TaskMetrics
 (RunTimeMetricsID
 ,ParentTaskMetricsID
 ,TaskStartDateTime
 ,SourceName
 ,TaskStatus)
output inserted.TaskMetricsID as ‘Inserted_Task_ID’
values
 (@RunTimeMetricsID
 ,@ParentTaskMetricsID
 ,@TaskStartDateTime
 ,@SourceName
 ,@TaskStatus)

end

go

Now for the end-of-task update:

use SSISRunTimeMetrics
go

if exists(select s.name + ‘.’ + p.name
          from sys.procedures p
          inner join sys.schemas s on s.schema_id = p.schema_id
          where p.name = ‘usp_RecordTaskEnd’
           and s.name = ‘ssis’)
 begin
  Drop Procedure ssis.usp_RecordTaskEnd
 end
go

Create Procedure ssis.usp_RecordTaskEnd
 @TaskMetricsID int
,@TaskEndDateTime datetime = null
,@TaskStatus varchar(25) = null
As

begin

— set @TaskEndDateTime default…
declare @Now datetime
set @Now = GetDate()

if (@TaskEndDateTime Is Null)
 set @TaskEndDateTime = @Now

— update the task metrics data…
update ssis.TaskMetrics
Set TaskEndDateTime = @TaskEndDateTime
   ,TaskStatus = @TaskStatus
where TaskMetricsid = @TaskMetricsid

end

go

Instrumenting the Task 

We will use the new procedures to instrument the task.

We need some variables for these parameters. At a bare minimum, we need a variable to hold the Task Metrics ID returned from the ssis.usp_RecordTaskStart stored procedure. Open variables and create a new package-scoped, Int32 type variable named iTaskLoadID:

 

Drag an Execute SQL Task into the Sequence Container an configure the General page as shown:

Set the SQLStatement property to the follow script:

declare @Now datetime
set @Now = GetDate()
exec ssis.usp_RecordTaskStart ?,NULL,@Now,’Do some stuff’,’Started’

Click the Parameter Mapping page and add a parameter. Set the Variable Name to User::iPackageLoadID and Parameter Name to 0 – accept the defaults for the remaining parameter properties. Click the Result Set page and add a new result. Set the ResultName to 0 and the Variable Name to User::iTaskLoadID. This will hold the TaskMetricsID value returned from the ssis.usp_RecordTaskStart stored procedure:

Click the OK button to close the editor. Next, let’s create a way to update TaskMetrics table after a successful execution.

Drag an Execute SQL Task into the “Step 1 – Do some stuff” Sequence Container and connect it as shown:

 

Double-click the Execute SQL Task to edit it. Name it “Log Successful End of Task”. Set the SQLStatement to:

declare @Now datetime
set @Now = GetDate()
exec ssis.usp_RecordTaskEnd ?,@Now,’Completed’

Add one parameter mapped to User::iTaskLoadID and set to Parameter Name 0. Test the new tasks by executing the SSIS package. When prompted, have the script task succeed. Execute the following query to observe the results:

select *
from ssis.TaskMetrics
order by TaskMetricsID desc

Note TaskMetricsID 2 completed while TaskMetricsID 1 started and then stopped.

We now need to add a way to track and record task error status. Add another Execute SQL Task to the Control Flow canvas and connect a Failed Precedence Constraint from the Sequence Container to the new Execute SQL Task as shown:

 

 Name the new Execute SQL Task “Log Failed End of Task”. Set the SQLStatement to:

declare @Now datetime
set @Now = GetDate()
exec ssis.usp_RecordTaskEnd ?,@Now,’Failed’

Add one parameter mapped to User::iTaskLoadID and set to Parameter Name 0. Test the new tasks by executing the SSIS package. When prompted, have the script task fail. Execute the query to observe the results:

You now have enough information to correlate the three tables. For example, we can view the results of the last execution with the following script:

select
  m.packageName
 ,m.packageStartDateTime
 ,DateDiff(ss, m.packageStartDateTime, m.packageEndDateTime) as ‘packageRunTime’
 ,m.packageStatus
 ,t.SourceName
 ,t.TaskStartDateTime
 ,DateDiff(ss, t.TaskStartDateTime, t.TaskEndDateTime) as ‘taskRunTime’
 ,t.TaskStatus
 ,e.ErrorNumber
 ,e.ErrorDescription
from ssis.RunTimeMetrics m
inner join ssis.RunTimeErrors e on e.RunTimeMetricsID = m.id
inner join ssis.TaskMetrics t on t.RunTimeMetricsID = m.id
where m.packageStartDateTime = (select Max(packageStartDateTime)
                                from ssis.RunTimeMetrics)

Those results appear as shown here:

This makes a fine source query for a SQL Server Reporting Services solution!

Conclusion

In this second part of ETL Instrumentation, we modified version metadata to indicate we have made changes; added package error logging and error status logging; and added task status logging.

Get the code!

:{> Andy

 PS – The PASS Summit 2015 Call for Speakers closes at 9:00 PM PDT Sunday, 15 Mar 2015 (04:00 GMT 16 Mar 2015). There’s still time, but hurry!

Learn more: 
Linchpin People Blog: SSIS 
Stairway to Biml 
Stairway to Integration Services 

SSIS2014DesignPatterns200

Context and Grain

Introduction: The World Series (1979) and Photography

Willie “Pops” Stargell started my interest in photography (…a funny way to start a post on a SQL Server blog site, but bear with me). Pops Stargell led the 1979 Pittsburgh Pirates to win the World Series. He was the MVP that year as well.

My Mom, in the only time I ever remember her gambling when I was a kid, bought a $10 spot in a World Series baseball pool that year. The pool was based on the 1’s digit of the total runs scored by the Pirates and Orioles in the entire series with the person who had the 1’s digits in the correct order winning $600, the person getting them in reverse order winning $300, and the organizer kept $100. The series lasted seven games that year. Mom was set to win the $600 if the final score of Game 7 was 4-1 in the Pirate’s favor. Willie Stargell hit a home run that helped achieve that score, along with some good fielding (especially for a 39-year old player).

I had a budding interest in photography because some friends at high school were photographers. There was just no way we could swing $200 for a mid-range 35mm SLR back them. But Mom promised me if she won the big money she’d buy me the camera. She won and she bought me the camera.

I got to attend an interesting Yearbook Seminar the next summer held at Longwood University in Farmville, VA – located about 7 miles from where I currently live. I learned some fascinating things about photography and did a decent job getting shots for my senior year yearbook.

Photography Context and Grain 

One of the things I learned was this: If you take an out-of-focus shot, there’s nothing you can do in chemical and light processing to bring that shot into focus. There may be (and probably are) digital things we can do nowadays, but this was the 70’s. The very best you could do with an out-of-focus was produce a print no less in focus.

In the context of the photograph, the focus attribute was set when the picture was taken. The focus could not be improved after the picture was taken. It was the best it was ever going to be from that moment until forever.

Database Design Context and Grain 

Now here’s a lesson we can take into database development.

Several for-instances leap to mind: granular resolution in a business intelligence data acquisition system, for one.

Imagine you are tasked with reporting real-time data for a business intelligence scorecard application. The client expects up-to-the-second updates from the data acquisition systems on the manufacturing floor and this is a project requirement.

Sure, you can do that.

Unless data is collected every five seconds on the floor. Then you have an issue. There’s nothing you can do from your side of the project – since you merely read, store, transform, and display the acquired data – to “fix” this. You can give updates every second, but these counters (and metrics derived from them) will change only every five seconds.

In this imaginary scenario you cannot “re-focus” the picture. The best you can do is present the information at it’s current grain – and no finer.

Software Development Context and Grain 

The same can be said for software development. I will state it in another way: Every decision to develop Feature A in a certain way is also a decision to not develop Feature A in any number of other ways.

There are consequences to choosing an approach, methodology, or context. These consequences are usually discovered well down the path towards a deliverable or release. Occassionally, talented developers can find a “silver bullet” – a fix that solves the current consequence to a past decision without breaking anything else. But this is unfortunately rare. Usually there are consequences to the consequence-initiated fix, and so on, ad infinitum.

There is a point early on in a development project where such consequences, if detected, can be addressed with relatively little effort. I use the analogy of deflecting an approaching asteroid: If you detect it early enough you can deflect it with a BB. Wait, and all the nukes on the planet won’t stop it.

One rule of processes – I first saw this in The Goal – is “Losses accumulate, gains do not” (just call me Mr. Encouraging). Most good process measurement methodologies account for this. One way to think about it is to consider the example of a three-stage linear process where each stage is running at 90% efficiency.

<TrickQuestion> What is the overall efficiency of the system?
  <TrickAnswer> 90%</TrickAnswer>
</TrickQuestion>

90% is incorrect. It’s a linear process. The output of stage 1 is 90%. The output of stage 2 is 90% of 90%, or 81%. The output of stage 3 is 90% of 81%, or 73%.

Consider a process-improvement initiative that results in 100% efficiency at stage 1. The output improves, but only to 81%. Consider the alternative – a reduction at stage 1 to 80%. Stage 2’s output becomes 72%; stage 3’s: 65%. The losses accumulate, the gains don’t. To quote Foghorn Leghorn, “Figures don’t lie.”

Why Are You Sharing This, Mr. Encouraging?

There are a few reasons: First, this is a trap into which many a young developer falls. It looks simple but it is not. Life is filled with things that appear deceptively simple. Software development is but one of them.

Second, underestimating the effort required to accomplish any software related task is a trait shared by every good developer I know, plus me. We all think things will take less time than they do. I don’t know why, but that doesn’t make the fact any less true. Joel Spolsky has an excellent article on Evidence Based Scheduling that presents an interesting solution to this.

Third, it doesn’t matter how good you are, this can happen to you. The odds of getting caught in a process-resolution trap increase exponentially with complexity.

Conclusion 

Be aware during design / architecture phases, the decisions made today will limit future options. There’s a side order of art included with the science here. Balance is the key.

:{> Andy

New TFS 2008 VPC Images are available

Previously, Microsoft has posted an update to the Visual Studio 2008 Downloads page: 

IMPORTANT:

  • The VPC downloads listed above will expire on November 1, 2007, which is earlier than anticipated. It is important that you backup or move any data to a new location before that date. To move Team Foundation Server data, see Moving Team Foundation Server. New VPC images will be posted as soon as they are available. 

 The original expiration date was 15 Mar 2008.

Today:

Jeff Beehler announces the availability of updated Team Foundation Server 2008 and Team Suite Beta 2 VPC images.

The updated images are available:

Good work Jeff and team!

 :{> Andy

Technorati Tags:

Master Data Management 101: Standardize Units of Measure

Introduction

 

   There’s quite a bit of hoopla about MDM lately, mostly due to awareness. While “Lacks needed data” was cited as the number 1 problem (with 21 votes) in data warehouses in the recent IBM Data Warehousing Satisfaction Survey (2007), “Insufficient or inadequate master data” made a decent showing at second with 17 votes (see Table 3).

 

   The new Microsoft MDM site is a good place to start learning about Master Data Management and Microsoft’s efforts in this area.

   Master Data is a pretty common thing for engineers. I learned about it way back in my manufacturing engineering days.

 

The Issue

 

   Consider this scenario: Conglomerate C (CC) makes widgets and starts acquiring businesses that also make widgets. CC sells widgets by the pound, but Acquisition A (AA) measures them by counting individual widgets, while Acquisition B (AB) sells them by the case (gross, or 144 ea).

 

   CC now wants all this data in a data warehouse so they can compare apples to apples and know, among other things, how many widgets they’re actually making and selling on a given day.

 

Note: Instrumentation and measurement are scientific disciplines in their own rite. There’s a lot more to this, which I hope to cover here and in my new Applied MDM blog.

 

   The Unit of Measure in the existing database, dbCC, is pounds. The Widgets tables from the three companies look like this:

 

dbCC.dbo.Widgets

ID

Date

Weight

1

1/1/2007

2076

2

1/2/2007

2100

3

1/3/2007

1977

 

dbAA.Product.Widgets

ProductID

Date

Count

F0932E13-218D-458A-BE09-3286AFDE0280

1 Jan 2007

10,265

F68BF7AC-553E-4A32-B1CB-442DD310194C

2 Jan 2007

13,009

8C0C7511-1386-4C13-84B8-2351248280E6

3 Jan 2007

17,121

 

dbAB.dbo.Widgets

ID

Date

Cases

1

20070101

84

2

20070102

82

3

20070103

99

 

 

Standards, Standards, Standards

 

   One facet of MDM is standardizing this data. The keys to standardizing this data are recognizing traits in the data types, along with the quality of any proposed conversion.

   For example, the Cases to Count ratio is most likely stable and predictable. Conversion is easily accomplished using multiplication (or division, depending on which way you go in the standardization). Quality will be high.

   But the weight to count (individual or case count) conversion is going to be impacted by other factors. Most notably, do all widgets weigh the same? If not, what’s the acceptable tolerance? Quality will be adversely affected by tolerance.

 

   Dimensional analysis (the multiplication or division you do to convert known quantities) is also a question about measurement granularity. Trust me: You will want to store as fine a grain as possible.

    Looking at the sample data, you will want to store WidgetCount somewhere. dbAA is already in this format. Yay. dbAB is easy enough: dbAB.dbo.Widgets.Cases * 144 gives you WidgetCount. Again, the math on widget Weight in the dbCC data becomes fuzzy. But our customer assures us each pound contains five widgets. Five widgets per pound is probably a high quality number. It’s much higher than, say, five hundred or five thousand per pound!

 

   “Fuzziness” will impact the integrity of your data. A couple important measures of data warehouse integrity are data accuracy and signal to noise (usually defined by the percentage of “unknowns” in the data).

   When I have encountered this scenario in the field, I have informed the customer of the dangers and begged them to collect better metrics at the WidgetWeight station for company CC.

 

Enter Stage Left…

 

   There are other issues in these examples: date and ID standardization. Dates are fairly straightforward. The IDs can be a little tricky. To standardize the IDs in this example I may consider a LocationID and ProductDateID (ID) composite key on the first pass.

 

   I might create a couple tables in the data warehouse staging database that look like this:

 

Staging.Products.Widget

LocationID

ID

Date

Count

1

1

1/1/2007

10380

1

2

1/2/2007

10500

1

3

1/3/2007

9885

2

1

1/1/2007

10,265

2

2

1/2/2007

13,009

2

3

1/3/2007

17,121

3

1

1/1/2007

12,096

3

2

1/2/2007

11,808

3

3

1/3/2007

14,256

Staging.Products.Location

LocationID

LocationDescription

1

dbCC

2

dbAA

3

dbAB

 

   I’ve assumed (based on customer feedback) I get 5 widgets / pound from dbCC, and I know the math for the rest. Here I’ve prepared Location data for a star schema via a LocationID and Location table. But I haven’t touched dates and IDs, they will need some work before reaching the destination data warehouse.

 

Just the Fact, Ma’am

 

   When I load the destination data warehouse, LocationID maps nicely to the Location dimension surrogate key, while I could simply make the column now labeled ID into a Date dimension surrogate key (removing the Date column from the fact table altogether because it’s now redundant because it’s now redundant).

   This three-column table then makes a decent Fact:

 

DW.Product.FactWidget
LocationIDDateIDCount
1110,380
1210,500
139,885
2110,265
2213,009
2317,121
3112,096
3211,808
3314,256

 

   There’s more to Master Data Management, but this is one type of business problem folks are trying to solve when they talk about MDM and how it ultimately relates to data warehouse quality and customer satisfaction.

 

:{> Andy

 

Technorati Tags:

Master Data Management 101: Standardize Units of Measure

Introduction

 

   There’s quite a bit of hoopla about MDM lately, mostly due to awareness. While “Lacks needed data” was cited as the number 1 problem (with 21 votes) in data warehouses in the recent IBM Data Warehousing Satisfaction Survey (2007), “Insufficient or inadequate master data” made a decent showing at second with 17 votes (see Table 3).

 

   The new Microsoft MDM site is a good place to start learning about Master Data Management and Microsoft’s efforts in this area.

   Master Data is a pretty common thing for engineers. I learned about it way back in my manufacturing engineering days.

 

The Issue

 

   Consider this scenario: Conglomerate C (CC) makes widgets and starts acquiring businesses that also make widgets. CC sells widgets by the pound, but Acquisition A (AA) measures them by counting individual widgets, while Acquisition B (AB) sells them by the case (gross, or 144 ea).

 

   CC now wants all this data in a data warehouse so they can compare apples to apples and know, among other things, how many widgets they’re actually making and selling on a given day.

 

Note: Instrumentation and measurement are scientific disciplines in their own rite. There’s a lot more to this, which I hope to cover here and in my new Applied MDM blog.

 

   The Unit of Measure in the existing database, dbCC, is pounds. The Widgets tables from the three companies look like this:

 

dbCC.dbo.Widgets

ID

Date

Weight

1

1/1/2007

2076

2

1/2/2007

2100

3

1/3/2007

1977

 

dbAA.Product.Widgets

ProductID

Date

Count

F0932E13-218D-458A-BE09-3286AFDE0280

1 Jan 2007

10,265

F68BF7AC-553E-4A32-B1CB-442DD310194C

2 Jan 2007

13,009

8C0C7511-1386-4C13-84B8-2351248280E6

3 Jan 2007

17,121

 

dbAB.dbo.Widgets

ID

Date

Cases

1

20070101

84

2

20070102

82

3

20070103

99

 

 

Standards, Standards, Standards

 

   One facet of MDM is standardizing this data. The keys to standardizing this data are recognizing traits in the data types, along with the quality of any proposed conversion.

   For example, the Cases to Count ratio is most likely stable and predictable. Conversion is easily accomplished using multiplication (or division, depending on which way you go in the standardization). Quality will be high.

   But the weight to count (individual or case count) conversion is going to be impacted by other factors. Most notably, do all widgets weigh the same? If not, what’s the acceptable tolerance? Quality will be adversely affected by tolerance.

 

   Dimensional analysis (the multiplication or division you do to convert known quantities) is also a question about measurement granularity. Trust me: You will want to store as fine a grain as possible.

    Looking at the sample data, you will want to store WidgetCount somewhere. dbAA is already in this format. Yay. dbAB is easy enough: dbAB.dbo.Widgets.Cases * 144 gives you WidgetCount. Again, the math on widget Weight in the dbCC data becomes fuzzy. But our customer assures us each pound contains five widgets. Five widgets per pound is probably a high quality number. It’s much higher than, say, five hundred or five thousand per pound!

 

   “Fuzziness” will impact the integrity of your data. A couple important measures of data warehouse integrity are data accuracy and signal to noise (usually defined by the percentage of “unknowns” in the data).

   When I have encountered this scenario in the field, I have informed the customer of the dangers and begged them to collect better metrics at the WidgetWeight station for company CC.

 

Enter Stage Left…

 

   There are other issues in these examples: date and ID standardization. Dates are fairly straightforward. The IDs can be a little tricky. To standardize the IDs in this example I may consider a LocationID and ProductDateID (ID) composite key on the first pass.

 

   I might create a couple tables in the data warehouse staging database that look like this:

 

Staging.Products.Widget

LocationID

ID

Date

Count

1

1

1/1/2007

10380

1

2

1/2/2007

10500

1

3

1/3/2007

9885

2

1

1/1/2007

10,265

2

2

1/2/2007

13,009

2

3

1/3/2007

17,121

3

1

1/1/2007

12,096

3

2

1/2/2007

11,808

3

3

1/3/2007

14,256

Staging.Products.Location

LocationID

LocationDescription

1

dbCC

2

dbAA

3

dbAB

 

   I’ve assumed (based on customer feedback) I get 5 widgets / pound from dbCC, and I know the math for the rest. Here I’ve prepared Location data for a star schema via a LocationID and Location table. But I haven’t touched dates and IDs, they will need some work before reaching the destination data warehouse.

 

Just the Fact, Ma’am

 

   When I load the destination data warehouse, LocationID maps nicely to the Location dimension surrogate key, while I could simply make the column now labeled ID into a Date dimension surrogate key (removing the Date column from the fact table altogether because it’s now redundant because it’s now redundant).

   This three-column table then makes a decent Fact:

 

DW.Product.FactWidget
LocationIDDateIDCount
1110,380
1210,500
139,885
2110,265
2213,009
2317,121
3112,096
3211,808
3314,256

 

   There’s more to Master Data Management, but this is one type of business problem folks are trying to solve when they talk about MDM and how it ultimately relates to data warehouse quality and customer satisfaction.

 

:{> Andy

 

Technorati Tags: