SSIS Design Pattern – ETL Instrumentation, Part 1

Introduction
 
SSIS is a fantastic ETL engine. When I build an ETL solution in SSIS, I like to collect runtime metrics. Why? I use the data initially to determine a baseline for performance and to determine, in some cases, if I’m loading within a defined window.
 
I refer to this process of collecting performance and runtime metrics as ETL Instrumentation and I collect this data for every enterprise ETL solution I design.
 
The types of things I want to know are:
  • Did the process start when I wanted it to start?
  • Did it succeed?
    • If not, is it still running?
    • Did it fail?
  • How long did it execute?
  • How does recent performance compare to past performance?

In this mini-series entitled SSIS Design Pattern – ETL Instrumentation, I will demonstrate how I capture, report, and measure these data using SSIS, SSRS, SQL Agent Jobs, and Transact-SQL.

In Part 1, we build an elementary system. In subsequent parts, we will improve upon the concepts demonstrated here.

To Instrument Or Not To Instrument; That Is The Question
 
My thoughts on this are best summed up thus:
 
There are two types of enterprise data warehouse developers:
1) Those who collect ETL instrumentation data; and
2) Those who will.
 
Why do I make this statement? When you build ETL to load your enterprise data warehouse, you are building the source for the reporting system. Should anything unfortunate happen to the data between the time you read it from the System Of Record and the time it lands in your Reporting Server Database, you are basically guilty until proven innocent. ETL Instrumentation will provide evidence for (and sometimes against) you as you defend your ETL design and implementation.
 
Building The Database
 
Before we instrument an ETL process, we need a database solution for the data. We’ll start by building the database itself, then add objects.
 
Let’s build the database. We’ll call it SSISRunTimeMetrics. Open SQL Server Management Studio (SSMS) and execute the following Transact-SQL:
 
 
use master
go

if not exists(select name
              from master.sys.databases
              where name = ‘SSISRunTimeMetrics’)
 Create Database SSISRunTimeMetrics
go

 
   This script creates the SSISRunTimeMetrics database. Next, let’s create a schema and table to hold our metrics:
 
 
use SSISRunTimeMetrics
go

— vars…
declare @sql varchar(255)

— create ssis schema…
if not exists(select name
              from sys.schemas
              where name = ‘ssis’)
 begin
  set @sql = ‘Create Schema ssis’
  exec(@sql)
 end

— create RunTimeMetrics table…
if not exists(select s.name + ‘.’ + t.name
              from sys.tables t
              inner join sys.schemas s on s.schema_id = t.schema_id
              where t.name = ‘RunTimeMetrics’
               and s.name = ‘ssis’)
 Create Table ssis.RunTimeMetrics
  (id int identity(1,1)
  ,packageStartDateTime datetime null
  ,packageEndDateTime datetime null
  ,packageName varchar(255) null
  ,packageVersion varchar(25) null
  ,packageStatus varchar(25) null)

Next, let’s add a stored procedure to record a package start event:
 

use SSISRunTimeMetrics
go
 
if exists(select s.name + ‘.’ + p.name
          from sys.procedures p
          inner join sys.schemas s on s.schema_id = p.schema_id
          where p.name = ‘usp_RecordPackageStart’
           and s.name = ‘ssis’)
 begin
  Drop Procedure ssis.usp_RecordPackageStart
 end
go
 
Create Procedure ssis.usp_RecordPackageStart
  @packageStartDateTime datetime = null
 ,@packageName varchar(255) = null
 ,@packageVersion varchar(25) = null
 ,@packageStatus varchar(25) = null
As
 begin
  — set @packageStartDateTime default…
  declare @Now datetime
  set @Now = GetDate()

  if (@packageStartDateTime Is Null)
   set @packageStartDateTime = @Now

  — insert the run time metrics data…
  insert into ssis.RunTimeMetrics
   (packageStartDateTime
   ,packageName
   ,packageVersion
   ,packageStatus)
  values
  (@packageStartDateTime
  ,@packageName
  ,@packageVersion
  ,@packageStatus)

 end

go

 
That’s all for now. We’ll add more stuff later.
 
Building The SSIS Project
 
Open SQL Server Business Intelligence Development Studio (BIDS) and create a new project. When the New Project dialog displays, select Business Intelligence Projects from the Project types list and Integration Services Project from the Templates. Name the project SSISRunTimeMetrics and click the OK button: 
 
 
Rename the Package SSISRunTimeMetrics.dtsx:
 
 
Always click the Yes button when you see this dialog (trust me):
 
 
Right-click in the white-space on the Control Flow canvas and click Add Annotation to document your code:
 
 
I can’t say enough about good documentation. You never know when someone else will have to open your source code to troubleshoot.
 
Right-click in the white-space of the Control Flow canvas and click Variables. Add a variable named sVersion. Make sure the Scope is the SSISRunTimeMetrics package and make the Data Type String:
 

 

Add a Sequence Container and change the name to “Step 1 – Do some stuff”:
 
 
Sequence Containers are nice for several reasons:
  • They’re aesthetic. You can break up chucks of functionality logically… how cool!
  • When troubleshooting, you can execute the contents of a container from the right-click context menu.
  • You can isolate chunks of functionality using transactions by container.
Next, add an Execute SQL Task and name it “Log Start of Package Execution”:
 
 
Double-click the Execute SQL Task to edit it. Click Connection and select “<New connection…>”:
 
 
When the Configure OLE DB Connection Manager dialog displays, click the “New…” button:
 
 
Enter or select your server in the Server Name dropdown. In the “Connect to a database” panel, select the “Select or enter a database name” option and enter or select the SSISRunTimeMetrics database you created earlier. Click the OK button to proceed:
 
 
Then click the OK button again to set the Connection property of the Execute SQL Task:
 
 
Select the SQL Statement property and click the ellipsis. When the “Enter SQL Query” dialog displays, enter the following parameterized Transact-SQL statement:
 
Exec ssis.usp_RecordPackageStart ?,?,?,’Started’
 
Click the OK button to proceed:
 
 
Click on the Parameter Mapping page and configure parameters as shown:
 
 
Click the OK button to proceed.
 
Next, let’s do some work on the sVersion variable. Click on the sVersion variable name and press the F4 key to display variable properties. Change the EvaluateAsExpression property to True, then click on the Expression property’s ellipsis:
 
 
When the Expression Builder displays, enter the following SSIS Expression Language text to build a version string in the format Major.Minor.Build:
 
(DT_WSTR, 3) @VersionMajor + “.” +  (DT_WSTR, 3)@VersionMinor + “.” + (DT_WSTR, 3)@VersionBuild
 
 
Click the “Evaluate Expression” button to check the expression. Then click the OK button to close the Expreesion Builder:
 
 
Add a Script Task and configure it to popup a message box displaying the value of sVersion:
 
Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Runtime
 
Public Class ScriptMain
  Public Sub Main()
 
    Dim sVersion As String = Dts.Variables(“sVersion”).Value.ToString
    Dim sMsg As String
 
    sMsg = “Version: ” & sVersion
    MsgBox(sMsg)
    Dts.TaskResult = Dts.Results.Success
 
  End Sub
End Class
 
 
Execute the package. Then execute the following Transact-SQL query to check the results:

select * from ssis.RunTimeMetrics

 
It’s time to add a procedure to log the end of package execution, but we have a problem: How do we identify the row we just added to the ssis.RunTimeMetrics table? Think about it. We just inserted a row, and since that’s the only row in the table it’s pretty simple to identify. Later, we will have lots of rows in the table in varying states. We need a method to positively identify the row we’re working with.
 
We could use the last row inserted – apply Max(id) criteria – but what happens as our metrics grows to include multiple packages? One package could start while another was executing and we’d update the wrong row.
 
We require a means to positively identify the row when added to the table. Such a means exists. If we modify the ssis.usp_RecordPackageStart stored procedure we can have it return the value of the inserted ID. The OUTPUT clause gives us access to the inserted and deleted virtual tables. Modify the stored procedure by adding the following OUTPUT clause as shown:
 
output inserted.id as ‘Inserted_ID’
 
Execute the Transact-SQL to drop and recreate the ssis.usp_RecordPackageStart stored procedure:
 
 
Add another Int32 Type variable named iPackageLoadID:
 
 
Back in Execute SQL Task, change the ResultSet property to “Single row”:
 
 
Click the Result Set page and add a new resultset. Name it 0 (ordinals work well) and assign the results of this query to the iPackageLoadID variable you just created. This will push the results of the OUTPUT clause in the stored procedure (which will include the id of the newly inserted row) into the iPackageLoadID variable. You can then access it later in the package to update the precise row:
 
 
Alter the Script Task by adding iPackageLoadID to the list of ReadOnlyVariables:
 
 
Alter the VB.Net script to read:
Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Runtime
 
Public Class ScriptMain
  Public Sub Main()
 
    Dim sVersion As String = Dts.Variables(“sVersion”).Value.ToString
    Dim sPackageLoadID As String = Dts.Variables(“iPackageLoadID”).Value.ToString
    Dim sMsg As String
 
    sMsg = “Version: ” & sversion & vbCrLf
    sMsg = sMsg & “PackageLoadID: ” & sPackageLoadID
    MsgBox(sMsg)
    Dts.TaskResult = Dts.Results.Success
 
  End Sub
End Class
 
 
Execute the package to test the new script:
 
 
Next, add a new stored procedure named ssis.usp_RecordPackageEnd to the SSISRunTimeMetrics database using the following Transact-SQL script:

use SSISRunTimeMetrics
go
 

if exists(select s.name + ‘.’ + p.name
          from sys.procedures p
          inner join sys.schemas s on s.schema_id = p.schema_id
          where p.name = ‘usp_RecordPackageEnd’
          and s.name = ‘ssis’)
 begin
  Drop Procedure ssis.usp_RecordPackageEnd
 end
go
Create Procedure ssis.usp_RecordPackageEnd
  @id int
 ,@packageEndDateTime datetime = null
 ,@packageStatus varchar(25) = null
As
 begin
  — set @packageEndDateTime default…
  declare @Now datetime
  set @Now = GetDate()
  if (@packageEndDateTime Is Null)
   set @packageEndDateTime = @Now
 
  — update the run time metrics data…
  update ssis.RunTimeMetrics
   Set packageEndDateTime = @packageEndDateTime
      ,packageStatus = @packageStatus
  where id = @id
 end
go
 
Now add a new Execute SQL Task named “Log End of Package Execution”:
 
 
Set the Connection property to your SSISRunTimeMetrics database and the SQLStatement property to the following Transact-SQL statement:
declare @Now datetime
set @Now = GetDate()
 
Exec ssis.usp_RecordPackageEnd ?,@Now,’Succeeded’
 
Click the Parameter Mapping page and configure the iPackageLoadID variable as an input variable of the Long data type, assigned to ordinal 0:
 
 
Click the OK button: 
 
 
Execute the package. Note that each time a change is saved, the VersionBuild property auto-increments, thus updating your sVersion variable value:
 
 
The package succeeds:
 
 
Examine the data stored in the ssis.RunTimeMetrics table. Note the latest row now contains a Start and End DateTime, and the packageStatus field now records “Succeeded”:
 
 
We will work with these concepts more – fleshing them out by adding more robust code and additional functionality. For now, this is a good start. Congratulations!
 
Get the code!
 
:{> Andy

PS โ€“ The PASS Summit 2015 Call for Speakers closes at 9:00 PM PDT Sunday, 15 Mar 2015 (04:00 GMT 16 Mar 2015). Thereโ€™s still time, but hurry!

Learn more: 
Linchpin People Blog: SSIS 
Stairway to Biml 
Stairway to Integration Services 

SSIS2014DesignPatterns200 

34 Replies to “SSIS Design Pattern – ETL Instrumentation, Part 1”

  1. Hi Andy
    you did a great work, but I have to say that for me instrumentation should me something "above" packages and not IN the package. In this way you can turn instrumentation on or off whenever you need.
    I’ve developed a tool that allows you to do this:
    http://www.codeplex.com/DTLoggedExec
    you just need to create a plugin that write to database instead writing in a log file as it actually do.
    Otherwise you can just wait a couple of days so that I can release the new versions that uses Log4Net, so that you can just configure it to write on SQL Server. ๐Ÿ™‚

  2. Hi Manowar,
      Thanks!
      I clicked the link and read through your page on DTLoggedExec. That looks really cool. Very impressive work!
      There are several reasons I like to keep the instrumentation inside the SSIS package process. My reasons will become more clear (I hope) as the series progresses. One reason is I like to use out-of-the-box transformations. This is clearly a preference on my part, but it has implications for application management lifecycle.
      SSIS is so flexible that good people can disagree on which is better; something in-line or something external. Personally, I believe everyone should use the approach with which they’re most comfortable.
    :{> Andy

  3. Sorry to jump is so late.
    I could not resist posting since I know both Andy and Davide (aka Manowar).
    Would you also be able to intercept failures/errors, update the packageEndDateTime while also enter a value like "Failed" for the packageStatus?
    What are some good ways of accomplishing this? ExecuteSQL in the OnError Event Handler perhaps?

  4. sVersion = Dts.Variables(sVersion).Value.ToString
      at Microsoft.SqlServer.Dts.Runtime.Variables.get_Item(Object index)
      at ScriptTask_1e047fb858904a059a7fbeedd6f98d42.ScriptMain.Main()
    Andy  I am getting these errors. I think I tracked it down to this line. What am I doing wrong?
    Mark

  5. Hi Mark,
     In the sVersion assignment, the line should read:
    sVersion = Dts.Variables("sVersion").Value.ToString.
      Your line is missing the double-quotes surrounding the Variable name.
      Does that fix it?
    Hope this helps,
    Andy

  6. Hi Andy, great article, however, when I run your example I get the following error:
    [Execute SQL Task] Error: Executing the query "Exec ssis.usp_RecordPackageStart ?,?,?,’Started’" failed with the following error: "No value given for one or more required parameters.". Possible failure reasons: Problems with the query, "ResultSet" property not set correctly, parameters not set correctly, or connection not established correctly.
    Any ideas? Thanks
    Mike

  7. Hi Mike,
      It sounds like either you’re missing one of the parameters in the "Record Start of Package Execution" Execute SQL Task. Check that page in the editor and make sure you have three parameters listed (System::StartTime, System::PackageName, and User::sVersion) and that they’re all configured as INPUT parameters.
    Hope this helps,
    Andy

  8. Hi Andy,
    Good Post….
    I tried the whole thing but i am getting error message in ‘Log End of Package Execution’. Here is the message text
    [Execute SQL Task] Error: Executing the query "declare @Now datetime set @Now = GetDate()   Exec ssis.usp_RecordPackageEnd ?,@Now,’Succeeded’  " failed with the following error: "Syntax error, permission violation, or other nonspecific error". Possible failure reasons: Problems with the query, "ResultSet" property not set correctly, parameters not set correctly, or connection not established correctly.
    Please respond

  9. Hi Rhea,
      That’s a tough call without being able to see how you configured the Execute SQL Task. When I get that error it’s usually something I did with the Parameter configuration. I would double-check to make sure you named the Parameter "0", set the direction to "Input", set the data type to "Long", and the variable to "User::iPackageLoadID".
    Hope this helps,
    Andy

  10. Andy,
    I am having the same issue as Mike B with the "Record Start of Package Execution" Execute SQL Task.
    I tried everything posted in these comments, to no avail…What did Mike B do to solve the issue ?
    thanks in advance, and thanks for the column….
    Frank

  11. Hi Frank,
      I am not sure what Mike did to resolve the issue he encountered. If you’re stumped and can hold out until I get some time to look at it, zip up the package and email it to me at andy dot leonard at gmail dot com.
    :{> Andy

  12. Hi Andy,
    I’m working through your great article on SSIS Design Pattern – ETL Instrumentation, Part 1.
    I built the package etc as i went along, all worked great until i added the last execute sql task to log the end of package execution. I’m getting the same error message as Rhea who posted on Jan 24th. The error is:
    [Execute SQL Task] Error: Executing the query "declare @Now datetime set @Now = GetDate()   Exec ssis.usp_RecordPackageEnd ?,@Now,’Succeeded’  " failed with the following error: "Syntax error, permission violation, or other nonspecific error". Possible failure reasons: Problems with the query, "ResultSet" property not set correctly, parameters not set correctly, or connection not established correctly.
    Before you ask i read your reply, i’ve checked the parameter properties etc..
    I then downloaded your package from the site, i keep getting the same error message ๐Ÿ™
    I think it’s this bit that’s causing the problem:
    declare @Now datetime
    set @Now = GetDate()
    I copied the usp_RecordPackageEnd stored proc and took out the date stuff, so that the exec looked like this:
    Exec ssis.usp_RecordPackageEnd_Test ?,’Succeeded’
    This worked! However the original stored proc with date in it doesn’t, is it to do with the oledb data provider not liking @Now?
    I’d be very grateful for your thoughts.
    Many thanks,
    Andy R
    (York, England)

  13. Hi Andrew,
    Make sure you have the SP2 installed, there may be a chance that it may not work if you dont have SP2.
    Hope this helps…

  14. Why not use the logging feature of SSIS? I’m trying to learn more about the concepts of the instrumentation.
    What about be the difference with creating the metrics table etc….. ?
    thanks,
    Jonahtan

  15. Hi Jonathan,
      You can use SSIS’s built-in logging to do this.
      One reason I use custom logging: I like to own the schema of the log. When some future version of SQL Server is released I would rather avoid surprises. There’s a chance the log table schema may change. That’s fine but I don’t want that change interfering with the SSIS Metrics reporting structure I’ve built.
    :{> Andy

  16. I have a similar problem as andrewyorkbr had.
    Do you know what causes the different behaviour:
    On my development machine (and many others) the
      declare @Now datetime
      set @Now = GetDate()
    works, but on my productive one (which happens to be Windows 2000 Server) it does not, yet any SELECT-Statement with getdate() works. Just when I have some "declare @…" in it, it crashes with the same error "Syntax error, permission violation, or other nonspecific error".
    Do you have any idea?
    Thank you
    Martin

  17. Hi Yasmina,
      Goodness, I don’t know.
      I opened Access 2007 just now (for the first time, I think) and it appears to use VBA. I suppose you could shell an application from inside VBA and call DTExec.exe.
    Hope this helps,
    Andy

  18. Was the issue reported by Rhea fixed. I also face the same issue. The problem is it works sometimes and it fails sometimes. When it fails it always gives error:
    Exec ssis.usp_RecordPackageEnd ?,@Now,’Succeeded’  " failed with the following error: "Syntax error, permission violation, or other nonspecific error". Possible failure reasons: Problems with the query, "ResultSet" property not set correctly, parameters not set correctly, or connection not established correctly.

  19. Great work Andy! This works very well with SSIS 2008 as well. Thanks a lot. I reckon this as a very good introductory article for those who are stepping into this domain.
    Dinesh

  20. Hmm can’t get the script to functioning, where I’m supposed to add the script. I tried to add it as a new script and ereased the old script and switched it with this. As you can tell I’m not a wizard in c or vb. So, please if you could take the time to explain that piece would be very much appreciated.
    Thanks

  21. Wow! Great article and Thank you very much.
    When I added the scrip task inside the sequence container and configured it, it is showing me a blue line at Dts.Results.Success  running the package the script task failed.
    Dim sVersion As String = Dts.Variables("sVersion").Value.ToString
       Dim sMsg As String
       sMsg = "Version: " & sVersion
       MsgBox(sMsg)
       Dts.TaskResult = Dts.Results.Success
    Any help pls
    Thanks

  22. Just wanted to say Thanks a bunch Andy. Although I was looking for something entirely different, reading Part 1 of your series resolved a vexing problem I’ve been fighting for days! Keep up the excellent work!

  23. Reply to KINGDOM SAID’s question:
    Depneding on which version of VS you are using, the script task will not work .If you are using MS VB 2008 Script Language, your script within the script task editor should look something like this
    Dim sVersion As String = Dts.Variables("sVersion").Value.ToString
           Dim sMsg As String
           sMsg = "Version: " & sVersion
           MsgBox(sMsg)
           Dts.TaskResult = ScriptResults.Success

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.