ETL Instrumentation in 2016

A long time ago in a galaxy far, far away… I wrote about ETL Instrumentation.

Ok, it wasn’t that long ago, and it wasn’t far away. But things have changed some since I wrote that series, and I would like to spend some time updating the material. I’m typing this post at the end of 2015, so I’m using SQL Server 2016 CTP 3.2 running on Windows Server 2016 Technical Preview 4. The SSDT-BI IDE I’m using is the December preview running in Visual Studio 2015. I downloaded the AdventureWorks 2016 CTP 3 Sample database scripts from Codeplex. Finally (or first…), I build VMs for testing purposes and I believe you should too. You can spin up a VM in Azure pretty quickly these days. To do so, start at the Azure Portal. For local instances, I use Oracle VirtualBox to build VMs. You can also use VMWare or Hyper-V for local virtualization. Pick a hypervisor, local or in the cloud, and build a VM.

As in the original article, lets begin by creating a database to host our instrumented data:

use master
go

print ‘SSISRunTimeMetrics database’
if not exists(select name
              from master.sys.databases
              where name = ‘SSISRunTimeMetrics’)
begin
  print ‘ – Creating SSISRunTimeMetrics database’
  Create Database SSISRunTimeMetrics
  print ‘ – SSISRunTimeMetrics database created’
end
Else
print ‘ – SSISRunTimeMetrics database already exists.’
go

Next, let’s build a schema and a table to hold our metrics:

— create ssis schema…
print ”
print ‘ssis Schema’
if not exists(select name
              from sys.schemas
              where name = ‘ssis’)
begin
  print ‘ – Creating ssis Schema’
  set @sql = ‘Create Schema ssis’
  exec(@sql)
  print ‘ – ssis Schema created’
end
Else
  print ‘ – ssis Schema already exists.’

— create RunTimeMetrics table…
print ”
print ‘ssis.RunTimeMetrics Table’
if not exists(select s.name + ‘.’ + t.name
              from sys.tables t
              inner join sys.schemas s on s.schema_id = t.schema_id
              where t.name = ‘RunTimeMetrics’
               and s.name = ‘ssis’)
begin
  print ‘ – Create ssis.RunTimeMetrics Table’
  Create Table ssis.RunTimeMetrics
  (id int identity(1,1)
  ,packageStartDateTime datetime null
  ,packageEndDateTime datetime null
  ,packageName varchar(255) null
  ,packageVersion varchar(25) null
  ,packageStatus varchar(25) null)
  print ‘ – ssis.RunTimeMetrics Table created’
end
Else
print ‘ – ssis.RunTimeMetrics Table already exists.’
go

We’ll depart from the original article here – but just a little – and create a stored procedure named usp_RecordManualPackageStart in the ssis schema:

use SSISRunTimeMetrics
go

print ”
print ‘ssis.usp_RecordManualPackageStart Stored Procedure’
if exists(select s.name + ‘.’ + p.name
          from sys.procedures p
          inner join sys.schemas s on s.schema_id = p.schema_id
          where p.name = ‘usp_RecordManualPackageStart’
           and s.name = ‘ssis’)
begin
  print ‘ – Dropping ssis.usp_RecordManualPackageStart Stored Procedure’
  Drop Procedure ssis.usp_RecordManualPackageStart
  print ‘ – ssis.usp_RecordManualPackageStart Stored Procedure dropped’
end
print ‘ – Creating ssis.usp_RecordManualPackageStart Stored Procedure’
go
 

Create Procedure ssis.usp_RecordManualPackageStart
  @packageStartDateTime datetime = null
,@packageName varchar(255) = null
,@packageVersion varchar(25) = null
As
begin
  — set @packageStartDateTime default…
  declare @Now datetime
  set @Now = GetDate()

  if (@packageStartDateTime Is Null)
   set @packageStartDateTime = @Now

  — insert the run time metrics data…
  insert into ssis.RunTimeMetrics
   (packageStartDateTime
   ,packageName
   ,packageVersion
   ,packageStatus)
  values
  (@packageStartDateTime
  ,@packageName
  ,@packageVersion
  ,’Started’)

end
go

print ‘ – ssis.usp_RecordManualPackageStart Stored Procedure created’
go

We make this change in order to provide support for SSIS packages executed from the file system or MSDB database which remain supported in SSIS 2016 (and for which I nod with approval to the Microsoft SSIS Developer Team. Options. Always give me options, please!).

Building the SSIS Test Project

Open SQL Server Data Tools – Business Intelligence (SSDT-BI) and create a new project. When the New Project dialog displays,select Business Intelligence Projects from the Project types list and Integration Services Project from the Templates. Name the project SSISRunTimeMetrics and click the OK button:

ETLInst2016_1_0

One Glitch…

When I click the OK button, I get an error that looks like this:

ETLInst2016_1_1

To work around this error (which I suspect is a beta issue), I click OK and then right-click the solution in Solution Explorer. I then hover over “A” and click “Existing Project…”:

ETLInst2016_1_2

I navigate to the solution folder and locate and select the project file:

ETLInst2016_1_3

When I click the Open button, the project appears as part of the solution:

ETLInst2016_1_4

Adding a new package is easy:

ETLInst2016_1_5

After it’s added, I rename the new package to SSISRunTimeMetrics.dtsx:

ETLInst2016_1_6

Annotation…

I still like to add annotations to my SSIS packages (although I would like to be able to change the color of the font…):

ETLInst2016_1_7

I can’t say enough about good documentation. You never know when someone else will have to open your source code to troubleshoot.

PackageVersion: One Way

Right-click in the white-space of the Control Flow canvas and click Variables. Add a variable named PackageVersion. Make sure the Scope is the SSISRunTimeMetrics package and make the Data Type String:

ETLInst2016_1_8

Click the ellipsis for the Expression and add the following SSIS Expression Language to automate the variable value:

(DT_WSTR,4) @[System::VersionMajor] + "." +
(DT_WSTR,4) @[System::VersionMinor] + "." +
(DT_WSTR,4) @[System::VersionBuild] + "." +
(DT_WSTR,4) Year(@[System::CreationDate]) +
((Month(@[System::CreationDate]) < 10) ? "0" : "") +
(DT_WSTR,2) Month(@[System::CreationDate]) +
((Day(@[System::CreationDate]) < 10) ? "0" : "") +
(DT_WSTR,2) Day(@[System::CreationDate])

ETLInst2016_1_9

Once entered, the variable window will display as shown here:

ETLInst2016_1_10

I manually update the System::VersionMajor and System::VersionMinor variable values. System::VersionBuild is automatically incremented each time you save the package. I rarely parse the System::CreationDate as shown in this expression. I wish SSIS packages had a System::ModifiedDate property. If they did, I would parse that date as shown here. But they don’t, so I usually store the modified date in the System::VersionComments property manually, which alters my expression thus:

(DT_WSTR,4) @[System::VersionMajor] + "." +
(DT_WSTR,4) @[System::VersionMinor] + "." +
(DT_WSTR,4) @[System::VersionBuild] + "." +
@[System::VersionComments]

I just have to remember to manually update the value of System::VersionComments in the package properties:

ETLInst2016_1_11

I labeled this section “PackageVersion: One Way” because there are other ways to track the version of an SSIS package in SSIS 2016. The VersionGUID package property shown above is another way, and this property is read-only from the SSDT-BI IDE (although editable from the dtsx file). How you manage SSIS package versions is not nearly as important as tracking them somehow.

Add a Container

Add a Sequence Container and change the name to "SEQ Step 1 – Do Some Stuff":

ETLInst2016_1_12

Sequence Containers are nice for several reasons:

  • They’re aesthetic. You can break up chucks of functionality logically… how cool!
  • When troubleshooting, you can execute the contents of a container from the right-click context menu.
  • You can isolate chunks of functionality using transactions by container.

Add an Execute SQL Task

Next, add an Execute SQL Task and name it "SQL Log Manual Start of Package Execution":

ETLInst2016_1_13

Double-click the Execute SQL Task to edit it. Change the ConnectionType property to ADO.Net. Click Connection and select "<New connection…>":

ETLInst2016_1_14

When the Configure OLE DB Connection Manager dialog displays, click the "New…" button:

ETLInst2016_1_15

I really like the “New Connection” functionality in SSIS. It picks the correct Provider for the option selected in the Execute SQL Task. Configure the connection to match where you built the SSISRunTimeMetrics database and then click the OK button:

ETLInst2016_1_16

When you return to the Configure ADO.NET Connection Manager window, it should now display your connection in the Data Connections lists:

ETLInst2016_1_17

Click OK to return to the Execute SQL Task Editor. Click the ellipsis in the SQLStatement property to open the Enter SQL Query dialog. Enter the name of the ssis.usp_RecordManualPackageStart stored procedure:

ETLInst2016_1_18

Click the OK button to return to the Execute SQL Task Editor. Set the IsQueryStoredProcedure property (available for ADO.Net connection types) from False to True:

ETLInst2016_1_19

What’s nice about using ADO.Net and the IsQueryStoredProcedure property is parameter configuration, which comes next. Click on the Parameter Mapping page:

ETLInst2016_1_20

Click the Add button and add the following three parameter mappings:

Variable Name: System::StartTime
Direction: Input
Data Type: DateTime
Parameter Name: packageStartDateTime
Parameter Size: –1

Variable Name: System::PackageName
Direction: Input
Data Type: String
Parameter Name: packageName
Parameter Size: –1

Variable Name: User::PackageVersion
Direction: Input
Data Type: String
Parameter Name: packageVersion
Parameter Size: –1

ETLInst2016_1_21

Click the OK button to proceed.

Add a Script Task

Drag a Script Task into the sequence container and rename it “SCR Output Package Version”:

ETLInst2016_1_22

Open the Script Task Editor and click the ellipsis in the ReadOnlyVariables property. Add the User::PackageVersion variable:

ETLInst2016_1_23

Click the Edit Script button to open the VSTA Editor, and enter the following code in the public void Main() method:

public void Main()
        {
            string packageVersion = Dts.Variables["User::PackageVersion"].Value.ToString();
            string msg = "Version: " + packageVersion;
            bool fireAgain = true;

            Dts.Events.FireInformation(1001, "SSISRunTimeMetrics", msg, "", 0, ref fireAgain);

            Dts.TaskResult = (int)ScriptResults.Success;
        }

Close the VSTA Editor and click the OK button to close the Script Task Editor. Your SSIS package should appear as shown here:

ETLInst2016_1_24

Let’s Test It!

Execute the package in the SSDT-BI debugger. Your results should match those shown here (Success!):

ETLInst2016_1_25

Click on the Progress tab to locate the Information message we generated in the Script Task:

ETLInst2016_1_26

Viola! It works.

Checking for Data

Now let’s check the database to see if our message was logged. Open SSMS and connect to your SSISRunTimeMetrics database. Execute a query to check the contents of the SSISRunTimeMetrics table:

select * from ssis.RunTimeMetrics

ETLInst2016_1_27

This is a good sign. Our data is being logged and shows the package started execution.

Logging Execution Completion

It’s time to add a procedure to log the end of package execution, but we have a problem: How do we identify the row we just added to the ssis.RunTimeMetrics table? Think about it. We just inserted a row, and since that’s the only row in the table it’s pretty simple to identify. Later, we will have lots of rows in the table in varying states. We need a method to positively identify the row we’re working with.

We could use the last row inserted – apply Max(id) criteria – but what happens as our metrics grows to include multiple packages? One package could start while another was executing and we’d update the wrong row.

We require a means to positively identify the row when added to the table. Such a means exists. If we modify the ssis.usp_RecordManualPackageStart stored procedure we can have it return the value of the inserted ID. The OUTPUT clause gives us access to the inserted and deleted virtual tables. Modify the stored procedure by adding the following OUTPUT clause as shown:

output inserted.id as ‘Inserted_ID’

Execute the Transact-SQL to drop and recreate the ssis.usp_RecordManualPackageStart stored procedure:

print ‘ssis.usp_RecordManualPackageStart Stored Procedure’
if exists(select s.name + ‘.’ + p.name
          from sys.procedures p
          inner join sys.schemas s on s.schema_id = p.schema_id
          where p.name = ‘usp_RecordManualPackageStart’
           and s.name = ‘ssis’)
begin
  print ‘ – Dropping ssis.usp_RecordManualPackageStart Stored Procedure’
  Drop Procedure ssis.usp_RecordManualPackageStart
  print ‘ – ssis.usp_RecordManualPackageStart Stored Procedure dropped’
end
print ‘ – Creating ssis.usp_RecordManualPackageStart Stored Procedure’
go 

Create Procedure ssis.usp_RecordManualPackageEnd
  @id int
,@packageStatus varchar(25) = ‘Succeeded’
As
begin

  — set @packageEndDateTime default…
  declare @packageEndDateTime datetime = GetDate()

  — update the run time metrics data…
  update ssis.RunTimeMetrics
   Set packageEndDateTime = @packageEndDateTime
      ,packageStatus = @packageStatus
  where id = @id

end

go

print ‘ – ssis.usp_RecordManualPackageStart Stored Procedure created’
go

Add another Int32 Data Type variable named PackageInstanceId:

ETLInst2016_1_28

Open the editor for the “SQL Log Manual Start of Package Execution” Execute SQL Task. Change the ResultSet property to “Single row”:

ETLInst2016_1_29

Click the Result Set page and add a new resultset. Name it “0” (ordinals work well) and assign the results of this query to the PackageInstanceId variable you just created. This will push the results of the OUTPUT clause in the stored procedure (which will include the id of the newly inserted row) into the PackageInstanceId variable. You can then access it later in the package to update the precise row:

ETLInst2016_1_30

Next, edit the Script Task by adding PackageInstanceId to the list of ReadOnlyVariables:

ETLInst2016_1_31

Then edit the script in public void Main() to read:

public void Main()
        {
            string packageVersion = Dts.Variables["User::PackageVersion"].Value.ToString();
            int packageInstanceId = Convert.ToInt32(Dts.Variables["User::PackageInstanceId"].Value);
            string msg = "Version: " + packageVersion + " PackageInstanceId: " + packageInstanceId.ToString() ;
            bool fireAgain = true;

            Dts.Events.FireInformation(1001, "SSISRunTimeMetrics", msg, "", 0, ref fireAgain);

            Dts.TaskResult = (int)ScriptResults.Success;
        }

Next, add a new stored procedure named ssis.usp_RecordManualPackageEnd to the SSISRunTimeMetrics database using the following Transact-SQL script:
use SSISRunTimeMetrics
go

print ‘ – ssis.usp_RecordManualPackageStart Stored Procedure created’
go

print ”
print ‘ssis.usp_RecordManualPackageEnd Stored Procedure’
if exists(select s.name + ‘.’ + p.name
          from sys.procedures p
          inner join sys.schemas s on s.schema_id = p.schema_id
          where p.name = ‘usp_RecordManualPackageEnd’
          and s.name = ‘ssis’)
begin
  print ‘ – Dropping ssis.usp_RecordManualPackageEnd Stored Procedure’
  Drop Procedure ssis.usp_RecordManualPackageEnd
  print ‘ – ssis.usp_RecordManualPackageEnd Stored Procedure dropped’
end
print ‘ – ssis.usp_RecordManualPackageEnd Stored Procedure’
go

Create Procedure ssis.usp_RecordManualPackageEnd
  @id int
,@packageStatus varchar(25) = ‘Succeeded’
As
begin

  — set @packageEndDateTime default…
  declare @packageEndDateTime datetime = GetDate()

  — update the run time metrics data…
  update ssis.RunTimeMetrics
   Set packageEndDateTime = @packageEndDateTime
      ,packageStatus = @packageStatus
  where id = @id

end

go

print ‘ – ssis.usp_RecordManualPackageEnd Stored Procedure created’
go

Now add a new Execute SQL Task named "SQL Log Manual End of Package Execution":

ETLInst2016_1_32

Open the Execute SQL Task Editor and set the ConnectionType to ADO.Net, the Connection to your SSISRunTimeMetrics connection manager. Edit the SQLStatement property, changing it to ssis.usp_RecordManualPackageEnd. Set the IsQueryStoredProcedure property to True. Navigate to the Parameter Mapping page and add a mapping for PackageInstanceId as shown here:

ETLInst2016_1_33

Testing!

That’s all we need to conduct our next test, so let’s roll (as my friend Chris Yates [blog] says)!

ETLInst2016_1_34

The package execution succeeds – that’s a good start. Let’s look at the Progress tab:

ETLInst2016_1_35

Our information message tells us we created PackageInstanceId 2 – another good sign. Next, let’s look at the metadata in our database:

ETLInst2016_1_36

w00t! Very cool, our metadata shows us when the package started and ended!

Conclusion

“In an age of the SSIS Catalog, why would one ever employ this kind of metadata collection, Andy?” That’s a fair question. The SSIS Catalog is an awesome data integration execution, logging, and externalization engine. There are a handful of use cases, though, where enterprises may opt to continue to execute SSIS packages from the file system or the MSDB database. Perhaps the biggest reason to do so is that’s the way the enterprise is currently executing SSIS. When SSDT-BI converts pre-Catalog-era (2005, 2008, 2008 R2) SSIS packages to current, it imports these packages in a “Package Deployment Model” SSIS Project. This allows developers to upgrade the version of their SSIS project to SSIS 2016 (and enjoy many benefits for so doing) while continuing to execute SSIS packages in the file system. Kudos to the Microsoft SSIS Development Team for this backwards compatibility!

If you do not have an SSIS Catalog collecting this kind of metadata for you, creating a database similar to SSISRunTimeMetrics is a valid solution. Yes, you have to manually add ETL instrumentation to your SSIS packages, but I believe the cost and overhead outweighs not having a solution for collecting this data integration execution metadata.

As always, I welcome your thoughts on the matter!

:{>

Andy Leonard

andyleonard.blog

Christian, husband, dad, grandpa, Data Philosopher, Data Engineer, Azure Data Factory, SSIS guy, and farmer. I was cloud before cloud was cool. :{>

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.