A long time ago in a galaxy far, far away… I wrote about ETL Instrumentation.
Ok, it wasn’t that long ago, and it wasn’t far away. But things have changed some since I wrote that series, and I would like to spend some time updating the material. I’m typing this post at the end of 2015, so I’m using SQL Server 2016 CTP 3.2 running on Windows Server 2016 Technical Preview 4. The SSDT-BI IDE I’m using is the December preview running in Visual Studio 2015. I downloaded the AdventureWorks 2016 CTP 3 Sample database scripts from Codeplex. Finally (or first…), I build VMs for testing purposes and I believe you should too. You can spin up a VM in Azure pretty quickly these days. To do so, start at the Azure Portal. For local instances, I use Oracle VirtualBox to build VMs. You can also use VMWare or Hyper-V for local virtualization. Pick a hypervisor, local or in the cloud, and build a VM.
As in the original article, lets begin by creating a database to host our instrumented data:
use master
go
print ‘SSISRunTimeMetrics database’
if not exists(select name
from master.sys.databases
where name = ‘SSISRunTimeMetrics’)
begin
print ‘ – Creating SSISRunTimeMetrics database’
Create Database SSISRunTimeMetrics
print ‘ – SSISRunTimeMetrics database created’
end
Else
print ‘ – SSISRunTimeMetrics database already exists.’
go
Next, let’s build a schema and a table to hold our metrics:
— create ssis schema…
print ”
print ‘ssis Schema’
if not exists(select name
from sys.schemas
where name = ‘ssis’)
begin
print ‘ – Creating ssis Schema’
set @sql = ‘Create Schema ssis’
exec(@sql)
print ‘ – ssis Schema created’
end
Else
print ‘ – ssis Schema already exists.’
— create RunTimeMetrics table…
print ”
print ‘ssis.RunTimeMetrics Table’
if not exists(select s.name + ‘.’ + t.name
from sys.tables t
inner join sys.schemas s on s.schema_id = t.schema_id
where t.name = ‘RunTimeMetrics’
and s.name = ‘ssis’)
begin
print ‘ – Create ssis.RunTimeMetrics Table’
Create Table ssis.RunTimeMetrics
(id int identity(1,1)
,packageStartDateTime datetime null
,packageEndDateTime datetime null
,packageName varchar(255) null
,packageVersion varchar(25) null
,packageStatus varchar(25) null)
print ‘ – ssis.RunTimeMetrics Table created’
end
Else
print ‘ – ssis.RunTimeMetrics Table already exists.’
go
We’ll depart from the original article here – but just a little – and create a stored procedure named usp_RecordManualPackageStart in the ssis schema:
use SSISRunTimeMetrics
go
print ”
print ‘ssis.usp_RecordManualPackageStart Stored Procedure’
if exists(select s.name + ‘.’ + p.name
from sys.procedures p
inner join sys.schemas s on s.schema_id = p.schema_id
where p.name = ‘usp_RecordManualPackageStart’
and s.name = ‘ssis’)
begin
print ‘ – Dropping ssis.usp_RecordManualPackageStart Stored Procedure’
Drop Procedure ssis.usp_RecordManualPackageStart
print ‘ – ssis.usp_RecordManualPackageStart Stored Procedure dropped’
end
print ‘ – Creating ssis.usp_RecordManualPackageStart Stored Procedure’
go
Create Procedure ssis.usp_RecordManualPackageStart
@packageStartDateTime datetime = null
,@packageName varchar(255) = null
,@packageVersion varchar(25) = null
As
begin
— set @packageStartDateTime default…
declare @Now datetime
set @Now = GetDate()
if (@packageStartDateTime Is Null)
set @packageStartDateTime = @Now
— insert the run time metrics data…
insert into ssis.RunTimeMetrics
(packageStartDateTime
,packageName
,packageVersion
,packageStatus)
values
(@packageStartDateTime
,@packageName
,@packageVersion
,’Started’)
end
go
print ‘ – ssis.usp_RecordManualPackageStart Stored Procedure created’
go
We make this change in order to provide support for SSIS packages executed from the file system or MSDB database which remain supported in SSIS 2016 (and for which I nod with approval to the Microsoft SSIS Developer Team. Options. Always give me options, please!).
Building the SSIS Test Project
Open SQL Server Data Tools – Business Intelligence (SSDT-BI) and create a new project. When the New Project dialog displays,select Business Intelligence Projects from the Project types list and Integration Services Project from the Templates. Name the project SSISRunTimeMetrics and click the OK button:
One Glitch…
When I click the OK button, I get an error that looks like this:
To work around this error (which I suspect is a beta issue), I click OK and then right-click the solution in Solution Explorer. I then hover over “A” and click “Existing Project…”:
I navigate to the solution folder and locate and select the project file:
When I click the Open button, the project appears as part of the solution:
Adding a new package is easy:
After it’s added, I rename the new package to SSISRunTimeMetrics.dtsx:
Annotation…
I still like to add annotations to my SSIS packages (although I would like to be able to change the color of the font…):
I can’t say enough about good documentation. You never know when someone else will have to open your source code to troubleshoot.
PackageVersion: One Way
Right-click in the white-space of the Control Flow canvas and click Variables. Add a variable named PackageVersion. Make sure the Scope is the SSISRunTimeMetrics package and make the Data Type String:
Click the ellipsis for the Expression and add the following SSIS Expression Language to automate the variable value:
(DT_WSTR,4) @[System::VersionMajor] + "." +
(DT_WSTR,4) @[System::VersionMinor] + "." +
(DT_WSTR,4) @[System::VersionBuild] + "." +
(DT_WSTR,4) Year(@[System::CreationDate]) +
((Month(@[System::CreationDate]) < 10) ? "0" : "") +
(DT_WSTR,2) Month(@[System::CreationDate]) +
((Day(@[System::CreationDate]) < 10) ? "0" : "") +
(DT_WSTR,2) Day(@[System::CreationDate])
Once entered, the variable window will display as shown here:
I manually update the System::VersionMajor and System::VersionMinor variable values. System::VersionBuild is automatically incremented each time you save the package. I rarely parse the System::CreationDate as shown in this expression. I wish SSIS packages had a System::ModifiedDate property. If they did, I would parse that date as shown here. But they don’t, so I usually store the modified date in the System::VersionComments property manually, which alters my expression thus:
(DT_WSTR,4) @[System::VersionMajor] + "." +
(DT_WSTR,4) @[System::VersionMinor] + "." +
(DT_WSTR,4) @[System::VersionBuild] + "." +
@[System::VersionComments]
I just have to remember to manually update the value of System::VersionComments in the package properties:
I labeled this section “PackageVersion: One Way” because there are other ways to track the version of an SSIS package in SSIS 2016. The VersionGUID package property shown above is another way, and this property is read-only from the SSDT-BI IDE (although editable from the dtsx file). How you manage SSIS package versions is not nearly as important as tracking them somehow.
Add a Container
Add a Sequence Container and change the name to "SEQ Step 1 – Do Some Stuff":
Sequence Containers are nice for several reasons:
-
They’re aesthetic. You can break up chucks of functionality logically… how cool!
-
When troubleshooting, you can execute the contents of a container from the right-click context menu.
-
You can isolate chunks of functionality using transactions by container.
Add an Execute SQL Task
Next, add an Execute SQL Task and name it "SQL Log Manual Start of Package Execution":
Double-click the Execute SQL Task to edit it. Change the ConnectionType property to ADO.Net. Click Connection and select "<New connection…>":
When the Configure OLE DB Connection Manager dialog displays, click the "New…" button:
I really like the “New Connection” functionality in SSIS. It picks the correct Provider for the option selected in the Execute SQL Task. Configure the connection to match where you built the SSISRunTimeMetrics database and then click the OK button:
When you return to the Configure ADO.NET Connection Manager window, it should now display your connection in the Data Connections lists:
Click OK to return to the Execute SQL Task Editor. Click the ellipsis in the SQLStatement property to open the Enter SQL Query dialog. Enter the name of the ssis.usp_RecordManualPackageStart stored procedure:
Click the OK button to return to the Execute SQL Task Editor. Set the IsQueryStoredProcedure property (available for ADO.Net connection types) from False to True:
What’s nice about using ADO.Net and the IsQueryStoredProcedure property is parameter configuration, which comes next. Click on the Parameter Mapping page:
Click the Add button and add the following three parameter mappings:
Variable Name: System::StartTime
Direction: Input
Data Type: DateTime
Parameter Name: packageStartDateTime
Parameter Size: –1
Variable Name: System::PackageName
Direction: Input
Data Type: String
Parameter Name: packageName
Parameter Size: –1
Variable Name: User::PackageVersion
Direction: Input
Data Type: String
Parameter Name: packageVersion
Parameter Size: –1
Click the OK button to proceed.
Add a Script Task
Drag a Script Task into the sequence container and rename it “SCR Output Package Version”:
Open the Script Task Editor and click the ellipsis in the ReadOnlyVariables property. Add the User::PackageVersion variable:
Click the Edit Script button to open the VSTA Editor, and enter the following code in the public void Main() method:
public void Main()
{
string packageVersion = Dts.Variables["User::PackageVersion"].Value.ToString();
string msg = "Version: " + packageVersion;
bool fireAgain = true;
Dts.Events.FireInformation(1001, "SSISRunTimeMetrics", msg, "", 0, ref fireAgain);
Dts.TaskResult = (int)ScriptResults.Success;
}
Close the VSTA Editor and click the OK button to close the Script Task Editor. Your SSIS package should appear as shown here:
Let’s Test It!
Execute the package in the SSDT-BI debugger. Your results should match those shown here (Success!):
Click on the Progress tab to locate the Information message we generated in the Script Task:
Viola! It works.
Checking for Data
Now let’s check the database to see if our message was logged. Open SSMS and connect to your SSISRunTimeMetrics database. Execute a query to check the contents of the SSISRunTimeMetrics table:
select * from ssis.RunTimeMetrics
This is a good sign. Our data is being logged and shows the package started execution.
Logging Execution Completion
It’s time to add a procedure to log the end of package execution, but we have a problem: How do we identify the row we just added to the ssis.RunTimeMetrics table? Think about it. We just inserted a row, and since that’s the only row in the table it’s pretty simple to identify. Later, we will have lots of rows in the table in varying states. We need a method to positively identify the row we’re working with.
We could use the last row inserted – apply Max(id) criteria – but what happens as our metrics grows to include multiple packages? One package could start while another was executing and we’d update the wrong row.
We require a means to positively identify the row when added to the table. Such a means exists. If we modify the ssis.usp_RecordManualPackageStart stored procedure we can have it return the value of the inserted ID. The OUTPUT clause gives us access to the inserted and deleted virtual tables. Modify the stored procedure by adding the following OUTPUT clause as shown:
output inserted.id as ‘Inserted_ID’
Execute the Transact-SQL to drop and recreate the ssis.usp_RecordManualPackageStart stored procedure:
print ‘ssis.usp_RecordManualPackageStart Stored Procedure’
if exists(select s.name + ‘.’ + p.name
from sys.procedures p
inner join sys.schemas s on s.schema_id = p.schema_id
where p.name = ‘usp_RecordManualPackageStart’
and s.name = ‘ssis’)
begin
print ‘ – Dropping ssis.usp_RecordManualPackageStart Stored Procedure’
Drop Procedure ssis.usp_RecordManualPackageStart
print ‘ – ssis.usp_RecordManualPackageStart Stored Procedure dropped’
end
print ‘ – Creating ssis.usp_RecordManualPackageStart Stored Procedure’
go
Create Procedure ssis.usp_RecordManualPackageEnd
@id int
,@packageStatus varchar(25) = ‘Succeeded’
As
begin
— set @packageEndDateTime default…
declare @packageEndDateTime datetime = GetDate()
— update the run time metrics data…
update ssis.RunTimeMetrics
Set packageEndDateTime = @packageEndDateTime
,packageStatus = @packageStatus
where id = @id
end
go
print ‘ – ssis.usp_RecordManualPackageStart Stored Procedure created’
go
Add another Int32 Data Type variable named PackageInstanceId:
Open the editor for the “SQL Log Manual Start of Package Execution” Execute SQL Task. Change the ResultSet property to “Single row”:
Click the Result Set page and add a new resultset. Name it “0” (ordinals work well) and assign the results of this query to the PackageInstanceId variable you just created. This will push the results of the OUTPUT clause in the stored procedure (which will include the id of the newly inserted row) into the PackageInstanceId variable. You can then access it later in the package to update the precise row:
Next, edit the Script Task by adding PackageInstanceId to the list of ReadOnlyVariables:
Then edit the script in public void Main() to read:
public void Main()
{
string packageVersion = Dts.Variables["User::PackageVersion"].Value.ToString();
int packageInstanceId = Convert.ToInt32(Dts.Variables["User::PackageInstanceId"].Value);
string msg = "Version: " + packageVersion + " PackageInstanceId: " + packageInstanceId.ToString() ;
bool fireAgain = true;
Dts.Events.FireInformation(1001, "SSISRunTimeMetrics", msg, "", 0, ref fireAgain);
Dts.TaskResult = (int)ScriptResults.Success;
}
Next, add a new stored procedure named ssis.usp_RecordManualPackageEnd to the SSISRunTimeMetrics database using the following Transact-SQL script:
use SSISRunTimeMetrics
go
print ‘ – ssis.usp_RecordManualPackageStart Stored Procedure created’
go
print ”
print ‘ssis.usp_RecordManualPackageEnd Stored Procedure’
if exists(select s.name + ‘.’ + p.name
from sys.procedures p
inner join sys.schemas s on s.schema_id = p.schema_id
where p.name = ‘usp_RecordManualPackageEnd’
and s.name = ‘ssis’)
begin
print ‘ – Dropping ssis.usp_RecordManualPackageEnd Stored Procedure’
Drop Procedure ssis.usp_RecordManualPackageEnd
print ‘ – ssis.usp_RecordManualPackageEnd Stored Procedure dropped’
end
print ‘ – ssis.usp_RecordManualPackageEnd Stored Procedure’
go
Create Procedure ssis.usp_RecordManualPackageEnd
@id int
,@packageStatus varchar(25) = ‘Succeeded’
As
begin
— set @packageEndDateTime default…
declare @packageEndDateTime datetime = GetDate()
— update the run time metrics data…
update ssis.RunTimeMetrics
Set packageEndDateTime = @packageEndDateTime
,packageStatus = @packageStatus
where id = @id
end
go
print ‘ – ssis.usp_RecordManualPackageEnd Stored Procedure created’
go
Now add a new Execute SQL Task named "SQL Log Manual End of Package Execution":
Open the Execute SQL Task Editor and set the ConnectionType to ADO.Net, the Connection to your SSISRunTimeMetrics connection manager. Edit the SQLStatement property, changing it to ssis.usp_RecordManualPackageEnd. Set the IsQueryStoredProcedure property to True. Navigate to the Parameter Mapping page and add a mapping for PackageInstanceId as shown here:
Testing!
That’s all we need to conduct our next test, so let’s roll (as my friend Chris Yates [blog] says)!
The package execution succeeds – that’s a good start. Let’s look at the Progress tab:
Our information message tells us we created PackageInstanceId 2 – another good sign. Next, let’s look at the metadata in our database:
w00t! Very cool, our metadata shows us when the package started and ended!
Conclusion
“In an age of the SSIS Catalog, why would one ever employ this kind of metadata collection, Andy?” That’s a fair question. The SSIS Catalog is an awesome data integration execution, logging, and externalization engine. There are a handful of use cases, though, where enterprises may opt to continue to execute SSIS packages from the file system or the MSDB database. Perhaps the biggest reason to do so is that’s the way the enterprise is currently executing SSIS. When SSDT-BI converts pre-Catalog-era (2005, 2008, 2008 R2) SSIS packages to current, it imports these packages in a “Package Deployment Model” SSIS Project. This allows developers to upgrade the version of their SSIS project to SSIS 2016 (and enjoy many benefits for so doing) while continuing to execute SSIS packages in the file system. Kudos to the Microsoft SSIS Development Team for this backwards compatibility!
If you do not have an SSIS Catalog collecting this kind of metadata for you, creating a database similar to SSISRunTimeMetrics is a valid solution. Yes, you have to manually add ETL instrumentation to your SSIS packages, but I believe the cost and overhead outweighs not having a solution for collecting this data integration execution metadata.
As always, I welcome your thoughts on the matter!
:{>
Comments