Introduction
This post is part of a series of posts on ETL Instrumentation.
In Part 1 we built a database to hold collected SSIS run time metrics and an SSIS package to deomnstrate how and why we would load metrics into the database.
In Part 2 we expanded on our database and the SSIS package to annotate version metadata, manage error metrics capture, and task status reporting.
In Part 3, we start using the ETL Instrumentation infrastructure we’ve built to measure some actual ETL. We now have the concepts – it’s time to put them to work!
A Brief History Of Our ETL Instrumentation Project
To review, our existing database is named SSISRunTimeMetrics. It contains a schema named ssis. In this schema are eight objects:
– a table named ssis.RunTimeMetrics.
– a table named ssis.RunTimeErrors.
– a table named ssis.TaskMetrics.
– a stored procedure named ssis.usp_RecordPackageStart.
– a stored procedure named ssis.usp_RecordPackageEnd.
– a stored procedure named ssis.usp_RecordPackageError.
– a stored procedure named ssis.usp_RecordTaskStart.
– a stored procedure named ssis.usp_RecordTaskEnd.
We expanded the types of run-time data we are collecting. Part 1 introduced Status collection, in Part 2 we added Exception collection. We also introduced scope into both types of collection, recording Exception information on error and finalizing Status (reporting that an error occurred).
At the beginning of SSIS package execution, we call ssis.usp_RecordPackageStart from an Execute SQL Task. We pass the package start date and time, the package name, and the package version. We also pass in a status of “Started”. From this stored procedure we get the ID of the newly created row, which we then push into a Package Load ID variable (iPackageLoadID).
At the beginning of a task or collection of tasks that define a process, we call ssis.usp_RecordTaskStart from an Execute SQL Task. We pass the task or process start date and time, the task (source) name, iPackageLoadID, and a status of “Started”. From this stored procedure we get the ID of the newly created row, which we then push into a Task Load ID variable (iTaskLoadID).
When the task completes, we call ssis.usp_RecordTaskEnd from an Execute SQL Task. We pass in the Task Load ID from the iTaskLoadID variable, the current date and time, and the status “Succeeded”.
On error, we capture Exception data and record an Error Status – both are crucial to knowing what happens when an exception is thrown.
When the package completes execution, we call ssis.usp_RecordPackageEnd from an Execute SQL Task. We pass in the Package Load ID from the variable, the current date and time, and the status “Succeeded”.
Version Metadata
Let’s create a target database for our ETL process. Open SQL Server Management Studio (SSMS) and execute the following script to create SSISRunTimeMetrics_Target:
— create a target database…
use master
go
if not exists(select name
from master.sys.databases
where name = ‘SSISRunTimeMetrics_Target’)
Create Database SSISRunTimeMetrics_Target
go
Open the existing SSISRunTimeMetrics SSIS solution. Open the SSISRunTimeMetrics package and edit the version metadata:
This is a major revision – we’re moving to version 2.0 and resetting the VersionBuild property. I also add text to the VersionComments property.
Don’t forget the package annotation:
Now we’re ready to make some code changes.
Extract, Transform, Load? Two Out Of Three Ain’t Bad…
Rename the “Step 1 – Do some stuff” Sequence Container to “Step 1 – Load Contacts”. Double-click the “Log Start of Task” Execute SQL Task to open the editor. Set the SQLStatement property to:
declare @Now datetime
set @Now = GetDate()
exec ssis.usp_RecordTaskStart ?,NULL,@Now,?,’Started’
Change the name property to “Load Contact”:
Click the Parameter Mapping page and click the Add button. Set the Variable Name to System::TaskName, change the Data Type to VarChar, and the Parameter Name to 1.
Other applicable properties were modified in Part 2 of this series. Because of the design of the related Task Status collection in Part 2, we do not need to make major changes to the “Log Successful End of Task” or “Log Failed End of Task” Execute SQL Tasks – they are ready for our purposes. Next, delete the Test Script Task. Drag a Data Flow Task into the space vacated by the script task:
Double-click the Data Flow Task to open the editor. Drag an OLE DB Source Adapter onto the Data Flow Task canvas and double-click it to open the editor. Click the New button beside the OLE DB connection manager to open the Configure OLE DB Connection Manager dialog. Click the New button to open the Connection Manager dialog. Enter “(local)” in the Server name dropdown box. In the “Select or enter a database name” dropdown, select AdventureWorks.
You can download SQL Server 2005 Samples and Sample Databases if you don’t already have them installed.
Click the OK button to close the Connection Manager dialog. Click the OK button again to close the Configure OLE DB Connection Manager dialog. Select “Person.Contact” from the “Name of the table or the view” dropdown and click the OK button to close the OLE DB Source Editor.
Drag an OLE DB Destination Adapter onto the Data Flow canvas and connect it to the OLE DB Source Adapter.
Double-click the OLE DB Destination Adapter to open the editor. Using the procedure outlined above, create a connection to the SSISRunTimeMetrics_Target database you created earlier:
Click the New button beside the “Name of the table or the view” dropdown. Change the table name in the Create Table statement from “[OLE DB Destination]” to “[dbo].[Contact]”. Your script should now read:
CREATE TABLE [dbo].[Contact] (
[ContactID] INTEGER,
[NameStyle] BIT,
[Title] NVARCHAR(8),
[FirstName] NVARCHAR(50),
[MiddleName] NVARCHAR(50),
[LastName] NVARCHAR(50),
[Suffix] NVARCHAR(10),
[EmailAddress] NVARCHAR(50),
[EmailPromotion] INTEGER,
[Phone] NVARCHAR(25),
[PasswordHash] VARCHAR(40),
[PasswordSalt] VARCHAR(10),
[AdditionalContactInfo] NTEXT,
[rowguid] UNIQUEIDENTIFIER,
[ModifiedDate] DATETIME
)
Click the OK button. Click the Mappings page and then the Ok button to close the editor.
Now we have a process that extracts data from the AdventureWorks database and loads it into our SSISRunTimeMetrics_Target database. Execute the package and observe the Data Flow:
Let’s modify the report query from Part 2:
select
m.packageName
,m.packageStartDateTime
,DateDiff(ss, m.packageStartDateTime, m.packageEndDateTime) as ‘packageRunTime’
,m.packageStatus
,t.SourceName
,t.TaskStartDateTime
,DateDiff(ss, t.TaskStartDateTime, t.TaskEndDateTime) as ‘taskRunTime’
,t.TaskStatus
,e.ErrorNumber
,e.ErrorDescription
from ssis.RunTimeMetrics m
inner join ssis.TaskMetrics t on t.RunTimeMetricsID = m.id
left outer join ssis.RunTimeErrors e on e.RunTimeMetricsID = m.id
where m.id = (select Max(id)
from ssis.RunTimeMetrics)
The left outer join to the RunTimeErrors table allows us to view successful executions. Chnaging the Where clause to bring back the latest id value from the RunTimeMetrics table will prove useful when we add tasks.
Count Me In
On the Data Flow Task, delete the data flow path between the OLE DB Source Adapter and OLE DB Destination Adapter. Drag a Row Count Tranformation from the Toolbox onto the Data Flow canvas and connect it between the OLE DB Source and OLE DB Destination Adapters.
Click SSIS, Variables and create a new package-scoped Int32 variable named iContactCount.
Double-click the Row Count transformation to open the editor. Click the VariableName dropdown and select the User::iContactCount variable to hold the record counts.
Now we have a variable to count the number of rows loaded by the data flow – we now need a place to store this data. Let’s create a table to hold the row counts. Use the following script to create the ssis.RowCounts table:
use SSISRunTimeMetrics
go
— vars…
declare @sql varchar(255)
— create ssis schema…
if not exists(select name
from sys.schemas
where name = ‘ssis’)
begin
set @sql = ‘Create Schema ssis’
exec(@sql)
end
— create RunTimeErrors table…
if not exists(select s.name + ‘.’ + t.name
from sys.tables t
inner join sys.schemas s on s.schema_id = t.schema_id
where t.name = ‘RowCounts’
and s.name = ‘ssis’)
Create Table ssis.RowCounts
(RowCountsID int identity(1,1)
,TaskMetricsID int null
,RunTimeMetricsId int not null
,ParentTaskMetricsID int null
,[RowCount] int null)
As always, let’s create a stored procedure to handle the insert:
use SSISRunTimeMetrics
go
if exists(select s.name + ‘.’ + p.name
from sys.procedures p
inner join sys.schemas s on s.schema_id = p.schema_id
where p.name = ‘usp_RecordRowCount’
and s.name = ‘ssis’)
begin
Drop Procedure ssis.usp_RecordRowCount
end
go
Create Procedure ssis.usp_RecordRowCount
@RunTimeMetricsID int
,@TaskMetricsID int
,@RowCount int
,@ParentTaskMetricsID int = null
As
begin
— insert the run time errors data…
insert into ssis.RowCounts
(TaskMetricsID
,RunTimeMetricsId
,ParentTaskMetricsID
,[RowCount])
values
(@TaskMetricsID
,@RunTimeMetricsID
,@ParentTaskMetricsID
,@RowCount)
end
go
There are different ways (methodologies) to instrument an ETL SSIS package. Some designers choose to place the code to insert row counts – as well as the code to store start and end metrics we capture in the ssis.TaskMetrics table – on the Control Flow. Other choose to use Event Handlers – OnPreExecute and OnPostExecute – to house the logic performing these operations.
There are sound reasons for choosing each method. An argument for using Event Handlers is it provides a separation of the ETL logic and metrics / status logic. An argument for not using Event Handlers is all the logic is readily accessible on the Control Flow and Data Flow. Both arguments are legitimate and the customer’s maintenance needs and SSIS experience should be considered in this design decision.
For the purposes of this series, I choose to have most of the logic visible on the Control Flow and Data Flow.
Return to the Control Flow to add a mechanism to insert the RowCount data into our new table. Drag an Execute SQL Task into the “Step 1 – Load Contact” Sequence Container. Connect a Success Precendence Constraint from the “Log Successful End of Task” Execute SQL Task to the new Execute SQL Task.
Double-click the new Execute SQL Task to open the editor. Configure the General and Parameter Mappings pages as shown:
Click the OK button to close the Execute SQL Task editor.
Before we test the new row count functionality, let’s clear the destination table. Execute the following SQL statement in SSMS:
Truncate Table SSISRunTimeMetrics_Target.dbo.Contact
Now execute the package.
A slight modification of our earlier report query adds the record counts:
select
m.packageName
,m.packageStartDateTime
,DateDiff(ss, m.packageStartDateTime, m.packageEndDateTime) as ‘packageRunTime’
,m.packageStatus
,t.SourceName
,t.TaskStartDateTime
,DateDiff(ss, t.TaskStartDateTime, t.TaskEndDateTime) as ‘taskRunTime’
,t.TaskStatus
,c.[RowCount]
,e.ErrorNumber
,e.ErrorDescription
from ssis.RunTimeMetrics m
inner join ssis.TaskMetrics t on t.RunTimeMetricsID = m.id
left outer join ssis.RunTimeErrors e on e.RunTimeMetricsID = m.id
left outer join ssis.RowCounts c on c.RunTimeMetricsID = m.id
where m.id = (select Max(id)
from ssis.RunTimeMetrics)
The results appear as shown:
We get all the rows this execution – cool.
This demostration is nearly true-to-life. Let’s take it closer by adding incremental loading. (I cover another SSIS Design Pattern – Incremental Loads – in another post.)
First return to the Control Flow. Click anywhere in the Control Flow canvas where there is white space. This sets focus to the Control Flow, which sets focus to the Package level. Click SSIS, Variables to display the Variables dialog. Add three package-scoped Int32 variables named iNewRowsCount, iChangedRowsCount, and iUnchangedRowsCount.
Click the Data Flow tab and drag an OLE DB Command, Lookup, and Conditional Split transformations onto the Data Flow Task. Also drag three Row Count transformations onto the Data Flow.
Rename the OLE DB Destination Adapter “Insert New Rows” and delete the data flow path (the green line / arrow) that connects it to the Row Count transformation. Connect the Lookup to the output of the Row Count to the input of the Lookup and double-click the Lookup to open the editor. Set the OLE DB connection manager property to “(local).SSISRunTimeMetrics_Target” and the “Use a table or view” dropdown to “[dbo].[Contact]”:
Click the Columns tab. Right-click in the mappings area and click “Select All Mappings”. Right-click in the mappings area again and click “Delete Selected Mappings”. Drag ContactID from the Available Input Columns and drop it on the Available Lookup Columns’ ContactID column.
Select all the columns in the Available Lookup Columns list except the AdditionalContactInfo and rowguid columns. In the Output Alias column, add a “Dest_” prefix to each field. This will allow easier identification in the pipeline:
Click the Configure Error Output button and change the Lookup Output Error dropdown from “Fail component” to “Ignore failure”. This changes the default behavior of the Loopup operation from an Inner Join to a Left Outer Join.
Click the OK button to close the Lookup editor. Connect the output of the Lookup transformation to the input of the Conditional Split transformation and double-click the Conditional Split to open the editor.
Name the first Output “New Rows” and set the Condition to “IsNull(Dest_ContactID)”.
Name the second Output “Changed Rows” and set the Condition to “(IsNull(Dest_NameStyle) ? True : Dest_NameStyle) != (IsNull(NameStyle) ? True : NameStyle) || (IsNull(Dest_Title) ? “null” : Dest_Title) != (IsNull(Title) ? “null” : Title) || (IsNull(Dest_FirstName) ? “null” : Dest_FirstName) != (IsNull(FirstName) ? “null” : FirstName) || (IsNull(Dest_MiddleName) ? “null” : Dest_MiddleName) != (IsNull(MiddleName) ? “null” : MiddleName) || (IsNull(Dest_LastName) ? “null” : Dest_LastName) != (IsNull(LastName) ? “null” : LastName) || (IsNull(Dest_Suffix) ? “null” : Dest_Suffix) != (IsNull(Suffix) ? “null” : Suffix) || (IsNull(Dest_EmailAddress) ? “null” : Dest_EmailAddress) != (IsNull(EmailAddress) ? “null” : EmailAddress) || (IsNull(Dest_EmailPromotion) ? -1 : Dest_EmailPromotion) != (IsNull(EmailPromotion) ? -1 : EmailPromotion) || (IsNull(Dest_Phone) ? “null” : Dest_Phone) != (IsNull(Phone) ? “null” : Phone) || (IsNull(Dest_PasswordHash) ? “null” : Dest_PasswordHash) != (IsNull(PasswordHash) ? “null” : PasswordHash) || (IsNull(Dest_PasswordSalt) ? “null” : Dest_PasswordSalt) != (IsNull(PasswordSalt) ? “null” : PasswordSalt) || (IsNull(Dest_ModifiedDate) ? (DT_DBDATE)”1/1/1901″ : Dest_ModifiedDate) != (IsNull(ModifiedDate) ? (DT_DBDATE)”1/1/1901″ : ModifiedDate)”.
Change the Default Output Name to “Unchanged Rows”.
Drag an output from the Conditional Split to one of the Row Count transformations. When prompted, select the New Rows output from the Conditional Split transformation.
Rename this Row Count “New Rows Count” and double-click it to open the editor. Set the VariableName property to User::iNewRowsCount and click the OK button to close the editor.
Connect the output of the New Rows Count transformation to the OLE DB Destination Adapter “Insert New Rows”.
Drag a second output from the Conditional Split to another Row Count transformation. When prompted, select the Changed Rows output.
Rename the Row Count transformation “Changed Rows Count” and double-click the Row Count transformation to open the editor. Set the VariableName property to iChangedRowsCount and click the OK button to close the editor.
Double-click the OLE DB Command transformation to open the advanced editor. Set the Connection Manager property to “(local).SSISRunTimeMetrics_Target”.
Click the Component Properties tab. Click the SQLCommand ellipsis and enter the following statement in the String Value Editor:
update dbo.Contacts
set NameStyle = ?
,Title = ?
,FirstName = ?
,MiddleName = ?
,LastName = ?
,Suffix = ?
,EmailAddress = ?
,EmailPromotion = ?
,Phone = ?
,PasswordHash = ?
,PasswordSalt = ?
,ModifiedDate = ?
where ContactID = ?
Connect the remaining Conditional Split output (Unchanged Rows) to the remaining Row Count transformation. Rename the Row Count “Unchanged Rows Count”, double-click it to open the editor, and set the VariableName to User::iUnchangedRowCount.
Before we leave the Data Flow, let’s optimize it for performance. There’s a couple good reasons for this, one big reason is the current design may very well result in blocking, maybe even deadlocks!
Delete the “Update Changed Rows” OLE DB Command. Drag an OLE DB Destination Adapter onto the Data Flow Task and connect it to the Changed Rows Count transformation. Change the name of the Adapter to “stgContactChangedRows” and double-click it to open the editor. Set the OLE DB connection manager to “(local).SSISRUnTimeMetrics_Target”. Change the Data Access Mode to “Table or view”. Click the New button beside the “Name of table or view” dropdown:
Click the OK button to create the new table, then click the Mappings page to complete configuration. Click the OK buton to close the editor.
On the Control Flow, add an Execute SQL Task just after the Data Flow Task. Name it “Apply Staged Updates” and set the Connection property to “(local).SSISRunTimeMetrics_Target”. Set the SQLStatement property to the follow SQL:
UPDATE c
SET c.ContactID = u.ContactID
,c.NameStyle = u.NameStyle
,c.Title = u.Title
,c.FirstName = u.FirstName
,c.MiddleName = u.MiddleName
,c.LastName = u.LastName
,c.Suffix = u.Suffix
,c.EmailAddress = u.EmailAddress
,c.EmailPromotion = u.EmailPromotion
,c.Phone = u.Phone
,c.PasswordHash = u.PasswordHash
,c.PasswordSalt = u.PasswordSalt
,c.AdditionalContactInfo = u.AdditionalContactInfo
,c.rowguid = u.rowguid
,c.ModifiedDate = u.ModifiedDate
FROM dbo.Contact c
INNER JOIN dbo.stgContactChangedRows u on u.ContactID = c.ContactID
This converts the RBAR (Row-By-Agonizing-Row) update statement in the OLE DB Command to a set-based update. Much better.
We need to capture these row counts, but we need a way to mark them as well. Let’s return to our ssis.RowCounts table design and modify it. Before modifying the table, let’s create and populate a lookup table for RowCountTypes:
use SSISRunTimeMetrics
go
— vars…
declare @sql varchar(255)
— create ssis schema…
if not exists(select name
from sys.schemas
where name = ‘ssis’)
begin
set @sql = ‘Create Schema ssis’
exec(@sql)
end
— create RunTimeErrors table…
if not exists(select s.name + ‘.’ + t.name
from sys.tables t
inner join sys.schemas s on s.schema_id = t.schema_id
where t.name = ‘RowCountTypes’
and s.name = ‘ssis’)
Create Table ssis.RowCountTypes
(RowCountTypeID char(1) not null
,RowCountTypeName varchar(25) null
,RowCountTypeDescription varchar(255) null)
go
Execute the following statement to add the lookup column to the ssis.RowCounts table:
if not exists(select c.name
from sys.columns c
inner join sys.tables t on t.object_id = c.object_id
inner join sys.schemas s on s.schema_id = t.schema_id
where s.name = ‘ssis’
and t.name = ‘RowCounts’
and c.name = ‘RowCountTypeID’)
Alter Table ssis.RowCounts Add RowCountTypeID char(1) null
go
Execute the following statements to populate the RowCountTypes table:
if not exists(select RowCountTypeID
from ssis.RowCountTypes
where RowCountTypeID = ‘I’)
insert into ssis.RowCountTypes
(RowCountTypeID
,RowCountTypeName
,RowCountTypeDescription)
values
(‘I’
,’Selected Input Rows’
,’Input rows selected from a source’)
if not exists(select RowCountTypeID
from ssis.RowCountTypes
where RowCountTypeID = ‘N’)
insert into ssis.RowCountTypes
(RowCountTypeID
,RowCountTypeName
,RowCountTypeDescription)
values
(‘N’
,’New Rows’
,’New rows’)
if not exists(select RowCountTypeID
from ssis.RowCountTypes
where RowCountTypeID = ‘C’)
insert into ssis.RowCountTypes
(RowCountTypeID
,RowCountTypeName
,RowCountTypeDescription)
values
(‘C’
,’Changed Rows’
,’Changed rows’)
if not exists(select RowCountTypeID
from ssis.RowCountTypes
where RowCountTypeID = ‘U’)
insert into ssis.RowCountTypes
(RowCountTypeID
,RowCountTypeName
,RowCountTypeDescription)
values
(‘U’
,’Unchanged Rows’
,’No changes detected in rows’)
We now need to updated our stored procedure that inserts records into ssis.RowCounts:
use SSISRunTimeMetrics
go
if exists(select s.name + ‘.’ + p.name
from sys.procedures p
inner join sys.schemas s on s.schema_id = p.schema_id
where p.name = ‘usp_RecordRowCount’
and s.name = ‘ssis’)
begin
Drop Procedure ssis.usp_RecordRowCount
end
go
Create Procedure ssis.usp_RecordRowCount
@RunTimeMetricsID int
,@TaskMetricsID int
,@RowCount int
,@RowCountType char(1)
,@ParentTaskMetricsID int = null
As
begin
— insert the run time errors data…
insert into ssis.RowCounts
(TaskMetricsID
,RunTimeMetricsId
,ParentTaskMetricsID
,[RowCount]
,RowCountTypeID)
values
(@TaskMetricsID
,@RunTimeMetricsID
,@ParentTaskMetricsID
,@RowCount
,@RowCountType)
end
go
Finally, let’s update the “Record Row Count” SQLStatement.While we’re in there, let’s add the inserts for the additional row count variables. Return to the Control Flow and double-click the “Record Row Count” Execute SQL Task to open the editor.
Change the SQLStatement to:
exec ssis.usp_RecordRowCount ?,?,?,’I’
exec ssis.usp_RecordRowCount ?,?,?,’N’
exec ssis.usp_RecordRowCount ?,?,?,’C’
exec ssis.usp_RecordRowCount ?,?,?,’U’
Click the Parameter Mappings page and edit as shown:
Note every third parameter is a count measurement, the order indicated by the SQLStatement property. Click the OK button to close the editor and execute the package.
Let’s modify our report query yet again to have a look at the collected metrics:
select
m.packageName
,m.packageStartDateTime
,DateDiff(ss, m.packageStartDateTime, m.packageEndDateTime) as ‘packageRunTime’
,m.packageStatus
,t.SourceName
,t.TaskStartDateTime
,DateDiff(ss, t.TaskStartDateTime, t.TaskEndDateTime) as ‘taskRunTime’
,t.TaskStatus
,c.[RowCount]
,ct.RowCountTypeName
,e.ErrorNumber
,e.ErrorDescription
from ssis.RunTimeMetrics m
inner join ssis.TaskMetrics t on t.RunTimeMetricsID = m.id
left outer join ssis.RunTimeErrors e on e.RunTimeMetricsID = m.id
left outer join ssis.RowCounts c on c.RunTimeMetricsID = m.id
left outer join ssis.RowCountTypes ct on ct.RowCountTypeID = c.RowCountTypeID
where m.id = (select Max(id)
from ssis.RunTimeMetrics)
The results?
The Data It Is A-Changin’
Let’s simulate live data changes by making some changes in the source data. Execute the following statements to delete and modify data in SSISRunTimeMetrics_Target:
use SSISRunTimeMetrics_Target
delete dbo.Contact
where ContactID % 17 = 0
update dbo.Contact
set MiddleName = IsNull(MiddleName,”) + ‘A’
where ContactID % 25 = 0
Execute the package again and observe the Data Flow. Now this is starting to look like SSIS ETL!
Let’s have a look at our metrics to see if the numbers match:
I’d say that settles it.
We are now capturing enough information to gain good insight into our ETL process – how cool is that? More next time…
:{> Andy
PS โ The PASS Summit 2015 Call for Speakers closes at 9:00 PM PDT Sunday, 15 Mar 2015 (04:00 GMT 16 Mar 2015). Thereโs still time, but hurry!
Learn more:
Linchpin People Blog: SSIS
Stairway to Biml
Stairway to Integration Services
You saved me a hell lot of time. Very very valuable information. Thanks for sharing this.
Andy,you are a very good instructor. Your SSIS Design Pattern – ETL Instrumentation is well crafted. I have started on leason III. Before I get distracted I want to make sure I thank you for sharing your method of building SSIS packages.
Keep it up!
Steve
Andy, you did a wonderful job here. It helped me a lot. Thanks
Hi Andy,
I reproduced every part of series and it helped my a lot. Thanks.
A little suggestion: most of the sql code is text, so I could easyly copy & paste it. Some code is inserted as a screenshot (‘Some Optimization…’). It would be nice if every (sql) code is inserted as text ๐
So, I check for other posts from you ๐
Hi Andy,
Great Tutorials! Thanks for your work and the time spent in it.
I have one question regarding the number of columns to be updated. With the proposed approach to determine whether an update should be performed all the columns have to be compared one by one. For a table with 10 or more columns (imagine a table with 30 or 40 columns) this approach is too expensive. Can you recommend me another approach for these cases?
Any comment will be appreciated.
Kind Regards
Hi Paul,
Yes. You can use the HashBytes SQL Server function to create a single binary value that represents a hash of the values in the row. Comparing the hash on the source with the hash on the destination (I recommend using a computed column in the destination table), you can quickly determine if any value has changed. I need to blog about this…
:{>
Beautiful! Thank you Andy. Truly, you’re putting in the faces of millions all around the world. Keep on the good work sir.
I was wondering if you still blog about the HashBytes Sql Server function you talked about? If yes, could you please share the link on this post so we can follow along. Thank you
Andy, why we need fields with prefix ‘Dest_’ in stage table stgContactChangedRows ?
Hi Vlad,
If your field names are the same in the source and destination, SSIS will pick one of the fields and append " (1)" to the name. I hate that. I would rather alias the column names using a prefix like "Dest_".
Hope this helps,
Andy
Andy, I mean we do not use ‘Dest_%’ fields in the next update statement. So we could not map them to destination (stgContactChangedRows ) at all. The only reason for that the ‘New Table’ wizard creates them automatically ?
if 100 columns having one table
here update one or two columns and insert new one record
with out comparing all how can we pass the destination .
Trying to get the code but error, but good stuff