Introduction
Jes Borland (Blog | @grrl_geek) is hosting this month’s T-SQL Tuesday – started by SQLBlog’s own Adam Machanic (Blog | @AdamMachanic) – and it is about aggregation.
I thought I’d show a couple ways to do aggregation using SSIS.
The Aggregate Transformation in SSIS
The Aggregate transform in SSIS is fast. I built an SSIS package (AggregateScripts.dtsx) with two Data Flow Tasks (Using the Aggregate Transform and Using a Script Component). Using the Aggregate Transform looks like this:
The OLE DB Source is pulling data from the AdventureWorks (2005) Sales.SalesOrderDetail table. The Aggregate transform is configured to Sum the LineTotal amounts:
When the package executes, the Sum is the only output from the Aggregate Transformation:
And the package executes in 0.421 seconds in BIDS. This rocks if all I’m after is the Sum but let’s face it – I can get the Sum of the LineTotal column pretty quickly in T-SQL: SELECT Sum(LineTotal) FROM Sales.SalesOrderDetail.
Also, what if I want a running total? One way to achieve this is…
Using a Script Component
For the Using a Script Component data flow task, I select from the same source (AdventureWorks.Sales.SalesOrderDetail). I use a data flow path to pipe the rows into a Script Component. Accepting the default (synchronous) relationship between the input and output buffers, I select the LineTotal on the Inputs page:
On the Inputs and Outputs page, I create a new output column named RunningTotal (Decimal, 38, 6):
On to the script! I see folks struggle with scripting in SSIS. Variable scope can be tricky. Let’s see what happens when we put all the code in the ProcessInputRow method:
Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
Dim dTotal As Decimal
dTotal = dTotal + Row.LineTotal
Row.RunningTotal = dTotal
End Sub
Let’s see what we get when we execute in BIDS:
The LineTotal value is merely copied into the RunningTotal column each time. That’s not a running total. What’s the problem? Variable scope. The script variable dTotal is being created and populated for each row flowing through the Script Component.
We need for it to be created once and aggregated each time a row flows through the Script Component. To accomplish this, we need to move the Dim statement from inside the buffer’s ProcessInputRows method – we need to change the scope from ProcessInputRows to the ScriptMain class:
Public Class ScriptMain
Inherits UserComponent
Dim dTotal As Decimal
…
Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
dTotal = dTotal + Row.LineTotal
Row.RunningTotal = dTotal
End Sub
Now when we execute it, the script variable dTotal is created once at the class-scope and poopulated (repeatedly) for each row that flows through the buffer’s ProcessInputRow method. The results:
A running total!
Conclusion
As always, there are lots of cool ways to get at the data you seek. This isn’t the only way and I’m sure there are better ways to achieve the same results.
Happy T-SQL Tuesday!
:{>
"…is created once at the class-scope and poopulated (repeatedly) for each row that flows through …"
That sounds…messy
Nice!!!!
Can you guarantee the order of records from the OLD DB Source, or do we have to put a SORT in between?