T-SQL Tuesday: Aggregations in SSIS

 

Introduction

Jes Borland (Blog | @grrl_geek) is hosting this month’s T-SQL Tuesday – started by SQLBlog’s own Adam Machanic (Blog | @AdamMachanic) – and it is about aggregation.

I thought I’d show a couple ways to do aggregation using SSIS.

The Aggregate Transformation in SSIS

The Aggregate transform in SSIS is fast. I built an SSIS package (AggregateScripts.dtsx) with two Data Flow Tasks (Using the Aggregate Transform and Using a Script Component).  Using the Aggregate Transform looks like this:

 

The OLE DB Source is pulling data from the AdventureWorks (2005) Sales.SalesOrderDetail table. The Aggregate transform is configured to Sum the LineTotal amounts:

When the package executes, the Sum is the only output from the Aggregate Transformation:

And the package executes in 0.421 seconds in BIDS. This rocks if all I’m after is the Sum but let’s face it – I can get the Sum of the LineTotal column pretty quickly in T-SQL: SELECT Sum(LineTotal) FROM Sales.SalesOrderDetail.

Also, what if I want a running total? One way to achieve this is…

Using a Script Component

For the Using a Script Component data flow task, I select from the same source (AdventureWorks.Sales.SalesOrderDetail). I use a data flow path to pipe the rows into a Script Component. Accepting the default (synchronous) relationship between the input and output buffers, I select the LineTotal on the Inputs page:

On the Inputs and Outputs page, I create a new output column named RunningTotal (Decimal, 38, 6):

On to the script! I see folks struggle with scripting in SSIS. Variable scope can be tricky. Let’s see what happens when we put all the code in the ProcessInputRow method:

    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)

        Dim dTotal As Decimal

        dTotal = dTotal + Row.LineTotal

        Row.RunningTotal = dTotal

    End Sub

Let’s see what we get when we execute in BIDS:

The LineTotal value is merely copied into the RunningTotal column each time. That’s not a running total. What’s the problem? Variable scope. The script variable dTotal is being created and populated for each row flowing through the Script Component.

We need for it to be created once and aggregated each time a row flows through the Script Component. To accomplish this, we need to move the Dim statement from inside the buffer’s ProcessInputRows method – we need to change the scope from ProcessInputRows to the ScriptMain class:

Public Class ScriptMain
    Inherits UserComponent

    Dim dTotal As Decimal

    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)

        dTotal = dTotal + Row.LineTotal

        Row.RunningTotal = dTotal

    End Sub

Now when we execute it, the script variable dTotal is created once at the class-scope and poopulated (repeatedly) for each row that flows through the buffer’s ProcessInputRow method. The results:

A running total!

Conclusion

As always, there are lots of cool ways to get at the data you seek. This isn’t the only way and I’m sure there are better ways to achieve the same results.

Happy T-SQL Tuesday!

:{>

Andy Leonard

andyleonard.blog

Christian, husband, dad, grandpa, Data Philosopher, Data Engineer, Azure Data Factory, SSIS guy, and farmer. I was cloud before cloud was cool. :{>

3 thoughts on “T-SQL Tuesday: Aggregations in SSIS

  1. "…is created once at the class-scope and poopulated (repeatedly) for each row that flows through …"
    That sounds…messy

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.