Iteration = Maturity

Introduction 

I was recently reminded that iteration matures software.

The History of Andy, Part 1 

Like many DBAs, I was a software developer in another life. I built web applications – working my way up from HTML through DHTML and finally to ASP – and could regale (and bore) you young whipper-snappers with war-stories of how things were “back in my day”. </DanaCarvey>

But I won’t.

The Times They Are a-Changin’

I’ll share instead something I’ve witnessed many times since starting with software in 1975 – and something you probably already know: stuff changes.

And thank goodness stuff changes!

I recently ordered 1G of RAM from an online retailer. It should arrive before my next son (but that’s not a given as Riley refuses to provide a tracking number – the doctors will induce Christy into labor Friday if he hasn’t been born by then – but I digress…). I remember my neighbor John, who introduced me to computers, purchased a 256-byte RAM chip in the mid-1970s for about what I paid for the 1G. That’s 256 bytes of RAM – not a typo. As I recall it was either a 14- or 16-pin IC.

Things have changed since then. Improvements in technology, brought about by building and improving upon existing knowledge, have brought us to a day when I can purchase 1,073,741,824 bytes for roughly the previous price of 256. I don’t know how you feel about that. I think it’s a good thing.

The idea of “building and improving upon existing knowledge” defines iterative development. Although the idea is relatively new to the software development field, it serves as the basis for engineering disciplines. Engineers iterate – build and improve upon existing knowledge – and we get more powerful hardware for the same amount of money. What’s not to like?

Iteration – it’s not just a good idea… 

Iterative software development builds and improves upon existing knowledge within a specific domain. Most domains are defined by an application (wholly or in part), enterprise knowledge (again, wholly or in part), or – most likely – some combination of the two. For example, let’s say you work for a large corporation as a software developer. Your domain could be the corporate website. In which case you possess knowledge about the business of the corporation and web development. You mix these together to do your job. In this case, you will probably pick up marketing savvy and current trends along with the latest AJAX techniques.

As you make successive passes (iterations) through the website design interacting with marketing, your domain knowledge is built and improves. As your domain knowledge increases, the website will become more valuable to the corporation – as will you.

Iteration adds value.

Got Iteration?

The same can be said for database development.

Perhaps you’ve experienced this in your own database development efforts: you receive a request for a database design to meet some desired functionality. Or you’re handed a design and asked to optimize it. Or maybe even you had an idea to capture data – performance metrics or something similar – and you’re designing a database solution to accomplish this.

You get into the development a few hours or a few days and realize a little tweak here or there would improve performance, or readibility, or better adapt the design to your intentions. So you make the tweak and continue.

This improvement leads you to re-examine other portions of the design and you make more tweaks. Maybe your last change broke things. Maybe you see an opportunity to add a parameter to a stored procedure and combine the business logic of three stored procedures into one.

A “Growing” Solution 

Pretty soon, you have iterated enough to feel comfortable promoting, integrating, or even releasing the results – letting the effort move to the next step.

Depending on the nature of your efforts, it may not end there. If your database development is the back end of a larger application – say, the corporate website, for example – there will likely be requests for changes over time as the site grows (scales) in complexity and size.

When the requests come in you are not likely to start over. You will most likely build and improve upon your existing knowledge. You will most likely iterate.

Scaling forces iteration.

Voilà

This is how solutions mature – be they applications, databases, or both – regardless of who writes them or how many are involved in the development effort. It doesn’t matter if the development team is one lady in a cubicle in the European Union or a development team of thousands at Microsoft.

Iteration matures software.

:{> Andy

SSIS Design Pattern – Dynamic SQL

Introduction

I sometimes miss the ActiveX task in DTS. It was cool because “if you could see it, you could change it.” I used the ActiveX Task to generate dynamic SQL.

When To Use Dynamic SQL In SSIS

There are lots of uses for dynamic SQL in SSIS. If you find yourself using the same SQL statements over and over again with minor modifications – a different table name here, a different criteria value there, etc. – the statement is a likely candidate for dynamic SQL.

The flexibility of dynamic SQL coupled with the flexibility of SSIS allow for some very creative solutions. You can generate SQL based on environmental conditions – whether or not a directory exists; or using a Loop Container, set numeric criteria based on the value of the iterator; or in a ForEach Loop Container you can acces and include filenames in dynamically-generated SQL. The possibilties are limited only by our experience and knowledge of SSIS.

Variable On Variable Technology!

One of the best methods for generating dynamic SQL in SSIS is to use variables and utilize the EvaluateAsExpression property to update the value of the variable containing the dynamic SQL. This SSIS Design Pattern demonstrates how to generate dynamic SQL using the EvaluateAsExpression property of variables. 

To start, open Business Intelligence Developer Studio (or Visual Studio – they’re the same). Create a new Integration Services project. Right-click any whitespace on the Control Flow canvas and click variables:

When the Variables dialog displays, click the Add Variable button on the Variables menu bar:

Name the variable sSQL and set the Data Type to String. Set the value to “Select ContactID, NameStyle, Title, FirstName, MiddleName, LastName, Suffix, EmailAddress FROM Person.Contact WHERE ContactID = 0”. sSQL is our template variable – we will use this initial value (that returns no rows) as the basis of our dynamic SQL.

Note: In this example I only configure and display the dynamic SQL – I do not assign it to an ExecuteSQL (or some other) Task. But if I did, the SQL would have to parse or the task would throw a validation error. There are two ways around this error:

  1. Use a valid statement in the template SQL.
  2. Set the DelayValidation property of the ExecuteSQL Task to True.

Create another variable of Int32 Data Type named iContactID. iContactID will contain the dynamic portion of outr SQL statement. Set iContactID’s Value to 11:

Click on sSQL and press the F4 key to view the properties for sSQL. Set EvaluateAsExpression to True.

Click Expression, then click the ellipsis. In the Expression Builder, enter “Select ContactID, NameStyle, Title, FirstName, MiddleName, LastName, Suffix, EmailAddress FROM Person.Contact WHERE ContactID = ” + (DT_STR, 4, 1252)@[User::iContactID].

We are building a string variable, but iContactID is an Int32 data type. To concatenate it to a string, we need to cast the integer to a string.

In SSIS Expression Language, casting is accomplished using Type Casts. Drag (or type) the cast function just in front of the variable, i.e. (DT_STR, 4, 1252) @[User::iContactID]. All together it will look like the following:

Click the Evaluate Expression button to test. Your SQL statement should display in the two-line label above the button. Then click Ok to exit the Expression Builder.

A note about building the expression: I am sometimes tempted to use the REPLACE string function in Expression Builder to build dynamic SQL for variables. This usually turns out to be a bad idea because the value of the variable itself is updated when it’s evaluated as an expression. This means if I have a variable named sSQL with an initial value like “SELECT <ReplaceMe> FROM Person.Contact” and I use an expression like REPLACE(@sSQL, “<ReplaceMe>”, “ContactID”) it will work just fine on the first pass, but will fail to update the variable value on subsequent passes because the Search Phrase is no longer contained in the value of @sSQL – it was updated when the variable was evaluated as an expression.

Add a Script Task to the Control Flow and name it “Popup sSQL”. On the Script screen add sSQL to the ReadOnlyVariables list:

Click the Design Script button and add a MsgBox to popup the value of sSQL:

Close the Script Task. Right-click the Script Task and click Execute Task:

The value of sSQL should popup in a message box – with the value for iContactID displayed in the WHERE clause:

Taking It To The Street 

Although I do not demonstrate it in this post, you would use something similar to this example in a Loop Container. If you use iContactID as the iterator of the loop, you could start at some value and advance to some other value, executing dynamic SQL inside the loop on each pass. 

Conclusion 

I hope this helps you design dynamic SQL in SSIS. If you still need the advanced functionality of a script you can use the Script Task.

Download the code (Registration required)!

:{> Andy

 PS – The PASS Summit 2015 Call for Speakers closes at 9:00 PM PDT Sunday, 15 Mar 2015 (04:00 GMT 16 Mar 2015). There’s still time, but hurry!

Learn more: 
Linchpin People Blog: SSIS 
Stairway to Biml 
Stairway to Integration Services 

SSIS2014DesignPatterns200

PASS Webcast Thursday

Although it doesn’t mention my name on the PASS website, I’m presenting the PASS webcast tomorrow on Team Edition for Database Professionals.

If you’re interested you can sign up to attend at the PASS website. Hope to see you there!

:{> Andy

A Follow-up to Database Professionals: An Enterprise Requirement

Eric Wise drew some heat from the developer community at CodeBetter.com with this post about the need for a DBA during development (see my post on the subject here).

I think Eric makes a couple good points, one explicit, one implied:

1. (Explicit) A DBA – or Database Developer, more accurately (and there is a difference) – adds value to development.

2. (Implicit) There are Software Developers out there who can step into the Database Developer role long enough to solve most database tuning issues. Eric demonstrates this with himself in profiling and addressing a missing or ill-defined index.

I find most of the comments – presumably by software developers – typical. One developer stated:

My current project didn’t have a DBA for 2 years, until recently since we’re now at the stage of optimizing for performance. It seems to me that as long as the database is intelligently structured in the first place, a DBA’s role would be rather small in most cases.

<sarcasm>

I agree with the sentiment expressed here – as much as I agree that code-generation tools can replace application developers. It’s true that you can utilize SQL Server or any database engine as a dumb file store. And it’s equally true that you can build an enterprise application in C# that consists of thousands upon thousands of lines of nested If… Then… Else statements.

</sarcasm>

The question is: Why would you?

This goes beyond arguments over syntax, coding standards, methodology, and design philosophy. This is about putting competent professionals – at the height of their game – into the mix on a project.

You don’t have to take my word for it – ask software developers who have worked (or are working) with competent database developers.

:{> Andy

Database Professionals: An Enterprise Requirement

A friend (who shall remain nameless) recently told me his company interviewed a competent database developer and DBA. All seemed in agreement an offer would be forthcoming until the very end of the recruiting process. At that time, someone made the comment “we don’t need a DBA.”

It would be notable if this sentiment wasn’t so widespread – but I see it often. How often? Well, I would have to tell you how I see it to qualify that statement:

You see, people rarely say to me “We don’t need a DBA.” Mostly I see it in their applications – many of them prominent companies in which you may even own stock. I can tell when I examine their schema. I can see it when I execute Profiler against their SQL Server database.

Now, there are lots of reasons to design a denormalized schema. And there are lots of reasons to encapsulate the business rules in code. This is not what I’m talking about – though some of these systems would clearly perform better (or at all, in extreme cases) if they took advantage of better design patterns.

I’m talking about designs where this much is obvious:

1. At least two people designed the data layer; and
2. They did not communicate during the process.

Often, enterprise-level database design is shoveled onto developers as a secondary task. No, I’m not making this up – it’s too tragic to joke about. There are developers who can handle this task. But there are more who believe they are database developers than actually are. (Before I became a SQL Server DBA I was a developer who thought I was a SQL Server DBA…)

There will doubtless be readers who can provide examples of how their enterprise application was built by junior developers who did the database and code work and whose systems are performing just fine. I’m happy for you and sincerely hope the system scales. 

Designing a scalable solution  – database, application, or enterprise architecture – is one of those things that consumes time, thinking, resources, and money during the early phases of an enterprise development cycle. But it is – hands down – one of the best investments (if not the best) in the solution.

In today’s market, scalability is as optional as security. And like security, a scalable design is not something you “add later.” It’s not part of the foundation – it is the foundation.

My experiences with designing scalable solutions has proven there is no free lunch nor any shortcuts that work. If anyone – me included – skips the work of designing for scalability, there comes a day when they (or I) must pay the fiddler. From what I hear and have experienced, designing in this fashion is most often sacrificed on the altar of the deadline. Trust me, if it falls apart in six weeks or six months, you haven’t saved any time – and you may have lost a job or a customer.

Someone told me this and I remember because it has proven true several times over: “Deliver quality late, no one remembers. Deliver junk on time, no one forgets.

If you’re building (or upgrading to) cutting edge technology, you need a DBA.

:{> Andy

Technorati Tags: Software Business scalable database design quality

Which "flavor" DBA are you?

This re-post from my Applied Team System blog (which was a repost from my old SQL Server Central blog) was inspired by James Luetkehoelter‘s Agile Development and the “DBA” post a few days back. I’ll have more to say about database developers, agile methodologies, and the need for a DBA in posts to come. Enjoy!

   I received a cool compliment today from a peer who’s a developer. He said, “You know, I really like having a DBA on my team!” I have to tell you, it made my whole day!

   It led to a discussion about past experiences and expectations, and I shared something I thought was pretty much common knowledge: there are three types of DBAs. My peer was shocked, so maybe the knowledge isn’t so common after all.

   The three “flavors” of DBAs I define are:

  1. System, Operations, or Production Support DBAs – these DBAs write maintenance plans in notepad and have no qualms whatsoever about executing in command-line. They were DBAs in the old days, when we carved our own ICs out of wood. They will get your server and database back online fast – and with less data corruption than anyone else on the planet. They live for torn pages and I/O faults.
  2. Application Support DBAs – these DBAs are familiar with one or more (usually complex) applications. I’m talking PeopleSoft, Seibel, and SAP here. If you want to customize a screen or write a one-off web application, you desperately need these folks.
  3. Database Developers – these DBAs are ruthless bit-heads. They use bigint and byte fields for masking binary states. They can optimize a stored procedure in their sleep and wrap an API around a database so developers never have to consider writing SQL that directly hits tables. They are performance freaks that will work 18-hour days on weekends to test in Production – when it’s “safe.”

   Do you think DBAs fall into these categories? Do you know any that do? Do you see yourself in there anywhere? Do you have more or less or different “flavors” for classifying DBAs? 

:{> Andy

Technorati Profile

SSIS Design Pattern – Incremental Loads

 

Update: For more up-to-date material on this topic, please see the Stairway to Incremental Loads at SQL Server Central for more information! (Free registration required). 
Introduction
Loading data from a data source to SQL Server is a common task. It’s used in Data Warehousing, but increasingly data is being staged in SQL Server for non-Business-Intelligence purposes.
Maintaining data integrity is key when loading data into any database. A common way of accomplishing this is to truncate the destination and reload from the source. While this method ensures data integrity, it also loads a lot of data that was just deleted.
Incremental loads are a faster and use less server resources. Only new or updated data is touched in an incremental load.
When To Use Incremental Loads
Use incremental loads whenever you need to load data from a data source to SQL Server.
Incremental loads are the same regardless of which database platform or ETL tool you use. You need to detect new and updated rows – and separate these from the unchanged rows.
Incremental Loads in Transact-SQL
I will start by demonstrating this with T-SQL:
0. (Optional, but recommended) Create two databases: a source and destination database for this demonstration:

CREATE DATABASE [SSISIncrementalLoad_Source]

CREATE DATABASE [SSISIncrementalLoad_Dest]

1. Create a source named tblSource with the columns ColID, ColA, ColB, and ColC; make ColID is a primary unique key:
USE SSISIncrementalLoad_Source
GO
CREATE TABLE dbo.tblSource
(ColID int NOT NULL
,ColA varchar(10) NULL
,ColB datetime NULL constraint df_ColB default (getDate())
,ColC int NULL
,constraint PK_tblSource primary key clustered (ColID))
2. Create a Destination table named tblDest with the columns ColID, ColA, ColB, ColC:
USE SSISIncrementalLoad_Dest
GO
CREATE TABLE dbo.tblDest
(ColID int NOT NULL
,ColA varchar(10) NULL
,ColB datetime NULL
,ColC int NULL)
3. Let’s load some test data into both tables for demonstration purposes:
USE SSISIncrementalLoad_Source
GO

— insert an “unchanged” row
INSERT INTO dbo.tblSource
(ColID,ColA,ColB,ColC)
VALUES(0, ‘A’, ‘1/1/2007 12:01 AM’, -1)

— insert a “changed” row
INSERT INTO dbo.tblSource
(ColID,ColA,ColB,ColC)
VALUES(1, ‘B’, ‘1/1/2007 12:02 AM’, -2)

— insert a “new” row
INSERT INTO dbo.tblSource
(ColID,ColA,ColB,ColC)
VALUES(2, ‘N’, ‘1/1/2007 12:03 AM’, -3)

USE SSISIncrementalLoad_Dest
GO

— insert an “unchanged” row
INSERT INTO dbo.tblDest
(ColID,ColA,ColB,ColC)
VALUES(0, ‘A’, ‘1/1/2007 12:01 AM’, -1)

— insert a “changed” row
INSERT INTO dbo.tblDest
(ColID,ColA,ColB,ColC)
VALUES(1, ‘C’, ‘1/1/2007 12:02 AM’, -2)

4. You can view new rows with the following query:

SELECT s.ColID, s.ColA, s.ColB, s.ColC
FROM SSISIncrementalLoad_Source.dbo.tblSource s
LEFT JOIN SSISIncrementalLoad_Dest.dbo.tblDest d ON d.ColID = s.ColID
WHERE d.ColID IS NULL

This should return the “new” row – the one loaded earlier with ColID = 2 and ColA = ‘N’. Why? The LEFT JOIN and WHERE clauses are the key. Left Joins return all rows on the left side of the join clause (SSISIncrementalLoad_Source.dbo.tblSource in this case) whether there’s a match on the right side of the join clause (SSISIncrementalLoad_Dest.dbo.tblDest in this case) or not. If there is no match on the right side, NULLs are returned. This is why the WHERE clause works: it goes after rows where the destination ColID is NULL. These rows have no match in the LEFT JOIN, therefore they must be new.

This is only an example. You occasionally find database schemas that are this easy to load. Occasionally. Most of the time you have to include several columns in the JOIN ON clause to isolate truly new rows. Sometimes you have to add conditions in the WHERE clause to refine the definition of truly new rows.

Incrementally load the row (“rows” in practice) with the following T-SQL statement:

INSERT INTO SSISIncrementalLoad_Dest.dbo.tblDest
(ColID, ColA, ColB, ColC)
SELECT s.ColID, s.ColA, s.ColB, s.ColC
FROM SSISIncrementalLoad_Source.dbo.tblSource s
LEFT JOIN SSISIncrementalLoad_Dest.dbo.tblDest d ON d.ColID = s.ColID
WHERE d.ColID IS NULL

5. There are many ways by which people try to isolate changed rows. The only sure-fire way to accomplish it is to compare each field. View changed rows with the following T-SQL statement:

SELECT d.ColID, d.ColA, d.ColB, d.ColC
FROM SSISIncrementalLoad_Dest.dbo.tblDest d
INNER JOIN SSISIncrementalLoad_Source.dbo.tblSource s ON s.ColID = d.ColID
WHERE (
(d.ColA != s.ColA)
OR (d.ColB != s.ColB)
OR (d.ColC != s.ColC)
)

This should return the “changed” row we loaded earlier with ColID = 1 and ColA = ‘C’. Why? The INNER JOIN and WHERE clauses are to blame – again. The INNER JOIN goes after rows with matching ColID’s because of the JOIN ON clause. The WHERE clause refines the resultset, returning only rows where the ColA’s, ColB’s, or ColC’s don’t match and the ColID’s match. This is important. If there’s a difference in any or some or all the rows (except ColID), we want to update it.

Extract-Transform-Load (ETL) theory has a lot to say about when and how to update changed data. You will want to pick up a good book on the topic to learn more about the variations.

To update the data in our destination, use the following T-SQL:

UPDATE d
SET
d.ColA = s.ColA
,d.ColB = s.ColB
,d.ColC = s.ColC
FROM SSISIncrementalLoad_Dest.dbo.tblDest d
INNER JOIN SSISIncrementalLoad_Source.dbo.tblSource s ON s.ColID = d.ColID
WHERE (
(d.ColA != s.ColA)
OR (d.ColB != s.ColB)
OR (d.ColC != s.ColC)
)

Incremental Loads in SSIS
Let’s take a look at how you can accomplish this in SSIS using the Lookup Transformation (for the join functionality) combined with the Conditional Split (for the WHERE clause conditions) transformations.
Before we begin, let’s reset our database tables to their original state using the following query:

USE SSISIncrementalLoad_Source
GO

TRUNCATE TABLE dbo.tblSource

— insert an “unchanged” row
INSERT INTO dbo.tblSource
(ColID,ColA,ColB,ColC)
VALUES(0, ‘A’, ‘1/1/2007 12:01 AM’, -1)

— insert a “changed” row
INSERT INTO dbo.tblSource
(ColID,ColA,ColB,ColC)
VALUES(1, ‘B’, ‘1/1/2007 12:02 AM’, -2)

— insert a “new” row
INSERT INTO dbo.tblSource
(ColID,ColA,ColB,ColC)
VALUES(2, ‘N’, ‘1/1/2007 12:03 AM’, -3)

USE SSISIncrementalLoad_Dest
GO

TRUNCATE TABLE dbo.tblDest

— insert an “unchanged” row
INSERT INTO dbo.tblDest
(ColID,ColA,ColB,ColC)
VALUES(0, ‘A’, ‘1/1/2007 12:01 AM’, -1)

— insert a “changed” row
INSERT INTO dbo.tblDest
(ColID,ColA,ColB,ColC)
VALUES(1, ‘C’, ‘1/1/2007 12:02 AM’, -2)

Next, create a new project using Business Intelligence Development Studio (BIDS). Name the project SSISIncrementalLoad:

Once the project loads, open Solution Explorer and rename Package1.dtsx to SSISIncrementalLoad.dtsx:

When prompted to rename the package object, click the Yes button. From the toolbox, drag a Data Flow onto the Control Flow canvas:

Double-click the Data Flow task to edit it. From the toolbox, drag and drop an OLE DB Source onto the Data Flow canvas:

Double-click the OLE DB Source connection adapter to edit it:

Click the New button beside the OLE DB Connection Manager dropdown:

Click the New button here to create a new Data Connection:

Enter or select your server name. Connect to the SSISIncrementalLoad_Source database you created earlier. Click the OK button to return to the Connection Manager configuration dialog. Click the OK button to accept your newly created Data Connection as the Connection Manager you wish to define. Select “dbo.tblSource” from the Table dropdown:

Click the OK button to complete defining the OLE DB Source Adapter.

Drag and drop a Lookup Transformation from the toolbox onto the Data Flow canvas. Connect the OLE DB connection adapter to the Lookup transformation by clicking on the OLE DB Source and dragging the green arrow over the Lookup and dropping it. Right-click the Lookup transformation and click Edit (or double-click the Lookup transformation) to edit:

When the editor opens, click the New button beside the OLE DB Connection Manager dropdown (as you did earlier for the OLE DB Source Adapter). Define a new Data Connection – this time to the SSISIncrementalLoad_Dest database. After setting up the new Data Connection and Connection Manager, configure the Lookup transformation to connect to “dbo.tblDest”:

Click the Columns tab. On the left side are the columns currently in the SSIS data flow pipeline (from SSISIncrementalLoad_Source.dbo.tblSource). On the right side are columns available from the Lookup destination you just configured (from SSISIncrementalLoad_Dest.dbo.tblDest). Follow the following steps:

1. We’ll need all the rows returned from the destination table, so check all the checkboxes beside the rows in the destination. We need these rows for our WHERE clauses and for our JOIN ON clauses.

2. We do not want to map all the rows between the source and destination – we only want to map the columns named ColID between the database tables. The Mappings drawn between the Available Input Columns and Available Lookup Columns define the JOIN ON clause. Multi-select the Mappings between ColA, ColB, and ColC by clicking on them while holding the Ctrl key. Right-click any of them and click “Delete Selected Mappings” to delete these columns from our JOIN ON clause.

3. Add the text “Dest_” to each column’s Output Alias. These rows are being appended to the data flow pipeline. This is so we can distinguish between Source and Destination rows farther down the pipeline:

Next we need to modify our Lookup transformation behavior. By default, the Lookup operates as an INNER JOIN – but we need a LEFT (OUTER) JOIN. Click the “Configure Error Output” button to open the “Configure Error Output” screen. On the “Lookup Output” row, change the Error column from “Fail component” to “Ignore failure”. This tells the Lookup transformation “If you don’t find an INNER JOIN match in the destination table for the Source table’s ColID value, don’t fail.” – which also effectively tells the Lookup “Don’t act like an INNER JOIN, behave like a LEFT JOIN”:

Click OK to complete the Lookup transformation configuration.

From the toolbox, drag and drop a Conditional Split Transformation onto the Data Flow canvas. Connect the Lookup to the Conditional Split as shown. Right-click the Conditional Split and click Edit to open the Conditional Split Editor:

Expand the NULL Functions folder in the upper right of the Conditional Split Transformation Editor. Expand the Columns folder in the upper left side of the Conditional Split Transformation Editor. Click in the “Output Name” column and enter “New Rows” as the name of the first output. From the NULL Functions folder, drag and drop the “ISNULL( <<expression>> )” function to the Condition column of the New Rows condition:

Next, drag Dest_ColID from the columns folder and drop it onto the “<<expression>>” text in the Condition column. “New Rows” should now be defined by the condition “ISNULL( [Dest_ColID] )”. This defines the WHERE clause for new rows – setting it to “WHERE Dest_ColID Is NULL”.

Type “Changed Rows” into a second Output Name column. Add the expression “(ColA != Dest_ColA) || (ColB != Dest_ColB) || (ColC != Dest_ColC)” to the Condition column for the Changed Rows output. This defines our WHERE clause for detecting changed rows – setting it to “WHERE ((Dest_ColA != ColA) OR (Dest_ColB != ColB) OR (Dest_ColC != ColC))”. Note “||” is used to convey “OR” in SSIS Expressions:

Change the “Default output name” from “Conditional Split Default Output” to “Unchanged Rows”:

Click the OK button to complete configuration of the Conditional Split transformation.

Drag and drop an OLE DB Destination connection adapter and an OLE DB Command transformation onto the Data Flow canvas. Click on the Conditional Split and connect it to the OLE DB Destination. A dialog will display prompting you to select a Conditional Split Output (those outputs you defined in the last step). Select the New Rows output:

Next connect the OLE DB Command transformation to the Conditional Split’s “Changed Rows” output:

Your Data Flow canvas should appear similar to the following:

Configure the OLE DB Destination by aiming at the SSISIncrementalLoad_Dest.dbo.tblDest table:

Click the Mappings item in the list to the left. Make sure the ColID, ColA, ColB, and ColC source columns are mapped to their matching destination columns (aren’t you glad we prepended “Dest_” to the destination columns?):

Click the OK button to complete configuring the OLE DB Destination connection adapter.

Double-click the OLE DB Command to open the “Advanced Editor for OLE DB Command” dialog. Set the Connection Manager column to your SSISIncrementalLoad_Dest connection manager:

Click on the “Component Properties” tab. Click the elipsis (button with “…”) beside the SQLCommand property:

The String Value Editor displays. Enter the following parameterized T-SQL statement into the String Value textbox:

UPDATE dbo.tblDest
SET
ColA = ?
,ColB = ?
,ColC = ?
WHERE ColID = ?

The question marks in the previous parameterized T-SQL statement map by ordinal to columns named “Param_0” through “Param_3”. Map them as shown below – effectively altering the UPDATE statement for each row to read:

UPDATE SSISIncrementalLoad_Dest.dbo.tblDest
SET
ColA = SSISIncrementalLoad_Source.dbo.ColA
,ColB = SSISIncrementalLoad_Source.dbo.ColB
,ColC = SSISIncrementalLoad_Source.dbo.ColC
WHERE ColID = SSISIncrementalLoad_Source.dbo.ColID

Note the query is executed on a row-by-row basis. For performance with large amounts of data, you will want to employ set-based updates instead.

Click the OK button when mapping is completed.

Your Data Flow canvas should look like that pictured below:

If you execute the package with debugging (press F5), the package should succeed and appear as shown here:

Note one row takes the “New Rows” output from the Conditional Split, and one row takes the “Changed Rows” output from the Conditional Split transformation. Although not visible, our third source row doesn’t change, and would be sent to the “Unchanged Rows” output – which is simply the default Conditional Split output renamed. Any row that doesn’t meet any of the predefined conditions in the Conditional Split is sent to the default output.

That’s all! Congratulations – you’ve built an incremental database load! [:)]

Check out the Stairway to Incremental Loads at SQL Server Central for more information! (Free registration required)

:{> Andy

Learn more:
Expert SSIS Training (Live, Online)
IESSIS1: Immersion Event on Learning SQL Server Integration Services (In-person)
Biml Academy
Stairway to Biml
Stairway to Integration Services

SSIS2014DesignPatterns200