Bad IT Project Management

My sister-in-law recently passed the PMP certification (congratulations Heather!). I’m waiting for a call from her asking if we need to add resources.


The resources comment above is a joke, but it isn’t really that funny. It’s indicative of my decades of experience with bad IT project managers.

I believe and hope I have worked with some of the worst project managers on the planet. Why do I hope? I’d hate to think anyone has had to deal with folks worse than the poor project managers I’ve dealt with.

As I type, we’re experiencing a heat wave in Farmville, Virginia. It was 107 degrees Fahrenheit here yesterday. It’s the “dog days of summer“, as my Granny used to call them.

Somehwere, you will find 30 or more push mowers lined up wheel-to-wheel along one axis of a lawn. On command, the 30+ operators will start their mowers. On cue, they will push them across the lawn, maintaining the wheel-to-wheel alignment, cutting the entire area in one pass.

This, my friend, is the home of an IT project manager.


My experiences have led me to a couple thoughts:

 

(The same can be said for MBAs, but that’s for another post…)

Inspired by the book, Brook’s Law states “Adding manpower to a late software project makes it later.” It is perhaps best summed up by the following statement by Brooks himself: “The bearing of a child takes nine months, no matter how many women are assigned.”

As an IT professional, you can learn to detect when you’re about to be “managed”. I share the following indicators and advice from my years of experience:

Do we need to add more resources?” This question in and of itself is harmless. It’s actually the way project managers greet each other and has no more meaning to ordinary folk than “How are you doing today?” or “How about this weather?”

The best answer to this question is a non-answer. After years of trying to correctly answer this (as though it were a meaningful question), I stumbled across an answer that works for me: “I don’t know.” Why does this work so well? The last thing a bad IT project manager wants to do is make a decision – at least one traceable to them.

I am (or used-to-be) a software developer.” If you hear this, you’re in trouble. Big, big trouble. My advice to you is to vacate the project – and the premises – as quickly as possible. This isn’t a fire evacuation, it’s a bomb evacuation. You may wish to consider jumping out a window if you’re on or below the third floor.

Why? You are dealing with a person who believes they were promoted because they were such a good developer. Mind you, this is true in less than 25% of my experience. And even then, odds are their resume includes COBOL or they aren’t nearly the project manager they believe themselves to be. At best you have 1/3rd of a 25% chance that you’re working for someone who knows a definition for delegate – a definition that isn’t “someone who attends a convention.”

The truth of the matter is this person was likely promoted before they could delay or otherwise further damage the software project to which they were assigned.

What do I tell my boss (or the stakeholders)?” This question is the prelude to a demand. Your answer isn’t important, the demand in the mind of the IT project manager is important. And that demand is for you to do something no sane developer would ever do.

There are a couple options. If you’re feeling froggy, you can document the fact you were asked to take this ridiculous course of action by your IT project manager, and then do it. Be sure to address the issue in writing and as soon as possible. CC someone else – anyone else. If you can CC the project managers’ boss without looking like you’re trying to make them look stupid, that’s best. If not, CC someone else at your level on the development team (and allow the bad IT project manager to continue their work of making themselves look stupid unassisted).

Note: Never BCC. BCC’ing the boss is the equivalent of placing a bold, red, flashing banner across the top of your message which states “I’M INSECURE”. The boss will get this message, loud and clear. Go ahead and CC them if you believe it’s warranted – those dogs need to wake up eventually.

Make sure it’s in writing and someone else sees it – that’s the point.

The other option is to simply ignore it and do what you know to be right and good. There’s risk here too. Some bad IT project managers will call in bigger dogs to shout you down. It’s good to have your mugshot and name on a book somewhere if you’re going to exercise this option.

Umm yeah. I’m going to need you to come in Saturday. Sunday’s not looking good either…” People are people. Bad IT project managers don’t get that. They call people “resources”. People aren’t resources, we use resources, but we’re separate and distinct from resources. People are people.


Bad IT project managers are the reason we have IT Project Leads. After all, someone who knows what they’re talking about needs to have some authority if any software project is to stand a chance of succeeding.

:{> Andy

PS – This post inspired a new category at Applied Team System: Expensive Management Practices – gotta love the acronym. :{>

SSIS Design Pattern – ETL Instrumentation, Part 1

Introduction
 
SSIS is a fantastic ETL engine. When I build an ETL solution in SSIS, I like to collect runtime metrics. Why? I use the data initially to determine a baseline for performance and to determine, in some cases, if I’m loading within a defined window.
 
I refer to this process of collecting performance and runtime metrics as ETL Instrumentation and I collect this data for every enterprise ETL solution I design.
 
The types of things I want to know are:
  • Did the process start when I wanted it to start?
  • Did it succeed?
    • If not, is it still running?
    • Did it fail?
  • How long did it execute?
  • How does recent performance compare to past performance?

In this mini-series entitled SSIS Design Pattern – ETL Instrumentation, I will demonstrate how I capture, report, and measure these data using SSIS, SSRS, SQL Agent Jobs, and Transact-SQL.

In Part 1, we build an elementary system. In subsequent parts, we will improve upon the concepts demonstrated here.

To Instrument Or Not To Instrument; That Is The Question
 
My thoughts on this are best summed up thus:
 
There are two types of enterprise data warehouse developers:
1) Those who collect ETL instrumentation data; and
2) Those who will.
 
Why do I make this statement? When you build ETL to load your enterprise data warehouse, you are building the source for the reporting system. Should anything unfortunate happen to the data between the time you read it from the System Of Record and the time it lands in your Reporting Server Database, you are basically guilty until proven innocent. ETL Instrumentation will provide evidence for (and sometimes against) you as you defend your ETL design and implementation.
 
Building The Database
 
Before we instrument an ETL process, we need a database solution for the data. We’ll start by building the database itself, then add objects.
 
Let’s build the database. We’ll call it SSISRunTimeMetrics. Open SQL Server Management Studio (SSMS) and execute the following Transact-SQL:
 
 
use master
go

if not exists(select name
              from master.sys.databases
              where name = ‘SSISRunTimeMetrics’)
 Create Database SSISRunTimeMetrics
go

 
   This script creates the SSISRunTimeMetrics database. Next, let’s create a schema and table to hold our metrics:
 
 
use SSISRunTimeMetrics
go

— vars…
declare @sql varchar(255)

— create ssis schema…
if not exists(select name
              from sys.schemas
              where name = ‘ssis’)
 begin
  set @sql = ‘Create Schema ssis’
  exec(@sql)
 end

— create RunTimeMetrics table…
if not exists(select s.name + ‘.’ + t.name
              from sys.tables t
              inner join sys.schemas s on s.schema_id = t.schema_id
              where t.name = ‘RunTimeMetrics’
               and s.name = ‘ssis’)
 Create Table ssis.RunTimeMetrics
  (id int identity(1,1)
  ,packageStartDateTime datetime null
  ,packageEndDateTime datetime null
  ,packageName varchar(255) null
  ,packageVersion varchar(25) null
  ,packageStatus varchar(25) null)

Next, let’s add a stored procedure to record a package start event:
 

use SSISRunTimeMetrics
go
 
if exists(select s.name + ‘.’ + p.name
          from sys.procedures p
          inner join sys.schemas s on s.schema_id = p.schema_id
          where p.name = ‘usp_RecordPackageStart’
           and s.name = ‘ssis’)
 begin
  Drop Procedure ssis.usp_RecordPackageStart
 end
go
 
Create Procedure ssis.usp_RecordPackageStart
  @packageStartDateTime datetime = null
 ,@packageName varchar(255) = null
 ,@packageVersion varchar(25) = null
 ,@packageStatus varchar(25) = null
As
 begin
  — set @packageStartDateTime default…
  declare @Now datetime
  set @Now = GetDate()

  if (@packageStartDateTime Is Null)
   set @packageStartDateTime = @Now

  — insert the run time metrics data…
  insert into ssis.RunTimeMetrics
   (packageStartDateTime
   ,packageName
   ,packageVersion
   ,packageStatus)
  values
  (@packageStartDateTime
  ,@packageName
  ,@packageVersion
  ,@packageStatus)

 end

go

 
That’s all for now. We’ll add more stuff later.
 
Building The SSIS Project
 
Open SQL Server Business Intelligence Development Studio (BIDS) and create a new project. When the New Project dialog displays, select Business Intelligence Projects from the Project types list and Integration Services Project from the Templates. Name the project SSISRunTimeMetrics and click the OK button: 
 
 
Rename the Package SSISRunTimeMetrics.dtsx:
 
 
Always click the Yes button when you see this dialog (trust me):
 
 
Right-click in the white-space on the Control Flow canvas and click Add Annotation to document your code:
 
 
I can’t say enough about good documentation. You never know when someone else will have to open your source code to troubleshoot.
 
Right-click in the white-space of the Control Flow canvas and click Variables. Add a variable named sVersion. Make sure the Scope is the SSISRunTimeMetrics package and make the Data Type String:
 

 

Add a Sequence Container and change the name to “Step 1 – Do some stuff”:
 
 
Sequence Containers are nice for several reasons:
  • They’re aesthetic. You can break up chucks of functionality logically… how cool!
  • When troubleshooting, you can execute the contents of a container from the right-click context menu.
  • You can isolate chunks of functionality using transactions by container.
Next, add an Execute SQL Task and name it “Log Start of Package Execution”:
 
 
Double-click the Execute SQL Task to edit it. Click Connection and select “<New connection…>”:
 
 
When the Configure OLE DB Connection Manager dialog displays, click the “New…” button:
 
 
Enter or select your server in the Server Name dropdown. In the “Connect to a database” panel, select the “Select or enter a database name” option and enter or select the SSISRunTimeMetrics database you created earlier. Click the OK button to proceed:
 
 
Then click the OK button again to set the Connection property of the Execute SQL Task:
 
 
Select the SQL Statement property and click the ellipsis. When the “Enter SQL Query” dialog displays, enter the following parameterized Transact-SQL statement:
 
Exec ssis.usp_RecordPackageStart ?,?,?,’Started’
 
Click the OK button to proceed:
 
 
Click on the Parameter Mapping page and configure parameters as shown:
 
 
Click the OK button to proceed.
 
Next, let’s do some work on the sVersion variable. Click on the sVersion variable name and press the F4 key to display variable properties. Change the EvaluateAsExpression property to True, then click on the Expression property’s ellipsis:
 
 
When the Expression Builder displays, enter the following SSIS Expression Language text to build a version string in the format Major.Minor.Build:
 
(DT_WSTR, 3) @VersionMajor + “.” +  (DT_WSTR, 3)@VersionMinor + “.” + (DT_WSTR, 3)@VersionBuild
 
 
Click the “Evaluate Expression” button to check the expression. Then click the OK button to close the Expreesion Builder:
 
 
Add a Script Task and configure it to popup a message box displaying the value of sVersion:
 
Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Runtime
 
Public Class ScriptMain
  Public Sub Main()
 
    Dim sVersion As String = Dts.Variables(“sVersion”).Value.ToString
    Dim sMsg As String
 
    sMsg = “Version: ” & sVersion
    MsgBox(sMsg)
    Dts.TaskResult = Dts.Results.Success
 
  End Sub
End Class
 
 
Execute the package. Then execute the following Transact-SQL query to check the results:

select * from ssis.RunTimeMetrics

 
It’s time to add a procedure to log the end of package execution, but we have a problem: How do we identify the row we just added to the ssis.RunTimeMetrics table? Think about it. We just inserted a row, and since that’s the only row in the table it’s pretty simple to identify. Later, we will have lots of rows in the table in varying states. We need a method to positively identify the row we’re working with.
 
We could use the last row inserted – apply Max(id) criteria – but what happens as our metrics grows to include multiple packages? One package could start while another was executing and we’d update the wrong row.
 
We require a means to positively identify the row when added to the table. Such a means exists. If we modify the ssis.usp_RecordPackageStart stored procedure we can have it return the value of the inserted ID. The OUTPUT clause gives us access to the inserted and deleted virtual tables. Modify the stored procedure by adding the following OUTPUT clause as shown:
 
output inserted.id as ‘Inserted_ID’
 
Execute the Transact-SQL to drop and recreate the ssis.usp_RecordPackageStart stored procedure:
 
 
Add another Int32 Type variable named iPackageLoadID:
 
 
Back in Execute SQL Task, change the ResultSet property to “Single row”:
 
 
Click the Result Set page and add a new resultset. Name it 0 (ordinals work well) and assign the results of this query to the iPackageLoadID variable you just created. This will push the results of the OUTPUT clause in the stored procedure (which will include the id of the newly inserted row) into the iPackageLoadID variable. You can then access it later in the package to update the precise row:
 
 
Alter the Script Task by adding iPackageLoadID to the list of ReadOnlyVariables:
 
 
Alter the VB.Net script to read:
Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Runtime
 
Public Class ScriptMain
  Public Sub Main()
 
    Dim sVersion As String = Dts.Variables(“sVersion”).Value.ToString
    Dim sPackageLoadID As String = Dts.Variables(“iPackageLoadID”).Value.ToString
    Dim sMsg As String
 
    sMsg = “Version: ” & sversion & vbCrLf
    sMsg = sMsg & “PackageLoadID: ” & sPackageLoadID
    MsgBox(sMsg)
    Dts.TaskResult = Dts.Results.Success
 
  End Sub
End Class
 
 
Execute the package to test the new script:
 
 
Next, add a new stored procedure named ssis.usp_RecordPackageEnd to the SSISRunTimeMetrics database using the following Transact-SQL script:

use SSISRunTimeMetrics
go
 

if exists(select s.name + ‘.’ + p.name
          from sys.procedures p
          inner join sys.schemas s on s.schema_id = p.schema_id
          where p.name = ‘usp_RecordPackageEnd’
          and s.name = ‘ssis’)
 begin
  Drop Procedure ssis.usp_RecordPackageEnd
 end
go
Create Procedure ssis.usp_RecordPackageEnd
  @id int
 ,@packageEndDateTime datetime = null
 ,@packageStatus varchar(25) = null
As
 begin
  — set @packageEndDateTime default…
  declare @Now datetime
  set @Now = GetDate()
  if (@packageEndDateTime Is Null)
   set @packageEndDateTime = @Now
 
  — update the run time metrics data…
  update ssis.RunTimeMetrics
   Set packageEndDateTime = @packageEndDateTime
      ,packageStatus = @packageStatus
  where id = @id
 end
go
 
Now add a new Execute SQL Task named “Log End of Package Execution”:
 
 
Set the Connection property to your SSISRunTimeMetrics database and the SQLStatement property to the following Transact-SQL statement:
declare @Now datetime
set @Now = GetDate()
 
Exec ssis.usp_RecordPackageEnd ?,@Now,’Succeeded’
 
Click the Parameter Mapping page and configure the iPackageLoadID variable as an input variable of the Long data type, assigned to ordinal 0:
 
 
Click the OK button: 
 
 
Execute the package. Note that each time a change is saved, the VersionBuild property auto-increments, thus updating your sVersion variable value:
 
 
The package succeeds:
 
 
Examine the data stored in the ssis.RunTimeMetrics table. Note the latest row now contains a Start and End DateTime, and the packageStatus field now records “Succeeded”:
 
 
We will work with these concepts more – fleshing them out by adding more robust code and additional functionality. For now, this is a good start. Congratulations!
 
Get the code!
 
:{> Andy

PS – The PASS Summit 2015 Call for Speakers closes at 9:00 PM PDT Sunday, 15 Mar 2015 (04:00 GMT 16 Mar 2015). There’s still time, but hurry!

Learn more: 
Linchpin People Blog: SSIS 
Stairway to Biml 
Stairway to Integration Services 

SSIS2014DesignPatterns200 

Iteration = Maturity

Introduction 

I was recently reminded that iteration matures software.

The History of Andy, Part 1 

Like many DBAs, I was a software developer in another life. I built web applications – working my way up from HTML through DHTML and finally to ASP – and could regale (and bore) you young whipper-snappers with war-stories of how things were “back in my day”. </DanaCarvey>

But I won’t.

The Times They Are a-Changin’

I’ll share instead something I’ve witnessed many times since starting with software in 1975 – and something you probably already know: stuff changes.

And thank goodness stuff changes!

I recently ordered 1G of RAM from an online retailer. It should arrive before my next son (but that’s not a given as Riley refuses to provide a tracking number – the doctors will induce Christy into labor Friday if he hasn’t been born by then – but I digress…). I remember my neighbor John, who introduced me to computers, purchased a 256-byte RAM chip in the mid-1970s for about what I paid for the 1G. That’s 256 bytes of RAM – not a typo. As I recall it was either a 14- or 16-pin IC.

Things have changed since then. Improvements in technology, brought about by building and improving upon existing knowledge, have brought us to a day when I can purchase 1,073,741,824 bytes for roughly the previous price of 256. I don’t know how you feel about that. I think it’s a good thing.

The idea of “building and improving upon existing knowledge” defines iterative development. Although the idea is relatively new to the software development field, it serves as the basis for engineering disciplines. Engineers iterate – build and improve upon existing knowledge – and we get more powerful hardware for the same amount of money. What’s not to like?

Iteration – it’s not just a good idea… 

Iterative software development builds and improves upon existing knowledge within a specific domain. Most domains are defined by an application (wholly or in part), enterprise knowledge (again, wholly or in part), or – most likely – some combination of the two. For example, let’s say you work for a large corporation as a software developer. Your domain could be the corporate website. In which case you possess knowledge about the business of the corporation and web development. You mix these together to do your job. In this case, you will probably pick up marketing savvy and current trends along with the latest AJAX techniques.

As you make successive passes (iterations) through the website design interacting with marketing, your domain knowledge is built and improves. As your domain knowledge increases, the website will become more valuable to the corporation – as will you.

Iteration adds value.

Got Iteration?

The same can be said for database development.

Perhaps you’ve experienced this in your own database development efforts: you receive a request for a database design to meet some desired functionality. Or you’re handed a design and asked to optimize it. Or maybe even you had an idea to capture data – performance metrics or something similar – and you’re designing a database solution to accomplish this.

You get into the development a few hours or a few days and realize a little tweak here or there would improve performance, or readibility, or better adapt the design to your intentions. So you make the tweak and continue.

This improvement leads you to re-examine other portions of the design and you make more tweaks. Maybe your last change broke things. Maybe you see an opportunity to add a parameter to a stored procedure and combine the business logic of three stored procedures into one.

A “Growing” Solution 

Pretty soon, you have iterated enough to feel comfortable promoting, integrating, or even releasing the results – letting the effort move to the next step.

Depending on the nature of your efforts, it may not end there. If your database development is the back end of a larger application – say, the corporate website, for example – there will likely be requests for changes over time as the site grows (scales) in complexity and size.

When the requests come in you are not likely to start over. You will most likely build and improve upon your existing knowledge. You will most likely iterate.

Scaling forces iteration.

Voilà

This is how solutions mature – be they applications, databases, or both – regardless of who writes them or how many are involved in the development effort. It doesn’t matter if the development team is one lady in a cubicle in the European Union or a development team of thousands at Microsoft.

Iteration matures software.

:{> Andy

SSIS Design Pattern – Dynamic SQL

Introduction

I sometimes miss the ActiveX task in DTS. It was cool because “if you could see it, you could change it.” I used the ActiveX Task to generate dynamic SQL.

When To Use Dynamic SQL In SSIS

There are lots of uses for dynamic SQL in SSIS. If you find yourself using the same SQL statements over and over again with minor modifications – a different table name here, a different criteria value there, etc. – the statement is a likely candidate for dynamic SQL.

The flexibility of dynamic SQL coupled with the flexibility of SSIS allow for some very creative solutions. You can generate SQL based on environmental conditions – whether or not a directory exists; or using a Loop Container, set numeric criteria based on the value of the iterator; or in a ForEach Loop Container you can acces and include filenames in dynamically-generated SQL. The possibilties are limited only by our experience and knowledge of SSIS.

Variable On Variable Technology!

One of the best methods for generating dynamic SQL in SSIS is to use variables and utilize the EvaluateAsExpression property to update the value of the variable containing the dynamic SQL. This SSIS Design Pattern demonstrates how to generate dynamic SQL using the EvaluateAsExpression property of variables. 

To start, open Business Intelligence Developer Studio (or Visual Studio – they’re the same). Create a new Integration Services project. Right-click any whitespace on the Control Flow canvas and click variables:

When the Variables dialog displays, click the Add Variable button on the Variables menu bar:

Name the variable sSQL and set the Data Type to String. Set the value to “Select ContactID, NameStyle, Title, FirstName, MiddleName, LastName, Suffix, EmailAddress FROM Person.Contact WHERE ContactID = 0”. sSQL is our template variable – we will use this initial value (that returns no rows) as the basis of our dynamic SQL.

Note: In this example I only configure and display the dynamic SQL – I do not assign it to an ExecuteSQL (or some other) Task. But if I did, the SQL would have to parse or the task would throw a validation error. There are two ways around this error:

  1. Use a valid statement in the template SQL.
  2. Set the DelayValidation property of the ExecuteSQL Task to True.

Create another variable of Int32 Data Type named iContactID. iContactID will contain the dynamic portion of outr SQL statement. Set iContactID’s Value to 11:

Click on sSQL and press the F4 key to view the properties for sSQL. Set EvaluateAsExpression to True.

Click Expression, then click the ellipsis. In the Expression Builder, enter “Select ContactID, NameStyle, Title, FirstName, MiddleName, LastName, Suffix, EmailAddress FROM Person.Contact WHERE ContactID = ” + (DT_STR, 4, 1252)@[User::iContactID].

We are building a string variable, but iContactID is an Int32 data type. To concatenate it to a string, we need to cast the integer to a string.

In SSIS Expression Language, casting is accomplished using Type Casts. Drag (or type) the cast function just in front of the variable, i.e. (DT_STR, 4, 1252) @[User::iContactID]. All together it will look like the following:

Click the Evaluate Expression button to test. Your SQL statement should display in the two-line label above the button. Then click Ok to exit the Expression Builder.

A note about building the expression: I am sometimes tempted to use the REPLACE string function in Expression Builder to build dynamic SQL for variables. This usually turns out to be a bad idea because the value of the variable itself is updated when it’s evaluated as an expression. This means if I have a variable named sSQL with an initial value like “SELECT <ReplaceMe> FROM Person.Contact” and I use an expression like REPLACE(@sSQL, “<ReplaceMe>”, “ContactID”) it will work just fine on the first pass, but will fail to update the variable value on subsequent passes because the Search Phrase is no longer contained in the value of @sSQL – it was updated when the variable was evaluated as an expression.

Add a Script Task to the Control Flow and name it “Popup sSQL”. On the Script screen add sSQL to the ReadOnlyVariables list:

Click the Design Script button and add a MsgBox to popup the value of sSQL:

Close the Script Task. Right-click the Script Task and click Execute Task:

The value of sSQL should popup in a message box – with the value for iContactID displayed in the WHERE clause:

Taking It To The Street 

Although I do not demonstrate it in this post, you would use something similar to this example in a Loop Container. If you use iContactID as the iterator of the loop, you could start at some value and advance to some other value, executing dynamic SQL inside the loop on each pass. 

Conclusion 

I hope this helps you design dynamic SQL in SSIS. If you still need the advanced functionality of a script you can use the Script Task.

Download the code (Registration required)!

:{> Andy

 PS – The PASS Summit 2015 Call for Speakers closes at 9:00 PM PDT Sunday, 15 Mar 2015 (04:00 GMT 16 Mar 2015). There’s still time, but hurry!

Learn more: 
Linchpin People Blog: SSIS 
Stairway to Biml 
Stairway to Integration Services 

SSIS2014DesignPatterns200

PASS Webcast Thursday

Although it doesn’t mention my name on the PASS website, I’m presenting the PASS webcast tomorrow on Team Edition for Database Professionals.

If you’re interested you can sign up to attend at the PASS website. Hope to see you there!

:{> Andy

A Follow-up to Database Professionals: An Enterprise Requirement

Eric Wise drew some heat from the developer community at CodeBetter.com with this post about the need for a DBA during development (see my post on the subject here).

I think Eric makes a couple good points, one explicit, one implied:

1. (Explicit) A DBA – or Database Developer, more accurately (and there is a difference) – adds value to development.

2. (Implicit) There are Software Developers out there who can step into the Database Developer role long enough to solve most database tuning issues. Eric demonstrates this with himself in profiling and addressing a missing or ill-defined index.

I find most of the comments – presumably by software developers – typical. One developer stated:

My current project didn’t have a DBA for 2 years, until recently since we’re now at the stage of optimizing for performance. It seems to me that as long as the database is intelligently structured in the first place, a DBA’s role would be rather small in most cases.

<sarcasm>

I agree with the sentiment expressed here – as much as I agree that code-generation tools can replace application developers. It’s true that you can utilize SQL Server or any database engine as a dumb file store. And it’s equally true that you can build an enterprise application in C# that consists of thousands upon thousands of lines of nested If… Then… Else statements.

</sarcasm>

The question is: Why would you?

This goes beyond arguments over syntax, coding standards, methodology, and design philosophy. This is about putting competent professionals – at the height of their game – into the mix on a project.

You don’t have to take my word for it – ask software developers who have worked (or are working) with competent database developers.

:{> Andy

Database Professionals: An Enterprise Requirement

A friend (who shall remain nameless) recently told me his company interviewed a competent database developer and DBA. All seemed in agreement an offer would be forthcoming until the very end of the recruiting process. At that time, someone made the comment “we don’t need a DBA.”

It would be notable if this sentiment wasn’t so widespread – but I see it often. How often? Well, I would have to tell you how I see it to qualify that statement:

You see, people rarely say to me “We don’t need a DBA.” Mostly I see it in their applications – many of them prominent companies in which you may even own stock. I can tell when I examine their schema. I can see it when I execute Profiler against their SQL Server database.

Now, there are lots of reasons to design a denormalized schema. And there are lots of reasons to encapsulate the business rules in code. This is not what I’m talking about – though some of these systems would clearly perform better (or at all, in extreme cases) if they took advantage of better design patterns.

I’m talking about designs where this much is obvious:

1. At least two people designed the data layer; and
2. They did not communicate during the process.

Often, enterprise-level database design is shoveled onto developers as a secondary task. No, I’m not making this up – it’s too tragic to joke about. There are developers who can handle this task. But there are more who believe they are database developers than actually are. (Before I became a SQL Server DBA I was a developer who thought I was a SQL Server DBA…)

There will doubtless be readers who can provide examples of how their enterprise application was built by junior developers who did the database and code work and whose systems are performing just fine. I’m happy for you and sincerely hope the system scales. 

Designing a scalable solution  – database, application, or enterprise architecture – is one of those things that consumes time, thinking, resources, and money during the early phases of an enterprise development cycle. But it is – hands down – one of the best investments (if not the best) in the solution.

In today’s market, scalability is as optional as security. And like security, a scalable design is not something you “add later.” It’s not part of the foundation – it is the foundation.

My experiences with designing scalable solutions has proven there is no free lunch nor any shortcuts that work. If anyone – me included – skips the work of designing for scalability, there comes a day when they (or I) must pay the fiddler. From what I hear and have experienced, designing in this fashion is most often sacrificed on the altar of the deadline. Trust me, if it falls apart in six weeks or six months, you haven’t saved any time – and you may have lost a job or a customer.

Someone told me this and I remember because it has proven true several times over: “Deliver quality late, no one remembers. Deliver junk on time, no one forgets.

If you’re building (or upgrading to) cutting edge technology, you need a DBA.

:{> Andy

Technorati Tags: Software Business scalable database design quality

Which "flavor" DBA are you?

This re-post from my Applied Team System blog (which was a repost from my old SQL Server Central blog) was inspired by James Luetkehoelter‘s Agile Development and the “DBA” post a few days back. I’ll have more to say about database developers, agile methodologies, and the need for a DBA in posts to come. Enjoy!

   I received a cool compliment today from a peer who’s a developer. He said, “You know, I really like having a DBA on my team!” I have to tell you, it made my whole day!

   It led to a discussion about past experiences and expectations, and I shared something I thought was pretty much common knowledge: there are three types of DBAs. My peer was shocked, so maybe the knowledge isn’t so common after all.

   The three “flavors” of DBAs I define are:

  1. System, Operations, or Production Support DBAs – these DBAs write maintenance plans in notepad and have no qualms whatsoever about executing in command-line. They were DBAs in the old days, when we carved our own ICs out of wood. They will get your server and database back online fast – and with less data corruption than anyone else on the planet. They live for torn pages and I/O faults.
  2. Application Support DBAs – these DBAs are familiar with one or more (usually complex) applications. I’m talking PeopleSoft, Seibel, and SAP here. If you want to customize a screen or write a one-off web application, you desperately need these folks.
  3. Database Developers – these DBAs are ruthless bit-heads. They use bigint and byte fields for masking binary states. They can optimize a stored procedure in their sleep and wrap an API around a database so developers never have to consider writing SQL that directly hits tables. They are performance freaks that will work 18-hour days on weekends to test in Production – when it’s “safe.”

   Do you think DBAs fall into these categories? Do you know any that do? Do you see yourself in there anywhere? Do you have more or less or different “flavors” for classifying DBAs? 

:{> Andy

Technorati Profile

SSIS Design Pattern – Incremental Loads

 

Update: For more up-to-date material on this topic, please see the Stairway to Incremental Loads at SQL Server Central for more information! (Free registration required). 
Introduction
Loading data from a data source to SQL Server is a common task. It’s used in Data Warehousing, but increasingly data is being staged in SQL Server for non-Business-Intelligence purposes.
Maintaining data integrity is key when loading data into any database. A common way of accomplishing this is to truncate the destination and reload from the source. While this method ensures data integrity, it also loads a lot of data that was just deleted.
Incremental loads are a faster and use less server resources. Only new or updated data is touched in an incremental load.
When To Use Incremental Loads
Use incremental loads whenever you need to load data from a data source to SQL Server.
Incremental loads are the same regardless of which database platform or ETL tool you use. You need to detect new and updated rows – and separate these from the unchanged rows.
Incremental Loads in Transact-SQL
I will start by demonstrating this with T-SQL:
0. (Optional, but recommended) Create two databases: a source and destination database for this demonstration:

CREATE DATABASE [SSISIncrementalLoad_Source]

CREATE DATABASE [SSISIncrementalLoad_Dest]

1. Create a source named tblSource with the columns ColID, ColA, ColB, and ColC; make ColID is a primary unique key:
USE SSISIncrementalLoad_Source
GO
CREATE TABLE dbo.tblSource
(ColID int NOT NULL
,ColA varchar(10) NULL
,ColB datetime NULL constraint df_ColB default (getDate())
,ColC int NULL
,constraint PK_tblSource primary key clustered (ColID))
2. Create a Destination table named tblDest with the columns ColID, ColA, ColB, ColC:
USE SSISIncrementalLoad_Dest
GO
CREATE TABLE dbo.tblDest
(ColID int NOT NULL
,ColA varchar(10) NULL
,ColB datetime NULL
,ColC int NULL)
3. Let’s load some test data into both tables for demonstration purposes:
USE SSISIncrementalLoad_Source
GO

— insert an “unchanged” row
INSERT INTO dbo.tblSource
(ColID,ColA,ColB,ColC)
VALUES(0, ‘A’, ‘1/1/2007 12:01 AM’, -1)

— insert a “changed” row
INSERT INTO dbo.tblSource
(ColID,ColA,ColB,ColC)
VALUES(1, ‘B’, ‘1/1/2007 12:02 AM’, -2)

— insert a “new” row
INSERT INTO dbo.tblSource
(ColID,ColA,ColB,ColC)
VALUES(2, ‘N’, ‘1/1/2007 12:03 AM’, -3)

USE SSISIncrementalLoad_Dest
GO

— insert an “unchanged” row
INSERT INTO dbo.tblDest
(ColID,ColA,ColB,ColC)
VALUES(0, ‘A’, ‘1/1/2007 12:01 AM’, -1)

— insert a “changed” row
INSERT INTO dbo.tblDest
(ColID,ColA,ColB,ColC)
VALUES(1, ‘C’, ‘1/1/2007 12:02 AM’, -2)

4. You can view new rows with the following query:

SELECT s.ColID, s.ColA, s.ColB, s.ColC
FROM SSISIncrementalLoad_Source.dbo.tblSource s
LEFT JOIN SSISIncrementalLoad_Dest.dbo.tblDest d ON d.ColID = s.ColID
WHERE d.ColID IS NULL

This should return the “new” row – the one loaded earlier with ColID = 2 and ColA = ‘N’. Why? The LEFT JOIN and WHERE clauses are the key. Left Joins return all rows on the left side of the join clause (SSISIncrementalLoad_Source.dbo.tblSource in this case) whether there’s a match on the right side of the join clause (SSISIncrementalLoad_Dest.dbo.tblDest in this case) or not. If there is no match on the right side, NULLs are returned. This is why the WHERE clause works: it goes after rows where the destination ColID is NULL. These rows have no match in the LEFT JOIN, therefore they must be new.

This is only an example. You occasionally find database schemas that are this easy to load. Occasionally. Most of the time you have to include several columns in the JOIN ON clause to isolate truly new rows. Sometimes you have to add conditions in the WHERE clause to refine the definition of truly new rows.

Incrementally load the row (“rows” in practice) with the following T-SQL statement:

INSERT INTO SSISIncrementalLoad_Dest.dbo.tblDest
(ColID, ColA, ColB, ColC)
SELECT s.ColID, s.ColA, s.ColB, s.ColC
FROM SSISIncrementalLoad_Source.dbo.tblSource s
LEFT JOIN SSISIncrementalLoad_Dest.dbo.tblDest d ON d.ColID = s.ColID
WHERE d.ColID IS NULL

5. There are many ways by which people try to isolate changed rows. The only sure-fire way to accomplish it is to compare each field. View changed rows with the following T-SQL statement:

SELECT d.ColID, d.ColA, d.ColB, d.ColC
FROM SSISIncrementalLoad_Dest.dbo.tblDest d
INNER JOIN SSISIncrementalLoad_Source.dbo.tblSource s ON s.ColID = d.ColID
WHERE (
(d.ColA != s.ColA)
OR (d.ColB != s.ColB)
OR (d.ColC != s.ColC)
)

This should return the “changed” row we loaded earlier with ColID = 1 and ColA = ‘C’. Why? The INNER JOIN and WHERE clauses are to blame – again. The INNER JOIN goes after rows with matching ColID’s because of the JOIN ON clause. The WHERE clause refines the resultset, returning only rows where the ColA’s, ColB’s, or ColC’s don’t match and the ColID’s match. This is important. If there’s a difference in any or some or all the rows (except ColID), we want to update it.

Extract-Transform-Load (ETL) theory has a lot to say about when and how to update changed data. You will want to pick up a good book on the topic to learn more about the variations.

To update the data in our destination, use the following T-SQL:

UPDATE d
SET
d.ColA = s.ColA
,d.ColB = s.ColB
,d.ColC = s.ColC
FROM SSISIncrementalLoad_Dest.dbo.tblDest d
INNER JOIN SSISIncrementalLoad_Source.dbo.tblSource s ON s.ColID = d.ColID
WHERE (
(d.ColA != s.ColA)
OR (d.ColB != s.ColB)
OR (d.ColC != s.ColC)
)

Incremental Loads in SSIS
Let’s take a look at how you can accomplish this in SSIS using the Lookup Transformation (for the join functionality) combined with the Conditional Split (for the WHERE clause conditions) transformations.
Before we begin, let’s reset our database tables to their original state using the following query:

USE SSISIncrementalLoad_Source
GO

TRUNCATE TABLE dbo.tblSource

— insert an “unchanged” row
INSERT INTO dbo.tblSource
(ColID,ColA,ColB,ColC)
VALUES(0, ‘A’, ‘1/1/2007 12:01 AM’, -1)

— insert a “changed” row
INSERT INTO dbo.tblSource
(ColID,ColA,ColB,ColC)
VALUES(1, ‘B’, ‘1/1/2007 12:02 AM’, -2)

— insert a “new” row
INSERT INTO dbo.tblSource
(ColID,ColA,ColB,ColC)
VALUES(2, ‘N’, ‘1/1/2007 12:03 AM’, -3)

USE SSISIncrementalLoad_Dest
GO

TRUNCATE TABLE dbo.tblDest

— insert an “unchanged” row
INSERT INTO dbo.tblDest
(ColID,ColA,ColB,ColC)
VALUES(0, ‘A’, ‘1/1/2007 12:01 AM’, -1)

— insert a “changed” row
INSERT INTO dbo.tblDest
(ColID,ColA,ColB,ColC)
VALUES(1, ‘C’, ‘1/1/2007 12:02 AM’, -2)

Next, create a new project using Business Intelligence Development Studio (BIDS). Name the project SSISIncrementalLoad:

Once the project loads, open Solution Explorer and rename Package1.dtsx to SSISIncrementalLoad.dtsx:

When prompted to rename the package object, click the Yes button. From the toolbox, drag a Data Flow onto the Control Flow canvas:

Double-click the Data Flow task to edit it. From the toolbox, drag and drop an OLE DB Source onto the Data Flow canvas:

Double-click the OLE DB Source connection adapter to edit it:

Click the New button beside the OLE DB Connection Manager dropdown:

Click the New button here to create a new Data Connection:

Enter or select your server name. Connect to the SSISIncrementalLoad_Source database you created earlier. Click the OK button to return to the Connection Manager configuration dialog. Click the OK button to accept your newly created Data Connection as the Connection Manager you wish to define. Select “dbo.tblSource” from the Table dropdown:

Click the OK button to complete defining the OLE DB Source Adapter.

Drag and drop a Lookup Transformation from the toolbox onto the Data Flow canvas. Connect the OLE DB connection adapter to the Lookup transformation by clicking on the OLE DB Source and dragging the green arrow over the Lookup and dropping it. Right-click the Lookup transformation and click Edit (or double-click the Lookup transformation) to edit:

When the editor opens, click the New button beside the OLE DB Connection Manager dropdown (as you did earlier for the OLE DB Source Adapter). Define a new Data Connection – this time to the SSISIncrementalLoad_Dest database. After setting up the new Data Connection and Connection Manager, configure the Lookup transformation to connect to “dbo.tblDest”:

Click the Columns tab. On the left side are the columns currently in the SSIS data flow pipeline (from SSISIncrementalLoad_Source.dbo.tblSource). On the right side are columns available from the Lookup destination you just configured (from SSISIncrementalLoad_Dest.dbo.tblDest). Follow the following steps:

1. We’ll need all the rows returned from the destination table, so check all the checkboxes beside the rows in the destination. We need these rows for our WHERE clauses and for our JOIN ON clauses.

2. We do not want to map all the rows between the source and destination – we only want to map the columns named ColID between the database tables. The Mappings drawn between the Available Input Columns and Available Lookup Columns define the JOIN ON clause. Multi-select the Mappings between ColA, ColB, and ColC by clicking on them while holding the Ctrl key. Right-click any of them and click “Delete Selected Mappings” to delete these columns from our JOIN ON clause.

3. Add the text “Dest_” to each column’s Output Alias. These rows are being appended to the data flow pipeline. This is so we can distinguish between Source and Destination rows farther down the pipeline:

Next we need to modify our Lookup transformation behavior. By default, the Lookup operates as an INNER JOIN – but we need a LEFT (OUTER) JOIN. Click the “Configure Error Output” button to open the “Configure Error Output” screen. On the “Lookup Output” row, change the Error column from “Fail component” to “Ignore failure”. This tells the Lookup transformation “If you don’t find an INNER JOIN match in the destination table for the Source table’s ColID value, don’t fail.” – which also effectively tells the Lookup “Don’t act like an INNER JOIN, behave like a LEFT JOIN”:

Click OK to complete the Lookup transformation configuration.

From the toolbox, drag and drop a Conditional Split Transformation onto the Data Flow canvas. Connect the Lookup to the Conditional Split as shown. Right-click the Conditional Split and click Edit to open the Conditional Split Editor:

Expand the NULL Functions folder in the upper right of the Conditional Split Transformation Editor. Expand the Columns folder in the upper left side of the Conditional Split Transformation Editor. Click in the “Output Name” column and enter “New Rows” as the name of the first output. From the NULL Functions folder, drag and drop the “ISNULL( <<expression>> )” function to the Condition column of the New Rows condition:

Next, drag Dest_ColID from the columns folder and drop it onto the “<<expression>>” text in the Condition column. “New Rows” should now be defined by the condition “ISNULL( [Dest_ColID] )”. This defines the WHERE clause for new rows – setting it to “WHERE Dest_ColID Is NULL”.

Type “Changed Rows” into a second Output Name column. Add the expression “(ColA != Dest_ColA) || (ColB != Dest_ColB) || (ColC != Dest_ColC)” to the Condition column for the Changed Rows output. This defines our WHERE clause for detecting changed rows – setting it to “WHERE ((Dest_ColA != ColA) OR (Dest_ColB != ColB) OR (Dest_ColC != ColC))”. Note “||” is used to convey “OR” in SSIS Expressions:

Change the “Default output name” from “Conditional Split Default Output” to “Unchanged Rows”:

Click the OK button to complete configuration of the Conditional Split transformation.

Drag and drop an OLE DB Destination connection adapter and an OLE DB Command transformation onto the Data Flow canvas. Click on the Conditional Split and connect it to the OLE DB Destination. A dialog will display prompting you to select a Conditional Split Output (those outputs you defined in the last step). Select the New Rows output:

Next connect the OLE DB Command transformation to the Conditional Split’s “Changed Rows” output:

Your Data Flow canvas should appear similar to the following:

Configure the OLE DB Destination by aiming at the SSISIncrementalLoad_Dest.dbo.tblDest table:

Click the Mappings item in the list to the left. Make sure the ColID, ColA, ColB, and ColC source columns are mapped to their matching destination columns (aren’t you glad we prepended “Dest_” to the destination columns?):

Click the OK button to complete configuring the OLE DB Destination connection adapter.

Double-click the OLE DB Command to open the “Advanced Editor for OLE DB Command” dialog. Set the Connection Manager column to your SSISIncrementalLoad_Dest connection manager:

Click on the “Component Properties” tab. Click the elipsis (button with “…”) beside the SQLCommand property:

The String Value Editor displays. Enter the following parameterized T-SQL statement into the String Value textbox:

UPDATE dbo.tblDest
SET
ColA = ?
,ColB = ?
,ColC = ?
WHERE ColID = ?

The question marks in the previous parameterized T-SQL statement map by ordinal to columns named “Param_0” through “Param_3”. Map them as shown below – effectively altering the UPDATE statement for each row to read:

UPDATE SSISIncrementalLoad_Dest.dbo.tblDest
SET
ColA = SSISIncrementalLoad_Source.dbo.ColA
,ColB = SSISIncrementalLoad_Source.dbo.ColB
,ColC = SSISIncrementalLoad_Source.dbo.ColC
WHERE ColID = SSISIncrementalLoad_Source.dbo.ColID

Note the query is executed on a row-by-row basis. For performance with large amounts of data, you will want to employ set-based updates instead.

Click the OK button when mapping is completed.

Your Data Flow canvas should look like that pictured below:

If you execute the package with debugging (press F5), the package should succeed and appear as shown here:

Note one row takes the “New Rows” output from the Conditional Split, and one row takes the “Changed Rows” output from the Conditional Split transformation. Although not visible, our third source row doesn’t change, and would be sent to the “Unchanged Rows” output – which is simply the default Conditional Split output renamed. Any row that doesn’t meet any of the predefined conditions in the Conditional Split is sent to the default output.

That’s all! Congratulations – you’ve built an incremental database load! [:)]

Check out the Stairway to Incremental Loads at SQL Server Central for more information! (Free registration required)

:{> Andy

Learn more:
Expert SSIS Training (Live, Online)
IESSIS1: Immersion Event on Learning SQL Server Integration Services (In-person)
Biml Academy
Stairway to Biml
Stairway to Integration Services

SSIS2014DesignPatterns200