I am honored and so excited to be presenting a couple sessions at the PASS Summit in Denver next month!
One presentation is entitled Integrate Your SSIS Development. The focus is on SSIS development and testing. I touch on continuous integration and test-first development – using Team Foundation Server for the CI and Team Edition for Database Professionals to test ETL. I use the data collected by ETL Instrumentation in the testing, so this presentation highlights a lot of the pieces of my integrated SSIS development methodology (and previews some upcoming books!).
The tune to a Bush song is running through my head as I type this… the band, not the president – although imagining the Presidentsinging the song is an interesting brain-stretch.
It’s a fact of IT life that everything scales. Some successfully, even. Problems start when things do not scale successfully (or well). It happens in business. It happens with software systems.
When it happens with businesses, you hear things like “They grew too fast.” When it happens with software systems, you browse to a website and receive an HTTP 500 or 404 error.
Can this be avoided (in business or software)? I think that’s an excellent question – one well worth examining.
The answer, I believe, lies with how predictable the scalability is.
Consider a database application: If you know which tables are going to grow, how, and how much, you can plan for said growth. How would you plan? You could partition the tables using one or a combination of partitioning techniques. You could appropriate filegroups, snapshots, and a host of other functionality. If you only you knew where to apply these techniques.
That’s the key.
Achieving scalability starts with capturing metrics. If you know how your database is growing from the beginning – if you can chart the growth of individual tables, access patterns, and internal performance data – you can predict growth and manage scalability.
So the key is measurement.
Measurement is an engineering discipline in its own right. The field of applied measurement is called Instrumentation. Applying measurement to a process is referred to as “instrumenting the process.”
How do you instrument a database process? Iteration 1 would include creating an internal table to house and maintain process metadata:
If your instrumented process is stored-procedure-based, you could add INSERT statements to your existing stored procedures. Consider instrumenting a parent stored procedure that calls child stored procedures. The instrumented proc could look like the following (instrumentation emphasized):
CREATE PROCEDURE dbo.SomeProcess AS
INSERT INTO dbo.ProcessData (ProcessDataIndicatorName, ProcessDataIndicatorValue) VALUES(‘ChildProc1′,’Starting’);
INSERT INTO dbo.ProcessData (ProcessDataIndicatorName, ProcessDataIndicatorValue) VALUES(‘ChildProc1′,’Ending’);
INSERT INTO dbo.ProcessData (ProcessDataIndicatorName, ProcessDataIndicatorValue) (‘ChildProc2′,’Starting’);
INSERT INTO dbo.ProcessData (ProcessDataIndicatorName, ProcessDataIndicatorValue) VALUES(‘ChildProc2′,’Ending’);
INSERT INTO dbo.ProcessData (ProcessDataIndicatorName, ProcessDataIndicatorValue) (‘ChildProc3′,’Starting’);
INSERT INTO dbo.ProcessData (ProcessDataIndicatorName, ProcessDataIndicatorValue) VALUES(‘ChildProc3′,’Ending’);
Before moving forward, removing code duplication would be a worthwhile effort. In application development, this is one of many processes generally referred to as Refactoring.
The INSERT statements are a prime candidate for refactoring and we can address this with a stored procedure:
CREATE PROCEDURE dbo.AddProcessData @ProcessDataIndicatorName varchar(50), @ProcessDataIndicatorValue varchar(50) AS
INSERT INTO dbo.ProcessData (ProcessDataIndicatorName, ProcessDataIndicatorValue) VALUES(@ProcessDataIndicatorName, @ProcessDataIndicatorValue);
Now the parent stored procedure instrumentation above can be modified to look like this:
Measuring the current process provides a baseline – the first step in a continuous improvement process that will provides dynamic design changes, performance monitoring, and – eventually – a dynamically-scalable system. It also supplies the current performance status against which we can benchmark future improvements and modification.
My sister-in-law recently passed the PMP certification (congratulations Heather!). I’m waiting for a call from her asking if we need to add resources.
The resources comment above is a joke, but it isn’t really that funny. It’s indicative of my decades of experience with bad IT project managers.
I believe and hope I have worked with some of the worst project managers on the planet. Why do I hope? I’d hate to think anyone has had to deal with folks worse than the poor project managers I’ve dealt with.
As I type, we’re experiencing a heat wave in Farmville, Virginia. It was 107 degrees Fahrenheit here yesterday. It’s the “dog days of summer“, as my Granny used to call them.
Somehwere, you will find 30 or more push mowers lined up wheel-to-wheel along one axis of a lawn. On command, the 30+ operators will start their mowers. On cue, they will push them across the lawn, maintaining the wheel-to-wheel alignment, cutting the entire area in one pass.
This, my friend, is the home of an IT project manager.
Proof of callouses should be required for the application process.
(The same can be said for MBAs, but that’s for another post…)
Inspired by the book, Brook’s Law states “Adding manpower to a late software project makes it later.” It is perhaps best summed up by the following statement by Brooks himself: “The bearing of a child takes nine months, no matter how many women are assigned.”
As an IT professional, you can learn to detect when you’re about to be “managed”. I share the following indicators and advice from my years of experience:
“Do we need to add more resources?” This question in and of itself is harmless. It’s actually the way project managers greet each other and has no more meaning to ordinary folk than “How are you doing today?” or “How about this weather?”
The best answer to this question is a non-answer. After years of trying to correctly answer this (as though it were a meaningful question), I stumbled across an answer that works for me: “I don’t know.” Why does this work so well? The last thing a bad IT project manager wants to do is make a decision – at least one traceable to them.
“I am (or used-to-be) a software developer.” If you hear this, you’re in trouble. Big, big trouble. My advice to you is to vacate the project – and the premises – as quickly as possible. This isn’t a fire evacuation, it’s a bomb evacuation. You may wish to consider jumping out a window if you’re on or below the third floor.
Why? You are dealing with a person who believes they were promoted because they were such a gooddeveloper. Mind you, this is true in less than 25% of my experience. And even then, odds are their resume includes COBOL or they aren’t nearly the project manager they believe themselves to be. At best you have 1/3rd of a 25% chance that you’re working for someone who knows a definition for delegate – a definition that isn’t “someone who attends a convention.”
The truth of the matter is this person was likely promoted before they could delay or otherwise further damage the software project to which they were assigned.
“What do I tell my boss (or the stakeholders)?” This question is the prelude to a demand. Your answer isn’t important, the demand in the mind of the IT project manager is important. And that demand is for you to do something no sane developer would ever do.
There are a couple options. If you’re feeling froggy, you can document the fact you were asked to take this ridiculous course of action by your IT project manager, and then do it. Be sure to address the issue in writing and as soon as possible. CC someone else – anyone else. If you can CC the project managers’ boss without looking like you’re trying to make them look stupid, that’s best. If not, CC someone else at your level on the development team (and allow the bad IT project manager to continue their work of making themselves look stupid unassisted).
Note: Never BCC. BCC’ing the boss is the equivalent of placing a bold, red, flashing banner across the top of your message which states “I’M INSECURE”. The boss will get this message, loud and clear. Go ahead and CC them if you believe it’s warranted – those dogs need to wake up eventually.
Make sure it’s in writing and someone else sees it – that’s the point.
The other option is to simply ignore it and do what you know to be right and good. There’s risk here too. Some bad IT project managers will call in bigger dogs to shout you down. It’s good to have your mugshot and name on a book somewhere if you’re going to exercise this option.
“Umm yeah. I’m going to need you to come in Saturday. Sunday’s not looking good either…” People are people. Bad IT project managers don’t get that. They call people “resources”. People aren’t resources, we use resources, but we’re separate and distinct from resources. People are people.
Bad IT project managers are the reason we have IT Project Leads. After all, someone who knows what they’re talking about needs to have some authority if any software project is to stand a chance of succeeding.
SSIS is a fantastic ETL engine. When I build an ETL solution in SSIS, I like to collect runtime metrics. Why? I use the data initially to determine a baseline for performance and to determine, in some cases, if I’m loading within a defined window.
I refer to this process of collecting performance and runtime metrics as ETL Instrumentation and I collect this data for every enterprise ETL solution I design.
The types of things I want to know are:
Did the process start when I wanted it to start?
Did it succeed?
If not, is it still running?
Did it fail?
How long did it execute?
How does recent performance compare to past performance?
In this mini-series entitled SSIS Design Pattern – ETL Instrumentation, I will demonstrate how I capture, report, and measure these data using SSIS, SSRS, SQL Agent Jobs, and Transact-SQL.
In Part 1, we build an elementary system. In subsequent parts, we will improve upon the concepts demonstrated here.
To Instrument Or Not To Instrument; That Is The Question
My thoughts on this are best summed up thus:
There are two types of enterprise data warehouse developers:
1) Those who collect ETL instrumentation data; and
2) Those who will.
Why do I make this statement? When you build ETL to load your enterprise data warehouse, you are building the source for the reporting system. Should anything unfortunate happen to the data between the time you read it from the System Of Record and the time it lands in your Reporting Server Database, you are basically guilty until proven innocent. ETL Instrumentation will provide evidence for (and sometimes against) you as you defend your ETL design and implementation.
Building The Database
Before we instrument an ETL process, we need a database solution for the data. We’ll start by building the database itself, then add objects.
Let’s build the database. We’ll call it SSISRunTimeMetrics. Open SQL Server Management Studio (SSMS) and execute the following Transact-SQL:
use master go
if not exists(select name from master.sys.databases where name = ‘SSISRunTimeMetrics’) Create Database SSISRunTimeMetrics go
This script creates the SSISRunTimeMetrics database. Next, let’s create a schema and table to hold our metrics:
use SSISRunTimeMetrics go
— vars… declare @sql varchar(255)
— create ssis schema… if not exists(select name from sys.schemas where name = ‘ssis’) begin set @sql = ‘Create Schema ssis’ exec(@sql) end
— create RunTimeMetrics table… if not exists(select s.name + ‘.’ + t.name from sys.tables t inner join sys.schemas s on s.schema_id = t.schema_id where t.name = ‘RunTimeMetrics’ and s.name = ‘ssis’) Create Table ssis.RunTimeMetrics (id int identity(1,1) ,packageStartDateTime datetime null ,packageEndDateTime datetime null ,packageName varchar(255) null ,packageVersion varchar(25) null ,packageStatus varchar(25) null)
Next, let’s add a stored procedure to record a package start event:
use SSISRunTimeMetrics go
if exists(select s.name + ‘.’ + p.name from sys.procedures p inner join sys.schemas s on s.schema_id = p.schema_id where p.name = ‘usp_RecordPackageStart’ and s.name = ‘ssis’) begin Drop Procedure ssis.usp_RecordPackageStart end go
Create Procedure ssis.usp_RecordPackageStart @packageStartDateTime datetime = null ,@packageName varchar(255) = null ,@packageVersion varchar(25) = null ,@packageStatus varchar(25) = null As begin — set @packageStartDateTime default… declare @Now datetime set @Now = GetDate()
if (@packageStartDateTime Is Null) set @packageStartDateTime = @Now
— insert the run time metrics data… insert into ssis.RunTimeMetrics (packageStartDateTime ,packageName ,packageVersion ,packageStatus) values (@packageStartDateTime ,@packageName ,@packageVersion ,@packageStatus)
That’s all for now. We’ll add more stuff later.
Building The SSIS Project
Open SQL Server Business Intelligence Development Studio (BIDS) and create a new project. When the New Project dialog displays, select Business Intelligence Projects from the Project types list and Integration Services Project from the Templates. Name the project SSISRunTimeMetrics and click the OK button:
Rename the Package SSISRunTimeMetrics.dtsx:
Always click the Yes button when you see this dialog (trust me):
Right-click in the white-space on the Control Flow canvas and click Add Annotation to document your code:
I can’t say enough about good documentation. You never know when someone else will have to open your source code to troubleshoot.
Right-click in the white-space of the Control Flow canvas and click Variables. Add a variable named sVersion. Make sure the Scope is the SSISRunTimeMetrics package and make the Data Type String:
Add a Sequence Container and change the name to “Step 1 – Do some stuff”:
Sequence Containers are nice for several reasons:
They’re aesthetic. You can break up chucks of functionality logically… how cool!
When troubleshooting, you can execute the contents of a container from the right-click context menu.
You can isolate chunks of functionality using transactions by container.
Next, add an Execute SQL Task and name it “Log Start of Package Execution”:
Double-click the Execute SQL Task to edit it. Click Connection and select “<New connection…>”:
When the Configure OLE DB Connection Manager dialog displays, click the “New…” button:
Enter or select your server in the Server Name dropdown. In the “Connect to a database” panel, select the “Select or enter a database name” option and enter or select the SSISRunTimeMetrics database you created earlier. Click the OK button to proceed:
Then click the OK button again to set the Connection property of the Execute SQL Task:
Select the SQL Statement property and click the ellipsis. When the “Enter SQL Query” dialog displays, enter the following parameterized Transact-SQL statement:
Exec ssis.usp_RecordPackageStart ?,?,?,’Started’
Click the OK button to proceed:
Click on the Parameter Mapping page and configure parameters as shown:
Click the OK button to proceed.
Next, let’s do some work on the sVersion variable. Click on the sVersion variable name and press the F4 key to display variable properties. Change the EvaluateAsExpression property to True, then click on the Expression property’s ellipsis:
When the Expression Builder displays, enter the following SSIS Expression Language text to build a version string in the format Major.Minor.Build:
Click the “Evaluate Expression” button to check the expression. Then click the OK button to close the Expreesion Builder:
Add a Script Task and configure it to popup a message box displaying the value of sVersion:
Imports System Imports System.Data Imports System.Math Imports Microsoft.SqlServer.Dts.Runtime
Public Class ScriptMain
Public Sub Main()
Dim sVersion As String = Dts.Variables(“sVersion”).Value.ToString Dim sMsg As String
sMsg = “Version: ” & sVersion
Dts.TaskResult = Dts.Results.Success
Execute the package. Then execute the following Transact-SQL query to check the results:
select * from ssis.RunTimeMetrics
It’s time to add a procedure to log the end of package execution, but we have a problem: How do we identify the row we just added to the ssis.RunTimeMetrics table? Think about it. We just inserted a row, and since that’s the only row in the table it’s pretty simple to identify. Later, we will have lots of rows in the table in varying states. We need a method to positively identify the row we’re working with.
We could use the last row inserted – apply Max(id) criteria – but what happens as our metrics grows to include multiple packages? One package could start while another was executing and we’d update the wrong row.
We require a means to positively identify the row when added to the table. Such a means exists. If we modify the ssis.usp_RecordPackageStart stored procedure we can have it return the value of the inserted ID. The OUTPUT clause gives us access to the inserted and deleted virtual tables. Modify the stored procedure by adding the following OUTPUT clause as shown:
output inserted.id as ‘Inserted_ID’
Execute the Transact-SQL to drop and recreate the ssis.usp_RecordPackageStart stored procedure:
Add another Int32 Type variable named iPackageLoadID:
Back in Execute SQL Task, change the ResultSet property to “Single row”:
Click the Result Set page and add a new resultset. Name it 0 (ordinals work well) and assign the results of this query to the iPackageLoadID variable you just created. This will push the results of the OUTPUT clause in the stored procedure (which will include the id of the newly inserted row) into the iPackageLoadID variable. You can then access it later in the package to update the precise row:
Alter the Script Task by adding iPackageLoadID to the list of ReadOnlyVariables:
Alter the VB.Net script to read:
Imports System Imports System.Data Imports System.Math Imports Microsoft.SqlServer.Dts.Runtime
Public Class ScriptMain
Public Sub Main()
Dim sVersion As String = Dts.Variables(“sVersion”).Value.ToString Dim sPackageLoadID As String = Dts.Variables(“iPackageLoadID”).Value.ToString Dim sMsg As String
Next, add a new stored procedure named ssis.usp_RecordPackageEnd to the SSISRunTimeMetrics database using the following Transact-SQL script:
use SSISRunTimeMetrics go
if exists(select s.name + ‘.’ + p.name from sys.procedures p inner join sys.schemas s on s.schema_id = p.schema_id where p.name = ‘usp_RecordPackageEnd’ and s.name = ‘ssis’) begin Drop Procedure ssis.usp_RecordPackageEnd end go
Create Procedure ssis.usp_RecordPackageEnd @id int ,@packageEndDateTime datetime = null ,@packageStatus varchar(25) = null As begin
— set @packageEndDateTime default… declare @Now datetime set @Now = GetDate() if (@packageEndDateTime Is Null) set @packageEndDateTime = @Now
— update the run time metrics data… update ssis.RunTimeMetrics Set packageEndDateTime = @packageEndDateTime ,packageStatus = @packageStatus where id = @id
Now add a new Execute SQL Task named “Log End of Package Execution”:
Set the Connection property to your SSISRunTimeMetrics database and the SQLStatement property to the following Transact-SQL statement:
declare @Now datetime set @Now = GetDate()
Exec ssis.usp_RecordPackageEnd ?,@Now,’Succeeded’
Click the Parameter Mapping page and configure the iPackageLoadID variable as an input variable of the Long data type, assigned to ordinal 0:
Click the OK button:
Execute the package. Note that each time a change is saved, the VersionBuild property auto-increments, thus updating your sVersion variable value:
The package succeeds:
Examine the data stored in the ssis.RunTimeMetrics table. Note the latest row now contains a Start and End DateTime, and the packageStatus field now records “Succeeded”:
We will work with these concepts more – fleshing them out by adding more robust code and additional functionality. For now, this is a good start. Congratulations!
I was recently reminded that iteration matures software.
The History of Andy, Part 1
Like many DBAs, I was a software developer in another life. I built web applications – working my way up from HTML through DHTML and finally to ASP – and could regale (and bore) you young whipper-snappers with war-stories of how things were “back in my day”. </DanaCarvey>
I’ll share instead something I’ve witnessed many times since starting with software in 1975 – and something you probably already know: stuff changes.
And thank goodness stuff changes!
I recently ordered 1G of RAM from an online retailer. It should arrive before my next son (but that’s not a given as Riley refuses to provide a tracking number – the doctors will induce Christy into labor Friday if he hasn’t been born by then – but I digress…). I remember my neighbor John, who introduced me to computers, purchased a 256-byte RAM chip in the mid-1970s for about what I paid for the 1G. That’s 256 bytes of RAM – not a typo. As I recall it was either a 14- or 16-pin IC.
Things have changed since then. Improvements in technology, brought about by building and improving upon existing knowledge, have brought us to a day when I can purchase 1,073,741,824 bytes for roughly the previous price of 256. I don’t know how you feel about that. I think it’s a good thing.
The idea of “building and improving upon existing knowledge” defines iterative development. Although the idea is relatively new to the software development field, it serves as the basis for engineering disciplines. Engineers iterate – build and improve upon existing knowledge – and we get more powerful hardware for the same amount of money. What’s not to like?
Iteration – it’s not just a good idea…
Iterative software development builds and improves upon existing knowledge within a specific domain. Most domains are defined by an application (wholly or in part), enterprise knowledge (again, wholly or in part), or – most likely – some combination of the two. For example, let’s say you work for a large corporation as a software developer. Your domain could be the corporate website. In which case you possess knowledge about the business of the corporation and web development. You mix these together to do your job. In this case, you will probably pick up marketing savvy and current trends along with the latest AJAX techniques.
As you make successive passes (iterations) through the website design interacting with marketing, your domain knowledge is built and improves. As your domain knowledge increases, the website will become more valuable to the corporation – as will you.
Iteration adds value.
The same can be said for database development.
Perhaps you’ve experienced this in your own database development efforts: you receive a request for a database design to meet some desired functionality. Or you’re handed a design and asked to optimize it. Or maybe even you had an idea to capture data – performance metrics or something similar – and you’re designing a database solution to accomplish this.
You get into the development a few hours or a few days and realize a little tweak here or there would improve performance, or readibility, or better adapt the design to your intentions. So you make the tweak and continue.
This improvement leads you to re-examine other portions of the design and you make more tweaks. Maybe your last change broke things. Maybe you see an opportunity to add a parameter to a stored procedure and combine the business logic of three stored procedures into one.
A “Growing” Solution
Pretty soon, you have iterated enough to feel comfortable promoting, integrating, or even releasing the results – letting the effort move to the next step.
Depending on the nature of your efforts, it may not end there. If your database development is the back end of a larger application – say, the corporate website, for example – there will likely be requests for changes over time as the site grows (scales) in complexity and size.
When the requests come in you are not likely to start over. You will most likely build and improve upon your existing knowledge. You will most likely iterate.
Scaling forces iteration.
This is how solutions mature – be they applications, databases, or both – regardless of who writes them or how many are involved in the development effort. It doesn’t matter if the development team is one lady in a cubicle in the European Union or a development team of thousands at Microsoft.
I sometimes miss the ActiveX task in DTS. It was cool because “if you could see it, you could change it.” I used the ActiveX Task to generate dynamic SQL.
When To Use Dynamic SQL In SSIS
There are lots of uses for dynamic SQL in SSIS. If you find yourself using the same SQL statements over and over again with minor modifications – a different table name here, a different criteria value there, etc. – the statement is a likely candidate for dynamic SQL.
The flexibility of dynamic SQL coupled with the flexibility of SSIS allow for some very creative solutions. You can generate SQL based on environmental conditions – whether or not a directory exists; or using a Loop Container, set numeric criteria based on the value of the iterator; or in a ForEach Loop Container you can acces and include filenames in dynamically-generated SQL. The possibilties are limited only by our experience and knowledge of SSIS.
Variable On Variable Technology!
One of the best methods for generating dynamic SQL in SSIS is to use variables and utilize the EvaluateAsExpression property to update the value of the variable containing the dynamic SQL. This SSIS Design Pattern demonstrates how to generate dynamic SQL using the EvaluateAsExpression property of variables.
To start, open Business Intelligence Developer Studio (or Visual Studio – they’re the same). Create a new Integration Services project. Right-click any whitespace on the Control Flow canvas and click variables:
When the Variables dialog displays, click the Add Variable button on the Variables menu bar:
Name the variable sSQL and set the Data Type to String. Set the value to “Select ContactID, NameStyle, Title, FirstName, MiddleName, LastName, Suffix, EmailAddress FROM Person.Contact WHERE ContactID = 0”. sSQL is our template variable – we will use this initial value (that returns no rows) as the basis of our dynamic SQL.
Note: In this example I only configure and display the dynamic SQL – I do not assign it to an ExecuteSQL (or some other) Task. But if I did, the SQL would have to parse or the task would throw a validation error. There are two ways around this error:
Use a valid statement in the template SQL.
Set the DelayValidation property of the ExecuteSQL Task to True.
Create another variable of Int32 Data Type named iContactID. iContactID will contain the dynamic portion of outr SQL statement. Set iContactID’s Value to 11:
Click on sSQL and press the F4 key to view the properties for sSQL. Set EvaluateAsExpression to True.
Click Expression, then click the ellipsis. In the Expression Builder, enter “Select ContactID, NameStyle, Title, FirstName, MiddleName, LastName, Suffix, EmailAddress FROM Person.Contact WHERE ContactID = ” + (DT_STR, 4, 1252)@[User::iContactID].
We are building a string variable, but iContactID is an Int32 data type. To concatenate it to a string, we need to cast the integer to a string.
In SSIS Expression Language, casting is accomplished using Type Casts. Drag (or type) the cast function just in front of the variable, i.e. (DT_STR, 4, 1252) @[User::iContactID]. All together it will look like the following:
Click the Evaluate Expression button to test. Your SQL statement should display in the two-line label above the button. Then click Ok to exit the Expression Builder.
A note about building the expression: I am sometimes tempted to use the REPLACE string function in Expression Builder to build dynamic SQL for variables. This usually turns out to be a bad idea because the value of the variable itself is updated when it’s evaluated as an expression. This means if I have a variable named sSQL with an initial value like “SELECT <ReplaceMe> FROM Person.Contact” and I use an expression like REPLACE(@sSQL, “<ReplaceMe>”, “ContactID”) it will work just fine on the first pass, but will fail to update the variable value on subsequent passes because the Search Phrase is no longer contained in the value of @sSQL – it was updated when the variable was evaluated as an expression.
Add a Script Task to the Control Flow and name it “Popup sSQL”. On the Script screen add sSQL to the ReadOnlyVariables list:
Click the Design Script button and add a MsgBox to popup the value of sSQL:
Close the Script Task. Right-click the Script Task and click Execute Task:
The value of sSQL should popup in a message box – with the value for iContactID displayed in the WHERE clause:
Taking It To The Street
Although I do not demonstrate it in this post, you would use something similar to this example in a Loop Container. If you use iContactID as the iterator of the loop, you could start at some value and advance to some other value, executing dynamic SQL inside the loop on each pass.
I hope this helps you design dynamic SQL in SSIS. If you still need the advanced functionality of a script you can use the Script Task.
I think Eric makes a couple good points, one explicit, one implied:
1. (Explicit) A DBA – or Database Developer, more accurately (and there is a difference) – adds value to development.
2. (Implicit) There are Software Developers out there who can step into the Database Developer role long enough to solve most database tuning issues. Eric demonstrates this with himself in profiling and addressing a missing or ill-defined index.
I find most of the comments – presumably by software developers – typical. One developer stated:
My current project didn’t have a DBA for 2 years, until recently since we’re now at the stage of optimizing for performance. It seems to me that as long as the database is intelligently structured in the first place, a DBA’s role would be rather small in most cases.
I agree with the sentiment expressed here – as much as I agree that code-generation tools can replace application developers. It’s true that you can utilize SQL Server or any database engine as a dumb file store. And it’s equally true that you can build an enterprise application in C# that consists of thousands upon thousands of lines of nested If… Then… Else statements.
The question is: Why would you?
This goes beyond arguments over syntax, coding standards, methodology, and design philosophy. This is about putting competent professionals – at the height of their game – into the mix on a project.
You don’t have to take my word for it – ask software developers who have worked (or are working) with competent database developers.
A friend (who shall remain nameless) recently told me his company interviewed a competent database developer and DBA. All seemed in agreement an offer would be forthcoming until the very end of the recruiting process. At that time, someone made the comment “we don’t need a DBA.”
It would be notable if this sentiment wasn’t so widespread – but I see it often. How often? Well, I would have to tell you how I see it to qualify that statement:
You see, people rarely say to me “We don’t need a DBA.” Mostly I see it in their applications – many of them prominent companies in which you may even own stock. I can tell when I examine their schema. I can see it when I execute Profiler against their SQL Server database.
Now, there are lots of reasons to design a denormalized schema. And there are lots of reasons to encapsulate the business rules in code. This is not what I’m talking about – though some of these systems would clearly perform better (or at all, in extreme cases) if they took advantage of better design patterns.
I’m talking about designs where this much is obvious:
1. At least two people designed the data layer; and 2. They did not communicate during the process.
Often, enterprise-level database design is shoveled onto developers as a secondary task. No, I’m not making this up – it’s too tragic to joke about. There are developers who can handle this task. But there are more who believe they are database developers than actually are. (Before I became a SQL Server DBA I was a developer who thought I was a SQL Server DBA…)
There will doubtless be readers who can provide examples of how their enterprise application was built by junior developers who did the database and code work and whose systems are performing just fine. I’m happy for you and sincerely hope the system scales.
Designing a scalable solution – database, application, or enterprise architecture – is one of those things that consumes time, thinking, resources, and money during the early phases of an enterprise development cycle. But it is – hands down – one of the best investments (if not the best) in the solution.
In today’s market, scalability is as optional as security. And like security, a scalable design is not something you “add later.” It’s not part of the foundation – it is the foundation.
My experiences with designing scalable solutions has proven there is no free lunch nor any shortcuts that work. If anyone – me included – skips the work of designing for scalability, there comes a day when they (or I) must pay the fiddler. From what I hear and have experienced, designing in this fashion is most often sacrificed on the altar of the deadline. Trust me, if it falls apart in six weeks or six months, you haven’t saved any time – and you may have lost a job or a customer.
Someone told me this and I remember because it has proven true several times over: “Deliver quality late, no one remembers. Deliver junk on time, no one forgets.“
If you’re building (or upgrading to) cutting edge technology, you need a DBA.
I received a cool compliment today from a peer who’s a developer. He said, “You know, I really like having a DBA on my team!” I have to tell you, it made my whole day!
It led to a discussion about past experiences and expectations, and I shared something I thought was pretty much common knowledge: there are three types of DBAs. My peer was shocked, so maybe the knowledge isn’t so common after all.
The three “flavors” of DBAs I define are:
System, Operations, or Production Support DBAs – these DBAs write maintenance plans in notepad and have no qualms whatsoever about executing in command-line. They were DBAs in the old days, when we carved our own ICs out of wood. They will get your server and database back online fast – and with less data corruption than anyone else on the planet. They live for torn pages and I/O faults.
Application Support DBAs – these DBAs are familiar with one or more (usually complex) applications. I’m talking PeopleSoft, Seibel, and SAP here. If you want to customize a screen or write a one-off web application, you desperately need these folks.
Database Developers – these DBAs are ruthless bit-heads. They use bigint and byte fields for masking binary states. They can optimize a stored procedure in their sleep and wrap an API around a database so developers never have to consider writing SQL that directly hits tables. They are performance freaks that will work 18-hour days on weekends to test in Production – when it’s “safe.”
Do you think DBAs fall into these categories? Do you know any that do? Do you see yourself in there anywhere? Do you have more or less or different “flavors” for classifying DBAs?