Everything Scales

The tune to a Bush song is running through my head as I type this… the band, not the president – although imagining the President singing the song is an interesting brain-stretch.

It’s a fact of IT life that everything scales. Some successfully, even. Problems start when things do not scale successfully (or well). It happens in business. It happens with software systems.

When it happens with businesses, you hear things like “They grew too fast.” When it happens with software systems, you browse to a website and receive an HTTP 500 or 404 error.

Can this be avoided (in business or software)? I think that’s an excellent question – one well worth examining.


The answer, I believe, lies with how predictable the scalability is.

Consider a database application: If you know which tables are going to grow, how, and how much, you can plan for said growth. How would you plan? You could partition the tables using one or a combination of partitioning techniques. You could appropriate filegroups, snapshots, and a host of other functionality. If you only you knew where to apply these techniques.

That’s the key.


Achieving scalability starts with capturing metrics. If you know how your database is growing from the beginning – if you can chart the growth of individual tables, access patterns, and internal performance data – you can predict growth and manage scalability.

So the key is measurement.

Measurement is an engineering discipline in its own right. The field of applied measurement is called Instrumentation. Applying measurement to a process is referred to as “instrumenting the process.”

How do you instrument a database process? Iteration 1 would include creating an internal table to house and maintain process metadata:

CREATE TABLE dbo.ProcessData
(ProcessDataID int IDENTITY(1,1) NOT NULL,
ProcessDataDateTime datetime NULL CONSTRAINT DF_ProcessDataDateTime DEFAULT (getdate()),
ProcessDataIndicatorName varchar(50) NULL,
ProcessDataIndicatorValue varchar(50) NULL,
ProcessDataIndicatorStatus char(1) NULL CONSTRAINT DF_ProcessDataIndicatorStatus DEFAULT (‘C’),
CONSTRAINT PK_ProcessData PRIMARY KEY CLUSTERED (ProcessDataID)

If your instrumented process is stored-procedure-based, you could add INSERT statements to your existing stored procedures. Consider instrumenting a parent stored procedure that calls child stored procedures. The instrumented proc could look like the following (instrumentation emphasized):

CREATE PROCEDURE dbo.SomeProcess
AS

begin

INSERT INTO dbo.ProcessData
(ProcessDataIndicatorName,
ProcessDataIndicatorValue)
VALUES(‘ChildProc1′,’Starting’);


EXEC dbo.ChildProc1

INSERT INTO dbo.ProcessData
(ProcessDataIndicatorName,
ProcessDataIndicatorValue)
VALUES(‘ChildProc1′,’Ending’);

INSERT INTO dbo.ProcessData
(ProcessDataIndicatorName,
ProcessDataIndicatorValue)
(‘ChildProc2′,’Starting’);


EXEC dbo.ChildProc2

INSERT INTO dbo.ProcessData
(ProcessDataIndicatorName,
ProcessDataIndicatorValue)
VALUES(‘ChildProc2′,’Ending’);

INSERT INTO dbo.ProcessData
(ProcessDataIndicatorName,
ProcessDataIndicatorValue)
(‘ChildProc3′,’Starting’);


EXEC dbo.ChildProc3

INSERT INTO dbo.ProcessData
(ProcessDataIndicatorName,
ProcessDataIndicatorValue)
VALUES(‘ChildProc3′,’Ending’);


end

Before moving forward, removing code duplication would be a worthwhile effort. In application development, this is one of many processes generally referred to as Refactoring.

The INSERT statements are a prime candidate for refactoring and we can address this with a stored procedure:

CREATE PROCEDURE dbo.AddProcessData
@ProcessDataIndicatorName varchar(50),
@ProcessDataIndicatorValue varchar(50)
AS

begin

INSERT INTO dbo.ProcessData
(ProcessDataIndicatorName, ProcessDataIndicatorValue)
VALUES(@ProcessDataIndicatorName, @ProcessDataIndicatorValue);

end

Now the parent stored procedure instrumentation above can be modified to look like this:

CREATE PROCEDURE dbo.SomeProcess
AS

begin

EXEC dbo.AddProcessData @ProcessDataIndicatorName=’ChildProc1′, @ProcessDataIndicatorValue=’Starting’;

EXEC dbo.ChildProc1

EXEC dbo.AddProcessData @ProcessDataIndicatorName=’ChildProc1′, @ProcessDataIndicatorValue=’Ending’;

EXEC dbo.AddProcessData @ProcessDataIndicatorName=’ChildProc2′, @ProcessDataIndicatorValue=’Starting’;


EXEC dbo.ChildProc2

EXEC dbo.AddProcessData @ProcessDataIndicatorName=’ChildProc2′, @ProcessDataIndicatorValue=’Ending’;

EXEC dbo.AddProcessData @ProcessDataIndicatorName=’ChildProc3′, @ProcessDataIndicatorValue=’Starting’;


EXEC dbo.ChildProc3

EXEC dbo.AddProcessData @ProcessDataIndicatorName=’ChildProc3′, @ProcessDataIndicatorValue=’Ending’;

end

Much better.

Measuring the current process provides a baseline – the first step in a continuous improvement process that will provides dynamic design changes, performance monitoring, and – eventually – a dynamically-scalable system. It also supplies the current performance status against which we can benchmark future improvements and modification.

:{> Andy

Andy Leonard

andyleonard.blog

Christian, husband, dad, grandpa, Data Philosopher, Data Engineer, Azure Data Factory, SSIS guy, and farmer. I was cloud before cloud was cool. :{>

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.