Lessons Learned About Community Leadership

It’s been an honor and a privilege to serve in technical community leadership at the local (Richmond Virginia) and regional level. It’s not the same as managing geeks, but there are similarities.

I’ve considered serving on a national level but not taken that plunge. Over the years I’ve learned some lessons that I thought I’d share. I believe some of this stuff applies to everyone at all times, not just leaders when they are leading. In no particular order I offer the following:

  1. Serve the people you lead. As often as possible, do the non-glamorous parts of the job. Show up early. Stay late. Clean up. Take out the trash. Strive to help. Always.
  2. Love the people you serve.
  3. Engage in Active Humility. Respond to compliments by reminding folks you’re part of a team. Mean that when you say it. Be an example of service and love. Never believe your publicity or trust in popularity.
  4. Listen to the Community. Prove you’re listening by implementing some of their suggestions. Give credit to those who made the suggestions.
  5. Own your mistakes. Apologize when you’re wrong. Apologize when you’re right but you offended someone. There’s no statute of limitations on apologies because there is no statute of limitations on hurt feelings. I was reminded of this when I apologized for something I did years ago. The first response was, “Well that was a long time ago,” which was accurate. But then this person shared details… it turns out it wasn’t too long ago after all. I share this not as criticism but as proof. It stinks to get hurt. It stinks more to not receive an apology.
  6. Empower and enable others. Leaders multiply leaders. Community leadership takes a village. Assume a team with members leaving and joining is the norm because, well, it is.
  7. Cherish the opportunity to lead. Like all things, this too shall pass. You will miss leading when you’re done.

:{>

SSIS Performance Pattern – Loading a Bajillion Records

Do you need to use SSIS to load some data from a large source? This post covers some thoughts on a couple/three patterns you may apply to help. I can hear you thinking, “Which pattern should I use, Andy?” That’s an excellent question! The answer is, “It depends.” You should test each pattern and see which performs best for you.

Pattern 0: Just Do It

The first pattern is easy. Select the rows you need in an SSIS Data Flow Task OLE DB Source adapter. You can build a query in SSMS (like I did here):

Is there anything wrong with this query? Well, maybe. Functions – like DatePart() – in the WHERE clause can slow things down a bit. It’s a valid pattern, though, so you should use it in your SSIS Data Flow Task’s OLE DB Source adapter to test performance.

Pattern 1: Stage Select Rows Using a Data Flow

Another option is to stage the subset of rows you’re after – or just a list of uniquely-identifying columns from those rows – also known as a business key or natural key. You could create a Data Flow Task that selects the rows you wish to load, similar to that shown here:

Those rows could be loaded to a staging table and later read from that table.

Reading all the columns – as shown above – is really just adding a step that you don’t need if your goal is to load data straight from a source to a destination. If, however, you need to join to other tables or lookup reference data, loading more than just the business keys into a staging table may be the way to go.

Pattern 2: Load Staged Rows Using T-SQL

A third option is to use T-SQL to load the target table after you’re loaded a stage table via Data Flow Task. Configure the T-SQL query similar to that shown here:

Your Control Flow now appears as shown:

Pattern 3: Stage and Load Rows Using T-SQL and Execute SQL Tasks

Another option is to stage the subset of rows you’re after using an Execute SQL Task and then use a second Execute SQL Task to load those rows into the target table. Your stage query would appear as shown here:

The Control flow would now contain an Execute SQL Task for Staging and another for Loading:

Pattern 4: Skipping the Stage Step Altogether

If you’re just doing a simple single-table load (like in this demo), and if the source and destination are on the same instance, you can often achieve great performance using T-SQL to load from the source directly to the destination:

Your Control Flow becomes remarkably simple, as shown here:

Conclusion

Staging a subset of the data you need, or even unique keys for a subset of the desired data, is one way to improve load performance.

:{>

About Me

Hi there! Welcome to my professional blog. I really appreciate you taking the time to visit. I’d like to share a couple things about myself – in case you’re interested. Why? I believe context matters. If you don’t agree, feel free to skip on past this post and enjoy the ten years of content stored here. We can still be friends., I promise! :{>

This post is divided into three sections:

Read more

 

Announcing Biml Academy v2.0

Update: See this post at Biml Academy for an opportunity to sign up for 50% off!

I am pleased to announce the relaunch of Biml Academy!

The site currently hosts free training from the very first Biml Academy week – way back in the Spring of 2016. I delivered that training while on the road. I was delivering consulting at a large software company in the Pacific northwest… It was a fun gig! But when they asked me to spend Biml Academy week onsite, I had to request one caveat: that they allow me to deliver the content from their offices.

My project manager was (and still is) awesome. He agreed and the rest is history… recorded history, even!

Check it out at Biml Academy!

:{>

One Solution to Presentation Levels

I’ve made no secret of my personal disdain for Presentation Levels. My non-appreciation for presentation levels is largely driven by presentation evaluation complaints received over the years. These complaints mostly read the same (paraphrased): “The presentation was advertised as Level 200, but I felt it was more of a Level 100 (or 300, or whatever) presentation.”

I have three kneejerk reactions to this complaint:

  1. Levels are subjective and imprecise.
  2. Your Level 100 (or 300, or whatever) is probably different than someone else’s Level 100 (or 300, or whatever). Or mine.
  3. I’ve neither attended nor delivered a presentation at a single level.

I’d rather see something – anything – more than an naked scalar.

One Solution

I propose a more realistic reflection of presentation content. We don’t have to use data bars like the image shown below, but some indication of how much content is at which level would more effectively and accurately reflect the content of a presentation. If we insist on naked scalar levels, we could pick the largest percentage and label the presentation thus.

But I’d rather see something – anything – more than an naked scalar. I’d rather see something like this which I call Tiered Levels:

Applied

Tiered Levels address point 1 (somewhat) above. While still subjective, there’s more precision.

Tiered Levels do nothing for differences in skill levels of attendees or differences between attendee skill levels and the skill level of the presenter.

Tiered Levels completely address the fact that naked scalar presentation levels are less accurate while adding visibility into the true level distribution of the presentation.

:{>

My New Title is Chief Data Engineer

My new job title at Enterprise Data & Analytics is Chief Data Engineer.

“What’s a Data Engineer, Andy?”

I’m glad you asked! Data Science is a broad term applied to analytics derived from the statistical analysis of data. “So, what do data scientists actually do?” That’s a fantastic question: Data scientists experiment. they use tools like R and D3, languages like Python, and platforms like Azure Machine Learning to tinker with weights applied to data measures. Then they observe the results. It’s, well, science.

Before data science can happen, though, the data used by Data Scientists must be collected, cleansed (because garbage in still equals garbage out), coalesced into consistent units of measure, and formatted for consumption by the analytical engine. This part of the job is called data wrangling, munging, or data integration; and it’s 50-90% of the work!

This is data engineering.*

It’s All About Me

I’ve been doing data integration since before I knew what it was called. I cannot tell you what drew me to this type of work, but I can also not deny I was drawn to it. I seemed to have a knack (as we say in Farmville Virginia…) for it. I enjoyed it. It was useful work. And, for some unknown reason, I was good at it.

So I changed my title at Enterprise Data & Analytics to better reflect this reality.

More to Come

This isn’t the only change we’re making at Enterprise Data & Analytics. As we continue to grow, we are making other changes that will be announced soon. I cannot wait to share with you the cool stuff we’re working on right now! But, I must refrain. For now. More later!

:{>

* I found a couple good articles on the difference between Data Science and Data Engineering:
Data Science vs. Data Engineering
What is Data Engineering

My Plans for PASS Summit 2017

I’m looking forward to attending the PASS Summit 2017 next week – especially since I missed last year. If you search, you can find lots of great advice to attendees of the Summit. I agree with almost all of the advice out there and, if you’re a first-time attendee, it would serve you well to search out this advice.

Presenting!

I love the PASS Summit. I wrote about why in Why I Love the PASS Summit. It’s always an honor to present. This year I’m presenting Use Biml to Automate SSIS Design Patterns at the Summit – with demos I’ve been evolving for the past couple years. I like the latest version a lot, and I owe Scott Currie kudos for helping my C# surpass n00b status! I’m also presenting at the Seattle SQL Pro Workshop 2017 on Designing an SSIS Framework – a topic that I plan to write about next year.

My Plans

Other than presenting at the Freecon and the Summit, I hope to meet with Christian brothers and sisters the few mornings I am in town. We will share prayer requests and pray for each other, PASS leadership, and for safety and a positive experience of everyone attending the conference. This is something Mike Walsh (@mike_walsh) started years ago. He calls it PASS Prayers. Although Mike’s efforts predate many newer social-context efforts at the PASS Summit, you won’t find a table for our demographic at the conference nor any official recognition that the group even exists from the PASS organization (other than a sign to a multi-faith prayer room).

That’s not a complaint – I kind of like it that way.

That’s about it for plans.

I’ll likely go to bed late (for me), around 10:00 PM each night. Since it takes me time to unwind each evening, I’ll likely be heading for my room around 9:00 PM. I’m an early riser, so you may find me lurking in the Sheraton lobby or the Daily Grill (I love the smoked salmon for breakfast!) around 6:00 AM each morning.

Books!

The rest of the time I’ll be wandering around the Convention Center catching up with old friends and engaging with new friends. I plan to have copies of Building Custom Tasks for SSIS on me as I meander, and I plan to give them away for free. So if you want a copy, hunt me down! I’ll likely be sharing my location by Twitter (@AndyLeonard). I shouldn’t be hard to find – I’m a fat guy with a braided beard.

I’ll try to stop by as many after-hours events as possible. Most require registration and are sold out at this point… perhaps I’ll just walk by and wave.

My Advice

Travel with a friend. If you find yourself stuck somewhere, reach out on Twitter. Our community is filled with helpful people. If you find yourself in danger, call 911. I’m serious.

Take care of yourself. Be aware of your surroundings. Be aware of fellow community members and try to be helpful if you believe they need help. Seattle’s a big city (especially if you’re from a place the size of Farmville Virginia). There are many wonderful and cool people there. There are also a handful of non-wonderful and not-cool people there that you should watch out for.

Have a good time. Learn stuff. Engage people – especially people you admire. If you’re shy (I am not shy) or introverted (like me) and need practice engaging someone, I’m an easy engagement-starter-person.

Be kind. Be courteous. Be helpful. Be thoughtful.

:{>

Moving Day

With a little help from a friend – Adam Machanic (@AdamMachanic) – I have moved my old SQLBlog.com posts here to andyleonard.blog.

My old blog looks like the image above. The posts have been removed and links replace the content so older posts and links still work, but the placeholder content points folks here (I’m told it’s bad for SEO to have the same content in two locations). Adam’s code also copied images and comments from the old blog!

:{>

Parsing SSIS Catalog Messages for Lookup Performance, v2

A couple years ago I wrote Administering SSIS: Parsing SSIS Catalog Messages for Lookups. I’ve updated the script to return all Lookup Transformation performance messages captured by the SSIS Catalog with Basic logging enabled (the SSIS Catalog’s default Logging Level). Some smart data integration people recommend setting the SSIS Catalog Logging Level to Verbose and querying catalog schema views to retrieve row counts and performance metrics. I’m not a fan of running a Verbose logging level by default due to the overhead. Granted, the overhead is relatively light but it’s not nothing. Sometimes I need all the cycles!

I prefer to passively gather performance metrics – even if it’s a little convoluted and more work (for me). I don’t often need these metrics immediately available, so I execute queries similar to the query below and store the performance metrics in a table. I can (and do) build dashboards to track SSIS performance (perhaps I should blog about them…) using this passively-collected performance metrics.

Without further ado, the updated query:

Use SSISDB
go

declare @LookupIdString varchar(100) = ‘% rows in the cache. The processing time was %’
declare @LookupNameStartSearchString varchar(100) = ‘%:Information: The %’
declare @LookupNameStartLen int = Len(@LookupNameStartSearchString) – 2
declare @LookupNameEndSearchString varchar(100) = ‘% processed %’
declare @LookupNameEndLen int = Len(@LookupNameEndSearchString) – 2
declare @ProcessingTimeString varchar(100) = ‘The processing time was ‘
declare @ProcessingTimeSearchString varchar(100) = ‘%’ + @ProcessingTimeString + ‘%’
declare @CacheUsedString varchar(100) = ‘The cache used ‘
declare @CacheUsedSearchString varchar(100) = ‘%’ + @CacheUsedString + ‘%’

Select
SubString(om.[message]
, (PatIndex(@LookupNameStartSearchString, om.[message]) + @LookupNameStartLen)
, (PatIndex(@LookupNameEndSearchString, om.[message]) – (PatIndex(@LookupNameStartSearchString, om.[message]) + @LookupNameStartLen))
) As LookUpName
, Convert(bigint, Substring(om.[message]
, (PatIndex(@LookupNameEndSearchString, om.[message]) + @LookupNameEndLen)
, ((CharIndex(‘ ‘, om.[message], PatIndex(@LookupNameEndSearchString, om.[message]) + @LookupNameEndLen))

(PatIndex(@LookupNameEndSearchString, om.[message]) + @LookupNameEndLen)
)
)
) As LookupRowsCount
, Convert(decimal(16, 3), Substring(om.[message]
, (PatIndex(@ProcessingTimeSearchString, om.[message]) + Len(@ProcessingTimeString) + 1)
, ((CharIndex(‘ ‘, om.[message], PatIndex(@ProcessingTimeSearchString, om.[message]) + Len(@ProcessingTimeString) + 1))

(PatIndex(@ProcessingTimeSearchString, om.[message])+ Len(@ProcessingTimeString) + 1)
)
)
) As LookupProcessingSeconds
, Convert(bigint, Substring(om.[message]
, (Patindex(@CacheUsedSearchString, om.[message]) + Len(@CacheUsedString) + 1)
, ((Charindex(‘ ‘, om.[message], Patindex(@CacheUsedSearchString, om.[message]) + Len(@CacheUsedString) + 1))

(Patindex(@CacheUsedSearchString, om.[message]) + Len(@CacheUsedString) + 1))
)
) As LookupBytesUsed
, Convert(decimal(16, 3), (Convert(bigint, Substring(om.[message]
, (PatIndex(@LookupNameEndSearchString, om.[message]) + @LookupNameEndLen)
, ((CharIndex(‘ ‘, om.[message], PatIndex(@LookupNameEndSearchString, om.[message]) + @LookupNameEndLen))

(PatIndex(@LookupNameEndSearchString, om.[message]) + @LookupNameEndLen))
)
)
/
(Convert(decimal(16, 3), Substring(om.[message]
, (PatIndex(@ProcessingTimeSearchString, om.[message]) + Len(@ProcessingTimeString) + 1)
, ((CharIndex(‘ ‘, om.[message], PatIndex(@ProcessingTimeSearchString, om.[message]) + Len(@ProcessingTimeString) + 1))

(PatIndex(@ProcessingTimeSearchString, om.[message])+ Len(@ProcessingTimeString) + 1 ))
)
)
)
)
) As LookupRowsThroughput
, ex.start_time As ExecutionStartTime
, ex.folder_name As FolderName
, ex.project_name As ProjectName
, ex.package_name As PackageName
–Into rpt.LookupMetrics — drop table rpt.LookupMetrics
From [catalog].operation_messages om
Join [catalog].executions ex
On ex.execution_id = om.operation_id
Where om.message_source_type = 60 — Data Flow Task
And om.[message] Like @LookupIdString
— data type-checking
And IsNumeric(Substring(om.[message], (PatIndex(@LookupNameEndSearchString, om.[message]) + @LookupNameEndLen ), ((CharIndex(‘ ‘, om.[message], PatIndex(@LookupNameEndSearchString, om.[message]) + @LookupNameEndLen)) – (PatIndex(@LookupNameEndSearchString, om.[message]) + @LookupNameEndLen)))) = 1 — Check LookupRowsCount
And IsNumeric(Substring(om.[message], (PatIndex(@ProcessingTimeSearchString, om.[message]) + Len(@ProcessingTimeString) + 1 ), ((CharIndex(‘ ‘, om.[message], PatIndex(@ProcessingTimeSearchString, om.[message]) + Len(@ProcessingTimeString) + 1)) – (PatIndex(@ProcessingTimeSearchString, om.[message])+ Len(@ProcessingTimeString) + 1 )))) = 1 — Check LookupProcessingSeconds
And IsNumeric(Substring(om.[message], (Patindex(@CacheUsedSearchString, om.[message]) + Len(@CacheUsedString) + 1 ), ((Charindex(‘ ‘, om.[message], Patindex(@CacheUsedSearchString, om.[message]) + Len(@CacheUsedString) + 1)) – (Patindex(@CacheUsedSearchString, om.[message]) + Len(@CacheUsedString) + 1)))) = 1 — Check LookupBytesUsed
— catch 0-valued denominator rows
And Convert(bigint, Substring(om.[message], (PatIndex(@LookupNameEndSearchString, om.[message]) + @LookupNameEndLen ), ((CharIndex(‘ ‘, om.[message], PatIndex(@LookupNameEndSearchString, om.[message]) + @LookupNameEndLen)) – (PatIndex(@LookupNameEndSearchString, om.[message]) + @LookupNameEndLen)))) <> 0 — Check LookupRowsCount
Order By operation_id DESC

:{>

Related Training:
IESSIS1: Immersion Event on Learning SQL Server Integration Services – April 2017, Chicago
From Zero to Biml – 19-22 Jun 2017, London
IESSIS2: Immersion Event on Advanced Integration Services – Oct 2017, Chicago

Learn More:
SSIS Design Patterns and Biml: A Day of Intelligent Data Integration – Boston SQL Saturday precon, 24 Feb 2017
Save Time and Improve SSIS Quality with Biml
An Example of Data Integration Lifecycle Management with SSIS, Part 4
The Recordings for SSIS Academy: Using the SSIS Catalog are Available
SSIS Catalog Compare v2.0 Launch Event 7 Mar 2017!

Immersion Event on Learning SSIS – May 2018!

I am honored to partner with the awesome team at SQLSkills to deliver IESSIS1: Immersion Event on Learning SQL Server Integration Services in person in Chicago 7-11 May 2018!

Click here to register!

Here’s what I’ll be covering:

Module 1: Introduction

In the introduction we share information about the instructor, course flow, and content, and gather information about attendee expectations for the course. Topics covered include:

  • Training overview
  • Expectations
  • Introductions

Module 2: Creating Your First SSIS package

We believe many learn SSIS best by doing, so we get attendee’s hands on SSIS as soon as possible. In this module, we demonstrate how to build an SSIS package and invite attendees to walk through building a basic SSIS package with us. Topics covered include:

  • Introduction to SQL Server Data Tools – Business Intelligence (SSDT-BI), a version of Microsoft Visual Studio that serves as the SSIS integration development environment (IDE)
  • Visual Studio tips and tricks
  • Exercise: Create an SSIS package

Module 3: Introduction to the Data Flow Task

SSIS was designed to move data. The SSIS Data Flow Task is a complex data integration engine built to support generic data load use cases. In this module, we introduce SSIS Data Flow concepts. Topics covered include:

  • Introduction to the Data Flow Task
  • Connection Manager and Data Flow Task basics
  • Source and destination adapters
  • Exercise: Create a Real Data Flow, Part 1

Module 4: Data Flow Task 201

Once the basics of Data Flow development are understood, we introduce a basic version of the SSIS design pattern called the incremental load. Building idempotent (safely re-executable) SSIS packages is one key to successful data integration in the enterprise. Topics covered include:

  • Intermediate Data Flow concepts
  • Building re-executable data loaders
  • Exercise: Create a Real Data Flow, Part 2

Module 5: Data Flow Task 202

Change detection is one of the most important components of Extract, Transform, and Load (ETL) projects. In this module, we expand the basic change detection introduced in the previous module and begin to examine SSIS design patterns that improve SSIS Data Flow performance. Topics covered include:

  • Advanced Data Flow concepts
  • Building and tuning robust incremental loader
  • Exercise: Create a Real Data Flow, Parts 3 and 4

Module 6: The Control Flow

The SSIS Control Flow is the work flow manager for an SSIS package. In this module, we demonstrate work flow containment and management in the SSIS Control Flow, including advanced Precedence Constraint configuration and package execution control. Topics covered include:

  • Containers, precedence, and work flow
  • Transactions, restartability, and locking

Module 7: Loop Containers

Loop containers provide execution work flow isolation and iterative functionality. In this module, we examine traditional and non-traditional configurations for the For Loop and ForEach Loop containers. Topics covered include:

  • Using For Loop and Foreach Loop Containers

Module 8: Data Flow Task 301

After turning our attention to the SSIS Control Flow, we return to the SSIS Data Flow for an extended period of hands-on labs. In this module, attendees are tasked with the first of three labs that build a robust data integration loader using SSIS, with an instructor available to assist with questions. This lab focuses on transforming the data by applying strong-typing rules and managing data that violates those rules. Topics covered include:

  • Data typing basics
  • Building an incremental loader for real-world data
  • Exercise: Build a Robust Incremental Staging Loader, Part 1

Module 9: Data Flow Task 302

In this module, attendees are tasked with the second of three labs that build a robust data integration loader using SSIS, with an instructor available to assist with questions. This lab focuses on using built-in SSIS Adapter functionality to manage and execute DDL operations during data integration staging development. Topics covered include:

  • Intermediate data staging
  • Managing real-world changes to data sources
  • Exercise: Build a Robust Incremental Staging Loader, Part 2

Module 10: Data Flow Task 303

In this module, attendees are tasked with the third of three labs that build a robust data integration loader using SSIS, with an instructor available to assist with questions. This lab reinforces writing idempotent SSIS Data Flow packages and introduces data quality concepts via data cleansing in SSIS. Topics covered include:

  • Basic data cleansing
  • Deriving values
  • Exercise: Build a Robust Incremental Cleansing Loader

Module 11: Event Handlers, Logging, and Configurations

An important component of any solution is messaging and configuration. In this module, we demonstrate SSIS’s messaging model in the context of event handlers, logging, and managing externalized parameter and variable values. Topics covered include:

  • A survey of SSIS event handlers focusing on OnError and OnInformation events
  • Using SSIS’s built-in logging facility to capture package execution details and statistics
  • Using built-in package configurations to externalize variable values

Module 12: Security, Deployment, and Execution

SSIS facilitates secure connection and parameter value management. In this module, we build on our understanding of messaging, logging, and externalization by examining project, package, and connection security. We then turn our attention to deployment and execution patterns, and discuss the role of security in each. Topics covered include:

  • SSIS Package deployment options
  • Security implications
  • SSIS Package execution

Module 13: ETL Design Patterns

SSIS is a powerful enterprise data integration engine. In this module, we discuss experiences using SSIS to solve complex and “edgy” use cases, and some patterns we’ve realized from those engagements. Topics covered include:

  • Leveraging less-documented features of SSIS
  • Using Parent-Child design pattern to achieve “hands-free” custom logging and creative custom configuration
  • ETL Instrumentation

Module 14: Enterprise Execution Patterns

Data integration in a large enterprise is different from data integration in a smaller shop. In this module, we discuss patterns of SSIS development and management that lower the total cost of owning a data integration solution built using SSIS. Topics covered include:

  • Leveraging the Parent-Child design pattern and much of what we’ve learned over the past four days to build a metadata-driven SSIS execution framework engine

I hope to see you there!

:{>