Biml, Book Reviews, and Metadata-Driven Frameworks

I occasionally (rarely) read reviews at Amazon of books I’ve written. If I learn of a complaint regarding the book I often try to help. If a reader experiences difficulty with demos, I often offer to help, and sometimes I meet with readers to work through some difficulty related to the book (as I offered here). About half the time, there’s a problem with the way the book explains an example or the sample code; the other half the time the reader does not understand what is written.

I own both cases. As a writer it’s my job to provide good examples that are as easy to understand as possible. Sometimes I fail.

In very rare instances, I feel the review misrepresents the contents of a book –  enough to justify clarification. This review afforded one such opportunity. None of what I share below is new to regular readers of my blog. I chose to respond in a direct manner because I know and respect the author of the review, and believe he will receive my feedback in the manner in which it was intended – a manner (helpful feedback) very similar to the manner in which I believe his review was intended (also helpful feedback).

Enjoy.

My Reply to This Review of The Biml Book:

Thank you for sharing your thoughts in this review.

I maintain successful technology solutions are a combination of three factors:

  1. The problem we are trying to solve;
  2. The technology used to solve the problem; and
  3. The developer attempting to use a technology to solve the problem.

I believe any technologist, given enough time and what my mother refers to as “gumption” (a bias for action combined with a will – and the stubbornness, er… tenacity – to succeed), can solve any problem with any technology.

I find your statement, “It doesn’t really teach BIML but rather teaches a specific optional framework for those who want to do everything with BIML rather than increase productivity for repetitive patterns” inaccurate. Did you read the entire book? I ask that question because I most often encounter readers who state the precise opposite of the opinion expressed in this review (some of your fellow reviewers express the opposite opinion here, even). I disagree with your statement, but I understand your opinion in light of other opinions expressed to me by readers during my past decade+ of writing several books. I share one example in this blog post. The short version: we often get more – or different – things from reading than we realize.

There is a section in The Biml Book – made up of 3 of the 20 chapters and appendices – that proposes a basic metadata-driven Biml framework which is the basis of a metadata-driven Biml framework in production in several large enterprises. I wrote those 3 chapters and the basic metadata-driven Biml framework in question. Varigence has also produced a metadata-driven Biml framework called BimlFlex – based upon similar principles – which has been deployed at myriad enterprises. The enterprise architects at these organizations have been able to improve code quality and increase productivity, all while decreasing the amount of time required to bring data-related solutions to market. They do not share your opinion, although they share at least some of the problems (you mentioned ETL) you are trying to solve.

Am I saying Biml – or the framework about which I wrote or even BimlFlex – is the end-all-be-all for every SSIS development effort? Goodness no! In fact, you will find disclaimers included in my writings on Biml and SSIS frameworks. I know because I wrote the disclaimers.

Misalignment on any of the three factors for successful technology solutions – the problem, the technology, and/or the developer – can lead to impedance mismatches in application and implementation. For example, Biml is not a good solution for some classes of ETL or data integration (points 1 and 2). And learning Biml takes about 40 hours of tenacious commitment (which goes to point 3). This is all coupled with a simple fact: data integration is hard. SSIS does a good job as a generic, provider-driven solution – but most find SSIS challenging to learn and non-intuitive (I did when I first started using it). Does that mean SSIS is not worth the effort to learn? Goodness, no! It does mean complaints about simplifying the learning process are to be expected and somewhat rhetorical.

Architects disagree. We have varying amounts of experience. We have different kinds of experience. Some architects have worked only as lone-wolf consultants or as members of single-person teams. Others have worked only as leaders of small teams of ETL developers. Rarer still are enterprise architects who are cross-disciplined and experienced as both lone-wolf consultants and managers of large teams in independent consulting firms and large enterprises. Less-experienced architects sometimes believe they have solved “all the things” when they have merely solved “one of the things.” Does this make them bad people? Goodness, no. It makes them less-experienced, though, and helps identify them as such. Did the “one thing” need solving? Goodness, yes. But enterprise architecture is part science and part art. Understanding the value of a solution one does not prefer – or the value of a solution that does not apply to the problem one is trying to solve – lands squarely in the art portion of the gig.

Regarding the science portion of the gig: engineers are qualified to qualitatively pronounce any solution is “over-engineered.” Non-engineers are not qualified to make any such determination, in my opinion.

By way of example: I recently returned from the PASS Summit. Although I did not attend the session, I know an SSIS architect delivered a session in which he made dismissing statements regarding any and all metadata-driven frameworks related to ETL with SSIS. If I didn’t attend his session, how do I know about the content of the presentation? A number of people in attendance approached me after the session to share their opinion that the architect, though he shared some useful information, does not appreciate the problems faced by most enterprise architects – especially enterprise architects who lead teams of ETL developers.

My advice to all enterprise architects and would-be enterprise architects is: Don’t be that guy.

Instead, be the enterprise architect who continues to examine a solution until underlying constraints and drivers and reasons the solution was designed the way it was designed are understood. Realize and recognize the difference between “That’s wrong,” “I disagree,” and “I prefer a different solution.” One size does not fit all.

Finally, some of the demos and Biml platforms were not working at the time you wrote this review. Many have been addressed since that time. In the future, updates to SSIS, Biml, and ETL best practices will invalidate what this team of authors wrote during the first half of 2017. As such, the statements, “And much of the online information is outdated and no longer works. It’s a shame someone hasn’t found a way to simplify the learning process.” is a tautology that defines progress. Learning is part of the job of a technologist. I encourage us all – myself most of all – to continue learning.

Peace.

Catalog Browser v0.7.8.0

I’ve been making smaller, more incremental changes to SSIS Catalog Browser – a free utility from the Data Integration Lifecycle Management suite (DILM Suite).

You can use SSIS Catalog Browser to view SSIS Catalog contents on a unified surface. Catalog Browser works with SSIS Catalogs on-premises and Azure Data Factory SSIS Integration Runtime, or Azure SSIS. It’s pretty cool and the price ($0 USD) is right!

The latest change is a version check that offers to send you to the page to download an update. You will find this change starting with version 0.7.7.0.  Version 0.7.8.0 includes a slightly better-formatted version-check message. As I said, smaller, more incremental changes.

Enjoy!

Two (or More) Kinds of Developers

I’ve made statements about “two kinds of developers” for years. These statements are false inasmuch as all generalizations are false. The statements are not designed to be truisms. They are designed to make people think.

Last week – while presenting a full-day pre-conference session at the PASS Summit 2018 and again when delivering a session about Faster SSIS – I repeated the sentiment shown at the top of this post:

There are two kinds of developers:
1) Those who use source control; and
2) Those who will.

I follow up with: “Because if you do not use source control, you will lose code one day and it will break your heart.” Audience members laugh and the point is made.

More Than Two

There are myriad types of developers. And that’s a good thing. Why? Because there are myriad problems developers face in the wild and those problems need to be solved.

There are no one-size-fits-all solutions. If you attended the PASS Summit last week you likely saw some really cool demos. I know I did. And you may have thought – or even been told – that this, this right here is the answer for which you’ve searched your entire career.

There’s a possibility that the people selling you on whatever-this-is are absolutely correct.
There’s a greater possibility that they are less than absolutely correct.

I write this not to disparage anyone’s solution (or my own solutions). Promise.
I write this to dissuade the disparaging of anyone else’s solution (because that also happens).

The Bottom Line

Goldilocks. The bottom line is we all want the Goldilocks solution. We want the just-right solution that maximizes efficiency and minimizes complexity.

That’s just hard.

I can hear you thinking, “Why is maximizing efficiency and minimizing complexity hard, Andy?” I’m glad you asked. Solutions are a moving target, part art and part science, and the only way to learn where and when to draw the art-science line is experience.

A Moving Target

Maximizing efficiency and minimizing complexity is hard because it’s not at all the bottom line; it’s a line in the middle – a balancing of mutually-exclusive demands on your time, expertise, and energy.

Plus, it shifts.

Everything scales. Things scale either up and / or out or they scale down and / or in. In general (generality warning!), down and in is “bad” and up and out is “good.”

Experience Matters

Experienced architects understand subtle nuances; the art part of the art / science of enterprise software. When you speak with an experienced architect, you may hear her say, “It depends,” often. Good architects will finish the sentence and share at least some of the things upon which “it depends.” Less-experienced architects will present naked scalars and polarized advice.

Naked Scalars

Naked scalars are numeric values in a vacuum. They are unsupported because most are unsupportable. In other words, they are lies. Now, “lies” is a pretty harsh word. I prefer an engineering definition for the word “truth” that sounds an awful lot like the oath witnesses are asked to swear in US courts:

“Do you promise to tell the truth, the whole truth, and nothing but the truth, so help you God?”

This oath covers falsehoods that are shared, yes; but it also covers omissions of the truth.

Examples of naked scalars:

  • “97% of engineers believe ____.”
  • “10% of the people I know have practiced ____ successfully.”

Polarized Advice

Polarized advice can be a special case of naked scalars, advice focused on 0% and 100%. Polarized advice may or may not include scalars (naked or otherwise).

Examples of polarized advice:

  • “I’ve never seen a good use case for ____.”
  • “You should always ____.”

Are naked scalars and polarized advice always bad and wrong? Nope. That would be a generality (and we covered generalities already, did we not?).

Managing the Risk of Inexperience

What exactly is a consultant communicating when they engage naked scalars or polarized advice?
They are signalling a lack of experience.
They are, in effect, stating, “I do not have experience with ____.”

How do you manage the risk of inexperience?
You hire people – architects, especially – who understand there are good reasons systems are designed as they are. They will say things like, “I’m not sure why this was designed this way,” and mean it. It’s not a criticism; it’s an admission of curiosity. Trust me on this: You want curious consultants. They are more likely to identify a solution that solves the problem you are trying to solve in a way that doesn’t create new problems. Returning to the good reasons systems are designed as they are…

  1. Are (or were) the good reasons, well, good? Sometimes.
  2. Do the good reasons scale? Sometimes.
  3. Do the good reasons stand the test of time? Sometimes.

Good architects discern the baby from the bath water. Their experience separates good architects from the crowd. Not-as-good architects are less flexible, less willing to learn, and loathe to admit mistakes.

Let’s face facts, though: All architects and developers know what they know and don’t know what they don’t know. Better architects recognize these uncomfortable truths and mitigate them.

One way to mitigate inexperience – the best way, in my opinion, is to work with others.

The Story Of Us

At Enterprise Data & Analytics, our consultants and architects work together as a team. Our diverse group is a strength, bringing perspective to bear on the problems you are trying to solve. Our experience levels vary, the software and tools with which we work vary, and our demographics vary. As owner, I am honored to lead a team from diverse cultural – as well as diverse technical – backgrounds.

I didn’t set out to build a cultural- / age- / gender-diverse team. I set out to find the best people – to do what Jim Collins describes as “getting the right people on the bus.”

I found, though, that focusing on getting the right people on the bus had the side-effect of building a cultural- / age- / gender-diverse team.

As an added bonus, people of different genders approach problem-solving differently. People of different ethnicity pick up on stuff – especially cultural stuff, including enterprise culture – that people of other cultures miss.

EDNA‘s diversity is a strength that emerged unintentionally, but emerged nonetheless. As Chief Data Engineer, it’s very cool to watch our less-experienced consultants growing into more-experienced consultants and architects, while at the same time watching our people interact and perform as a team – each member catching stuff and contributing ideas because of their unique perspectives.

Cost Value

I can hear some of you thinking, “We’re on a budget here. Don’t good architects cost more than less-than-good architects, Andy?” I feel you. The answer is, “No. Good architects cost less than less-than-good architects.”

I can prove it. Because math. (Read that post for more information…)

It’s often accurate that good architects cost more per hour than less-than-good architects. Do you know why good architects charge more per hour?

Because they are worth more per hour.

(Generality!)

But consider this: “Time will tell” is a tried and true statement. Like good wine, the likelihood a generality is accurate improves with age. If enterprises continue to hire an organization – like Enterprise Data & Analytics or any other firm – to help them solve the problems they are trying to solve, then the folks shouting them down may be doing so in an effort to compete. Competition is fine, but I never hire anyone who talks bad about other clients or the competition. Why? They’ve demonstrated the capacity to talk bad about me at some later date.

Conclusion

I love our team!
I love our expertise!
I love our diversity!
I love that we always deliver value!

Contact me to learn more.

:{>

Want a Free #DILM Book? See Me at the #PASSsummit 2018!

I’ll be at the PASS Summit 2018 next week. I’m delivering a full-day precon Monday, presenting Faster SSIS and participating in the BI & Data Visualization Panel Wednesday, and Enterprise Data & Analytics is exhibiting Wednesday through Friday.

“You Mentioned a Free Book…”

Oh. Yeah. That.

I will be giving away free copies of my latest book: Data Integration Life Cycle Management with SSIS: A Short Introduction by Example starting Wednesday! Would you like to score a free copy? You’ll have to catch me carrying them in the Washington State Convention Center.

They are free while supplies last.

Where Can You Find Andy at the PASS Summit 2018?

Precon!

Monday 5 Nov 2018, I’m delivering a full-day pre-conference session titled Intelligent Data Integration with SSIS. I’m going to cover  everything listed at that link but there is an update about my precon content:

There will be Azure Data Factory content and demos!

Why this addition? Two reasons:

  1. My presentation titled Faster SSIS was selected for Wednesday, 7 Nov 2018 at 1:30 PM PT in room 612. I usually include the three Faster SSIS demos in my precon. This time, you can just view the Faster SSIS session to see those demos. I’ve added some Azure Data factory to the precon in place of these three demos!
  2. I am participating in a panel discussion titled BI & Data Visualization Panel Wednesday, 7 Nov 2018 at 3:15 PM PT in room TCC Yakima 1.

Enterprise Data & Analytics is Exhibiting!

That’s right, you can find me in the Exhibition Hall! Enterprise Data & Analytics is exhibiting at the PASS Summit 2018!

If you are attending the PASS Summit 2018, please catch me for a free book while supplies last. If I’m out of physical copies, come see me anyway… we will work something out. Promise!

:{>

DILM Tiers for the SSIS Enterprise

“How many Data Integration Lifecycle Management tiers (DILM tiers) do I need to deploy SSIS, Andy?” If I had a nickel for every time I am asked this question, I would have a lot of nickels. My answer is, “Four.”

Why Four?

I’m glad you asked. Before I answer, I have a question for you: Have you ever been told by a developer, “It works on my machine.”? Do you know why it works on the developer’s machine?

Defaults.

Developers build software that contains configurations information. In the first iteration of building the software, most developers do no externalize – or parameterize – all configuration values. Instead, they hard-code these values into the application, usually as variable defaults.

I can hear you thinking…

“But This Is SSIS, Andy”

And? Oh, you ask because SSIS stands for “SQL Server Integration Services.” I get that. It’s important you understand this:

SSIS development is software development.

SQL Server Integration Services suffers somewhat from its name which contains the name of a popular relational database engine (SQL Server). But SSIS is not a database, and SSIS development is not database development (and most definitely not database administration).

SSIS development is software development.

Software Best Practices Apply

Because SSIS development is software development, software development best practices apply. For example, SSIS developers should source-control and test their SSIS code.

Perhaps testing is that thing you do in Production right after deployment. I have this pithy saying:

While I freely admit this statement is pithy, it is also true. The remainder of this quote is even pithier: “Some software is tested by your soon-to-be-former largest customer.”

I don’t want you to lose customers – especially your largest customer. That’s why I advocate you test your software by developing it on one Data Integration Lifecycle Management – or DILM – tier and then promoting the code to at least two other DILM tiers before deploying to the Production tier.

Tiers. Or Tears. Your Call.

So which tiers do I recommend? I’m glad you asked! i recommend:

  1. Development
  2. Test
  3. QA
  4. Production

Development

You must build the software somewhere. Wherever you build the software is your development environment.

Now let’s take a step back and describe the development environment for an SSIS developer. An SSIS development DILM tier shares some characteristics. In Dev, SSIS developers can:

  • Deploy SSIS projects and packages
  • Overwrite existing SSIS projects and packages
  • Delete existing SSIS projects and packages
  • Execute SSIS packages
  • Manage (deploy, overwrite, delete) SSIS configurations
  • Manage (deploy, overwrite, delete) SSIS metadata
  • Create databases, tables, and files

The most important part of the SSIS Development DILM tier? SSIS developers can perform each action listed above without help from anyone else in the enterprise.

I can hear you thinking, “What if the SSIS developers blow something up, Andy? Huh? HUH?!?” Let me ask another question: What happens when things go awry or amiss in Production? Isn’t the answer to that question, “I fix it.”? If that’s true in Production…

Then you fix it.
In Dev.

In fact (and I know this is a crazy thought) you could actually use a more-open Development tier to test and – perhaps, even – improve break-fix procedures for Production. After all, developers are notoriously excellent testers (I write this as a developer: I can break things like nobody’s business!).

What if the Dev tier becomes a place for you to test your recovery procedures for Production? If your HA/DR procedures work in Development where developers have too much access, then they should work in Production where the environment has been locked down. Am I correct?

Over the years I have found myself making this statement to enterprises: “You have an environment named Development but it is not a Development environment.” How do you know it’s a Development environment? If the developers can destroy it, it’s Dev.

One last thing before I kick this soapbox back under my desk. Consider that developers will be inconvenienced by the destruction of their Dev environment. It may slow them down. Heck, it may cost them a deadline. If that happens, you can take the opportunity to educate them, sharing what happened and why you believe it happened. Maybe they won’t do the dumb thing that burned down Dev again… Maybe.

Would having developers that better understand databases make your job a little easier in the long run?

Test

Once the software works on the SSIS developer’s machine, they need another environment to which they can deploy the SSIS project. Why? Remember those defaults? SSIS packages will always execute in Dev because default values are aimed at local resources for which the SSIS developer has authorization and access.

Which resources? Required access (CRUD – create, read, update, delete) to files and directories, databases, servers, etc. No SSIS developer would claim an SSIS package is ready unless and until she or he successfully executes the package in the SQL Server Data Tools (SSDT) debugger. But a successful SSDT debugger execution does not a completed SSIS package make.

To suss out missing configurations (externalization, parameterization), one needs to move the software to another lifecycle management tier and execute it there. Only then can the developer be sure they’ve externalized required defaults.

In Test, SSIS developers can:

  • Deploy SSIS projects and packages
  • Overwrite existing SSIS projects and packages
  • Delete existing SSIS projects and packages
  • Execute SSIS packages
  • Manage (deploy, overwrite, delete) SSIS configurations
  • Manage (deploy, overwrite, delete) SSIS metadata
  • Create databases, tables, and files

I can hear you thinking, “But Andy, this is the same list of permissions as Dev!”
Yes.

This part of the work must be completed. As professionals, we get to decide when and where the work gets done. Remember, it can always be done in Production. And it will be done in Production unless we are intentional about doing this work on another DILM tier.

Someone needs to do this part of the work. Do you want to do it? Or would you rather have the person most familiar with the code – the SSIS developer – do it? They are not going to catch everything that needs to be externalized, either; not on this pass. But the Pareto Principle informs us that they will catch 80% of the missing externalization.

Deployment to the DILM Test tier serves as the first deployment test. I want the SSIS developers bundling up SSIS deployments. Why? It’s good practice and practice makes perfect.

On Estimation…

One last point regarding software testing: When estimating a software project I begin by estimating how much time it will take to build the software. After this step the calculation for how much time is required to test the software is really easy: It’s the same amount of time required to develop the software.

Andy, are you telling me it takes just as long to test software as to develop it?

Yep.

“Cutting Testing to Save Time”

You can change this relationship by “cutting testing to save time.” But you can never shorten testing, only lengthen it. Decades of software development experience inform me that the costs of testing software increase by an order of magnitude for each tier of testing. The most expensive environment in which to test is Production.

Cutting testing never saves time.

Or cutting testing does save time; the time it takes you to deliver broken software. I can hear some of you thinking, “But can’t SSIS developers just develop better SSIS packages with fewer bugs?” The answer is the same as the answer to the physics question: “Can’t light just be either a wave or a particle, and not both?” Physics says no, light is both a wave and a particle. Andy says the same physics apply to software development along with the Theory of Constraints which tells us losses accumulate, gains do not.

As my friend Aaron Lowe used to say (before he passed away earlier in 2018 – man, I miss him, and others… we lost too many SQL Family members this year…), “Math is hard.”

You must include the cost of lost customers in the calculation.

Remember, all software is tested…

QA

The QA (or UAT or whatever you call the environment sitting just this side of Production) environment should be locked down tighter than a tick. The security in place in this pre-Production tier should be identical to the security in the Production tier. Optimally, all hardware capability and storage capacity in Production will match QA.

If you understand the dynamics of SSIS Data Flow Task internals, it’s possible to test performance with less data on sub-optimal hardware. If you ask me to do it, I’m going to let you know there’s a possibility we will miss something. I’ll be able to tell you for sure after we run some performance test executions in Production (and we will need to do that to be sure, and that will cost both time and money, so how much money are you saving when you save money by buying sub-optimal hardware and storage for QA?).

Deployment to QA should be performed by the same team – optimally by the same individual – that will be performing the deployment to Production. Why? We do not want deployment to Production to be their first experience with deploying this SSIS project. The deployment to QA is another deployment test, this time for the Production release management person. Perhaps this is a DBA. Perhaps this is a Release Management team member. Regardless, this step is their opportunity to practice the deployment to Production.

I can hear you thinking, “What’s the big deal, Andy?” I’m glad you asked. The big deal is: No matter how you execute SSIS in your enterprise, there’s more to the solution than merely the SSIS packages. If you’re running from the file system (and most enterprises execute SSIS from the file system), there are configurations stored in dtsConfig files or database tables that also need to be promoted. Some of the values – such as connection string components – need to be edited in these dtsConfig files at each DILM tier.

The same holds for deployments to the SSIS Catalog. And… if you are deploying (or planning to deploy) SSIS to the Azure Data Factory (ADF) SSIS Integration Runtime, your only option is deployment to an SSIS Catalog. Externalization is accomplished via Catalog Environments, References, and Literals in the SSIS Catalog. Same stuff, different location. But both dtsConfig files and Catalog Environments must be managed.

Since editing will need to occur when it’s time to deploy to Production, I prefer the deploy-er practice the editing along with the deployment. I ask the developer to send the SSIS and scripts or files for managing external configurations metadata to the deploy-er, and I request these configuration artifacts be set up for the Test DILM tier.

Production

If you’ve taken my advice and marched SSIS development through this process or a similar process, the Pareto Principle says you’ve identified at least 96% of the bugs prior to deployment to Production. My experience bears this out; about 4% of my Production deployments fail because some parameter slipped through the cracks walking through Dev, Test, and QA.

I can hear you thinking, “How does that happen, Andy?” Excellent question (and I am glad you asked!). parameters slip through the cracks because Dev and Test may be sharing data sources and/or destinations. It may be possible – desirable, even – to access Development data from the Test tier and vice versa.

Isolation

Remember: I want QA to be identical to Prod, which means I want it equally isolated. Is Production behind a firewall? Is it impossible (without poking a hole in the firewall) to write from Test to Prod? If so, then it should also be impossible to write from Test to QA and from QA to Prod. If Prod has a firewall, QA should have a firewall.

If your enterprise sports this architecture you can beat Pareto’s 4% prediction by 3.2% (an additional 80% of bugs will be caught) and experience a deployment-to-Production success rate of 99.2% (a failure rate of 0.8%).

To quote Foghorn Leghorn, “figures don’t lie.”

Conclusion

Is this the only way to properly manage SSIS in an enterprise? Goodness, no.

You can add more tiers to your Data Integration Lifecycle as needed. You can technically get by with three tiers (Dev, Test, Prod) but I don’t recommend it unless you’re the person wearing all the hats. Even then, the more times you test deployment, the more bugs you will catch prior to Production, and the fewer bugs you will deploy to Production.

It’s always easier to deploy more bugs but it is never less-expensive and it never saves time. Plus it may cost you your soon-to-be former largest customer.

SSIS Project Incompatible

Has this ever happened to you? You are opening an SSIS project and… it won’t open. Instead of a Control Flow filled with awesome tasks and containers, you see a message in Solution Explorer telling you “The application is not installed.”

You know this isn’t right because you built the SSIS project on this same machine just a day or two ago! What gives?

Well, you may have seen a message informing you the Integration Services Projects 2.0 extension took too long to load. If that happens, you are presented with three options:

  1. Disable the extension
  2. Ignore the message
  3. Never show this message again

If you click option #1, you create this very issue.

How to Fix It

Open Visual Studio and click Tools–>Extensions and Updates:

When the Extensions and Updates window displays, select the Microsoft integration Services Projects entry and click the Enable button:

The Enable button text will change to Disable after the Microsoft Integration Services Projects extension is enabled:

Note: the change will not take effect until you restart Visual Studio.

Restart Visual Studio.

Depending on how you open the SSIS project, you may still get the same messages in Solution Explorer:


Reload the project to check:

Reloading the project almost always works for me:

You’re done! Happy integrating!

:{>

Honored to Present Faster SSIS at Triad SQL PASS BI 30 Oct!

I am honored to join my friends at the Triad SQL PASS BI Group in Greensboro North Carolina 30 Oct 2018, where I will present Faster SSIS!

Abstract

Ever wonder why SSIS runs so slowly? Watch SSIS author Andy Leonard as he runs test loads using sample and real-world data, and shows you how to tune SQL Server 2016 Integration Services (SSIS 2016) packages. We’ll start by experimenting with SSIS design patterns to improve performance loading AdventureWorks data. We will implement different change detection patterns and compare execution performance for each. Then, we’ll explain a Data Flow Task’s bottleneck when loading binary large objects – or Blobs. Finally, we’ll demonstrate a design pattern that uses a Script Component in a Data Flow to boost load performance to MySql, whether on-premises or in the cloud. Prerequisites: Familiarity with SQL Server Integration Services (SSIS).

Register today! I hope to see you there.

:{>

Stairway to Biml 9 is Live!

I am pleased to share Stairway to Biml 9 – Refactoring a Simple Custom Biml Framework is now live at SQL Server Central!

Typing that, I realize I neglected to publicize the fact that two other Stairway to Biml articles were recently published at SQL Server Central:

These three articles are taken from two chapters in The Biml Book:

  • Chapter 6: A Custom Biml Framework
  • Chapter 7: Using Biml as an SSIS Design Patterns Engine

Level 7 is focused on collecting and storing database metadata that will be used by Biml to automatically generate SSIS packages for a data integration solution.

Level 8 centers on one method for using the collected metadata to build a simple Biml-driven framework that automatically generates the SSIS packages for such a solution.

Level 9 describes and demonstrates one way to refactor this simple Biml framework, decoupling design patterns that were hard-coded in previous examples.

Enjoy!

:{>

 

“Do I Need a Biml Framework, Andy?”

No. No, you do not need a Biml Framework.

In the same way that I do not need a vehicle powered by an internal combustion engine, you do not need a Biml framework.

You see, I live about 5 miles from Farmville Virginia. I could walk to town each time my family needs groceries. I could pack as many groceries as I can carry into a backpack and walk the 5 miles back home. There are several benefits to this, not the least of which is the exercise that I would get walking to the store and then walking back – carrying a load of groceries, even!

“Is walking to the grocery store efficient, Andy?” Well, if you’re going get all persnickety about it… I suppose not. But think of all the money I would save on vehicle maintenance and fuel – plus the benefits of exercise!

Does this reasoning sound silly to you? It does to me (and I wrote it). Let’s add a dose of reality, shall we?

The Art of Data Integration Architecture, Part 1

Like many sciences, data integration is part art and part science. The art part is just good judgment. My lovely bride, Christy, has an awesome saying about good judgment: “Good judgment comes from experience, and experience comes from bad judgment.” With that in mind, I share the following in the hope that it saves you some… experience:

Don’t Use Biml…

If you are building a data integration solution that loads data from one source to one destination, do not use a Biml Framework. Heck, don’t even use Biml for this. Build this solution manually in SSIS or T-SQL or some combination thereof. It’s quicker, easier, and less-frustrating to construct a loader manually.

Use Biml…

If you are building a data integration solution that loads data from a couple dozen sources to a collection of destination tables, use Biml but do not use a Biml Framework. It takes about 40 hours to become proficient in Biml. To learn Biml (or anything, really), you first need a real-world problem to solve. Next, you need the gumption to try and solve that problem using a tool you have not yet used to solve a real-world problem.

Why wait until you need to build a couple dozen SSIS packages? It takes me about 2 hours to build and test an SSIS package that incrementally loads data. The math works:

24 x 2 = 48

If I take the time to learn Biml, which will take about 40 hours, I can complete two dozen loaders in less time than it would have taken me to build those SSIS packages manually. Plus I’ve learned something. More on this in a bit…

Use a Biml Framework…

After you’ve invested the initial 40 hours in learning Biml… If you are building a data integration solution that loads data from one or more sources to a collection of destination tables, use a Biml Framework. Once your framework is built, you can decide when it makes sense to use your automation and when it makes sense to build loaders manually.

I wrote about a Biml Framework here.
You can download an early example of a Biml Framework here.

Future-Proofing

Perhaps you’re looking at all the innovation and automation happening around you and thinking, “This is just some passing fad. The djin will be put back in the bottle soon and we’ll return to the days of carving our own integrated circuit chips out of wood.” Ok, maybe you’re not thinking precisely that. But I hope and pray you get my drift.

If the announcements of the latest Microsoft conference have taught you anything, they should have taught you that automation is following the arrow of time. Automation is on the increase in both volume and complexity. One benefit of having lived over half a century (and doing technology for most of that time) is perspective:

I promise you, the djin no longer fits in the bottle.

If I’m scaring you, good.

I’m not arguing the merits of automation.
I am arguing the facts of automation.

I am not attempting to assess the potential good – or potential harm – of automation. I am stating that it’s here to stay – and grow.

If you want to live and work in a field where you can master a craft and simply do the same thing until you retire, technology is not the field for you. Technology is the field for you if you enjoy learning and growing.

“How much time should I spend learning new stuff, Andy?” Experience informs me the answer is “about 25% of your time should be spent learning and growing.”

“Nummers”

My brother and friend Kevin Hazzard (DevJourney | LinkedIn | (awesome and fascinating) DataDriven interview) refers to “numbers” as “nummers.” There are a nummer of reasons I like this mispronunciation… perhaps I will explain another time. I bring up nummers here because – as my friend and brother Aaron Lowe used to tell me (before he passed away earlier in 2018 – I miss him…):

Math is hard.

I once delivered a solution that contained several hundred SSIS packages using Biml. The unit-of-measure for my estimate, based on the nummers I shared earlier (2 hours per package based on an incremental load SSIS design pattern) was “months.”

I delivered the solution in days using Biml.

It’s possible your employer isn’t good at math.
It’s possible your employer is more interested in providing you a job than in making a profit by remaining competitive (I love it when politicians speak of providing jobs… don’t get me started…).

The nummers argue that one day things may will change for your employer. If When that day arrives, things may will also change for you.

(Did I mention that I hope I am scaring you?)

So… Learn!

At the time of this writing, it is not possible for others to take your knowledge away from you. I pray that day never comes, but more than half-century of existence has trained me to not by surprised by anything.

So… learn!

Learn as much as you can. Consider knowledge a hedge against future changes. Add knowledge of the new and shiny to your experience. I don’t care how old or how young you are, start today.

I share this in the same spirit of you wanting me to get more exercise by walking to town to buy groceries, except that I am right about this and you were… less efficient. 😉

Start at SQL Server Central. It’s free.

Well, you have to give them your email address.

I understand that bothers some folks, but I encourage you to carefully weigh the benefits of sharing your email address with an entity that provides this much free education against falling behind in technology trends, application, and education. It’s totally your call, but you may find yourself in possession of the cleanest Inbox Zero of anyone who is uninformed and increasingly less-employable…

The Stairways are awesome.

Head over to edX. Many courses there are free. If you want, you can pay for certificates that you can post on LinkedIn. But – please – go get the knowledge.

 

If you want to dive deeper, sign up for Biml Academy or SSIS Academy or Fundamentals of Azure Data Factory! I’m still building the material so the costs for subscribing to the academies are low.

Peace.

T-SQL Tuesday: When Good Projects Go Bad

Thanks to Jeff Mlakar for hosting this month’s T-SQL Tuesday, a monthly collection of blog posts by data bloggers who work and live in the Microsoft SQL Server space.

This month’s topic is about projects that have gone wrong.

“Why Do Good Projects Go Bad, Andy?”

Deltas.

That’s all. Deltas. Allow me to explain.

In life, engineering, and projects, we experience deltas in a variety of forms.

IRL…

As a manager I learned there’s a difference between how we are perceived and how we think we are perceived. I label this the “self-awareness delta.”

In Engineering…

In engineering terms, delta is the letter of the Greek alphabet used to represent a difference or change or gap.

In Projects…

Projects that will eventually fail go astray early on in ways that are either invisible or ignored. Projects disappoint or fail. I’ve seen (and believe) statistics that 85% of business intelligence projects fail or disappoint. Disappointment or failure begins as some delta – some difference – between what is real and what is perceived as real.

Detecting Deltas

Think about how we discover there’s a delta in our self-awareness… it requires, well, self-awareness! It turns out that detecting deltas requires meta thinking. The risk of being too meta? We can succumb to circular logic. Thinking meta involves some circular – or, at least, circumspect – reasoning.

Our perspective is also limited by our perspective. A wise man once said it’s difficult to remove the speck from our brother’s eye while we have a plank in our own eye. (Note: Andy’s Plank Removal Service Offering currently has but a single customer…)

So how can we achieve delta detection?

I regularly reference Kathryn Schulz’s TED talk on Being Wrong. Kathryn begins her (excellent) TED talk by stating that we do not know what it feels like be wrong. But she concludes her introduction by contradicting herself (which is totally appropriate… think about it…) and states that we do know what it feels like to be wrong; it feels like we’re right.

Ownership

I’ve seen projects fail. Heck, I’ve seen companies fail. I’ve made mistakes and when I’ve caught them, things have sometimes improved. Sometimes I catch them too late to effect a complete or partial remedy. It’s rare, but it happens.

The first step on the path to (project) health is to admit we have a problem:

“We have a problem.”

Another step is communication. I prefer over-communication.

Over-Communication

I’ve never once had a client say to me, “Andy, you are communicating too much about this project.”

Not. Once.

I’ve had projects go sideways because I failed to communicate effectively or enough. Here’s one thing I’ve learned about communication in projects:

If I communicate enough, I will communicate effectively.

Failure, By Degrees

Projects that fail first go sideways. Were we to graph the trajectory of a project failing, we would not see a 90° turn at the point where the project goes sideways. Instead we would see a gradual departure – a curving away – from the path to success. The departure would reach 90° and then – if not corrected – continue to 180°. At this point the project is irrecoverable. A project may become irrecoverable at 90°, even. It depends on the project.

Fix It!

I can hear you thinking, “How does one fix it, Andy?” I’m glad you asked.

If you catch the departures at 1° – or even 9° – it takes much less effort and time to address and correct the departure.

This sounds simple and it is, but it’s also difficult. Remember that meta thinking is required, and meta thinking is hard. Simple, yes; difficult to implement and accomplish.

DfG

I address “Deviation from Good” (DfG) by over-communication. It’s the drum I beat throughout any project I lead. How do I implement DfG?

TPS Status Reports

I loathe bureaucracy. Especially bureaucracy for bureaucracy’s sake. I learned the reason I loathe status reports is I am not naturally good at tracking status. I struggle to remember details of what I did all week at the end of a week. Being an engineer, I engineered a solution:

A few years ago, my friend (and hero) Donald Farmer (Treehive Strategy | @DonaldTreeHive | LinkedIn) shared this Notepad hack with me. If you start a Notepad file with the text “.LOG” followed by a carriage-return / line feed (hit the Enter key), Notepad can provide a log template. Save the file with “.LOG” on the first line. Each time you open the file in Notepad a new line will be added containing the current date and time. Try it. It’s cool.

I use the following format to log project time:

Date time (automated by Notepad)
Customer
Project Name
Start

Customer
Stop

I use this to track my time. I’ve even built an SSIS package that reads this file and stores the data in a database. I’m able to generate status reports from this data very efficiently.

My former boss and mentor, Bennett McEwan (LinkedIn), challenged me when I reported to him at Unisys 10+ years ago. When I complained that my status report was taking an hour or more each week Ben said, “Your status report should only take you five minutes.” My kneejerk reaction was not positive. But then I engaged to solve the problem, remembered Donald’s Notepad hack, and the rest is history.

Come To Jesus Meetings

I’ve been wanting to use “Come to Jesus Meetings” in a heading for a long time. It’s not a perfect fit here but I must confess I’ve become tired of waiting. So I shoe-horned it in right here.

I visit (if possible) or call the sponsor for each project on which Enterprise Data & Analytics is actively engaged and ask questions similar to:

  • What questions do you have for me?
  • How can I better serve you?

These conversations identify expectation deltas at 1° – or 9° – deviation… a place where almost any issue can be efficiently addressed within the confines of the current work effort.

These conversations keep a project on track.

Are these conversations pleasant?
Not always.
But they are extremely valuable.

Failure is normal.
Failing fast is almost always a good thing.

I’ve learned over-communicating is vital to mitigating project disappointment or failures.