Biml, Book Reviews, and Metadata-Driven Frameworks

I occasionally (rarely) read reviews at Amazon of books I’ve written. If I learn of a complaint regarding the book I often try to help. If a reader experiences difficulty with demos, I often offer to help, and sometimes I meet with readers to work through some difficulty related to the book (as I offered here). About half the time, there’s a problem with the way the book explains an example or the sample code; the other half the time the reader does not understand what is written.

I own both cases. As a writer it’s my job to provide good examples that are as easy to understand as possible. Sometimes I fail.

In very rare instances, I feel the review misrepresents the contents of a book –  enough to justify clarification. This review afforded one such opportunity. None of what I share below is new to regular readers of my blog. I chose to respond in a direct manner because I know and respect the author of the review, and believe he will receive my feedback in the manner in which it was intended – a manner (helpful feedback) very similar to the manner in which I believe his review was intended (also helpful feedback).

Enjoy.

My Reply to This Review of The Biml Book:

Thank you for sharing your thoughts in this review.

I maintain successful technology solutions are a combination of three factors:

  1. The problem we are trying to solve;
  2. The technology used to solve the problem; and
  3. The developer attempting to use a technology to solve the problem.

I believe any technologist, given enough time and what my mother refers to as “gumption” (a bias for action combined with a will – and the stubbornness, er… tenacity – to succeed), can solve any problem with any technology.

I find your statement, “It doesn’t really teach BIML but rather teaches a specific optional framework for those who want to do everything with BIML rather than increase productivity for repetitive patterns” inaccurate. Did you read the entire book? I ask that question because I most often encounter readers who state the precise opposite of the opinion expressed in this review (some of your fellow reviewers express the opposite opinion here, even). I disagree with your statement, but I understand your opinion in light of other opinions expressed to me by readers during my past decade+ of writing several books. I share one example in this blog post. The short version: we often get more – or different – things from reading than we realize.

There is a section in The Biml Book – made up of 3 of the 20 chapters and appendices – that proposes a basic metadata-driven Biml framework which is the basis of a metadata-driven Biml framework in production in several large enterprises. I wrote those 3 chapters and the basic metadata-driven Biml framework in question. Varigence has also produced a metadata-driven Biml framework called BimlFlex – based upon similar principles – which has been deployed at myriad enterprises. The enterprise architects at these organizations have been able to improve code quality and increase productivity, all while decreasing the amount of time required to bring data-related solutions to market. They do not share your opinion, although they share at least some of the problems (you mentioned ETL) you are trying to solve.

Am I saying Biml – or the framework about which I wrote or even BimlFlex – is the end-all-be-all for every SSIS development effort? Goodness no! In fact, you will find disclaimers included in my writings on Biml and SSIS frameworks. I know because I wrote the disclaimers.

Misalignment on any of the three factors for successful technology solutions – the problem, the technology, and/or the developer – can lead to impedance mismatches in application and implementation. For example, Biml is not a good solution for some classes of ETL or data integration (points 1 and 2). And learning Biml takes about 40 hours of tenacious commitment (which goes to point 3). This is all coupled with a simple fact: data integration is hard. SSIS does a good job as a generic, provider-driven solution – but most find SSIS challenging to learn and non-intuitive (I did when I first started using it). Does that mean SSIS is not worth the effort to learn? Goodness, no! It does mean complaints about simplifying the learning process are to be expected and somewhat rhetorical.

Architects disagree. We have varying amounts of experience. We have different kinds of experience. Some architects have worked only as lone-wolf consultants or as members of single-person teams. Others have worked only as leaders of small teams of ETL developers. Rarer still are enterprise architects who are cross-disciplined and experienced as both lone-wolf consultants and managers of large teams in independent consulting firms and large enterprises. Less-experienced architects sometimes believe they have solved “all the things” when they have merely solved “one of the things.” Does this make them bad people? Goodness, no. It makes them less-experienced, though, and helps identify them as such. Did the “one thing” need solving? Goodness, yes. But enterprise architecture is part science and part art. Understanding the value of a solution one does not prefer – or the value of a solution that does not apply to the problem one is trying to solve – lands squarely in the art portion of the gig.

Regarding the science portion of the gig: engineers are qualified to qualitatively pronounce any solution is “over-engineered.” Non-engineers are not qualified to make any such determination, in my opinion.

By way of example: I recently returned from the PASS Summit. Although I did not attend the session, I know an SSIS architect delivered a session in which he made dismissing statements regarding any and all metadata-driven frameworks related to ETL with SSIS. If I didn’t attend his session, how do I know about the content of the presentation? A number of people in attendance approached me after the session to share their opinion that the architect, though he shared some useful information, does not appreciate the problems faced by most enterprise architects – especially enterprise architects who lead teams of ETL developers.

My advice to all enterprise architects and would-be enterprise architects is: Don’t be that guy.

Instead, be the enterprise architect who continues to examine a solution until underlying constraints and drivers and reasons the solution was designed the way it was designed are understood. Realize and recognize the difference between “That’s wrong,” “I disagree,” and “I prefer a different solution.” One size does not fit all.

Finally, some of the demos and Biml platforms were not working at the time you wrote this review. Many have been addressed since that time. In the future, updates to SSIS, Biml, and ETL best practices will invalidate what this team of authors wrote during the first half of 2017. As such, the statements, “And much of the online information is outdated and no longer works. It’s a shame someone hasn’t found a way to simplify the learning process.” is a tautology that defines progress. Learning is part of the job of a technologist. I encourage us all – myself most of all – to continue learning.

Peace.

Presenting Moving Data with Azure Data Factory at SQL Saturday Charlotte!

I am honored to present Moving Data with Azure Data Factory at SQL Saturday 806 in Charlotte, NC 20 Oct 2018.

This is the first time I am delivering this session. It still has that new presentation smell!

Abstract

Azure Data Factory – ADF – is a cloud data engineering solution. ADF version 2 sports a snappy web GUI (graphical user interface) and supports the SSIS Integration Runtime (IR) – or “SSIS in the Cloud.”

Attend this session to learn:
– How to build a “native ADF” pipeline;
– How to lift and shift SSIS to the Azure Data Factory integration Runtime; and
– ADF Design Patterns to execute and monitor pipelines and packages.

I hope to see you there!

:{>

On Data Engineering in 2018

A few years ago I had a conversation with Scott Currie, the CEO of Varigence and inventor of Biml. Scott is one of the smartest people I know and I know a lot of very smart people. Whenever I have the opportunity to communicate with the very smart people I know, I often ask them – based upon their knowledge of me and what I do and any plans I have shared – for advice on stuff I should focus upon moving forward. Scott’s monosyllabic reply to this question?

Spark

Over the course of the past couple years I’ve shared – on this blog, even – some of my experiences learning more about Apache Spark. While I expected a little resistance, I was a little surprised by the… intensity… of some of the private messages and emails I received. As I replied to the intense people (and there were only a few), I could stop learning stuff right now, today, and continue working with SSIS for the next decade or so. I believe the same goes for anyone working with SSIS today.

I’m not going to stop learning stuff.

Why SSIS?

I love SSIS. But I don’t use SSIS because I love it. I love SSIS because it solves a particularly difficult piece of the story of enterprise data: data integration and data engineering. I feel a similar way about writing (and I’ve blogged recently about writing). I write because I like to write, not for clicks or branding or any of the other benefits I glean from writing. I consider those benefits cool, but side-effects of my desire to write.

The same goes for SSIS: I love SSIS because it solves a problem that I want to solve. I hope that makes sense.

Why Spark?

Two reasons:

  1. Spark is taking a more prominent role in SQL Server 2019.
  2. Spark is the engine beneath Azure Data Factory Data Flows.

How can you learn more about Spark? There are bunches of videos out there on YouTube. YouTube is crowd-sourced and free, which makes it awesome. The quality of YouTube training is crowd-sourced and free, which is a challenge.

I learned a lot from edX. I’m a fan of how edX approaches MOOC (online education). They offer lots of courses for free, which means you can grow your knowledge by investing only time. It’s tough to beat $free. If you want, you can pay for a certificate which you can then add to your online resume or LinkedIn profile like I added this one.

Conclusion

My advice? Spend some time learning. Pick your favorite learning platform and jump in. If you have SQL Server and / or SSIS experience, you already know a lot about the problem enterprises are trying to solve with data engineering tools. That’s a good place to be, but not required.

Keep learning!

:{>

Introducing Azure Data Factory Design Patterns

I was honored to write an article titled Introducing Azure Data Factory Design Patterns featured in this month’s PASS Insights newsletter!

Introducing Azure Data Factory Design Patterns

The article covers a couple execution patterns:

  1. Execute Child Pipeline
  2. Execute Child SSIS Package

I demonstrate a cool SSIS Catalog Browser feature that helps ADF developers configure the Execute SSIS Package activity.

To see it in action, download SSIS Catalog Browser – it’s one of the free utilities available at DILM Suite. Connect to the instance of Azure SQL DB that hosts an Azure Data Factory SSIS Integration Runtime Catalog, select the SSIS Package you desire to execute using the Execute SSIS Package activity, and then copy the Catalog Path from the  Catalog Browser status message:

Paste that value into the Package Path property of the Execute SSIS Package activity:

You can rinse and repeat – Catalog Browser surfaces Environment paths as well:

Enjoy the article!

If you have any questions about Azure Data Factory – or need help getting started – please reach out!

Learn more:
Attend my full-day pre-conference session titled Intelligent Data Integration at the PASS Summit 2018  on 5 Nov 2018.
Check out this 1-day course on
Fundamentals of Azure Data Factory delivered in cooperation with Brent Ozar Unlimited 10 Dec 2018!

Announcing the Fundamentals of Azure Data Factory Course!

I am excited to announce a brand new course (it still has that new course smell) from Brent Ozar Unlimited and honored to deliver it! This one-day, live, online course is titled Fundamentals of Azure Data Factory and it’s designed to introduce you to Azure Data Factory (ADF).

There will be demos.
Live demos.
Lots of live demos!

Abstract

Azure Data Factory, or ADF, is an Azure PaaS (Platform-as-a-Service) that provides hybrid data integration at global scale. Use ADF to build fully managed ETL in the cloud – including SSIS. Join Andy Leonard – authorblogger, and Chief Data Engineer at Enterprise Data & Analytics – as he demonstrates practical Azure Data Factory use cases.

In this course, you’ll learn:

  • The essentials of Azure Data Factory (ADF)
  • Developing, testing, scheduling, monitoring, and managing ADF pipelines
  • Lifting and shifting SSIS to ADF SSIS Integration Runtime (Azure-SSIS)
  • ADF design patterns
  • Data Integration Lifecycle Management (DILM) for the cloud and hybrid data integration scenarios

To know if you’re ready for this class, look for “yes” answers to these questions:

  • Do you want to learn more about cloud data integration in Azure Data Factory?
  • Is your enterprise planning to migrate its data, databases, data warehouse(s), or some of them, to the cloud?
  • Do you currently use SSIS?

The next delivery is scheduled for 10 Dec 2018. Register today!

I hope to see you there.

:{>

Honored to Present Faster SSIS at SQL Saturday Boston 22 Sep 2018!

I am honored to present Faster SSIS at SQL Saturday Boston 22 Sep 2018!

Check out this schedule – there are a bunch of smart people presenting – plus me!

Abstract

Ever wonder why SSIS runs so slow? Watch SSIS author Andy Leonard as he runs test loads using sample and real-world data and shows you how to tune SQL Server 2017 Integration Services (SSIS 2017) packages.

We’ll start by experimenting with SSIS design patterns to improve performance loading AdventureWorks data. We will implement different change detection patterns and compare execution performance for each. Then, we’ll explain a Data Flow Task’s bottleneck when loading binary large objects – or Blobs.

Finally, we’ll demonstrate a design pattern that uses a Script Component in a Data Flow to boost load performance to MySql, whether on-premises or in the cloud.
Prerequisites: None. Some SSIS and SQL knowledge will be helpful but is not required.

I hope to see you there! I’d love to meet you if you read this blog and attend – just walk up and introduce yourself, I’m the southern guy with the braided beard.

:{>

PASS Summit 2018 Starts in 10 Weeks!

Can you believe the PASS Summit 2018 begins only 10 weeks from today (27 Aug 2018)? I confess, this is sneaking up on me fast!

I will be there. Will you?

Where Can You Find Andy at the PASS Summit 2018?

Precon!

Monday 5 Nov 2018, I’m delivering a full-day pre-conference session titled Intelligent Data Integration with SSIS. I’m going to cover  everything listed at that link but there is an update about my precon content:

There will be Azure Data Factory content and demos!

Why this addition? Two reasons:

  1. My presentation titled Faster SSIS was selected. I usually include the three Faster SSIS demos in my precon. This time, you can just view the Faster SSIS session to see those demos.
  2. may have something cool and new to share about Azure Data Factory that is currently under NDA! Stay tuned…

Enterprise Data & Analytics is Exhibiting!

That’s right, you can find me in the Exhibition Hall! Enterprise Data & Analytics is exhibiting at the PASS Summit 2018!

Have an SSIS or Biml or ADF question? Stop by our booth!
Want to grab a selfie with me or Nick? Stop by our booth!
Want me to autograph your book? Stop by our booth!
Need some consulting or training help? Stop by our booth!

I’m so excited about this – I can hardly wait. We’ll have more information about specific dates and times when I will be manning the booth in coming weeks.

Presenting Faster SSIS

At the time of this writing, the session schedule has not yet been published. PASS has published a general schedule. Keep checking for details!

Conclusion

I am looking forward to the PASS Summit 2018. I hope to see you there.

:{>

What’s the Best Book for Learning SSIS?

Best Book for Learning SSISI am often asked the question, “Andy, what’s the Best Book for Learning SSIS?” There are some really good books out there that will help you learn SSIS (and I am co-author of several).

The book from which I learned the most about SSIS is Extending SSIS 2005 with Script by Donald Farmer.

Why do I consider this book the best?

Donald Farmer (TreeHive Strategy | Interview on the Data Driven Podcast) is a data engineering industry expert. He led the team that developed Microsoft SQL Server Integration Services. He’s technical, a great communicator (speaking and writing), and fantastic at solving business problems.

In Extending SSIS 2005 with Script, Donald discusses and shares demo code that informs he reader of the problems SSIS is designed to solve, and how SSIS is designed to solve those problems. Best of all, you learn this from one of the minds that designed the product.

I consider the Data Flow task to be the heart of SSIS – and the SSIS Script Component of the Data Flow Task is arguably the most difficult and most flexible component in the SSIS Data Flow. If you get your mind around the Script Component, you have a pretty good handle on how the SSIS Data Flow works, in my humble opinion.

I am aware of SQL Server Integration Services (SSIS) courses that are based on the material contained in this book. It’s that good.

“This book is over 10 years old, Andy…”

Yep. The information remains cogent. If I have to choose one book, Extending SSIS 2005 with Script remains the best book for learning SSIS in my opinion.

The Best Book for Learning SSIS

How serious am I in this recommendation? I just ordered a copy for one of the independent consultants at Enterprise Data & Analytics who is learning more about SSIS. I want her to become a senior SSIS developer in 2019 and I consider this book vital to accomplishing that goal.

Full disclosure: The links to the book above pass through the Amazon Affiliate Program. If that bugs you, please click here for a clean link to the book.

Enjoy.

:{>

Free Webinar – Designing a Custom ADF SSIS Execution Framework

Folks, I’m not going to even pretend to be cool and collected about this webinar about designing an ADF SSIS Execution Framework. I’ve been waiting for this one for a loooooooong time! The next (free!) webinar in the series I’m calling the Summer-O’-ADF is Designing a Custom ADF SSIS Execution Framework.

It’s at noon EDT Thursday 12 Jul 2018.

Why Am I So Excited?

I’ve been designing data engineering frameworks since the days of DTS (Data Transformation Services). Enterprise Data & Analytics (EDNA) implements SSIS Frameworks for enterprises. We even give away a free version of a framework at DILM Suite (DILM == Data Integration Lifecycle Management). It’s called SSIS Framework Community Edition and it’s not only free, it’s open source.

I wrote about SSIS Framework Community Edition and Data Integration Lifecycle Management in my latest book: Data Integration Lifecycle Management with SSIS.

I’m excited about this webinar because it combines a long-held passion – DevOps for SSIS – with a new passion – Azure Data Factory (ADF)! I cannot wait to show you how!

If you haven’t seen my webinars titled The Azure Data Factory Controller Design Pattern and ADF Controller Design Pattern with the SSIS Integration Runtime (also part of the Summer-O’-ADF webinar series), you may want to check them out first (registration required).

Register today!

:{>