Join Microsoft Data Platform MVP, author, blogger, and Enterprise Data & Analytics Chief Data Engineer Andy Leonard for a free webinar titled Troubleshooting SSIS 13 Mar 2019 at 12:00 PM ET.
Here’s a free preview!
While managing a team of 40 ETL developers, I wanted to track lots of metrics. Some of the things I wanted to track were technical, like SSIS package execution times. Some metrics were people-centric.
Andy’s First Rule of Statistics states:
You can use statistics for anything about people, except people.Andy – circa 2005
It was important to me to track how long it took the on-call person to identify the problem. I didn’t use the information to beat on-call people over the head. I used the information to measure the results of several experiments for displaying metadata about the failure.
You may be as shocked by this as I was; reports helped a lot more than I anticipated. Before I deployed the reports the Mean Time to Identify Failure was tracking just under 30 minutes. After deploying the reports, the mean time to identify failure fell to 5 minutes.
As I said, I was shocked. There were mitigating circumstances. The on-call team members were getting more familiar with the information SSIS produces when it logs an error. They were gaining experience, seeing similar errors more than once.
I accounted for growing familiarity by narrowing the time window I examined. The least-impressive metrics put the reduction at 18 minutes to 5 minutes.
Before I built and deployed the dashboard for SSIS Application Instances (like the one pictured at the top of this post), on-call people would query custom-logging tables we built to monitor enterprise data integration. The queries to return Application Instance log data were stored where everyone could reach them. In fact, I used the same queries as sources for this report.
A funny thing happened when I deployed the reports. Each week, one or more on-call people would ping me and tell me how much they liked the reports. Even though the data was the same, the presentation was different. A picture with a little color goes a long way.
The image at the beginning of this section – the SSIS Framework Task Instance Report – is displayed when a user clicks the Failed link shown in the initial screenshot. This design received he most comment by the on-call team members. The most common comment was, “I click the Failed link and it takes me to details about the failure.” The reports were passing The 2:00 AM Test.
If you’ve read this far and wondered, “Andy, what’s an SSIS Application?” An SSIS Application is a construct I came up with to describe a collection of SSIS Packages configured to execute in a specific order. An application is a way to group SSIS packages for execution. You can get a sense of how our frameworks work – especially the application execution functionality – by checking out the SSIS Framework Community Edition at DILM Suite (DILM == Data Integration Lifecycle Management).
An Application Instance is an instance of execution of an SSIS Application. An Application Instance is made up of Package Instances. the relationship between applications and packages appears straightforward: an application is a collection of packages; parent-child; one-to-many. But it’s not quite that simple. Our SSIS Frameworks facilitate patterns that execute the same package multiple times, sometimes in parallel! We can also create packages that perform utility functions – such as ArchiveFile.dtsx – and call it from multiple applications. When you do the Boolean algebra, the relationship between applications and packages is many-to-many.
Our SSIS Frameworks are SSIS Catalog-integrated. They even work with the SSIS Integration Runtime that’s part of Azure Data Factory, Azure-SSIS.
While the Reporting Services dashboard was neat when it was released back in the day, the cool kids now play with Power BI. At DILM Suite you will also find a free – albeit basic – Power BI dashboard that surfaces many of the same metrics using even better reporting technology. The Basic SSIS Catalog Dashboard in Power BI is free at DILM Suite.
I’ve not yet collected Mean Time to Identify Failure metrics using the Basic SSIS Catalog Dashboard in Power BI dashboard. Perhaps you can be the first.
I occasionally (rarely) read reviews at Amazon of books I’ve written. If I learn of a complaint regarding the book I often try to help. If a reader experiences difficulty with demos, I often offer to help, and sometimes I meet with readers to work through some difficulty related to the book (as I offered here). About half the time, there’s a problem with the way the book explains an example or the sample code; the other half the time the reader does not understand what is written.
I own both cases. As a writer it’s my job to provide good examples that are as easy to understand as possible. Sometimes I fail.
In very rare instances, I feel the review misrepresents the contents of a book – enough to justify clarification. This review afforded one such opportunity. None of what I share below is new to regular readers of my blog. I chose to respond in a direct manner because I know and respect the author of the review, and believe he will receive my feedback in the manner in which it was intended – a manner (helpful feedback) very similar to the manner in which I believe his review was intended (also helpful feedback).
Thank you for sharing your thoughts in this review.
I maintain successful technology solutions are a combination of three factors:
I believe any technologist, given enough time and what my mother refers to as “gumption” (a bias for action combined with a will – and the stubbornness, er… tenacity – to succeed), can solve any problem with any technology.
I find your statement, “It doesn’t really teach BIML but rather teaches a specific optional framework for those who want to do everything with BIML rather than increase productivity for repetitive patterns” inaccurate. Did you read the entire book? I ask that question because I most often encounter readers who state the precise opposite of the opinion expressed in this review (some of your fellow reviewers express the opposite opinion here, even). I disagree with your statement, but I understand your opinion in light of other opinions expressed to me by readers during my past decade+ of writing several books. I share one example in this blog post. The short version: we often get more – or different – things from reading than we realize.
There is a section in The Biml Book – made up of 3 of the 20 chapters and appendices – that proposes a basic metadata-driven Biml framework which is the basis of a metadata-driven Biml framework in production in several large enterprises. I wrote those 3 chapters and the basic metadata-driven Biml framework in question. Varigence has also produced a metadata-driven Biml framework called BimlFlex – based upon similar principles – which has been deployed at myriad enterprises. The enterprise architects at these organizations have been able to improve code quality and increase productivity, all while decreasing the amount of time required to bring data-related solutions to market. They do not share your opinion, although they share at least some of the problems (you mentioned ETL) you are trying to solve.
Am I saying Biml – or the framework about which I wrote or even BimlFlex – is the end-all-be-all for every SSIS development effort? Goodness no! In fact, you will find disclaimers included in my writings on Biml and SSIS frameworks. I know because I wrote the disclaimers.
Misalignment on any of the three factors for successful technology solutions – the problem, the technology, and/or the developer – can lead to impedance mismatches in application and implementation. For example, Biml is not a good solution for some classes of ETL or data integration (points 1 and 2). And learning Biml takes about 40 hours of tenacious commitment (which goes to point 3). This is all coupled with a simple fact: data integration is hard. SSIS does a good job as a generic, provider-driven solution – but most find SSIS challenging to learn and non-intuitive (I did when I first started using it). Does that mean SSIS is not worth the effort to learn? Goodness, no! It does mean complaints about simplifying the learning process are to be expected and somewhat rhetorical.
Architects disagree. We have varying amounts of experience. We have different kinds of experience. Some architects have worked only as lone-wolf consultants or as members of single-person teams. Others have worked only as leaders of small teams of ETL developers. Rarer still are enterprise architects who are cross-disciplined and experienced as both lone-wolf consultants and managers of large teams in independent consulting firms and large enterprises. Less-experienced architects sometimes believe they have solved “all the things” when they have merely solved “one of the things.” Does this make them bad people? Goodness, no. It makes them less-experienced, though, and helps identify them as such. Did the “one thing” need solving? Goodness, yes. But enterprise architecture is part science and part art. Understanding the value of a solution one does not prefer – or the value of a solution that does not apply to the problem one is trying to solve – lands squarely in the art portion of the gig.
Regarding the science portion of the gig: engineers are qualified to qualitatively pronounce any solution is “over-engineered.” Non-engineers are not qualified to make any such determination, in my opinion.
By way of example: I recently returned from the PASS Summit. Although I did not attend the session, I know an SSIS architect delivered a session in which he made dismissing statements regarding any and all metadata-driven frameworks related to ETL with SSIS. If I didn’t attend his session, how do I know about the content of the presentation? A number of people in attendance approached me after the session to share their opinion that the architect, though he shared some useful information, does not appreciate the problems faced by most enterprise architects – especially enterprise architects who lead teams of ETL developers.
My advice to all enterprise architects and would-be enterprise architects is: Don’t be that guy.
Instead, be the enterprise architect who continues to examine a solution until underlying constraints and drivers and reasons the solution was designed the way it was designed are understood. Realize and recognize the difference between “That’s wrong,” “I disagree,” and “I prefer a different solution.” One size does not fit all.
Finally, some of the demos and Biml platforms were not working at the time you wrote this review. Many have been addressed since that time. In the future, updates to SSIS, Biml, and ETL best practices will invalidate what this team of authors wrote during the first half of 2017. As such, the statements, “And much of the online information is outdated and no longer works. It’s a shame someone hasn’t found a way to simplify the learning process.” is a tautology that defines progress. Learning is part of the job of a technologist. I encourage us all – myself most of all – to continue learning.
Azure Data Factory – ADF – is a cloud data engineering solution. ADF version 2 sports a snappy web GUI (graphical user interface) and supports the SSIS Integration Runtime (IR) – or “SSIS in the Cloud.”
Attend this session to learn:
– How to build a “native ADF” pipeline;
– How to lift and shift SSIS to the Azure Data Factory integration Runtime; and
– ADF Design Patterns to execute and monitor pipelines and packages.
I hope to see you there!
A few years ago I had a conversation with Scott Currie, the CEO of Varigence and inventor of Biml. Scott is one of the smartest people I know and I know a lot of very smart people. Whenever I have the opportunity to communicate with the very smart people I know, I often ask them – based upon their knowledge of me and what I do and any plans I have shared – for advice on stuff I should focus upon moving forward. Scott’s monosyllabic reply to this question?
Over the course of the past couple years I’ve shared – on this blog, even – some of my experiences learning more about Apache Spark. While I expected a little resistance, I was a little surprised by the… intensity… of some of the private messages and emails I received. As I replied to the intense people (and there were only a few), I could stop learning stuff right now, today, and continue working with SSIS for the next decade or so. I believe the same goes for anyone working with SSIS today.
I’m not going to stop learning stuff.
I love SSIS. But I don’t use SSIS because I love it. I love SSIS because it solves a particularly difficult piece of the story of enterprise data: data integration and data engineering. I feel a similar way about writing (and I’ve blogged recently about writing). I write because I like to write, not for clicks or branding or any of the other benefits I glean from writing. I consider those benefits cool, but side-effects of my desire to write.
The same goes for SSIS: I love SSIS because it solves a problem that I want to solve. I hope that makes sense.
How can you learn more about Spark? There are bunches of videos out there on YouTube. YouTube is crowd-sourced and free, which makes it awesome. The quality of YouTube training is crowd-sourced and free, which is a challenge.
I learned a lot from edX. I’m a fan of how edX approaches MOOC (online education). They offer lots of courses for free, which means you can grow your knowledge by investing only time. It’s tough to beat $free. If you want, you can pay for a certificate which you can then add to your online resume or LinkedIn profile like I added this one.
My advice? Spend some time learning. Pick your favorite learning platform and jump in. If you have SQL Server and / or SSIS experience, you already know a lot about the problem enterprises are trying to solve with data engineering tools. That’s a good place to be, but not required.
I am excited to announce a brand new course (it still has that new course smell) from Brent Ozar Unlimited and honored to deliver it! This one-day, live, online course is titled Fundamentals of Azure Data Factory and it’s designed to introduce you to Azure Data Factory (ADF).
There will be demos.
Lots of live demos!
Azure Data Factory, or ADF, is an Azure PaaS (Platform-as-a-Service) that provides hybrid data integration at global scale. Use ADF to build fully managed ETL in the cloud – including SSIS. Join Andy Leonard – author, blogger, and Chief Data Engineer at Enterprise Data & Analytics – as he demonstrates practical Azure Data Factory use cases.
In this course, you’ll learn:
To know if you’re ready for this class, look for “yes” answers to these questions:
I hope to see you there.
The script begins with a header documentation similar to that shown here:
When executed, the Project Reference script header documentation portion displays in the Messages output window of SQL Server management Studio (SSMS):
The statements returned in the Messages tab of SQL Server Management Studio (SSMS) are designed to be copied and stored. The authors recommend enterprises use a ticketing system to manage and track the deployment of enterprise scripts. Before closing a ticket to create a Catalog Reference, the deploying agent is advised to copy the contents of the Messages tab and paste them into the Notes section of the ticket for auditing purposes.
The next portion of the Catalog Reference script tests for the existence of prerequisite artifacts:
Prerequisites for a Project Reference include:
• Catalog Folder
• SSIS Project
• Catalog Environment
When executed, this portion of the script returns status messages for prerequisites similar to those shown below:
The next portion of the Reference Script creates the Reference which is a relationship between an SSIS Catalog Environment and an SSIS Project (or Package). An example of this portion of the script is shown here:
Once this portion of the script is executed, a message similar to that shown in the figure above is returned if the reference is created:
If the script detects the reference already exists, a message similar to that shown below is returned:
The next portion of the Reference script clears the parameter value:
The messages generated by this portion of the References script appear similar to those show below:
The final portion of the script builds the Reference Mapping – the relationship between a Catalog Environment Variable and a Parameter that the Environment Variable value will override at execution-time:
When executed, this portion of the script generates a message similar to that shown below:
After executing the Project Reference script in the target instance, click the Refresh button in SSIS Catalog Compare to observe the updated SSIS Catalog state of the target SSIS Catalog instance as shown here:
As you may glean from this analysis of one script generated for SSIS Catalog References management, the Transact-SQL for scripting SSIS Catalog artifacts in SSIS Catalog Compare is rigorous and includes several existence and error-condition checks prior to performing any updates. The script is designed to be idempotent, as well, meaning the script will succeed and the results will be repeatable and predictable each time the script is executed – and that the script itself is re-executable.
Learn more about SSIS Catalog Compare!
Check out this schedule – there are a bunch of smart people presenting – plus me!
Ever wonder why SSIS runs so slow? Watch SSIS author Andy Leonard as he runs test loads using sample and real-world data and shows you how to tune SQL Server 2017 Integration Services (SSIS 2017) packages.
We’ll start by experimenting with SSIS design patterns to improve performance loading AdventureWorks data. We will implement different change detection patterns and compare execution performance for each. Then, we’ll explain a Data Flow Task’s bottleneck when loading binary large objects – or Blobs.
Finally, we’ll demonstrate a design pattern that uses a Script Component in a Data Flow to boost load performance to MySql, whether on-premises or in the cloud.
Prerequisites: None. Some SSIS and SQL knowledge will be helpful but is not required.
I hope to see you there! I’d love to meet you if you read this blog and attend – just walk up and introduce yourself, I’m the southern guy with the braided beard.
… to a different instance on-premises.
… to Azure Data Factory SSIS Integration Runtime.
This screenshot is from the SSIS Catalog Compare documentation. You can script an entire SSIS Catalog using SSIS Catalog Compare!
This awesome functionality can save you hours (days, weeks, even!) when starting a new project or updating an existing project.
This awesome functionality supports DevOps with SSIS.
View the recording of Use SSIS Catalog Compare to Lift and Shift SSIS to Azure Data Factory (registration required).
I was about to click the Publish button for this post about my new role as a product manager when my friend and brother, Scott, called. I mentioned it and Scott said, “You’re not just a product manager. You’re a problem-solver.”
Scott is correct. Hence the parentheses in the title of this post.
A friend used to ask me this question all the time. It’s a fantastic question. The question made me pause and think. Pausing and thinking is mostly good most of the time. When is pausing and thinking not good?
When it delays action you desperately need to take; action you are uniquely positioned to take.
For a couple years my response to this question was to answer, “No,” and then I would feel silly for considering the thing I had wanted to do. But then a funny thing began to happen: I increasingly felt cornered by “the best use of my time.”
Let’s pause right here and consider a different perspective. I was being selfish. I wanted to do what I wanted to do, the consequences be damned.
This is an accurate description right up until the point where I didn’t do what I wanted to do – what I felt desperately needed to be done – to help people attempting DevOps with SSIS. Instead I stuffed the ideas back down inside, put my head down, and went back to doing what was a good use of my time.
Was I in any position to make this determination?
Was I qualified to make this call?
Yes. Yes I was.
Over time, my response evolved. I stopped feeling silly about wanting to solve DevOps for SSIS and started feeling silly for placing myself into a position which offered more obstacles than opportunities.
The short version of a long story is: I extricated myself from that corner.
Before I did anything else – I am not making this up, I can produce witnesses – I started writing SSIS Catalog Compare. I started coding within minutes of announcing my decision.
I did not know what I was doing.
I am still learning.
I feel like I only recently worked my way up to being a n00b C# developer.
I didn’t know anything about designing a software product. I know more now but (still) not enough.
I didn’t know anything about marketing a software product.
I didn’t know anything about managing a software product.
I continue to learn. Here’s the latest thing I’ve learned:
I am not afraid.
I am not afraid of not knowing. Frank Herbert got it right (in Dune): Fear is the mind-killer. The only way to learn is by using my mind – the same mind battling fear of not knowing.
Was the question about the bet use of my time well-intentioned? Absolutely. My friend was watching out for me, he had my back. I learned from the experience and walked away more mature and with broader perspective. I learned. I grew. I would not be where I am now had I not.
Scott reminded me I am a problem-solver. I always have been a problem-solver. Lord willing, I will continue to be a problem-solver.
Becoming a software product manager was required in order to solve a problem.
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.