Biml, Book Reviews, and Metadata-Driven Frameworks

I occasionally (rarely) read reviews at Amazon of books I’ve written. If I learn of a complaint regarding the book I often try to help. If a reader experiences difficulty with demos, I often offer to help, and sometimes I meet with readers to work through some difficulty related to the book (as I offered here). About half the time, there’s a problem with the way the book explains an example or the sample code; the other half the time the reader does not understand what is written.

I own both cases. As a writer it’s my job to provide good examples that are as easy to understand as possible. Sometimes I fail.

In very rare instances, I feel the review misrepresents the contents of a book –  enough to justify clarification. This review afforded one such opportunity. None of what I share below is new to regular readers of my blog. I chose to respond in a direct manner because I know and respect the author of the review, and believe he will receive my feedback in the manner in which it was intended – a manner (helpful feedback) very similar to the manner in which I believe his review was intended (also helpful feedback).

Enjoy.

My Reply to This Review of The Biml Book:

Thank you for sharing your thoughts in this review.

I maintain successful technology solutions are a combination of three factors:

  1. The problem we are trying to solve;
  2. The technology used to solve the problem; and
  3. The developer attempting to use a technology to solve the problem.

I believe any technologist, given enough time and what my mother refers to as “gumption” (a bias for action combined with a will – and the stubbornness, er… tenacity – to succeed), can solve any problem with any technology.

I find your statement, “It doesn’t really teach BIML but rather teaches a specific optional framework for those who want to do everything with BIML rather than increase productivity for repetitive patterns” inaccurate. Did you read the entire book? I ask that question because I most often encounter readers who state the precise opposite of the opinion expressed in this review (some of your fellow reviewers express the opposite opinion here, even). I disagree with your statement, but I understand your opinion in light of other opinions expressed to me by readers during my past decade+ of writing several books. I share one example in this blog post. The short version: we often get more – or different – things from reading than we realize.

There is a section in The Biml Book – made up of 3 of the 20 chapters and appendices – that proposes a basic metadata-driven Biml framework which is the basis of a metadata-driven Biml framework in production in several large enterprises. I wrote those 3 chapters and the basic metadata-driven Biml framework in question. Varigence has also produced a metadata-driven Biml framework called BimlFlex – based upon similar principles – which has been deployed at myriad enterprises. The enterprise architects at these organizations have been able to improve code quality and increase productivity, all while decreasing the amount of time required to bring data-related solutions to market. They do not share your opinion, although they share at least some of the problems (you mentioned ETL) you are trying to solve.

Am I saying Biml – or the framework about which I wrote or even BimlFlex – is the end-all-be-all for every SSIS development effort? Goodness no! In fact, you will find disclaimers included in my writings on Biml and SSIS frameworks. I know because I wrote the disclaimers.

Misalignment on any of the three factors for successful technology solutions – the problem, the technology, and/or the developer – can lead to impedance mismatches in application and implementation. For example, Biml is not a good solution for some classes of ETL or data integration (points 1 and 2). And learning Biml takes about 40 hours of tenacious commitment (which goes to point 3). This is all coupled with a simple fact: data integration is hard. SSIS does a good job as a generic, provider-driven solution – but most find SSIS challenging to learn and non-intuitive (I did when I first started using it). Does that mean SSIS is not worth the effort to learn? Goodness, no! It does mean complaints about simplifying the learning process are to be expected and somewhat rhetorical.

Architects disagree. We have varying amounts of experience. We have different kinds of experience. Some architects have worked only as lone-wolf consultants or as members of single-person teams. Others have worked only as leaders of small teams of ETL developers. Rarer still are enterprise architects who are cross-disciplined and experienced as both lone-wolf consultants and managers of large teams in independent consulting firms and large enterprises. Less-experienced architects sometimes believe they have solved “all the things” when they have merely solved “one of the things.” Does this make them bad people? Goodness, no. It makes them less-experienced, though, and helps identify them as such. Did the “one thing” need solving? Goodness, yes. But enterprise architecture is part science and part art. Understanding the value of a solution one does not prefer – or the value of a solution that does not apply to the problem one is trying to solve – lands squarely in the art portion of the gig.

Regarding the science portion of the gig: engineers are qualified to qualitatively pronounce any solution is “over-engineered.” Non-engineers are not qualified to make any such determination, in my opinion.

By way of example: I recently returned from the PASS Summit. Although I did not attend the session, I know an SSIS architect delivered a session in which he made dismissing statements regarding any and all metadata-driven frameworks related to ETL with SSIS. If I didn’t attend his session, how do I know about the content of the presentation? A number of people in attendance approached me after the session to share their opinion that the architect, though he shared some useful information, does not appreciate the problems faced by most enterprise architects – especially enterprise architects who lead teams of ETL developers.

My advice to all enterprise architects and would-be enterprise architects is: Don’t be that guy.

Instead, be the enterprise architect who continues to examine a solution until underlying constraints and drivers and reasons the solution was designed the way it was designed are understood. Realize and recognize the difference between “That’s wrong,” “I disagree,” and “I prefer a different solution.” One size does not fit all.

Finally, some of the demos and Biml platforms were not working at the time you wrote this review. Many have been addressed since that time. In the future, updates to SSIS, Biml, and ETL best practices will invalidate what this team of authors wrote during the first half of 2017. As such, the statements, “And much of the online information is outdated and no longer works. It’s a shame someone hasn’t found a way to simplify the learning process.” is a tautology that defines progress. Learning is part of the job of a technologist. I encourage us all – myself most of all – to continue learning.

Peace.

100 Dumb Little Things

Software development is hard. It takes time, yes. But more than that, software development takes patience and thought and blood and sweat and love and tears.

My friends at Varigence recently released an update to their Business Intelligence Markup Language (Biml) products. If you’re into business intelligence or data science, integration, or engineering, you should check out Biml.

The release took longer than some would have liked.
Varigence didn’t provide regular updates on progress.
Some became… antsy.

I understand. Really, I do. As a BimlHero I get just a little more access behind the curtain compared to the average bear. Would I like to know more? Yep. Does it bother me when I don’t hear more? Nope. Why?

Software Development is Hard

I know how difficult it is to develop software because I decided back in the early 20-teens that I wanted to develop some software. (And I did it! Check out DILM Suite!) In the early 20-teens, I encountered… resistance… to the idea. Make no mistake, the resistance was well-founded and may ultimately prove to have been correct. But resistance didn’t do anything to curb my beliefs that:

  1. Software should always participate in a lifecycle that is managed, preferably by a process akin to DevOps;
  2. All software is tested. Some intentionally; and
  3. SSIS development is software development.

SSIS Rocks

The SSIS team at Microsoft has given us some incredible out-of-the-box functionality. I love the SSIS Catalog! It’s a great enterprise framework for managing data engineering execution, logging, and externalization (configuration). I believe that strongly-enough to have included similar statements in my last book: Data Integration Life Cycle Management with SSIS:

I can hear you thinking, “If you’re convinced the SSIS Catalog is so awesome, Andy, why did you build DILM Suite?” That’s a fair question. I actually answer this question in the book in chapter 6 titled Catalog Browser. You don’t have to buy the book to learn my answer; I published Chapter 6 here on this blog in a post with the obscure title, Why I Built DILM Suite, by Andy Leonard.

Was I Right?

I don’t know.
Time will tell.

There have been thousands of downloads since I built DILM Suite. I view the number of downloads as indicative of interest. Does everyone who downloads a product – especially a free product – use that product? Goodness no. Does everyone who uses SSIS or the SSIS Catalog need to download DILM Suite components? Goodness no.

If you’re trying to practice lifecycle management (or DevOps) with SSIS, though, DILM Suite can help.

100 Dumb Little Things

Software development is a lot like being a parent in that it consists of getting 100 dumb little things right. Are the dumb little things important? Some are, some are not, and some are vital. Does anyone get all 100 dumb little things right in parenting? in software development? No and no.

At the end of the day, every day in fact, I am extremely proud of what I’ve built.

SSIS Catalog Compare is the first product I’ve ever attempted to develop. Perhaps that shows. My competition certainly thinks so and has made much hay out of this fact. Do I shy away from telling folks because my competition uses it against me?

Nope. At the end of the day, every day in fact, I am extremely proud of what I’ve built. I get regular feedback from customers sharing how much the product helps them manage SSIS in their enterprise. The feedback greatly overshadows the… statements… of the competition. (Sidebar: I sometimes wonder how my competition sleeps at night…)

Getting software right is all about getting everything right including the best wording for feedback and error messages (like that shown at the top of this post).
Getting everything right is almost impossible, and certainly cost-prohibitive, but it should absolutely be the goal of any software development endeavor.

Getting 100 dumb little things right is my goal.

Peace.

SSIS Design Pattern: Controller Pattern

SSIS Framework Community Edition defaults to serial execution. The Controller Pattern can help. How? Read on…

“Great Andy, But What If I Need To Load In Parallel Using A Framework?”

Enter the SSIS Design Pattern named the Controller Pattern. A Controller is an SSIS package that executes other SSIS packages. Controllers can be serial or parallel, or combinations of both. I’ve seen (great and talented) SSIS architects design frameworks around specific Controllers – building a Controller for each collection of loaders related to a subject area.

There’s nothing wrong with those solutions.

SSIS Framework Community Edition ships with a generic metadata-driven serial controller named Parent.dtsx which is found in the Framework SSIS solution.

Specific Controller Pattern Design

A specific Controller can appear as shown here (click to enlarge):

This controller achieves parallel execution. One answer to the question, “How do I execute packages in parallel?” is to build a specific controller like this one.

Advantages

  • Just Works
  • Simple and straightforward, uses out-of-the-box Execute Package Task

Disadvantages

  • “All Executions” Catalog Report is… misleading…

“How is the All Executions Report misleading, Andy?”

I’m glad you asked. If you build and deploy a project such as SerialControllers SSIS project shown here – and then execute the SerialController.dtsx package – the All Executions reports only a single package execution: SerialController.dtsx (click to enlarge):

We see one and only one execution listed in the All Executions report. If we click on the link to view the Overview report we see each package listed individually:

The All Executions report accurately reflects an important aspect of the execution of the SerialController.dtsx SSIS package. The execution of this package – and the packages called by SerialController.dtsx – share the same Execution ID value. This is not necessarily a bad thing, but it is something of which to be aware.

Specific Controller Design in SSIS Framework Community Edition

A specific Controller built using SSIS Framework Community Edition can appear as shown here:

This controller uses Execute SQL Tasks instead of Execute Package Tasks. The T-SQL in the Execute SQL Tasks calls a stored procedure named custom.execute_catalog_package that is part of SSIS Framework Community Edition.

One answer to the question, “How do I execute packages in parallel using SSIS Framework Community Edition?” is to build a Controller.

Advantages

  • Just Works
  • The SSIS Catalog All Executions report is accurate

Disadvantages

  • Adds complexity

The All Executions Report is no longer misleading. If you build and deploy a project such as SerialControllersInFrameworkCE SSIS project shown here – and then execute the SerialControllerInFrameworkCE.dtsx package – the All Executions reports each package execution (click to enlarge):

We now see one execution listed in the All Executions report for each package. As before, All Executions accurately reflects an important aspect of the execution of the SerialControllerInFrameworkCE.dtsx SSIS package: The execution of the Controller and each Child package now have distinct Execution ID values.

When using specific Controllers with an SSIS Framework it’s common to create a single-package SSIS Application that simply starts the Controller, and then let the Controller package call the child packages. Parent.dtsx in SSIS Framework Community Edition is a generic metadata-driven Controller, but it doesn’t mind executing specific Controllers one bit!

Once Upon A Time…

Not too long ago, Kent Bradshaw and I endeavored to add automated parallel package execution to our Commercial and Enterprise SSIS Frameworks. We achieved our goal, but the solution added so much complexity to the Framework and its associated metadata that we opted to not market the implemented solution.

Why? Here are some reasons:

Starting SSIS packages in parallel is very easy to accomplish in the SSIS Catalog. The SYNCHRONIZED execution parameter is False by default. That means we could build a controller package similar to the SerialControllerInFrameworkCE.dtsx SSIS package – with precedence constraints between each Execute SQL Task, even – and the SSIS Catalog would start the packages in rapid succession. In some scenarios – such as the scenario discussed in this post (from which the current post was derived) – this then becomes a race condition engine.

A Race Condition Engine?

Yes. Because controlling only when packages start is not enough to effectively manage race conditions. To mitigate the race condition described in this post I need to make sure the dimension loaders complete before starting the fact loaders. A (much simplified) Controller for such a process could appear as shown here (click to enlarge):

I grabbed this screenshot after the dimension loader and some (vague) pre-operations process have completed in parallel but while the fact loader is still executing. Please note the combination of the Sequence Container and precedence constraint which ensure the fact loader does not start executing until the dimension loader execution is complete. The sequence container creates a “virtual step” whereby all tasks within the container must complete before the sequence container evaluates the precedence constraint. Since each task inside this container starts an SSIS package (and since the SYNCHRONIZED execution parameter is set to True by default in SSIS Framework Community Edition), nothing downstream of this container can begin executing until everything inside the container has completed executing. This is how we avoid the race condition scenario described earlier.

How does one automate this process in a framework?

It’s not simple.

The method Kent and I devised was to create and operate upon metadata used to define and configure a “virtual step.” In  SSIS Framework Community Edition the Application Packages table is where we store the Execution Order attribute. We reasoned if two Application Package entities shared the same value for Execution Order, then the associated package(s) (I’m leaving out some complexity in the design here, but imagine executing the same package in parallel with itself…) compose a virtual step.

In a virtual step packages would start together, execute, and not proceed to the next virtual step – which could be another serial package execution or another collection of packages executing in parallel in yet another virtual step – until all packages in the current virtual step had completed execution. Here, again, I gloss over even more complexity regarding fault tolerance. Kent and I added metadata to configure whether a virtual step should fail if an individual package execution failed.

This was but one of our designs (we tried three). We learned managing execution dependency in a framework is not trivial. We opted instead to share the Controller pattern.

We Learned Something Else

While visiting a client who had deployed the Controller Pattern, we noticed something. The client used a plotter to print large copies of Controller control flows and post them on the walls outside his cubicle.

When we saw this we got it.

The tasks in the Controller’s control flow were well-named. They were, one could say, self-documenting. By posting updated versions of the Controller control flows whenever the design changed, the data engineer was informing his colleagues of changes to the process.

He didn’t need to explain what had changed. It was obvious to anyone stopping by his cubicle for a few minutes. Briliant!

Conclusion

In this post I wrote about some data integration theory. I also answered a question I regularly receive about performing parallel loads using SSIS Framework Community Edition. I finally covered some of the challenges of automating a solution to manage parallel execution of SSIS packages in a Framework.

Note: much of this material was shared earlier in this post. I broke the Controller Pattern part out in this post because the other post was really too long.

:{>

You Might Need an SSIS Framework

You might need an SSIS framework. “How can I tell if I need an SSIS framework, Andy?” I’m glad you asked.

Does your enterprise:

  • Practice DevOps?
  • Execute lots of SQL Server Integration Services (SSIS) packages?
  • Execute SSIS packages several times per day?
  • Execute “SSIS in the Cloud” using the Azure Data Factory version 2 Integration Runtime (ADFv2 IR)?
  • Require configuration options not available in off-the-shelf solutions?

How an SSIS Framework Can Help

One SSIS best practice is to develop small, unit-of-work packages. You can think of them as data engineering functions. Design SSIS packages with the fewest number of Data Flow Tasks, optimally one.

There’s a saying in engineering (and life): “There’s no free lunch.” Applied to data engineering with SSIS, if you apply the best practice of creating small, unit-of-work packages, you end up with a bunch of SSIS packages. How do you manage executing all these packages? An SSIS framework.

An SSIS framework manages package execution, configuration, and logging.

SSIS Framework Community Edition

The SSIS Framework Community Edition is part of the DILM (Data Integration Lifecycle Management) Suite. SSIS Framework Community Edition groups the execution of several SSIS packages into SSIS Applications, which are a collection of SSIS packages configured to execute in a specific order.

“Can’t I just use Execute Package Tasks for that, Andy?”

Yes. And no. When deploying to the SSIS Catalog, the Execute Package Task can be used to execute any package as long as that package exists in the same project. What if you have a utility package – say a package that archives flat files after you’ve loaded them – that you want to store in a single SSIS Catalog folder and project but call from different processes (or SSIS applications)? SSIS Framework Community Edition can execute that package as part of an SSIS application.

SSIS Framework Community Edition is Catalog-Integrated

SSIS Framework Community Edition is integrated into the SSIS Catalog. When packages execute as part of an SSIS application, operational metadata and execution information is sent to the SSIS Catalog’s tables. You can view operational metrics and metadata using the catalog reports solution built into SQL Server Management Studio (SSMS)…

…or you could view SSIS execution logs and operational metadata using Catalog Reports – a free and open-source SQL Server Reporting Services (SSRS) solution from DILM Suite.

SSIS Framework Community Edition is Free. And Open-Source.

SSIS Framework Community Edition is free and open-source. In fact, the documentation walks you through building your own SSIS framework – it teaches you how you would design your own SSIS framework.

SSIS Framework Community Edition is Customize-able

Customization is one of the coolest features of open-source software. If you need some unique functionality, you have the source code and can code it up yourself!

If you don’t have time to code your own unique functionality, Enterprise Data & Analytics can help. It’s possible SSIS Framework Commercial or Enterprise Edition already has the functionality you seek. Compare editions to learn more.

SSIS Framework Community Edition is Cloud-Ready

I can hear you thinking, “Wait. It’s free. It’s open-source. And it runs in the cloud?” Yep, yep, and yep!

We Can Help

At Enterprise Data & Analytics, we’ve been building data integration frameworks for over 15 years. I wrote a book about Data Integration Lifecycle Management (DILM):

We built the DILM Suite– a collection of utilities and solutions, many of which are free (and some even open-source!):

We grok frameworks.

Learn more at Enterprise Data & Analytics.

Contact us today!

My Books

I’ve authored and co-authored a bunch of books!

Truth be told, I enjoy writing. I think my love of writing stems from a love of learning. I’m just trying to share the joy!

It’s a huge honor to write. I’m humbled and thrilled (all at the same time – a feeling I describe as “all giggly inside”) whenever I’m attending an event like the PASS Summit or a SQL Saturday and someone tells me they enjoy what I’ve written or learned something from something I’ve written.

Mostly I’ve co-authored books but a few I’ve written by myself. I owe all my knowledge to those who shared their knowledge with me. I am merely your scribe.

For a list and links to my books, please visit my Amazon author page.

If you’d like for me to train you or your team in the fine arts and sciences of SSIS, Biml, or both in-person or remotely, please contact me.

:{>

On Output…

I’m going to be a little bold in this post and suggest if you are developing for SQL Server, the screenshot to the left and above shows something that is, well, wrong. I can hear you thinking,

“What’s Wrong With That Output, Andy?”

I’m glad you asked. I will answer with a question: What just happened? What code or command or process or… whatever… just “completed successfully”? Silly question, isn’t it? There’s no way to tell what just happened simply by looking at that output.

And that’s the point of this post:

You don’t know what just happened.

Sit Back and Let Grandpa Andy Tell You a Story

I was managing a large team of ETL developers and serving as the ETL Architect for a large enterprise developing enterprise data engineering solutions for two similar clients. Things were winding down and we were in an interesting state with one client – somewhere between User Acceptance Testing (UAT) and Production. I guess you could call that state PrUAT, but I digress…

The optics got… tricksy… with the client in PrUAT. Vendors were not receiving pay due to the state of our solution. The vendors (rightfully) complained. One of them called the news media and they showed up to report on the situation. Politicians became involved. To call the situation “messy” was accurate but did not convey the internal pressure on our teams to find and fix the issue – in addition to fixing all the other issues.

There were fires everywhere. In this case, one of the fires had caught fire.

Things. Were. Ugly.

My boss called and said, “Andy, can you fix this issue?” I replied, “Yes.” Why? Because it was my job to fix issues. Fixing issues and solving problems is still my job (it’s probably your job too…). I found and corrected the root cause in Dev. As ETL Architect, I exercised my authority to make a judgment call, promoted the code to Test, tested it, documented the test results, created a ticket, and packaged things up for deployment to PrUAT by the PrUAT DBAs.

Because this particular fire was on fire, I also followed up by calling Geoff, the PrUAT DBA I suspected would be assigned this ticket. Geoff was busy (this is important, don’t forget this part…) working on another fire-on-fire, and told me he couldn’t get to this right now.

But this had to be done.
Right now.

I thanked Geoff and hung up the phone. I then made another judgment call and exercised yet more of my ETL Architect authority. I assigned the PrUAT ticket to myself, logged into PrUAT, executed the patch, copied the output of the execution to the Notes field of the ticket (as we’d trained all DBAs and Release Management people to do), and then manually verified the patch was, in fact, deployed to PrUAT.

I closed the ticket and called my boss. “Done. And verified,” I said. My boss replied, “Good,” and hung up. He passed the good news up the chain.

A funny thing happened the next morning. And by “funny,” I mean no-fun-at-all. My boss called and asked, “Andy? I thought you said the patch was was deployed to PrUAT.” I was a little stunned, grappling with the implications of the accusation. He continued, “The process failed again last night and vendor checks were – again – not cut.” I finally stammered, “Let me check on it and get back to you.”

I could ramble here. But let me cut to the chase. Remember Geoff was busy? He was working a corrupt PrUAT database issue. How do you think he solved it? Did you guess restore from backup? You are correct, if so. When did Geoff restore from backup? Sometime after I applied the patch. What happened to my patch code? It was overwritten by the restore.

I re-opened the ticket and assigned it to Geoff. Being less-busy now, Geoff executed the code, copied the output into the Notes field of the ticket (as we’d trained all DBAs and Release Management people to do), and then closed the ticket. The next night, the process executed successfully and the vendor checks were cut.

“How’d You Save Your Job, Andy?”

That is an excellent question because I should have been fired. I’m almost certain the possibility crossed the mind of my boss and his bosses. I know I would have fired me. The answer?

Output.
Documented output, to be more precise.

You see, the output we’d trained all DBAs and Release Management people to copy and paste into the Notes field of the ticket before closing the ticket included enough information to verify that both Geoff and I had deployed code with similar output. It also contained date and time metadata about the deployment, which is why I was not canned.

Output Matters

Compare the screenshot at the top of this post to the one below (click to enlarge).

This T-SQL produces lots of output. That’s great. Sort of.

“There’s no free lunch” is a saying that conveys everything good thing (like lunch) costs something (“no free”). And that’s true – especially in software development. Software design is largely an exercise in balancing between mutually exclusive and competing requirements and demands.

If it was easy anyone could do it.

It’s not easy. It takes experienced developers years to develop (double entendre intended) the skills required to design software – and even more years of varied experience to build the skills required to be a good software architect.

The good news: the output is awesome.
The bad news: the output is a lot of typing.

“So Why, Andy? Why Do All The Typing?”

That’s not DevOps. That’s wishful thinking.

You’ve probably heard of technical debt. This is the opposite of technical debt; this is a technical investment.

Technical investments are time and energy spent early (or earlier) in the software development lifecycle that produce technical dividends later in the software development lifecycle. (Time and energy invested earlier in the project lifecycle always costs less than investing later in the project lifecycle. I need to write more about this…) What are some examples of technical dividends? Well, not-firing-the-ETL-Architect-for-doing-his-job leaps to mind.

This isn’t the only technical dividend, though. Knowing that the code was deployed is important to the DevOps process. Instrumented code is verifiable code – whether the instrumentation supports deployment or execution. Consider the option: believing the code has been executed.

That’s not DevOps. That’s wishful thinking.

Measuring Technical Dividends

Measuring technical dividends directly is difficult but possible. It’s akin to asking the question, “How much downtime did we avoid by having good processes in place?” The answer to that question is hard to capture. You can get some sense of it by tracking the mean time to identify a fault, though – as measured by the difference between the time someone begins working the issue and the time when they identify the root cause.

Good instrumentation reduces mean time to identify a fault.
Knowing is better than guessing or believing.
The extra typing required to produce good output is worth it.

Good Output

In this age of automation, good output may not require extra typing. Good output may simply require another investment – one of money traded for time. There are several good tools available from vendors that surface awesome reports regarding the state of enterprise software, databases, and data. DevOps tools are maturing and supporting enterprises willing to invest the time and energy required to implement them.

One such tool is SSIS Catalog Compare which generated the second screenshot. (Full disclosure: I built SSIS Catalog Compare.)

SSIS Catalog Compare generates scripts and ISPAC files from one SSIS Catalog that are ready to be deployed to another SSIS Catalog. Treeview controls display Catalog artifacts, surfacing everything related to an SSIS Catalog project without the need to right-click and open additional windows. (You can get this functionality free by downloading SSIS Catalog Browser. Did I mention it’s free?)

In addition, SSIS Catalog Compare compares the contents of two SSIS Catalogs – like QA and PrUAT, for example. Can one compare catalogs using other methods? Yes. None are as easy, fast, or complete as SSIS Catalog Compare.

Discount!

For a limited time you can get SSIS Catalog Compare for 60% off. Click here, buy SSIS Catalog Compare – the Bundle, the GUI, or CatCompare (the command-line interface) – and enter “andysblog” without the double-quotes as the coupon code at checkout.

Conclusion

Whether you use a tool to generate scripts or not, it’s a good idea to make the technical investment of instrumenting your code – T-SQL or other. Good instrumentation saves time and money and allows enterprises to scale by freeing-up people to do more important work.

:{>

How I Learn

This is a picture of how I learn.

These are executions of an Azure Data Factory version 2 (ADFv2) pipeline. The pipeline is designed to grab data collected from a local weather station here in Hampden-Sydney Virginia – just outside Farmville – and piped to Azure Blob Storage via AzCopy.

How do I learn?

  • Fail
  • Fail
  • Fail
  • Fail
  • Fail
  • Fail
  • Succeed

My advice? Keep learning!

Completed: Microsoft Big Data Professional Program Capstone Project

I’m excited to announce that I’ve completed – with help from my friend and brother and co-host of the Data Driven podcast, Frank La Vigne (Blog | @Tableteer) – the Microsoft Big Data Professional Program Capstone project!

The capstone is the last course requirement (of 10 courses) to complete the Microsoft Professional Program in Big Data. The official Professional Program certificate won’t be available until next month, but I’m excited to complete both the capstone and the professional program.

Although there was some data analysis included in the courses and capstone, I found a lot of data engineering was covered in the curriculum. For people wanting to learn more about Azure offerings for data engineering – including HDInsight, Spark, Storm, and Azure Data Factory – I highly recommend the program.

You can audit the courses, gain the same knowledge, and pass the same tests Frank and I passed – and even participate in a similar capstone project – all for free. You only have to pay to receive a certificate.

:{>