Learning for Free

I’ve been blogging lately about training and certifications and it’s generated a fair amount of questions. One question is:

“How much does certification cost?”

Well, some of it is free. A surprising amount is free, in fact. Biml Academy offers several hours of free training, for example.

YouTube is chock-full of learning opportunities. The quality of training varies, but it’s difficult (not impossible, unfortunately) to complain about the quality of free training.

Like Biml Academy, WintellectNOW offers a mix of free plus paid training. My friend and co-host of the Data Driven podcast, Frank La Vigne (blog | @Tableteer), authored a free course at WintellectNOW titled Data Science for Developers.

edX is the site I’ve been using lately. In partnership with Microsoft Virtual Academy, edX offers Microsoft-recognized certifications. If you want the certificate, you have to pay – usually $49-$99 USD at the time of this writing. If you string together enough certifications, you complete a program. This feels a lot like leveling-up, career-wise.

BUT…

You can receive the education – the training, the knowledge – for free at edX. You can audit the courses! Here’s the best part: You can audit the courses today, pass them, and then purchase the certificate later. How much later? I do not know. I like this, though – a lot. Why? I get the knowledge and experience (edX labs – in my experience – are good). As a lifelong learner (which sounds so much better than “learn-aholic”), that’s really what I’m after: the knowledge. And then I can pay for the certificate later.

Why Seek Training?

Once you acquire knowledge you can apply it. Start applying what you’ve learned and it won’t be long before the (hiring) world beats a path to your door.

Think I’m kidding about this? I added certifications to my LinkedIn profile late last year.  The number of people who have viewed my profile in the last 90 days is now 2.5X what it was prior to that time.

Please keep in mind that I worked on three books that were published last year. Three. Books. Nothing increased my visibility to the professional community as much as training.

Focused Training

Training can lead to a level-up in your career. You may find a new job at a different – or at your current – employer. Free training can take you far, provided you can locate excellent free training (it’s out there but sometimes difficult to locate). You may want more focused training – training that’s more intense and takes less time. Your employer may want this too.

This is when you want to invest in professional training. Where can you get such training? I’ve learned a lot at SQL Saturday and PASS Summit precons. Beyond that, you may want to attend online or in-person training, classes delivered by experienced and recognized experts. You can find this training in lots of locations. Two of my favorites are Brent Ozar Unlimited and SQLSkills. Full disclosure: I deliver Expert SSIS training for Brent Ozar Unlimited and IESSIS1: Immersion Event on Learning SQL Server Integration Services for SQLSkills.

Courses – such as the training I deliver – efficiently improve your level of knowledge. How do I know? I’ve been delivering training like this for over a decade in the SQL Server space. I hear it over and over again from students: they learn a lot and appreciate the experience and knowledge.

You don’t have to get training from me, but when it’s time to kick things up to the next level, please consider a focused training experience from someone.

Achievement Unlocked: Certified in Implementing Real-Time Analytics with Azure HDInsight

I learned a lot completing Implementing Real-Time Analytics with Azure HDInsight. It’s the most fun Microsoft Virtual Academy / edX course I’ve taken to date!

In completing this course, I finished the requirements for the Microsoft Azure HDInsight Big Data Analyst program. I completed Implementing Predictive Analytics with Spark in Azure HDInsight and Processing Big Data with Hadoop in Azure HDInsight late last year.

What’s Next?

I started the Microsoft Professional Program in Data Science early last year, but I’m only a few courses into it. Anything worth starting is worth finishing.

I really got a lot out of the Big Data Analyst XSeries program. I’m eyeing the Microsoft Professional Program in Big Data program. I’ve already completed some of the courses as part of the Data Science program and the Big Data XSeries program. Some that I haven’t completed look very interesting – like Processing Big Data with Azure Data Lake AnalyticsOrchestrating Big Data with Azure Data Factory, and Analyzing Big Data with Microsoft R Server.

It’s time to update LinkedIn

:{>

Writing about Apache Storm Stirs Up a Storm

A storm is coming…

I’ve been spending some time learning other data integration and analytics open-source-y stuff. Perhaps you’ve noticed recent blog posts like:

If you read MSDN Magazine, you may have noticed a familiar name at the end of Frank La Vigne’s (blog | @Tableteer) new column, Artificially Intelligent – especially the October, November, December (2017), and January 2018 articles – I’ve been helping Frank with some technical editing for his column.

Maybe you noticed this and wondered, “Is Andy bailing on SSIS?” The answer to that question is:

No.

Goodness no! If you only knew what I have on the drawing board and in the pipeline, you’d understand me abandoning SSIS is nothing to be concerned about. (I don’t blame you for not knowing – I’ve not shared more about these endeavors… yet!)

I’ve repeatedly shared that I enjoy learning. I’ve advised readers of this blog and those who follow me on social media to continually educate themselves. I’ve equally warned folks: If you don’t like lifelong learning, technology is not for you and you should go into another field.

I like learning!

I’ve invested some time and money in more formal training via Microsoft Virtual Academy and edX (although you can take all the courses I’ve taken for free… you have to pay only if you want a certificate).

Learning a New Language

Frank and I chat often. In a recent chat I shared my excitement at discovering patterns and frameworks in Apache Storm and Spark – frameworks and patterns that are remarkably similar to frameworks and patterns I’ve developed for SSIS. Why was I excited? It’s validating to me when smart people – people smart enough to build open source technology platforms like Spark and Storm – include the same functionality I’ve built for SSIS in their platforms and solutions. I’ve debated smart SSIS developers on occasion who believe frameworks in general, or specific frameworks, are not necessary. I’m not good at these debates because I can rarely remember the reason why something is a good or bad idea; I only remember that the idea is good or bad. I think the reason my brain works this way is because I’m an engineer and more interested in solving problems for customers than winning an argument… but I digress.

Frank shared an anecdote from his days of learning Deutsche. Paraphrasing:

I had a hard time remembering when to use ‘who’ and when to use ‘whom’ in English grammar until I learned German. Now I get it.

I have developed patterns and utilities (such as DILM Suite) to address architectural concepts I identified as unclear or less-clear or even missing from SSIS. While it’s validating (to me) to find this same functionality built into open source platforms, the biggest thrill remains learning new stuff. As Frank and I discuss in an upcoming Data Driven show, I bring a bunch of context when learning about open source data integration solutions – and Frank brings a bunch of context when learning about the software architecture that underlies Data Science.

Context Matters

Why is context so important? Context is what we mean most of the time when we use the word “experience.”

Like Frank’s German lessons helped him with English grammar, my open source data integration training is helping me articulate arguments for frameworks and patterns – in SSIS, even. I’ve gained additional depth in my career as a result of cross-training in other data integration platforms. I’m better at SSIS because I’ve learned Spark and Storm. Bonus: I’m getting lots of experience in the Azure Data Engineering platforms.

But – please trust me – I’m nowhere near leaving SSIS! In fact, I have a more lucid understanding of where SSIS excels when compared to other data integration platforms.

You’re still stuck with me writing about SSIS. I hope that’s ok.

;{>

 

Apache Storm Stream Processing in Azure HDInsight

I’m learning more about Apache Storm which is used for stream processing for near-real-time analytics in Azure HDInsight. I grabbed the image above from an AWS presentation on slideshare (link).

I got all giggly inside as I learned about Storm architecture. Why? Because I built this much of this same functionality as part the DILM Suite and we implement many of these same SSIS Design Patterns (SSIS Framework Editions, Controllers, Application Restartability, etc.) when delivering data integration awesomeness to our Enterprise Data & Analytics consulting customers.

It’s kinda cool to see similar architecture, patterns, and functionality in a completely separate toolset and environment. I find it… validating.

It just goes to show you, architecture is architecture, regardless of the platform or tools.

:{>

Expert SSIS Training: SSIS Design Patterns for Performance

 

I focus on three topics in the Expert SSIS course I deliver live, online in cooperation with Brent Ozar Unlimited:

  1. SSIS Design Patterns for Performance
  2. SSIS Deployment, Configuration, Execution, and Monitoring
  3. Automation

What do I cover in SSIS Design Patterns for Performance? I cover Data Flow Internals, chat about throughput – the OBM (One Big Metric) for data flow operations, and walk through several data flow performance-tuning examples. Note: I send each student the demo code! Many students make a comment similar to, “Pattern / Tip x is worth the cost of the course!”

The next delivery of Expert SSIS is 5-7 Mar 2018. Brent and I expanded the class to 2.5 days so we can cover more material! We limit the number of students to be fair to the attendees (I like to answer attendee questions and have been known to build and add demos on demand…).

So don’t wait, register today!

:{>

Why I Built DILM Suite, by Andy Leonard

The following is Chapter 6: Catalog Browser from my latest book titled Data Integration Life Cycle Management with SSIS: A Short Introduction by Example:

I was honored to be a Microsoft SQL Server MVP for five years (2007-2012). One cool thing about a being a Microsoft MVP was access to the internal developer teams. Everyone could file Microsoft Connect items reporting bugs and making suggestions for product improvements. Many MVPs did so only to have their bug reports marked as “works as designed” or “won’t fix” and suggestions responded to with something similar. It was discouraging. There are reasons many Connect items were addressed in this way. I am happy to report the root cause (Performance-Based Management or PBM) has been abandoned and the Microsoft Developer Teams are really and truly listening and responding to requests from the field.

Why I Built DILM Suite, by Andy Leonard

That doesn’t mean every suggestion is acted upon (I promise this is not a complaint). It turns out that Microsoft is a software development enterprise. As big as Microsoft is, they cannot possibly respond to every request. When I realized this, I began thinking about how I might address gaps I perceived. I’d co-founded a consulting company and we (collectively) weren’t interested in becoming a software product company. But I was very interested in developing products to address gaps in Data Integration Lifecycle Management (DILM).

In 2015 I left the consulting company I co-founded and immediately began developing the software I’d dreamed of building. In my opinion, the most fair answers to the question, “Why?” are:

  1. I came to believe the Microsoft SSIS Developer Team would never address the things I perceived as “gaps” in the product story; and
  2. I came to believe the consulting company I co-founded and I held irreconcilable visions of how to address DILM issues.

Looking back with two years of perspective, I believe focusing on DILM was the best long-term move for me. I started another consulting company, Enterprise Data & Analytics (entdna.com), mostly to fund my coding habit.

Surfacing the SSIS Catalog

Let’s examine the SSIS Catalog surface in the SSMS Object Explorer’s Integration Services Catalogs node as shown in Figure 6-1:


Figure 6-1. The SSIS Catalog as Shown in the SSMS Object Explorer Integration Services Catalogs Node

Beneath the Integration Services Catalogs node we find the SSIS Catalog named SSISDB. Two Catalog Folders are displayed, Framework and Test. The Test Folder contains Projects and Environment virtual folders. The Projects virtual folder contains our SSIS Catalog Project named DILMSample, which in turn contains our SSIS Package named SimplePackage.dtsx. The Environments virtual folder contains our Catalog Environment named envConnection1.

We know – because we’ve done the work – that there’s more there than meets the eye.

6.1   SSIS Catalog Environment Configuration

If we double-click envConnection1 we can see details of our Catalog Environment Variable on the Variables page as shown in Figure 6-2:


Figure 6-2. Viewing the Variables Page of an SSIS Catalog Environment

The Variables page contains details about SSIS Catalog Environment Variables including name, data type, description, value, and whether the variable is sensitive.

6.2   SSIS Catalog Project Configuration

The Parameters tab on the Parameters page of the SSIS Catalog Project Configuration dialog lists SSIS Project and Package parameters, their container name, and value by default as shown in Figure 6-3:


Figure 6-3. Viewing Project Parameters and Values for an SSIS Catalog Project

The Connection Managers tab of the Parameters page contains a list of SSIS Project and Package connection managers and their properties as shown in Figure 6-4:


Figure 6-4. Viewing Connection Manager Parameters and Values for an SSIS Catalog Project

The References page of the SSIS Catalog Project Configure dialog contains a list of SSIS Catalog Environments the SSIS Catalog Project may reference at runtime as shown in Figure 6-5:


Figure 6-5. Viewing Project References for an SSIS Catalog Project

That’s a lot of right- and double-clicking just to see what’s configured in an SSIS Catalog Project.

6.3   Catalog Browser

The SSIS Catalog is filled with really cool and useful configuration information, but one has to know where to look and – in some cases – where to look isn’t so obvious.

Enter Catalog Browser, a free utility that is part of the DILM Suite and available at dilmsuite.com/catalog-browser. Catalog Browser was built to surface the contents of the SSIS Catalog in a single view – a tree that exposes all relevant SSIS Catalog artifacts, properties, and configurations.

As shown in Figure 6-6, Catalog Browser surfaces the same metadata as the SSMS Object Explorer Integration Services Catalogs node:


Figure 6-6. Catalog Browser Surfacing Part of the SSIS Project and Configurations Metadata

Looking at Figure 6-6, though, you probably already see some differences between Catalog Browser and the SSMS Object Explorer Integration Services Catalogs node. Please note the Project Parameters and Project References virtual folders present beneath the SSIS Catalog Project, in addition to the Packages virtual folder.

Expanding these virtual folders reveals the SSIS Catalog Project Parameters and Reference as shown in Figure 6-7:


Figure 6-7. SSIS Catalog Project Parameters and References

Please remember in Figure 6-3 the SSMS Object Explorer Integration Services Catalogs node surfaced all parameters – SSIS Catalog Project Parameters and SSIS Package Parameters. Where are the Package Parameters? They’re here in Catalog Browser. To view Package Parameters, expand the SimplePackage.dtsx SSIS Package node as shown in Figure 6-8:


Figure 6-8. Viewing SSIS Package Parameters

Please recall Connection Manager Properties are treated as Parameters in the SSIS Catalog. They are prefixed with “CM.”. We see the SSIS Package Connection Manager vmDemo\Demo.TestDB1 Connection String property is mapped to an SSIS Catalog Environment Variable named ConnectionString.

To surface the Reference used for the Reference Mapping, expand the Package References virtual folder as shown in Figure 6-9:


Figure 6-9. Viewing the Package Reference

Expanding the Package Reference virtual folder surfaces the Test/envConection1 Catalog Environment. Expanding the Test/envConection1 Catalog Environment reveals the Catalog Environment Variable named ConnectionString is mapped to the vmDemo\Demo.TestDB1 Connection String property.

But what’s the value of the ConnectionString Catalog Environment Variable? Expand the envConnection1 Catalog Environment in the Environments virtual Folder to view the collection of Catalog Environment Variables, their data types, and their values as shown in Figure 6-10:


Figure 6-10. Catalog Environment Variables, Data Types, and Values

SSIS Package Properties includes a Package Version property constructed from the Version Major, Version Minor, and Version Build properties of the SSIS package. Every time a developer saves an SSIS package, the Version Build property increments. It’s possible to revise an SSIS package and “trick” the Version Build property by manually setting it. I have not yet found a valid use case for doing so to SSIS Catalog-deployed SSIS packages.

The Package Version property can be used to detect different versions of SSIS packages deployed to an SSIS Catalog. Because SSIS developers can manually set the Version Build property, Package Version is not a reliable indication.

The Package Properties virtual folder surfaces SSIS Package metadata as shown in Figure 6-11:


Figure 6-11. SSIS Package Properties

Catalog Properties are handy for detecting differences in patch levels (via the Schema Build property). Catalog Version is a property exposed by Catalog Base – the custom Catalog object that lies beneath Catalog Browser.

Catalog Base works with SSIS 2012, 2014, and 2016 Catalogs.

An update that also works with SSIS 2017 is under development.

The Catalog Properties virtual folder surfaces SSIS Catalog metadata as shown in Figure 6-12:


Figure 6-12. SSIS Catalog Properties

Conclusion

Catalog Browser surfaces SSIS Catalog artifacts, configurations metadata, and artifact properties in a single view.

 

ETL Instrumentation: Logging SSIS Variable Values

During the December 2017 delivery of Expert SSIS, I was asked if there is an SSIS Catalog Logging Mode that will display the value of variables. I responded that I wasn’t aware of a logging level that accomplishes this, but then – as sometimes happens – I could not let it go. It is an excellent question. So I dug in…

Answer #1: YesBut

Here’s one way to do it: Open the Variables window in SSIS. Click the Grid Options button on the Variables window toolbar:

When the Variable Grid Options dialog displays, check the “Raise event when variable value changes” checkbox:

Click the OK button to return to the Variables window and note the new property: “Raise Change Event.” Set Raise Change Event to True:

Once deployed, the value of the variable will be logged in the SSIS Catalog at startup and each time the variable value changes:

Please note: the VariableValueChanged event is captured using the SSIS Catalog Basic Logging level.

Why isn’t this a resounding “Yes! Yes, you can log all variables values in the SSIS Catalog!”? Well, because this requires developers to open existing packages, open Variables, check the checkbox to even be able to see the property, and then change the property value – for every. single. variable.

Answer #2: Use a Script Task

You can add a Script Task to your SSIS package – similar to the Script Task shown below:

Open the Script Task Editor and click the ellipsis in the ReadOnlyVariables value textbox to open the Select Variables dialog. I like to select the System::PackageName and System::TaskName variables each time I use a Script Task for logging purposes (and I use Script Tasks for logging a lot). Then select the variable(s) for which you wish to log values:

When I design Production Data Integration solutions with SSIS, I log the values of parameters and variables at the start of each execution, and each variable used inside a Loop Container or Foreach Loop Container. Why? I don’t want to guess or dig or rely on defaults; I want the package to tell me what those values were when it executed. I consider this a best practice.

I add C# similar to that shown here to perform variable value logging:

Once deployed and executed, the Information Event raised in the Script Task publishes a message consumed by SSIS Catalog Basic Logging (the default SSIS Catalog logging level) as shown below:

I know many SSIS developers are uncomfortable using Script Tasks (and Script Components). While I understand the discomfort, I urge SSIS developers to learn enough .Net (C# or VB) to become comfortable using Script Tasks and Components in SSIS. In my opinion, .Net easily doubles your data integration capabilities with SSIS.

:{>

Learn more:
Expert SSIS Training (Live, Online)
IESSIS1: Immersion Event on Learning SQL Server Integration Services (In-person)

Coming Soon to SSIS Catalog Compare: Values Everywhere

I’ve been testing a new feature in SSIS Catalog Compare‘s catalog browser. I call it “values everywhere.” What do I mean by values everywhere? In the browser shown above, please note the reference mapping of the ConnectionString property for the package connection manager in SimplePackage.dtsx. Each Reference is listed as a child node of the reference mapping, and the values of each reference-mapped Environment Variable appears as a sub-node of the Reference.

My intent here is to surface related configurations metadata at the location where said metadata is consumed. I think that makes sense, reduces right-clicking, and is more intuitive.

I also surface reference-mapped environment variable values in the Package and Project References nodes.

As before, the environment variable values remain available (and update-able) in the Environments node.

I welcome your feedback. What do you think?

I Need Some Updates

As I type, 2017 is drawing to a close. For many, this time of year is a time of reflection on the past year and planning for the coming year. I recently saw this Windows message in the corner of my screen and realized, “You’re right, Windows. I do need some updates.”

I’m not big on New Year’s resolutions for a number of reasons:

  • I’d rather set goals.
  • I’m not a fan of the timing.

Are Goals Better?

Yes. Much is written about Specific, Measurable, Achievable, Realistic, Time-bound goals with good reason.

The Timing?

My friend, brother, and co-host of the Data Driven podcast, Frank La Vigne (blog | @Tableteer) follows a family tradition and sets goals for 1 Nov.

I set goals almost weekly. I don’t wait until the beginning of a calendar year. But I don’t exclude this week from my normal goal-setting. I mark another trip around Sol like many but in goal-setting terms, it’s just another week.

Current Goals

I have a stack of goals that I categorize as near-term, mid-term, and far-term.

Near-term goals include business endeavors for DILM Suite and Enterprise Data & Analytics and Biml Academy.

Mid-term goals include blog series and book projects that I am writing and planning.

Far-term goals include things like relocating to Costa Rica.

If you peruse these goals, you’ll notice that they move farther from work and towards personal. That’s intentional. You may also notice there’s very little here about personal goals. Do I have personal goals? Yep, I have personal goals. They’re, well, personal.

You may wonder how I time-slice activities – especially when it comes to work on the far-term stuff. Many nearer-term goals actually feed my long-term goals. In this way, near- and mid-term stuff are milestones towards achieving far-term stuff.

How so?

Consider DILM Suite which consists of mostly free and some open source products. The free stuff – like Catalog Browser – is helpful and cool, but it also leads to sales of  SSIS Catalog Compare. The for-pay products – SSIS Frameworks Commercial and Enterprise Editions and SSIS Catalog Compare – are gaining momentum. We’re selling more licenses and implementations. Either – or both – could be acquired by a larger entity in coming years. An acquisition could play nicely into my plans to relocate to Costa Rica.

In Practice

The topic of your goals isn’t nearly as important as their existence. You should have goals. If you don’t, please consider starting with a single goal.

Here are some tips and tricks I find useful (this is not an exhaustive list!):

Write it down somewhere where you can see it regularly – like a sticky note on the bathroom mirror or a legal pad in your work area. Reading it – even causally perusing it – helps remind you of your goal(s).

Re-prioritize regularly. Things change. Having goals will actually drive changes – that’s a normal and healthy part of the cycle. Life happens. You do not exist to serve your goals; your goals exist to serve you.

Check it off when you complete a goal. It’s a good feeling to get something done, to ship, to throw the switch in Production, to hold that book or read your name in the by-line of an article or blog post. Consider the words of the redneck sage, Larry the Cable Guy: “Get-r-done!”

In Conclusion

As 2017 draws to a close, don’t spend too much time looking back. Spend the vast majority of your time looking forward to 2018. That’s what I do. You can’t do a durn thing about 2017 once it’s in the past. Am I advising you forget about 2017? Goodness, no! Learn from 2017. Build on 2017. That’s what I’m saying.

Happy New Year, everyone!

o<:{>