SSIS, Docker, and Windows Containers, Part 1 – Why?

In my first post in this series, SSIS, Docker, and Windows Containers, Part 0 – Getting Started, I covered:

  • Installing docker
  • Choosing a Container OS
  • Pulling a test image
  • Pulling a Windows Container with SQL Server installed
  • interacting with with your instantiated container

In this post I reveal my ultimate goal and then travel down two rabbit holes to explain:
1. How I got here; and
2. One way to get answers to questions no one wants to hear.

My Ultimate Goal

My ultimate goal is to use one or more Windows containers running SQL Server to test SSIS. I can hear some of you thinking,..

“Why, Andy?”

My response? “Cuz.”
And, “I do what I want!”

Seriously, since I hopped the fence from developer to data I’ve dreamed of the day when I could practice lifecycle management with data-stuff like I used to practice lifecycle management with software development.

I recognize the obstacles. The greatest obstacle (in my humble opinion) is software is mostly stateless these days (these days started with Object-Oriented Programming and include its descendants). Stateless development solves lots of engineering problems in lifecycle management, and by “solves a lot of engineering problems” I mean some engineering problems simply don’t exist so lifecycle management for stateless stuff can simply ignore “a lot of engineering problems.”

A database, on the other hand, is all about that state. When it comes to managing lifecycle for a stateful platform – like a database – ACID gets dumped on many lifecycle management tools and solutions (see what I did there?).

Is it possible to manage a data-related lifecycle using stateless tools? Yes. But here there be obstacles. Let’s look at on use case:

Use Case 0

Your team releases an update that adds one column to one table. The column receives data. Someone realizes there’s a bug and determines a rollback is the solution. The web and middle-tier teams successfully rollback the release using functionality built into the release-management / DevOps enterprise tool.

What happens to the column in the table?

  • Is it dropped?
  • Is the data deleted?
  • Since this isn’t a post about data lifecycle management, let’s not even consider what happens if the data is considered personal-identifying-information or some tuple otherwise subject to privacy regulations.

You see the problem? The answer is always going to be, “It depends.”

This is the problem I am trying to solve, specifically for data integration.

I want to be able to easily deploy and rollback data integration deployments. But to accomplish this, I need more metadata (data about the data) and a bunch more automation. Why? My stuff has state – by definition.

“Not Supported”

If you have not yet figured this out, Microsoft is a software development shop. They’re people trying to use software to solve business problems for customers – just like you and me. Just like you and me, they have schedules and deadlines and competition and a need to generate a profit. in other words, Microsoft is a business.

When some redneck from Farmville Virginia starts asking all sorts of out-of-scope questions, he sometimes hears, “That’s not supported, Andy,” in response. Ask me how I know.

So it helps to be stubborn persistent. Persistence sometimes takes the form of thinking of another approach. When I began probing for ways to test SSIS in containers, my ultimate goal was to have the tests simulate Production as closely as possible. In 2019, Production SSIS means executing SSIS in the SSIS Catalog.

Is it possible to execute Production SSIS from the file system or the MSDB database in 2019? It is. I’ve not found a good use case for executing SSIS outside the Catalog, though – not in 2019. I am not trying to hurt anyone’s feelings, I promise. I’m just saying that the SSIS Catalog has been out for north of six years at the time of this writing. Plus: The only way to execute Azure-SSIS packages is via an SSIS Catalog in the cloud (again, at the time of this writing).

At Enterprise Data & Analytics we help clients migrate from file- / MSDB-based SSIS execution to the SSIS Catalog (both on-premises and Azure-SSIS). Some clients realize the benefits of SSIS Catalog-based execution and call us for help with training and consulting. Others ask for help performance-tuning SSIS, and we almost always recommend using the SSIS Catalog because SSIS packages almost always execute faster when executed from the Catalog. When do we not recommend using the SSIS Catalog? Some enterprises have designed their data environments in such a way that using the SSIS Catalog would require a major shift. The technical portion of the shift would be relatively straightforward, but – often – the cultural shift is more disruptive. Not all enterprises are willing to make such a cultural shift; to endure the disruption.

Getting Around “Not Supported”

When I started asking questions like, “Is it possible to run the SSIS Catalog in a Windows container?” the response was, “That’s not supported.”

Fine. What is supported? Running the SSIS Catalog on SQL Server installed on Windows Server Core. I was trying to solve many of the same problems in a container that would need to be solved running an SSIS Catalog on Core. It turned out many of these problems had indeed already been solved.

More Information

If you can make it to SQL Saturday #813 – Boston BI Edition on 30 Mar 2019, Liz Baron and Sreeja Pullagura are delivering an information-packed session titled SSIS DevOps with Azure Containers on this very topic. I’ve had the privilege and honor to work with Liz, Sreeja, and the awesome data engineering team at Teach For America – a worthy non-profit.

I predict Sreeja’s and Liz’s presentation is going to be a session that moves the needle for Data Integration Lifecycle Management (DILM).

My next several blog posts on this topic – sharing some of my efforts and what I’ve learned about executing SSIS using containers – will begin appearing 31 Mar 2019.

Shameless plug: I am delivering Intelligent Data Integration with SSIS – a full-day pre-conference training session – Friday, 29 Mar 2019.
Register here!
I hope to see you there!

:{>

Delivering Intelligent Data Integration through SSIS in Dallas 31 May 2019!

I am honored to announce I will be delivering Intelligent Data Integration through SSIS – a day-long SQL Saturday 841 – Dallas pre-conference session 31 May 2019!

Description:

What is Intelligent Data Integration? SSIS (SQL Server Integration Services) packages developed using tried and true design patterns, built to participate in a DevOps enterprise practicing DILM (Data Integration Lifecycle Management), produced using Biml (Business Intelligence Markup Language) and executed using an SSIS Framework.

Attend a day of training focused on intelligent data integration delivered by an experienced SSIS consultant who has also led an enterprise team of several ETL developers during multiple projects that spanned 2.5 years (and delivered).

Attendees will learn:

  • a holistic approach to data integration design.
  • a methodology for enterprise data integration that spans development through operational support.
  • how automation changes everything. Including data integration with SSIS.

Topics include:

  1. SSIS Design Patterns
  2. Executing SSIS in the Enterprise
  3. Custom SSIS Execution Frameworks
  4. DevOps and SSIS
  5. Biml, Biml Frameworks, and Tools

Prerequisites: Familiarity with SQL Server Integration Services (SSIS).

Continental Breakfast, Lunch and Afternoon Snacks are provided.

I hope to see you there!

:{>

Enterprise Data & Analytics Welcomes Shannon Lowder!

I am honored and thrilled to welcome Shannon Lowder (@shannonlowder | blog | LinkedIn) to the Enterprise Data & Analytics team!

Shannon is a data engineer, data scientist, BimlHero (though not listed on the page at the time of this writing), and shepherd of the Biml Interrogator open source project. If you use Biml to generate SSIS projects that load flat files, you need Biml Interrogator.

Shannon, Kent Bradshaw, and I are also co-authoring a book tentatively titled Frameworks. (Confession: Kent and Shannon are mostly done… I am slacking…)

Shannon brings a metric ton of experience to serve our awesome clients. He has years of experience in data analytics, serving recently in the role of enterprise Lead Data Scientist. Shannon’s experience spans supply chain management, manufacturing, finance, and insurance.

In addition to his impressive data skills, Shannon is an accomplished .Net developer with enterprise senior developer experience (check out Biml Interrogator for a sample of his coding prowess).

Shannon is a regular speaker at SQL Saturday events, presenting on topics that include Business Intelligence, Biml, and data integration automation. He is a gifted engineer with experience in framework design, data integration patterns, and Azure who possesses a knack for automation. Shannon is an avid “tinkerer” who enjoys learning. He has experience implementing Azure Machine Learning and applying AI to predictive analytics using sources classified Big Data. Shannon is also a practitioner of data integration DevOps with SSIS. In other words, he fits right in with our team here at Enterprise Data & Analytics!

As Shannon writes on his LinkedIn profile:

I am a data guy with a passion for partnering with clients to solve their database and technology issues. Over my career, I’ve played all the roles: database developer, administrator, business intelligence developer, and architect, and now consultant. I’m the guy you call in when you have the impossible problem and everyone tells you it cannot be solved. I automate solutions in order to free your current staff to do the higher value tasks. I bring solutions outside of traditional relational database solutions, in order to find the shortest path between you and your goals.

As an accomplished Microsoft SQL data professional, recognized BimlHero, and practicing Data Scientist, I’m the resource you need to extract the most value from your data.

I’m humbled and thankful and excited to watch Enterprise Data & Analytics continue to (quietly) grow – adding cool people (another announcement is forthcoming) and service offerings like Data Concierge. It’s very cool to watch!

Welcome Shannon! I’m honored to work with you, my brother and friend.

For more information, please contact Enterprise Data & Analytics!

The 2:00 AM Test

I’m working with a collection of clients these days providing data integration, cloud, and DevOps advice. I love architecture work! My clients love the results. Everyone wins.

Kent Bradshaw and I are tag-teaming with a couple clients who are converting data integration from other platforms to SSIS. In a recent meeting Kent mentioned “the 2:00 AM test.”

“What’s the 2:00 AM Test, Andy?”

I’m glad you asked! The 2:00 AM test is a question developers should ask themselves when designing solutions. The question?

“Will this make sense at 2:00 AM? When I have – or someone else has – been awakened by a job or application failure? Will I be able to look at this and intuit stuff about how this is supposed to work?”

Future You Will Thank You

The reason the 2:00 AM test is important is because later – at some point in the future when you’ve long-since stopped thinking about the stuff you’re thinking about when you designed this solution – you (or someone) will be awakened at 2:00 AM by some failure. Even if it’s you, there’s a good chance you won’t remember all the nuances of the design. Even if you do, you may not remember why you designed things the way you did.

So…?

Be like Hansel and Gretel. Leave yourself some breadcrumbs. What kind of breadcrumbs?

Comments

Code comments are the best way to record what you are thinking at the time you are thinking it. Consider the lifetime of a software solution. It may take an hour, a day, week, or month to build; but it may be in production for years. Are you going to remember why you made each decision you made? Are you going to be able to recall – years later – why you zigged instead of zagging?

Call out anything not readily accessible, anything invisible on the initial view of the immediate coding surface. If you’re writing VB or C#, include a comment explaining – at a minimum – where to learn more about classes and objects not readily available. If you used NuGet to import stuff, will it kill you to copy and paste the command into a comment?

In SSIS, if you’re using variables or parameters or event handlers, add annotation to the control flow or data flow task.

Leave yourself a note.

Practice Good Naming

My friend Joel has a t-shirt based on this tweet:

One the two hard things is naming things. I suggest descriptive names for stuff. Which stuff? In VB or C#, variables, methods, and classes for starters. For SSIS I heartily recommend Jamie Thomson’s iconic post titled Suggested Best Practises and naming conventions.

Good naming conventions promote self-documenting code.

Small Chunks

Coders on all platforms develop a sense of a “good size.” An easy trap to fall into is attempting to “boil the ocean.” Break things up. Practice separation of concerns. Write decoupled code.

You will find decoupled, small, function-of-work code is simpler to test, easier to maintain, and promotes code reuse. 

Conclusion

Future you will thank for incorporating these best practices in your coding, regardless of your software development platform.

How May I Serve You?

2018 is drawing to a close.

Some readers recently shared requests for more blog posts on certain topics. I thought I’d put the question to my audience (both of you, hi Mom!) to gather your thoughts. Feel free to leave a comment or reach out via email.

Here are some topics I plan to blog about in 2019, Lord willing:

  • SSIS (of course)
  • Azure Data Factory
  • Azure DevOps
  • Data Integration Testing
  • Data Integration Lifecycle Management (DILM)
  • Containers

What do you think? Good list? 

If you were the boss of me, which topics would you like to see – or see more of? Any additions to this list?

Merry Christmas!
*<:{>

SSIS, Docker, and Windows Containers, Part 0 – Getting Started

To begin tinkering with SSIS in containers, you first need to install Docker. There are some prerequisites. I will not exhaust the prerequisites here. I strongly suggest you familiarize yourself with the requirements for Docker prior to attempting to install the software.

Since I use a PC (a Lenovo P51S) running Windows 10 Pro, I chose to use Docker for Windows.

I Needed a HyperVisor

I chose to run Docker for Windows with Hyper-V:

One reason I upgraded to Windows 10 was to work with containers. I read posts and articles that stated I could use Docker for Windows with VirtualBox, and I had been using VirtualBox for a long time. When I started using VirtualBox, it was the only HyperVisor that was:

  1. Free; and
  2. Supported 64-bit guests.

I decided to switch to Hyper-V, though, and rebuilt my demo and development virtual machines in Hyper-V.

IWOMM; YMMV (It works on my machine; your mileage may vary…)  😉

Choose a Container OS Platform

Once Docker is installed you need to decide whether to work with Linux or Windows containers, but you can switch anytime:

One nice thing about switching is it’s fairly painless, as one may glean from the message that displays when I click “Switch to Linux containers…”:

Docker for Windows with Windows Containers

The cool kids are using Linux containers, especially the cool SQL Server kids. I’ve tinkered a little with SQL Server on Linux. I’m interested in SSIS, though, so I’ve been focusing on running Windows containers in Docker for Windows.

Getting Started

Containers are like lightweight virtual machines. They typically consume less disk space. Containers spin up relatively quickly. A developer can configure a container, persist it in some state, and then turn it off. It’s faster and lighter than working with a VM. There are other benefits that we will get to later in this series.

After installing Docker, your next step is pulling an image.

Pulling an Image

I can hear you thinking, “Umm, Andy… what’s an image?” I’m glad you asked. From the Docker Glossary:

An Image is an ordered collection of root filesystem changes and the corresponding execution parameters for use within a container runtime. An image typically contains a union of layered filesystems stacked on top of each other. An image does not have state and it never changes.

I can now hear you thinking, “Umm… what?” I think of an image as a pre-configured container. What’s configured in the pre-configuration? Well, the operating system and other software I may want to use.

For example, the hello-world image is relatively small and will test your installation of docker. In the animated gif below, I show how I:

  • Search for images named “hello-world”
  • Locate the name of an image labeled “Official” and named “hello-world”
  • Pull the “hello-world” image (sped up – my internet is not that fast out here in Farmville…)
  • Re-pull the “hello-world” image to show how things look when the image is up to date.
  • Run the image

(click to open, enlarged, in a new tab)

As stated earlier, the hello-world image is a test.

Searching, Pulling, and Starting a Windows Container with SQL Server Installed

To search for, pull, and start a Windows container with SQL Server installed and configured, execute the following Docker commands:

    1. docker search mssql
    1. docker pull microsoft/mssql-server-windows-developer
  1. docker run -d -p 1433:1433 -e sa_password=$up3r$3cr3t -e ACCEPT_EULA=Y –name mySqlServerContainer microsoft/mssql-server-windows-developer

If all goes well, you will see the container id – a hexadecimal number such as:

2d06a9a756b7b75bb0e388173bdd6925ba712e8843848610b5f5276a69f4bf19

Command Line Switches

#3 above has a lot of switches on that command line. Here’s what they mean:

  • -d == detach, which tells docker to run the container in the background and print the container id
  • -p == publish list, which publishes a container’s port(s) to the host
    • I use -p to map the container’s port 1433 (SQL Server uses port 1433) to my laptop’s port 1433
  • -e == env list, which sets a container’s environment variable(s) to a value (or values).
    • I use -e to set the sa_password and ACCEPT_EULA environment variables
  • –name == assigns a name to the container
  • The final argument is the name of the image

By the way, you can get help on any command in docker by typing docker <command> — help. To complete the list above, I typed:

docker run –help

Interacting with Your Newly-Started Container

If you saw that long hex number, your SQL Server Windows container started. It’s a brand new container – it still has that new container smell.

There are a few ways to interact with this container. Let’s look at one, PowerShell.

Connect to PowerShell in mySqlServerContainer using the following command:

docker exec -it mySqlServerContainer powershell

Once connected, you can use PowerShell to execute all sorts of commands, such as:

dir
(or ls):

Cool? I think so too.

Conclusion

I’m going to stop here because this post is long enough and this is enough to get you started using SQL Server in a Windows container. That was my goal today.

In my next post – SSIS, Docker, and Windows Containers, Part 1 – Why? – I explain why this is important (to me, at least).

Enjoy!

Website Ch-ch-changes…

Inspired by Frank La Vigne’s (blog | @Tableteer) snappy update to the Data Driven home page, I decided to spruce up a couple / three of my websites including this one, andyleonard.blog.

I added the sidebar shown on the right of the site. It contains a more-readily-available search box and links to other websites I maintain – or co-maintain, as is the case with Data Driven , co-maintained with Frank. At Data Driven, Frank and I share interviews with people doing machine learning, artificial intelligence, data science – and working in related fields. At the time of this writing, we’ve been live for just over six months and have almost 100 shows posted. We’re honored to have over 30k downloads!

Biml Academy is an online training site that contains free and paid Biml training, like the Basic Biml Training course. Read this post in which I explain why Basic Biml Training is a good deal if you purchase it now instead of waiting (spoiler: as new content is added the price will increase but you will get access to the new content as it becomes available at no additional charge if you purchase now).

DILM Suite hosts Data Integration Lifecycle Management (DILM) tools and utilities. Most of the tools and utilities are free and many are open source. We charge for a few, like SSIS Catalog Compare and SSIS Framework Commercial and Enterprise Editions. But there’s the free and open source SSIS Framework Community Edition, and the free Catalog Browser, and a free Basic SSIS Catalog Dashboard in Power BI, and more! I updated the logo (seen here) and made a few cosmetic changes.

Enterprise Data & Analytics, to which we lovingly refer to as “EDNA” (not the character in The Incredibles… but no capes!) is a consulting firm specializing in data integration, data science, training, and independent verification and validation. We deliver data. I spiffed up the content at the EDNA site, adding verbiage, more images, and cleaning up a few visualizations.

Believe it or not, that’s not all. I have a couple other business endeavors that I enjoy. As a serial entrepreneur, I’m always thinking about the next product or service – and sometimes – the next website…

One Way to Connect PSDrive to a SQL Server Instance

“Everyone Stand Back! I’m Learning PowerShell!”

I keep telling folks, “Learn new stuff” and “then blog about it!” I figured it’s time I take my own advice.

I’m reading Learn Windows PowerShell in a Month of Lunches by Don Jones (@concentrateddon). I’m actually reading the first edition. I bought the book a while back and I’m just now reading it. Why? I got shamed on Twitter for using Azure CLI. It’s for the best, I promise. I’ve been needing to learn PowerShell. And Hadoop. And Machine Learning (I’ve been helping my Data Driven co-host, Frank La Vigne [@Tableteer] some by tech-editing his Artificially Intelligent column in MSDN Magazine).

Connecting to a SQL Server Instance

In the first lesson / lunch, Don shares how to use PSDrives. PSDrives allow you to do some very cool things in PowerShell – like connect to the registry and navigate it as you would the file system.  In this section, Don mentions you can connect to SQL Server and navigate the contents of an instance in a similar fashion. I had to figure this out!

But when I tried – being a (literal) Day One n00b – I got errors running commands I found online. For example, running:

$cred=Get-Credential
New-PSDrive -Credential $cred -Name dcBase -PSProvider SqlServer -Root ‘SQLSERVER:\SQL\vmDC\BASE\’

resulted in an error:

New-PSDrive : Cannot find a provider with the name ‘SqlServer’.

I searched for an answer and had difficult time locating what I was doing wrong. One key to learning is making the problem give up before you do. After an inordinate amount of time, I discovered I’d not loaded the sqlps module.

Now, if you’ve been using PowerShell for a while, you already have your environment configured to automagically load sqlps. If you’re a n00b, though, you may not. And I am a n00b.

Also, for people I’ve encouraged to start blogging: Here’s a topic. Right here. I picked up a book on a topic I know nothing about, starting reading, tried something on a tangent, made the problem give up before I did, and BAM! Instant topic.

My Fix

Even though I’m no Mike Fal (@Mike_Fal), Chrissy LeMaire (@cl), Rob Sewell (@sqldbawithbeard), or Aaron Nelson (@sqlvariant), I ran this one down and figured it out.

Import-Module sqlps
$cred=Get-Credential
New-PSDrive -Credential $cred -Name dcBase -PSProvider SqlServer -Root ‘SQLSERVER:\SQL\vmDC\BASE\’

I start the script I ran before with “Import-Module sqlps”. When I run this script, I’m prompted for credentials (as before – this is what the “$cred=Get-Credentials” line does – it creates a variable to hold my credentials):

Once I enter my credentials and click OK, it works!

So I saved it. I’m using the PowerShell ISE, so I opened the Scripts window by pressing Ctrl+R, copied my script and saved it:

Now I won’t ever have to figure that out again.

One Other Thing

In experimenting, I also learned that when connecting to local instances – that is, instances of SQL Server on the same server I’m running PowerShell – supplying credentials is unnecessary – at least when I am system administrator and a sysadmin on the SQL Server instance.

Browsing my SQL Server Instance

I next attempted to connect to my newly-mapped PSDrive and experienced another n00b moment. My command?

cd dcBase

The error?

cd : Cannot find path ‘C:\dcBase’ because it does not exist.

What the heck? I’m not trying to change the file system directory! Oh wait, yes I am. I need to type:

cd dcBase:

Once I do that, I can list available artifacts:

Now I can navigate! I can change the “directory” to Databases and execute an “ls” command to list the contents of the Databases “folder”:

Conclusion

This really isn’t the conclusion – I’m just getting started!

I hope you take away one or two things from this post:

  1. You do not have to be an expert in technology to blog about it. I am definitely not a PowerShell expert, and yet here I am, blogging about it.
  2. You see one way to create a PSDrive for a SQL Server instance and connect to it, along with a couple ways to not do those things.

I hope this helps…