Free Webinar – Using Biml as an SSIS Design Patterns Engine

The next free webinar from Enterprise Data & Analytics is Using Biml as an SSIS Design Patterns Engine Thursday 2 May 2019 at 12:00 PM EDT.

Abstract

Perhaps you’ve heard the buzz about Business Intelligence Markup Language (Biml) and wondered, “Why is everyone so excited about Biml?” Occasionally, a new technology emerges that changes everything. For SSIS developers, Biml is one such technology. Business Intelligence Markup Language (Biml) is a powerful solution for rapidly creating SSIS packages and solutions. SSIS Design Patterns support various load profiles. Combined, Biml and SSIS Design Patterns offer a compelling solution for automating enterprise data integration.

In this session, you will learn:

How to rapidly build multiple SSIS packages using BimlHow to apply different SSIS Design Patterns using BimlHow to automate SSIS Design Patterns using Biml.

I hope to see you there.
Register today!

HowTo: Install SSIS Framework Community Edition

SSIS Framework Comunity Edition is available from DILM Suite (Data Integration Lifecycle Management Suite). SSIS Framework CE is not only free, it is open source.

To install SSIS Framework Community Edition, visit DILM Suite and click SSIS Framework Community Edition:

When the SSIS Framework Community Edition page opens, click the first link:

This link will take you to GitHub where you may download the code:

Unzip the file and open the PDF titled SSIS Framework Community Edition Documentation and User Guide:

The documentation includes instructions for creating an SSIS Catalog on-premises – or an instance of Azure-SSIS in Azure Data Factory:

The Documentation and User Guide next walks you through the process of upgrading the SSIS packages to the latest version of SSIS, followed by instructions for deploying the Framework project:

The next step is to execute T-SQL scripts that configure the project you just deployed and also create the SSIS Framework objects:

The remainder of the document walks you through testing and customizing the SSIS Framework for your own needs. A customizing primer is included, in which I demonstrate how to extend Framework functionality.

Learn More

Click here to learn about enterprise SSIS Framework editions.
Click here to learn about SSIS Framework Browser, which is free.

Azure DevOps, Software Development, and Git Part 1 – Applied

I started this series with the title “Azure DevOps, SSIS, and Git” in a post titled Azure DevOps, SSIS, and Git Part 0 – Getting Started. I am renaming it because “I do what I want!” Seriously, I renamed the series – changing “SSIS” to “Software Development” – because I’m going to write about more than just developing SSIS. And because…

SSIS is software development.

– Andy, circa 2006

Regarding the DevOps Mindset

In my humble opinion, DevOps is first a mindset. DevOps is a way of thinking that leaks into doing. DevOps is more cultural than technical.

Technology shifts are easy.
Cultural shifts are hard.

Applying DevOps

How do I begin applying DevOps, then? It starts with starting. Feel free to sing along at home. Peppered throughout this post you will find transparent notes from me describing my current understanding of how Azure Devlops, Git, and Visual Studio interact. I confess that – at the time of this writing – I continue to learn. If you read my post titled “You Do Not Know What You Are Doing” this should come as no surprise.

0. Create a Repo

I begin a new C# project. But then again, no. I start by browsing to dev.azure.com and creating a new project because when one uses Git in Azure DevOps, this is how one begins:

(click to enlarge)

After signing in I create a new project by clicking the Create Project button. I am making this project public so you, Dear Reader, wlil be able to grab the code. I name this project Database Metadata Loader and provide an optional description. I select Public Visibility. I choose to use Git for Version Control and Scrum for the Work Item Process:

The Database Metadata Loader project is created:

Once the Azure DevOps repository exists, it’s time to…

1. Connect Visual Studio to the Project

Open Visual Studio and connect to the new Azure DevOps project. This is so new to me that I return to my earlier post titled Azure DevOps, SSIS, and Git Part 0 – Getting Started – specifically the section titled “Connect to the Azure DevOps Project” – to remind myself how to do it.

I connect to the new project:

Learning from the last time around, I follow Visual Studio’s lead and cli k the Clone button. After the clone operation succeeds, Team Explorer presents the next step in the process, creating a new project “in this repository“:

I find this step challenging after years of opening Visual Studio and immediately creating a new project. This feels… disjointed. I struggle through my CDO-driven (CDO is OCD with the letters in their proper sequence) discomfort because this is the way DevOps is done.

2. Create a New Project

Click the “Create a new project or solution” link in Team Explorer to open the Visual Studio New Project window:

(click to enlarge)

When I created this project I accepted most of the defaults, changing only the Project Name (to DatabaseMetadataLoader).

As before (in the section titled “Connect to the Azure DevOps Project in my post titled Azure DevOps, SSIS, and Git Part 0 – Getting Started), Solution Explorer indicates the project and artifacts have been added to Source Control via decoration with the green “+” sign:

3. Commit and Push

The next step is to commit and push from the Team Explorer Changes page. I enter a checkin comment which is required by default with git:

Commit adds the changes to the local repository. Push updates the changes to the repository:

I confess I do not yet have my mind around the differences between Commit and Push and Sync. I perform them in this order: Commit, Push (or Commit and Push) followed by…

4. Sync

Syncing is performed in the Team Explorer Synchronization window. Below is a before and after screenshot of the Synchronization window. In the screenshot on the left, I’ve executed a Push (after a Commit). In the screenshot on the right, I’ve executed a Sync.

If I examine the Repos “folder” in Azure DevOps, my project files appear beneath Files:

I am stilll learning. I do not understand which Git actions correspond with which results when using Azure DevOps connected to a Visual Studio project. I will continue to learn. I believe the best way to learn is to invest time and effort working – even playing – with the technology one wishes to learn. Some of the best tools were once toys.

Conclusion

The code forDatabase Metadata Loader is publicly available. Depending on when you read this post, it may be in a different state of development.

Staging Data Temporarily in an SSIS Data Flow

My friend Slava Murygin (@SlavaSQL) recently asked a question on Twitter:

Populated query results into an object variable sucessfully used it in a data flow task, but can’t use it second time. Is there an easy way around?

Tim Mitchell (Blog | @Tim_Mitchell | Tim’s post: Temp Tables in SSIS) and I engaged. You can read the thread here. Spoiler: Tim and I agree that staging data temporarily in a work table is a good solution.

“Good Solution?”

Like all SSIS solutions (and software design solutions, and life solutions), staing data temporarily in a work table is not the only solution. Why do Tim and I agree on work tables? My best answer is, it reduces the total cost of ownership.

What are the Other Solutions?

There are several alternative solutions. You could stage data temporarily in a Recordset Destination. There’s a way to make SSIS work with tempDB. You can stage to a Raw File. You can use an SSIS Cache (though I believe this remains an Enterprise-only feature). There are yet other solutions.

“Why do You Advocate for Work Tables, Andy?”

I’m glad you asked. Work tables are:

  • Simple
  • Fast
  • Understood by almost every SSIS developer, analyst, and DBA

Simple

A work table is a table defined in a nearby data location; either a schema in the source or target database or in a database on the same instance. I take a constraint-driven approach to work table location selection. Closer – a schema in the same database – is often better for performance.

I write this knowing some folks will frown at the suggestion of polluting a data source or target database with additional schemas and tables. Best practices exist for a reason. It’s helpful to maintain a list of best practices and to include in this list the reasons each practice exists. This could be a case where violating one or more best practices is justified.

In some cases – like when interacting with databases for third-party solutions – adding schemas and tables is a bad idea (or violation of an EULA). In those cases, stand up a work database on the same instance and place the work table there, unless…

Some data integration design patterns require joining the work table to a source or target table, and some relational database engines do not support three-part naming in SQL queries. My suggestion in those cases is to be creative.

In my humble opinion, “we’ve always / never done it that way” is a warm and open invitation to explore why it’s always / never been done that way.

Fast

A work table should be used by the data integration process during data integration execution. It should only be queried occasionally, and only by development or support personnel. I refer to this state as owned, and say things like, “WorkTable1 is owned by the data integration process.” Note: ownership has security implications, even in Production.

Since the data integration process owns the work table, developers should be able to use an OLED DB Destination configured for fast load (if supported by the provider) to populate a work table. This will make staging temporary data very fast. The data integration process should be able to truncate and manipulate data in a work table based on the requirements of the load pattern.

Easily Understood

Data loaded to a work table may be persisted between package executions. If something unfortunate happens, development and operations personnel may query the table to see data that was persisted – and the state in which it was persisted – during the previous execution.

Data in a work table is accessible using SQL. Not everyone understands SSIS. Almost everyone working around data understands SQL syntax.

New SSIS Catalog Browser Azure Properties

Thanks to some help from my friend, Sandy Winarko, SSIS Catalog Browser version 0.8.9.0 now includes Azure-SSIS Catalog Properties for Azure Worker Agents and Azure Status. When Azure-SSIS is stopped or starting, SSIS Catalog Browser reports an Azure Status of “Not Running”:

(click to enlarge)

Once even one Azure-SSIS worker agent starts, SSIS Catalog Browser reports an Azure Status of “Running” and surfaces the number of Azure Worker Agents currently running:

(click to enlarge)

Once all Azure-SSIS worker agents start, SSIS Catalog Browser surfaces the number of Azure Worker Agents currently running:

(click to enlarge)

SSIS Catalog Browser is one of the free utilities available for download from DILM Suite (Data Integration Lifecycle Management Suite).

The Recording for Loading Medical Data with SSIS is Available

Kent Bradshaw and I had a blast delivering Loading Medical Data with SSIS earlier today! If you missed the webinar and, perhaps more importanly, coupon codes to save on upcoming Enterprise Data & Analytics Training.

Enjoy the video!

We demonstrated a handful of (free!) DILM Suite (Data Integration Lifecycle Management) utilities:

Join us next week for another free webinar: Enterprise SSIS Execution!

Learn Power BI From Eugene Meidinger 27 Jun!

I’m excited to announce the availablity of a brand new Enterprise Data & Analytics course titled Power BI: Implementing the Other 90% by Eugene Meidinger (@SQLGene | SqlGene.com)!

Abstract

One of the things Eugene Meidinger found frustrating when he first learned Power BI was all of the behind-the-scenes configuration required to bring Power BI to the enterprise. It was easy to find information about charts and graphs, but difficult to learn about how all the moving parts fit together.

This course focuses on two main areas: data wrangling and administration.

Session 1: Database Theory

When creating a model, it is important to know Power BI is optimized for star schema in particular and filtering/aggregating in general. Within Power BI lies a columnar database that has really good compression. This means Power BI’s model can handle a certain amount of flattening/denormalizing gracefully.

Business users are Power BI’s target demographic. We begin our exploration of Power BI with a review of database fundamentals for business users, covering topics such as:

  • Primary keys
  • Normalization
  • Star schema

Power BI offers two data manipulation languages:

  • Power Query (M)
  • DAX
  • (3 languages if you include R!)

Session 2: Power Query (M)

Power Query is designed for business users. It started as an Excel add-in to assist users familiar with the Excel formula bar. One result: Power Query surfaces an intuitive graphical user itnerface (GUI), but behind the scenes M syntax is generated.

In this session, Eugene discusses and demonstrates tips and tricks for using Power Query to clean and prepare data.

Session 3: DAX

To model and add context to data Power BI users apply DAX. DAX appears deceptively simple, very similar to Excel formulas – but DAX requires thinking in terms of columns and filters, not in terms of rows.

In this session, Eugene scales some of the steeper slopes of the DAX language learning curve.

Session 4: Data Gateways

Many enterprises practice hybrid data management in which some data and services reside in the cloud while other data and services reside on-premises. Data gateways are a way to bridge the Power BI service (in the cloud) with on-premises data.

in this session, Eugene discusses and demonstrates Data Gaterway installation and configuration. Topics include:

  • Scheduling data refreshes
  • Alternative query methods
  • DirectQuery

Eugene authored a Pluralsight course related to this session: Leveraging Timely On-premises Data with Power BI.

Session 5: Licensing and Deployment

In this session, Eugene compares and contrasts Power BI licensing scenarios, including:

  • Power BI Pro
  • Power BI Reporting Server
  • Power BI Premium

Because it can be difficult to keep up with all of the options, Eugene discusses and demonstrates several ways to deploy Power BI dashboards:

  1. Personal workspaces
  2. App Workspaces
  3. Organizational content packs
  4. Publish to web
  5. Sharepoint
  6. Power BI Premium
  7. Power BI Embedded
  8. Power BI report server

Session 6: Security and Auditing

Securing Power BI and the data it surfaces is no longer optional.

In this session, Eugene discusses and demonstrates:

  • Data access management
  • Report access
  • Sharing configuration
  • Row-level security in Power BI and SSAS
  • Auditing via:
    • Unified Audit log for Office 365
    • PowerShell
    • Data Gateway configuration

Conclusion

In conclusion, Eugene says, “Overall I’m pretty proud of the contents. This is the kind of course I wish I had been able to attend 3 years ago.”

About Eugene

Starting out as an accidental DBA and developer, Eugene Meidinger now focuses primarily on BI consulting. He is a Pluralsight course author who has been working with seven years’ SQL Server experience. Eugene holds SQL Server certifications and regularly presents at community events including SQL Saturdays and his local user group.
His current focus is Power BI and related areas.

Register today!

Free Webinar – Enterprise SSIS Execution

Join Kent Bradshaw and me as we present (another) free Enterprise Data & Analytics webinar Tuesday, 23 Apr 2019 at 12:00 PM EDT: Enterprise SSIS Execution.

Abstract

SQL Server Integration Services (SSIS) is a powerful enterprise data integration tool that ships free with Microsoft SQL Server. Join Andy Leonard – Microsoft Data platform MVP, author, blogger, and Chief Data Engineer at Enterprise Data & Analytics – and Kent Bradshaw – Database Administrator, Developer, and Data Scientist at Enterprise Data & Analytics – as they demonstrate several ways to execute enterprise SSIS.

Join this webinar and learn how to execute SSIS from:

  • SSDT (SQL Server Data Tools)
  • the Command Prompt
  • the SSIS Catalog
  • a metadata-driven SSIS Framework

Register today!

:{>

Regarding SSIS Frameworks, Part 1 – Execution

Why Use a Framework?

The first answer is, “Maybe you don’t need or want to use a framework.” If your enterprise data integration consists of a small number of SSIS packages, a framework could be an extra layer of hassle metadata management for you that, frankly, you can live without. We will unpack this in a minute…

“You Don’t Need a Framework”

I know some really smart SSIS developers and consultants for whom this is the answer.
I know some less-experienced SSIS developers and consultants for whom this is the answer.
Some smart and less-experienced SSIS developers and consultants may change their minds once they gain at-scale experience and encounter some of the problems a framework solves.
Some will not.

If that last paragraph rubbed you the wrong way, I ask you to read the next one before closing this post:

One thing to consider: If you work with other data integration platforms – such as DataStage or Informatica – you will note these platforms include framework functionality built-in. Did the developers of these platforms include a bunch of unnecessary overhead in their products? No. They built in framework functionality because framework functionality is a solution for common data integration issues encountered at enterprise scale.

If your data integration consultant tells you that you do not need a framework, one of two things is true:
1. They are correct, you do not need a framework; or
2. They have not yet encountered the problems a framework solves, issues that only arise when one architects a data integration solution at scale.

– Andy, circa 2018

Data Integration Framework: Defined

A data integration framework manages three things:

  1. Execution
  2. Configuration
  3. Logging

This post focuses on…

Execution

If you read the paragraph above and thought, “I don’t need a framework for SSIS. I have a small number of SSIS packages in my enterprise,” I promised we would unpack that thought. You may have a small number of packages because you built one or more monolith(s). A monolith is one large package containing all the logic required to perform a data integration operation – such as staging from sources.

(click to enlarge)

The monolith shown above is from a (free!) webinar Kent Bradshaw and I deliver 17 Apr 2019. It’s called Loading Medical Data with SSIS. We refactor this monolith into four smaller packages – one for each Sequence Container – and add a (Batch) Controller package to execute them in order. I can hear some of you thinking…

“Why Refactor, Andy?”

I’m glad you asked! Despite the fact that its name contains the name of a popular relational database engine (SQL Server), SQL Server Integration Services is a software development platform. If you search for software development best practices, you will find something called Separation of Concerns near the top of everyone’s list.

One component of separation of concerns is decoupling chunks of code into smaller modules of encapsulated functionality. Applied to SSIS, this means Monoliths must die:

A slide from the Expert SSIS training…

If your SSIS package has a dozen Data Flow Tasks and one fails, you have to dig through the logs – a little, not a lot; but it’s at 2:00 AM – to figure out what failed and why. You can cut down the “what failed” part by building SSIS packages that contain a single Data Flow Task per package.

If you took that advice, you are now the proud owner of a bunch of SSIS packages. How do you manage execution?

Solutions

There are a number of solutions. You could:

  1. Daisy-chain package execution by using an Execute Package Task at the end of each SSIS package Control Flow that starts the next SSIS package.
  2. Create a Batch Controller SSIS package that uses Execute Package Tasks to execute each package in the desired order and degree of parallelism.
  3. Delegate execution management to a scheduling solution (SQL Agent, etc.).
  4. Use an SSIS Framework.
  5. Some combination of the above.
  6. None of the above (there are other options…).

Dasiy-Chain

Daisy-chaining package execution has some benefits:

  • Easy to interject a new SSIS package into the workflow, simply add the new package and update the preceding package’s Execute Package Task.

Daisy-chaining package execution has some drawbacks:

  • Adding a new package to daisy-chained solutions almost always requires deployment of two SSIS packages – the package before the new SSIS package (with a reconfigured Execute Package Task – or an update to the ) along with the new SSIS package. The exception is a new first package. A new last package would also require the “old last package” be updated.

Batch Controller

Using a Batch Controller package has some benefits:

  • Relatively easy to interject a new SSIS package into the workflow. As with daisy-chain, add the new package and modify the Controller package by adding a new Execute Package Task to kick off the new package when desired.

Batch-controlling package execution has some drawbacks:

  • Adding a new package to a batch-controlled solutions always requires deployment of two SSIS packages – the new SSIS package and the updated Controller SSIS package.

Scheduler Delegation

Depending on the scheduling utility in use, adding a package to the workflow can be really simple or horribly complex. I’ve seen both and I’ve also seen methods of automation that mitigate horribly-complex schedulers.

Use a Framework

I like metadata-driven SSIS frameworks because they’re metadata-driven. Why’s metadata-driven so important to me? To the production DBA or Operations people monitoring the systems in the middle of the night, SSIS package execution is just another batch process using server resources. Some DBAs and operations people comprehend SSIS really well, some do not. We can make life easier for both by surfacing as much metadata and operational logging – ETL instrumentation – as possible.

Well architected metadata-driven frameworks reduce enterprise innovation friction by:

  • Reducing maintenance overhead
  • Batched execution, discrete IDs
  • Packages may live anywhere
Less Overhead

Adding an SSIS package to a metadata-driven framework is a relatively simple two-step process:

  1. Deploy the SSIS package (or project).
  2. Just add metadata.

A nice bonus? Metadata stored in tables can be readily available to both production DBAs and Operations personnel… or anyone, really, with permission to view said data.

Batched Execution with Discrete IDs

An SSIS Catalog-integrated framework can overcome one of my pet peeves with using Batch Controllers. If you call packages using the Parent-Child design pattern implemented with the Execute Package Task, each child execution shares the same Execution / Operation ID with the parent package. While it’s mostly not a big deal, I feel the “All Executions” report is… misleading.

Using a Catalog-integrated framework gives me an Execution / Operation ID for each package executed – the parent and each child.

“Dude, Where’s My Package?”

Ever try to configure an Execute Package Task to execute a package in another SSIS project? or Catalog folder? You cannot.* By default, the Execute Package Task in a Project Deployment Model SSIS project (also the default) cannot “see” SSIS packages that reside in other SSIS projects or which are deployed to other SSIS Catalog Folders.

“Why do I care, Andy?”

Excellent question. Another benefit of separation of concerns is it promotes code reuse. Imagine I have a package named ArchiveFile.dtsx that, you know, archives flat files once I’m done loading them. Suppose I want to use that highly-parameterized SSIS package in several orchestrations? Sure, I can Add-Existing-Package my way right out of this corner. Until…

What happens when I want to modify the packages? Or find a bug? This is way messier than simply being able to modify / fix the package, test it, and deploy it to a single location in Production where a bajillion workflows access it. Isn’t it?

It is.

Messy stinks. Code reuse is awesome. A metadata-driven framework can access SSIS packages that are deployed to any SSIS project in any SSIS Catalog folder on an instance. Again, it’s just metadata.

*I know a couple ways to “jack up” an Execute Package Task and make it call SSIS Packages that reside in other SSIS Projects or in other SSIS Catalog Folders. I think this is such a bad idea for so many reasons, I’m not even going to share how to do it. If you are here…

… Just use a framework.

SSIS Framework Community Edition

At DILM Suite, Kent Bradshaw and I give away an SSIS Framework that manages execution – for free! I kid you not. SSIS Framework Community Edition is not only free, it’s also open source.