Available for Pre-Order – The Biml Book!

I am excited to announce The Biml Book is now available for pre-order from Amazon and Apress!

I was honored to write with the inventor of Biml, Scott Currie, and an awesome team of BimlHeroes: Jacob Alley, Martin Andersson, Peter Avenant, Bill Fellows, Simon Peck, Reeves Smith, Raymond Sondak, Benjamin Weissman, and Cathrine Wilhelmsen.

Chapters

Part I: Learning Biml

  • Biml Tools
  • Introduction to the Biml Language
  • Basic Staging Operations
  • Importing Metadata
  • Reusing Code, Helper Classes and Methods

Part II: Biml Frameworks

  • A Custom Biml Framework
  • Using Biml as an SSIS Design Patterns Engine
  • Integration with a Custom SSIS Execution Framework
  • Metadata Automation
  • Advanced Biml Frameworks and BimlFlex

Part III: Biml Topics

  • Biml and Analysis Services
  • Biml for T-SQL
  • Documenting Your Biml Solution
  • Troubleshooting Metadata
  • Troubleshooting Biml

Part IV: Appendices

  • Source Control
  • Parallel Load Patterns in Biml
  • Metadata Persistence

Amazon projects the book’s release in early December but we anticipate the book will be available sooner.

Reserve your copy today!

:{>

BLObs in SSIS – Part 2 – The BLOb-Load Turbo Button.

As we saw in the first post in this series – BLObs in SSIS – Part 1 – How Does SSIS Load BLObs? – SSIS copies BLObs from their source to the location stored in the TEMP and TMP operation system environment variables. That location defaults to %USERPROFILE%\AppData\Local\Temp.

That’s a problem because that default location is on the same drive as the operating system and the operating system is busy running the server. For the same reason you wouldn’t configure data files on the OS drive, you don’t want to configure Data Flow Tasks to use the OS drive for BLOb temporary storage.

The BLOb-Load “Turbo Button”

Configure the Data Flow Task’s BLOBTempStoragePath property:

Configuring the Data Flow Task’s BLOBTempStoragePath property to anywhere other than the OS drive will improve SSIS Data Flow Task execution performance.

“That’s Great, Andy, But I Have a Bajillion SSIS Packages in Production Already. What About Them?”

You have options.

Option 1: Edit a Bajillion SSIS Packages

Not pretty, I know. But this is a valid option. I don’t recommend this option, but I mention it because it’s a valid solution. I recommend you not hard-code the BLOBTempStoragePath location. Rather, create an SSIS Project or Package Parameter and override that parameter at runtime:

Option 2: Edit a Bajillion SQLAgent Jobs

If SSIS execution is scheduled via SQLAgent SQL Server Integration Services Package job step type, you can persist Property Overrides on the Advanced tab:

To override, set the Property Path to “\Package\<Data Flow Task Name>.Properties[BLOBTempStoragePath]” and the Property Value to the File System Path you wish to use for BLObs transfer.

Similarly, you can edit dtexec command lines to override the BLOBTempStoragePath property of a Data Flow Task:

dtexec /SERVER <SqlServerInstance> /ISSERVER \SSISDB\<CatalogFolder>\<CatalogProject>\<PackageName> /SET "\Package\<Data Flow Task Name>.Properties[BLOBTempStoragePath]";<Path>

Option Three: Use an SSIS Framework

An SSIS Framework that supports SSIS Package Property Overrides – such as the DILM Suite SSIS Framework Commercial and Enterprise Editions – can make changing this property a matter of executing a query to update package execution metadata.

You may be asking, “But Andy, can’t the SSIS Catalog help?” Yes, some. In the next post – BLObs in SSIS – Part 3 – Help from the SSIS Catalog – we will look at ways the SSIS Catalog can help implement these options.

Conclusion

Changing the value of the Data Flow Task BLOBTempStoragePath property can dramatically improve SSIS execution performance. Using an SSIS Framework can help you manage such updates without changing or redeploying your SSIS packages.

:{>

Keep Up: Join my mailing list.

Need Help? Enterprise Data & Analytics delivers SSIS performance tuning and training.

Learn More:
IESSIS1: Immersion Event on Learning SQL Server Integration Services – 2-6 Oct 2017, Chicago

Presenting Use Biml to Automate SSIS Design Patterns at PASS Summit 2017

I am honored to present Use Biml to Automate SSIS Design Patterns at the PASS Summit 2017!

Abstract:

Perhaps you’ve heard the buzz about Business Intelligence Markup Language (Biml) and wondered, “Why is everyone so excited about Biml?” Occasionally, a new technology emerges that changes everything. For SSIS developers, Biml is one such technology. Business Intelligence Markup Language (Biml) is a powerful solution for rapidly creating SSIS packages and solutions. SSIS Design Patterns support various load profiles. Combined, Biml and SSIS Design Patterns offer a compelling solution for automating enterprise data integration.

In this session, you will learn:

How to rapidly build multiple SSIS packages using Biml
How to apply different SSIS Design Patterns using Biml
How to automate SSIS Design Patterns using Biml

I hope to see you there!

:{>

SSIS Version 15.0 Script Error

There I was…

Delivering ExpertSSIS training with Brent Ozar Unlimited (we’re going to present another ExpertSSIS class – stay tuned for more information). I scheduled the execution of an SSIS package that contains a Script Task via SQL Agent, executed the job, and boom:

“Excellent teaching moment,” I thought as I opened SSIS Catalog Overview and Messages reports in SSMS for the execution. The error is shown here:

Since we were covering Data Integration Lifecycle Management (DILM) in class – and since Brent encourages presenters to share tools, websites, and utilities that they build (and even sell)* – we also took a look at the error using the DILM Suite free and open-source Catalog Reports solution:

I cannot select text from the SSIS Catalog reporting reports built into SSMS. However, I can select the error text from Catalog Reports or from SQL Agent History:

The SSIS Catalog execution error is:

Script Task:Error: There was an exception while loading Script Task from XML: System.Exception: The Script Task “ST_f690ad8e3ea0454d92176ea4235a87e2” uses version 15.0 script that is not supported in this release of Integration Services. To run the package, use the Script Task to create a new VSTA script. In most cases, scripts are converted automatically to use a supported version, when you open a SQL Server Integration Services package in %SQL_PRODUCT_SHORT_NAME% Integration Services.
at Microsoft.SqlServer.Dts.Tasks.ScriptTask.ScriptTask.LoadFromXML(XmlElement elemProj, IDTSInfoEvents events)

The SQL Agent error is:

Executed as user: NT Service\SQLAgent$DEMO. Microsoft (R) SQL Server Execute Package Utility Version 13.0.1601.5 for 64-bit Copyright (C) 2016 Microsoft. All rights reserved. Started: 6:12:58 AM Package execution on IS Server failed. Execution ID: 20258, Execution Status:4. To view the details for the execution, right-click on the Integration Services Catalog, and open the [All Executions] report Started: 6:12:58 AM Finished: 6:13:00 AM Elapsed: 2.125 seconds. The package execution failed. The step failed.

When Did this Happen?

I started seeing this error around the time I installed SSMS v17.2 on my VMs running SQL Server 2016 and SSIS 2016. I suspect the two events – the installation and the errors – may be related but I could be mistaken.

The fix is fairly straightforward: We need to inform SQL Server Data Tools One Designer that we’re using SQL Server 2016. To do that, right-click the SSIS project in Solution Explorer and click Properties. When the SSIS project Property Pages opens, expand the Configuration Properties node on the left and click the General node. Click the TargetServerVersion property dropdown to change the current TargetServerVersion property setting as shown here:

Change this setting to SQL Server 2016 and click the OK button:

You will be prompted:

Click the Yes button. You will be prompted again:

Click the Yes to All button.

You have now reset the TargetServerVersion property of the SSIS project to SQL Server 2016. I did this during the ExpertSSIS class and then redeployed the SSIS project. When I executed the SQL Agent job again, it succeeded:

I Consider This A Bug

In my opinion, installing SQL Server Management Studio should not make breaking changes to the defaults of SQL Server Data Tools so I consider this a bug. It’s a pretty easy bug to identify and fix, though, so I don’t expect the SSIS Development Team to halt current development and rush to fix it. I think someone should take a look at how the defaults of the One Designer TargetServerVersion property work, though.

* Please see A Note About Delivering Content For Brent Ozar

Keep UpJoin my mailing list.

Need HelpEnterprise Data & Analytics delivers SSIS performance tuning and training.

Learn More:
IESSIS1: Immersion Event on Learning SQL Server Integration Services – 2-6 Oct 2017, Chicago

GroupBy Online Conference

A Note About Delivering Content For Brent Ozar

Brent Ozar Unlimited

I recently had the honor and privilege of delivering ExpertSSIS training in cooperation with Brent Ozar Unlimited.

Brent believes people want to learn technology from folks who:

  • Are willing to share their knowledge on their personal or professional blog;
  • Work in the field with the technology – perhaps even own a company that provides consulting services in the field; and
  • Know the technology well enough to build tools – and maybe charge money for some of those tools – to support the technology.

I emailed Brent and asked, “How do you feel about me mentioning my blog? company? products?” Brent’s reply:

“You damn well better mention your blog, company, and products. The students are hiring you for your expertise and they need to know about the work you’re doing. 😀 You can also mention competitors to us, and you’re not encouraged to mention our stuff or anything like that.”

Brent’s response makes sense to me. I concur because I find it practical and Professional. I wish other organizations respected experience by recognizing achievement (and success) in the same manner.

:{>

Learn More:
GroupBy Online Conference

BLObs in SSIS – Part 1 – How Does SSIS Load BLObs?

How Does SSIS Load BLObs?

When SSIS loads Binary Large Objects (BLObs), the Data Flow Task first copies the data from the BLOb storage location to some location and stores the BLOb in a file. When it’s time to write the BLOb value, the Data Flow Task reads the BLOb from the file location and passes it to the destination adapter.

Please note: Two of those operations occur in the file system.

“Where in the File System, Andy?”

That’s an excellent question. If you log the Data Flow Task Pipeline Initialization event,

… the log will inform you that, by default, the Data Flow Task will initialize these value to use the locations found in the TEMP and TMP operating system environment variables:

What’s the default value for the TEMP and TMP operation system environment variables? %USERPROFILE%\AppData\Local\Temp. If you have not changed the TEMP and TMP operating system environment variables, you can open that location in Windows Explorer and, when you execute an SSIS package, you can watch the tmp files being created and deleted when you execute a Data Flow Task that loads BLObs:

There’s a problem with this location: it’s on the same drive as the operating system. The operating system is… busy – it’s running the server. For the same reason you wouldn’t configure data files on the OS drive, you don’t want to configure Data Flow Tasks to use the OS drive for BLOb temporary storage.

How do you change the location? That’s the subject of the next post in this series: BLObs in SSIS – Part 2 – The BLOb-Load Turbo Button.

:{>

 

Keep Up: Join my mailing list.

Need Help? Enterprise Data & Analytics delivers SSIS performance tuning and training.

Learn More:
IESSIS1: Immersion Event on Learning SQL Server Integration Services – 2-6 Oct 2017, Chicago

Why I Do Not Mix High Availability with Reporting Solutions

One way to offload reporting requirements from the Production OLTP (OnLine Transaction Processing) database while providing a near-real-time copy of Production data is to execute reporting and analytics queries against a High Availability solution replica – such as SQL Server Always On.

Learn more about Always On at: Overview of Always On Availability Groups (SQL Server).

“Why Don’t You Mix High Availability and Reporting Solutions, Andy?”

Two reasons:

  1. Indexes
  2. Additional columns

“We Don’t Need Different Indexes for Reporting…”

“… our OLTP database is optimized for OLTP and analytics queries.”

Um, are you sure about that?

I do a little database tuning. It’s not my forte but I’m not ignorant of the principles. Plus I have friends who are really, really good at database performance tuning.

We talk.

Mostly they talk and I listen (because I have a lot to learn more about… well, everything). They say things like, “This database I tuned had way too many indexes. The write performance was being clobbered by a bunch of unnecessary indexes;” or,  “The clustered index was clustered on the wrong column(s) and sub-queries were dying as a result.”

Is your enterprise Production OLTP database impacted by a hybrid indexing strategy? I can help.

Reporting Index Strategy

Indexing for reporting is almost always different than indexing for OLTP. One example is reference data. Before you get all froggy on me and jump to a conclusion that reference data is always a small subset, I present exhibit A: The National Provider Index (NPI) file. At over 5GB and just under 5 million records, this reference dataset is not small. For the sake of brevity I will only mention pre-computed drill-across merged fact tables in advanced star schema designs in passing. You can learn more about them – and star schemas in general – from Christopher Adamson’s (excellent) book, Star Schema:

Beyond reference data, star schemas are simply different beasts when compared to OLTP databases. Many OLTP databases are normalized to some normal form. The most popular is third normal form.  Star schemas reshape and, sometimes, repeat data – a big no-no in third normal form. The star schema isn’t the only alternative for reporting database models, either – there’s also Data Vault which, again, requires a different indexing strategy than an OLTP database. There are other models but this post is long enough.

“Why All The Fuss About Indexes, Andy?”

Replica databases in high availability solutions match the primary database. If you think about it, this must be so in order for the replica to take over in the event of a primary failure. When the primary fails, the last thing you want is for a database that looks and performs differently to take over. They need to be the same or high availability may be compromised.

This “same-ness” requirement includes indexes.

The same goes for mirroring solutions.
The same goes for backup and restore.

Backup and Restore is Decoupled

An advantage of using backup and restore is: the restored database is decoupled from the source. You could backup the OLTP system, restore it to a reporting instance of SQL Server, and then execute scripts to change the indexing strategy so the database is more amenable to analytics queries…

If the database is small enough and/or your enterprise reporting can tolerate the latency.

“We Don’t Need Additional Columns for Analytics Queries…”

“… we’ve added all those columns to our OLTP database and OLTP is unaffected by them.”

Um, are you sure about that?

I’ve seen lots of solutions over the decades: views, functions, computed columns, etc. I haven’t seen any solutions that have zero performance impact. Not. One.

Again, HA solutions require replicas that are identical to the source database (I cannot fathom the overhead required if they didn’t…). This requirement applies to columns as well as indexes.

“What About Hybrid Approaches?”

I’ve seen some interesting hybrid approaches:

These and similar solutions work well because, like backup and restore, the reporting database are decoupled from the source.

Our next considerations are latency tolerance and data integration.

Latency

Find an enterprise that permits analytics queries against the Production OLTP database and ask an analyst if they absolutely must have up-to-the-nanosecond results. As surely as a project manager will ask a developer, “Do we need to add resources?” that analyst will emphatically reply, “Yes!”

Um, are you sure about that?

I like to follow the Data Trail through the enterprise. I ask (pesky) questions like, “How much time elapses between the time this report query is executed and someone acts on the results?” When I hear the answer, I listen for something we data engineers call grain. And not just any old grain. I’m listening for time dimension grain.

I’ve been doing this for decades. No one has ever used the time dimension grain “seconds” when answering my earlier question.

There are near-real-time data integration use cases, such as data signal processing.

Data Integration

Grain plays a vital role in data warehouse and data integration design – especially time dimension grain. Time dimension grain drives aggregation requirements. It’s a fantastic question to ask on Day One. You want to carefully weigh the answer.

I always ask the preferred grain of the time dimension.
I almost always design the data warehouse to match the time dimension grain of the source data.

I can hear you thinking, “When would you not design the data warehouse to match the time dimension grain of the source data, Andy?” That’s an excellent question. My answer is, “It depends.”

“How Do You Do Analytics Data Integration, Andy?”

I bet you  know the answer to that question already. Yep, most of the time SSIS is involved. It may shock you to learn that these days I design more ELT (Extract, Load, and Transform) with the “transform” step built in the relational engine than ETL (Extract, Transform, and Load), especially with SQL Server 2012+. Why? The relational engine often outperforms SSIS Data Flow transformations.

When I execute the “T” in SQL Server, this is almost always accomplished via an SSIS Execute SQL Task. Why? If I’m using an SSIS Data Flow as a “data pump,” I want to log performance metadata (ETL Instrumentation). Since SSIS is my execution engine for the “EL” portion, it’s convenient to collect the same metrics from the “T” part of the process. It’s mighty convenient to collect the metrics in the same place and in the same format, too.

Advantages of Using SSIS

One nice thing about using a data integration platform like SSIS is I have the flexibility to choose my integration approach. I can load TableA using ELT and TableB using ETL.

I can also select different design patterns for different table loads: I can truncate and load smaller reference tables and use an incremental load design pattern to load larger tables.

Integration approach and design patterns apply to calculating additional column values: I may choose to use the relational engine for one aggregation and use .Net in SSIS for another calculation.

Executing SSIS in the SSIS Catalog means logging is automatic.  Runtime metrics are collected, no fuss, no muss.

Automation

Loading a reporting instance relies on SSIS design patterns. If you’re designing a reporting solution for a source database with more than a few tables, consider using Biml to auto-generate SSIS packages.

Not every reporting and analytics database solution requires the level of automation demonstrated in the short (2:32) video shown here, but some will. Spoiler: the video demonstrates performing about 24 hours of work in just over 6 minutes.

Conclusion

High availability solution requirements – combined with analytics and reporting database index strategy, the need for additional reporting columns, and latency constraints – mean many enterprise reporting and analytics solutions require a separate database maintained in some decoupled-from-Production-OLTP manner.

:{>

Learn more:

Expert SSIS Training Online with Brent Ozar Unlimited! – 11-12 Sep 2017
IESSIS1: Immersion Event on Learning SQL Server Integration Services – Chicago, 2-6 Oct 2017!
Enterprise Data & Analytics

Presenting Faster SSIS at SQL Saturday 683 – Charlotte 14 Oct

I’m honored and excited to deliver a brand new presentation at SQL Saturday 683 – Charlotte on 14 Oct: Faster SSIS!

Abstract:

Ever wonder why SSIS runs so slow? Watch SSIS author Andy Leonard as he runs test loads using sample and real-world data and shows you how to tune SQL Server 2016 Integration Services (SSIS 2016) packages.

We’ll start by experimenting with SSIS design patterns to improve performance loading AdventureWorks data. We will implement different change detection patterns and compare execution performance for each. Then, we’ll explain a Data Flow Task’s bottleneck when loading binary large objects – or Blobs.

Finally, we’ll demonstrate a design pattern that uses a Script Component in a Data Flow to boost load performance to MySql, whether on-premises or in the cloud.

I hope to see you there!

Register today!

:{>

Presenting Designing an SSIS Framework at SQL Saturday 681 – Spartanburg 23 Sep

I am honored to present Designing an SSIS Framework at SQL Saturday 681 – Spartanburg 23 Sep!

Abstract:

Enterprise data integration changed with the introduction of the SSIS Catalog. Implementing enterprise data integration often requires functionality beyond the rich features provided by the SSIS Catalog. Building on his previous work in the field of Data Integration Life Cycle Management (DILM) and SSIS Frameworks development, a Data Integration Architect with 10 years of SSIS experience explains the what, why, and how of an SSIS Catalog-integrated Framework that delivers metadata-driven package execution, connections management, and parameter configuration.

In this session, you will learn:
– A method for migrating packages from Development, through the data integration life cycle, to Production without editing SSIS Connection Manager Properties.
– A metadata-driven approach to SSIS package execution.
– A method for customizing enterprise data integration reporting.

I hope to see you there!

Register today!

:{>

Coming Soon: SSIS Catalog Compare Updates

I spent a lot of time this year writing and managing a book project. I will blog about that later, I promise. While I enjoy writing it really cuts into my play time. What do I do when I play? I write software.

This weekend I cracked open the code for SSIS Catalog Compare and began some much-needed refactoring. I noticed a couple things I didn’t like while I was tinkering around in the code. I fixed those things and was inspired to make more changes.

Faster

The next version of SSIS Catalog Compare is a lot faster than previous versions. I was executing code I did not need to execute unless and until certain conditions were met and some of that code was… busy.

In other places it wasn’t so much what I was doing. It was more how and when I was doing it. I cleaned and tightened the code. SSIS Catalog Compare is visibly faster as a result.

Connections

Since the days of pre-release I’ve been wanting to improve the object and the GUI by building a bona fide Connections object. Well I finally did it.

I added a Project Connections node to the Project node, shown here:

Internally, the SSIS Catalog treats Connection Manager properties like Parameters (mostly). The handful of differences made for some interesting testing use cases.

I also added Package Connections node to the Package node, shown here:

Adding Connections to the my Catalog object model doesn’t end with drawing pretty pictures; the new class facilitates additional scripting, as shown here:

Some of you requested this feature: the ability to separately script Connections Overrides (Catalog Literals). The feature is on its way!

This particular function generates a single script for overrides in all package connection managers. There’s facility for generating connection override scripts for each individual package connection manager. And the same functionality is available for project connection managers.

Scripting Improvements

While tinkering with parameter value scripting I encountered an interesting use case: Attempting to script a parameter configured to use the Design-time Default value to the same parameter in another SSIS Catalog configured to use an Override (Literal) or Reference just wasn’t working. Why? I wasn’t “clearing” the parameter.

The use case was compelling and I believe I solved it. Now, when scripting Literals I “clear” all parameters that are not referenced. The same goes for scripting Reference Mappings, I now “clear” all parameters that aren’t overridden with literals.

My thinking is: If you’re generating Reference Mappings, you don’t want me messing with your Literals, and vice versa. I could be wrong about that. Pop over to DILM Suite’s Contact page and drop me your thoughts.

Coming Soon…

My goal is to release the next version of SSIS Catalog Compare in mid-October. Stay tuned!

:{>