I see – and have lived – this virtuous cycle in the SQL Server and PASS communities:
A person discovers the Community and is overwhelmed at our openness and genuine willingness to help others. They realize they are not alone.
They learn more and become better at their jobs which, in turn, positively impacts their quality of life.
Some desire to give back to the community, so they develop a presentation and submit it to a User Group or SQL Saturday.
Some are selected to deliver their presentation.
Some presentations are well-received and increase the visibility of the presenter in the community.
As presentations are honed over time, some are used as a springboard to develop and deliver other presentations, further increasing the visibility of the presenter.
Some presenters achieve enough visibility to become a brand.
Some presenters are selected to present at larger events, like the PASS Summit.
Some presenters use their newfound greater visibility and brand awareness to join a consultancy practice or to become independent consultants.
The continued care and feeding of the brand of some consultants drives business growth.
The businesses of some consultants grows to the point where they can become sponsors and exhibitors at events such as User Groups, SQL Saturdays, and – eventually – the PASS Summit.
This cycle can be broken (or quashed) at any point by any number of actions, inactions, missteps, mistakes, and/or competitive overreach. In fact, I promise you will make mistakes and take missteps along the way (ask me how I know), but those mistakes and failures can tear you down or build you into more than you were – and the outcome is 100% your choice.
I advocate for the next generation of presenters. I want to see you engage, learn, share, grow, build their brands, and give back – just like I did.
I recently answered a question on the SQL Community Slack #ssis channel about SSIS and security. Let me begin by stating that SSIS security is complex, made up of many moving parts, and not trivial. If you struggle with SSIS security configuration, you are struggling with one of the harder parts of SSIS.
Security for SSIS packages and projects is managed by a property called “ProtectionLevel.” Here are some facts about the ProtectionLevel property:
The project ProtectionLevel setting must match the ProtectionLevel setting for everypackage in the project.
If a password is supplied, the same password must be supplied for the project and each package in the project.
You may not like these features, but you will have to engineer data integration solutions for SSIS with these features in mind.
“What Does Protection Level… protect, Andy?”
I’m glad you asked. The ProtectionLevel property defines the method SSIS uses to protect values marked as Sensitive. Connection Manager Password properties are, by default, sensitive. SSIS projects developed using SSIS 2012+ may contain package or project parameters. Parameters have a developer-configurable attribute named Sensitive which, not surprisingly, allows the developer to mark a project or package parameter as Sensitive. Project Connection Managers were added in SSIS 2012 and, as we mentioned earlier, the Connection Manager Password property is Sensitive by default (this, along with the ability to mark project parameters as Sensitive, is why we need a project ProtectionLevel property).
There are six ProtectionLevel settings:
Now you may have read the list and viewed the image and thought, “Andy, I don’t see ServerStorage in the image.” You may have experienced a moment where you thought, “There are five Protection Levels!” similar to that moment experienced by Jean-Luc Picard at the end of part 2 of Chain of Command (Star Trek: The Next Generation) where he screamed at his Cardassian torturer, “There are four lights!”
You are not wrong. ServerStorage is not there. But ServerStorage is a valid ProtectionLevel. Promise.
“So, where is ServerStorage, Andy?”
You are nailing the good questions today! The ServerStorage ProtectionLevel setting is the default setting for projects and packages deployed to the SSIS Catalog. When you deploy an SSIS package or project, the Integration Services Deployment Wizard decrypts your package and/or project, and then re-encrypts them using ServerStorage. You can see it in this image – step 3 changes the ProtectionLevel property:
What does this encryption look like?
If you click that image to enlarge it, you will see a binary string. You may ask, “Andy, how do you know the binary string is encrypted?” A just question. </GrimaWormtongue> I know because I’ve looked behind the curtain (some). One SSIS Catalog stored procedure used in the Export ISPAC (more on this in a bit…) functionality is SSISDB.internal.get_project_internal. If you script the internal.get_project_internal stored procedure you will note parameters named @key_name and @certificate_name that are used to build dynamic T-SQL statements to open a symmetric key using decryption by certificate – on or about line 69 of the query (depending on how you scripted it).
Yes, it’s binary. But it is also encrypted.
Now, the important part (and what motivated me to write this post):
If you password-protect your SSIS project, especially using EncryptAllWithPassword to protect intellectual property, your encryption – including your password – disappears forever once you deploy the project to the SSIS Catalog. Anyone with permission to export the project from the SSIS Catalog can see all you desire to hide. How hard is it to export the project? First you need to be a member of the SSIS Admin role. Then, you right-click the project in the SSMS Object Explorer Integration Services Catalogs node, and click Export:
Select a file location and file name for the exported ISPAC file:
Use a zip utility to open the compressed archive (or change the ISPAC file extension to zip, and Windows will decompress it for you):
From here, the package can be copied to disk. Using 7-Zip (or Windows if I changed the extension), I can edit the dtsx file to open the package XML in Notepad:
Here is a side-by-side comparison of the exported XML from the SSIS Catalog and the encrypted file used to develop the same SSIS package:
As you can see, using a password and one of the “WithPassword” Protection Level options will not protect your IP from folks with administrative privileges to your SSIS Catalog.
I can hear you thinking…
“Ok, Andy, You’ve Shown Us the Problem. What’s the Solution?”
I’m glad you asked. To protect intellectual property (IP), I highly recommend you encapsulate said IP in logic that resides outside of your SSIS package. How can you do that? One way is to use a custom assembly coded in .Net. The assembly can be designed to be imported into the Global Assembly Cache (GAC), and from there SSIS can access it from a Script Task (Control Flow) or Script Component (Data Flow Task).
The book is not about the logic required to code your task. The book is about the things you need to know in order to author a custom Visual Studio toolbox item. There’s a non-Production-ready demo that you build throughout the book (which is pretty cool, I think, but I wrote it so I’m biased). I assume you are completely new to Visual Studio software development. The code is in Visual Basic (deal). And I used Visual Studio Community Edition, which is free.
I’m so excited and honored to be a Microsoft Certified Big Data Professional! I enjoyed the courses and learned a lot the past seven months. It was super cool to work with Frank La Vigne – co-host of the Data Driven podcast – on the capstone project. (Any time I get to work with Frank, it’s super cool.)
What’s next? I’ve already started taking more courses! I’m currently working on the Processing Real-Time Data Streams in Azure course – which is an elective in the Big Data Professional Program that I did not take. Frank took it, and while Frank and I were working on the capstone project he whipped up an Azure Event Hub demo and… I was hooked. I signed up before we completed the capstone, even!
One note – and this is just my opinion – I think the Big Data program is misnamed. It should be called the Data Engineering program. So much of what I do day in and day out to integrate (or engineer) data is included in the training. If you’re working with SSIS today and want to learn other data integration / engineering platforms, this is the program for you.
I started answering a question on SQL Community Slack’s #ssis channel and I realized this would be better served as a blog post. The question was about three SSIS Data Flow properties: DefaultBufferSize, Engine Thread and DefaultBufferMaxRows.
I rarely change the EngineThreads property.
DefaultBufferSize and DefaultBufferMaxRows are two ways of managing the size limits of a Data Flow buffer. The two Data Flow Task properties can – and should – be treated as a single property. DefaultBufferSize is the number of bytes per buffer. DefaultBufferMaxRows is the number of rows per buffer. The defaults are 10,485,760 (10M) and 10,000, respectively.
For source rows that are 1K in width it’s a wash because 10,000 rows * 1K == 10M.
Let’s look at some implications of the math(s):
If the source rows are wider than 1K, you will hit the 10M memory limit set by DefaultBufferSize before you encounter the DefaultBufferMaxRows limit.
If the rows are less than 1K wide, you will hit the row count limit set by DefaultBufferMaxRows before you encounter the DefaultBufferSize limit.
Together, these properties make up what I refer to as the Data Flow Task “Buffer Size” property.
When tuning a Data Flow Task I adjust both properties together by a similar factor because, in my little mind (at least), they are one property.
It’s possible to mitigate for data types and I cover some ways to tune for the integration of BLOb data types in my series on Loading BLObs.
Data Size and Shape
As a function of the relationship between hard drive disk speeds (compounded by the fact that current systems were designed to work with spinning platters and read/write heads), RAM, and processor cache, the graph for moving “chunks” of data in any data movement operation (whether data or files) results in a curve*. In general, breaking an integration operation into ten chunks will almost always* perform the total load / move operation faster than loading / moving the same amount in a single operation. There is a point where the number of chunks results in a cursor-y operation. Here the performance curve trends downward (or upward, if you draw the graph like I do in my classes). The top (or bottom) of the curve represents maximum throughput for that size and shape of data containing the source data types.
Something to consider (I will not cover more in this post): The max-throughput can, and often almost always moves over time.
To demonstrate, I’ve created a sample SSIS project in SSDT v15.1.61801.210 for SSIS v14.0.3002.92 (Visual Studio 2017 for SSIS 2017). I select from the AdventureWorks2014 Person.Person table using an OLE DB Source adapter (click to enlarge):
I map to an almost identical table – via an OLE DB Destination adapter – in a test database (named TestDB) on the same instance of SQL Server:
Executing with the defaults (10,000 rows for DefaultBufferMaxRows and 10M for DefaultBufferSize), the Data Flow Task executes in 10.594 seconds:
Dropping the Buffer Size properties an order of magnitude…
… results in an execution time of 6.328 seconds:
For those curious about the effects of queuing, returning the properties to 10,000 and 10,485,760 results in an execution time of 9.61 seconds on my demo VM.
Your mileage may vary.
In this post, I’ve briefly discussed and demonstrated one impact of managing SSIS Data Flow Task performance by adjusting the Buffer Size properties – DefaultBufferMaxRows and DefaultBufferSize.
Intentionally equivocal language in use… caveats abound.
The capstone is the last course requirement (of 10 courses) to complete the Microsoft Professional Program in Big Data. The official Professional Program certificate won’t be available until next month, but I’m excited to complete both the capstone and the professional program.
Although there was some data analysis included in the courses and capstone, I found a lot of data engineering was covered in the curriculum. For people wanting to learn more about Azure offerings for data engineering – including HDInsight, Spark, Storm, and Azure Data Factory – I highly recommend the program.
You can audit the courses, gain the same knowledge, and pass the same tests Frank and I passed – and even participate in a similar capstone project – all for free. You only have to pay to receive a certificate.
What is Intelligent Data Integration? SSIS (SQL Server Integration Services) packages developed using tried and true design patterns, built to participate in a DevOps enterprise practicing DILM (Data Integration Lifecycle Management), produced using Biml (Business Intelligence Markup Language) and executed using an SSIS Framework.
Attend a day of training focused on intelligent data integration delivered by an experienced SSIS consultant who has also led an enterprise team of several ETL developers during multiple projects that spanned 2.5 years. And delivered.
Attendees will learn: – a holistic approach to data integration design. – a methodology for enterprise data integration that spans development through operational support. – how automation changes everything. Including data integration with SSIS.
Topics include: 1. SSIS Design Patterns 2. Executing SSIS in the Enterprise 3. Custom SSIS Execution Frameworks 4. DevOps and SSIS 5. Biml, Biml Frameworks, and Tools
Abstract Ever wonder why SSIS runs so slow? Watch SSIS author Andy Leonard as he runs test loads using sample and real-world data and shows you how to tune SQL Server 2016 Integration Services (SSIS 2016) packages.
We’ll start by experimenting with SSIS design patterns to improve performance loading AdventureWorks data. We will implement different change detection patterns and compare execution performance for each. Then, we’ll explain a Data Flow Task’s bottleneck when loading binary large objects – or Blobs.
Finally, we’ll demonstrate a design pattern that uses a Script Component in a Data Flow to boost load performance to MySql, whether on-premises or in the cloud.
The schedule for the event looks awesome. There are lots of really smart people presenting – plus me!
This is a picture of Kent Bradshaw, Microsoft Certified Data Scientist. Kent and I work together at Enterprise Data & Analytics delivering data engineering, business intelligence, and data-based solutions that help businesses turn their data into actionable information. We do not do this alone, we are surrounded by an excellent team that holds diverse skills, perspectives, and work and life experience.
In this post I want to focus on experience because I believe experience is the most valuable component a consultant brings to a customer.
When Experience Hurts
Experience can hurt if we allow it. We can become locked into that’s-the-way-it’s-done syndrome. There are new frameworks and methodologies introduced every day (literally). It’s too much for any one person to keep up with the latest and greatest and most shiny new stuff.
This is why a team delivers value to customers.
A diverse team – like the team at Enterprise Data & Analytics – is incredibly valuable. We’re not just a bunch of old guys, we are also Millennials. Our younger team members are constantly learning about new technologies and methodologies and bringing them the attention of our older team members. The energy our younger team members bring to our projects is inspiring and amazing, and…
One of the most rewarding parts of my job as Chief Data Engineer is watching our younger folks thrive, grow, and succeed.
I love how our team works together! I love it when we all pile on WebEx to solve a problem for a customer or design part of a data warehouse solution. It’s thrilling! I’m proud beyond words of our team.
Although we haven’t advertised it much, the team at Enterprise Data & Analytics has grown – nearly doubling in 2018 (so far).
Diversity in backgrounds, experiences, and levels of experience helps us overcome the downsides of experience.
When Experience Helps
Kent and I (and others on the Enterprise Data & Analytics team) have delivered many data-related solutions. I like to joke and say, “when we started working with data we had to carve our own chips out of wood!” That’s not entirely accurate, but I learned Motorola 6800 machine code in 1975 (I was 11) and Kent started when punch cards were the norm.
Having lots of experience gives Kent and me perspective. We understand the maturity cycles of technological, corporate-cultural, and even projects. Why? We’ve experienced a number of said cycles. When the inevitable “bumps in the road” occur, we don’t panic or lose perspective. We’ve likely experienced something very similar in the past, made mistakes attempting to recover, and learned the best way forward from those mistakes.
Do we still make mistakes? Yep. But fewer, and we have a deeper, richer well from which to draw experience-based, time-tested solutions. Perhaps most important to our customers, we have the confidence that only comes from recovering from a mistake (or several).
Plus, our more experienced team members have the privilege and honor of sharing our experience with our younger team members. Together, we learn from each other. I love that!
Our more experienced team members bring leadership and management experience to bear on each and every Enterprise Data & Analytics project. My experience leading a team of 40 ETL developers at Unisys gives me a deeper appreciation for the total cost of ownership of any solution (which can dwarf the cost of development – especially short-sighted development).
Snapping It Together
With our diverse backgrounds, experiences, and levels of experience, Enterprise Data & Analytics is able to deliver the best solutions, period.
How may our team serve you? We offer:
Data Warehouse Rescue – is your data warehouse project stalled? We can help.
SQL Server and MySQL database performance tuning, development, and support – both on-premises and in the cloud (AWS and Azure).
SSIS (SQL Server Integration Services) coaching, development, performance tuning, and training.
Biml (Business Intelligence Markup Language) coaching, development, and training.
Coaching is my favorite! We navigate and you fly.
Data science to help you visualize your data, and data engineering to prepare your data for data science and business intelligence.
People often ask me, “Andy, where can I learn SSIS or learn more about it?” Well, you’re at one such location – my blog! I’ve been blogging about SSIS for about 10 years as I type this. There’s a lot posts here about SSIS – that’s the good news. The bad news is: There’s a lot of posts here about SSIS.
Some other places you can learn about SSIS from me:
I hope to see you at one of these events. If you read this blog, come up and introduce yourself. I’m the fat guy with a braided beard!
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.