How Many SSIS Packages Should I Create?

Introduction
 
I received the following questions (paraphrased) from a friend:
 
I’m building an ETL process with SSIS in which I pull from about 40 tables in DB2. I put the data into staging tables in a SQL Server database.
Is it better to have one package with 40 data flows, 40 packages with one data flow each or something in between? Or should there be one data flow with a lot of source->destination modules? What the advantages and disadvantages of each approach?
 
Excellent Questions!
 
   The answer is: “It depends.” That’s one of the reasons you’re having trouble finding a definitive answer online. It’s not a bad question, so let me walk through the factors that would drive my design decisions:
 
   First, are there dependencies? Are there foreign key relationships in the source database? Is referential integrity enforced (keep in mind there are non-database ways to enforce RI)? If there are dependencies or referential integrity exists (whether it’s enforced or not), I load the parents first and then the children.
 
   Second, I think about Operations: How much visibility do I want into this process? If it’s going to take three hours to load, I probably want to break that down into several packages so I can at least see steps in the process completing. This will also assist in troubleshooting: “Package1.dtsx failed” isn’t a pleasant message to troubleshoot if it contains 40 Data Flow Tasks. “Load Orders and OrderDetails.dtsx failed” is a much better starting place.
 
   You’ve inspired another blog post – thank very much! Please keep the questions coming. And let me know what you decide.
 
:{> Andy

Andy Leonard

andyleonard.blog

Christian, husband, dad, grandpa, Data Philosopher, Data Engineer, Azure Data Factory, SSIS guy, and farmer. I was cloud before cloud was cool. :{>

7 thoughts on “How Many SSIS Packages Should I Create?

  1. I would add that SSIS/ETL programming shares similiar design issues with conventional procedural language programming: Support reuse, architect for reliability, maintainability, testability.
    For example, in the past I have had to create SSIS processing that I was certain could be reused with simple variable or config file changes. In those cases, I put that code in a seperate package so it can be reused easily in many ETL solutions.

  2. Andy,
    In the example above where you would have several packages, do you have a "main package" that runs all the other packages sequentially or do you schedule each package to run independently?
    Thanks,
    Bruce

  3. Hi Bruce,
      That is an excellent question and the answer is "it depends." I definitely automate some way to execute the packages in a prescribed order. I sometimes use a Parent Package (main package) to execute the Child Packages, sometimes I use a scheduler like SQL Agent and call each Child Package in its own job step.
    :{> Andy

  4. Hi Andy .. I’ve almost similar question.
    I need to extract data from 15 tables.Few tables are big tables (having millions of rows). There is no FK constraint between the tables in source DB. Do i need to add all the flows in one data flow task or should i break them for improved performance.
    Thanks

  5. This is a very old page but still I want to understand what if my package with 8 data flow task(with some transformations) takes 10 mins to complete(as you have mentioned if it take 3 hrs). but still if I want to go ahead with parent child design then is it ok?. Personally I prefer modular and clean code approach so i would like to use single package per table approach.
    What would you suggest?  

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.