Data Wrangling Concepts: Latency and Staleness

Latency is the amount of time between data creation and load. Staleness is the amount of time since data in a target was refreshed.

In many data integration use cases, a latency or staleness of one day, one week, or even one month is acceptable. Enterprise Data & Analytics helps enterprise data integration teams reduce latency and staleness. We often help customers seek near-real-time solutions – or near 0 latency and staleness.

How do we approach a solution?

  1. Tuning
  2. Re-architect
  3. Redesign

Tuning

If the source schemata are stable, we begin tuning at the sources and destinations. Most targets and many sources are data stores. If we’re using SQL Server Integration Services (SSIS) for data integration, many find it odd that we begin at the source and destinations before tuning the SSIS packages. We start with the databases because we often (not always) see more improvement from database tuning than from SSIS tuning.

If the source schemata are not stable, we examine how the enterprise is managing this volatility.

Re-architect

Some enterprises manage volatile source schemata manually. Employing a Biml-Driven Architecture (BDA) automates some of the manual effort. A BDA can often alleviate all of the manual work.

BDA can also facilitate loading a common target with dozens (or hundreds) of sources.

Perhaps most intriguing BDA can reduce to time-to-analyze data, allowing data scientists to begin experimenting with the data sooner and, hopefully, expediting results.

Redesign

Support and maintenance is too often overlooked when considering latency and staleness. Consider while your team is troubleshooting an issue with the enterprise data integration solution, the data in the target is becoming more stale and latency is increasing at a rate of one minute per minute. Employing design patterns is one way to reduce support and maintenance overhead. If most (or all) of your data integration solution employs the same (or similar) design pattern(s), understanding one package means team members understand many (or all) package(s). Implementing a BDA once design patterns are known adds even more efficiency to enterprise data integration support and maintenance.

Software design best practices cannot be overemphasized. Separation of concerns, decoupling, coding to contracts, testing, source control, and Data Integration Lifecycle Management (DILM) save more time and money than business owners realize.

Conclusion

At Enterprise Data & Analytics, we help enterprises build faster data integration solutions, build data integration solutions faster, and make data integration execution more manageable. If you are interested in learning more, please contact us.

:{>

Learn More:
Biml in the Enterprise Data Integration Lifecycle (Password: BimlRocks)
From Zero to Biml – 19-22 Jun 2017, London 
IESSIS1: Immersion Event on Learning SQL Server Integration Services – 2-6 Oct 2017, Chicago

Tools:
SSIS Framework Community Edition
Biml Express Metadata Framework
SSIS Catalog Compare
DILM Suite

Andy Leonard

andyleonard.blog

Christian, husband, dad, grandpa, Data Philosopher, Data Engineer, Azure Data Factory, SSIS guy, and farmer. I was cloud before cloud was cool. :{>

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.