Data Science, Unicorns, and Some Things That Never Change

This post is a follow-up to Mind the Gap where I wrote about Δ, the difference between our perception of ourselves and other’s perception of us. In Mind the Gap I focused on personal and community ramifications of minimizing your Δ. In this post I write about applying these same concepts to data science, analytics, and IT.

Before we proceed, please take 46 seconds and view this old IBM commercial. It’s a great spot that highlights some of the gaps between business and IT solutions. I have to credit Kent Bradshaw. We were discussing this post and he remembered this commercial which is a great example of a disconnect between business and IT. I can hear you thinking, …

“How Does Δ Affect Data Science and Analytics, Andy?”

Jen Underwood Unicorn ETL OLAP quote

In her post titled Analytics Market Commoditization and Consolidation, analytics guru Jen Underwood (blog | Impact Analytix) shares the unicorn image at left – a reference to Lars Nielsen’s (excellent) book Unicorns Among Us – along with the quote:

“The ‘unicorn’ the CEO can’t find is someone that knows old ETL and OLAP.”
– Jen Underwood, in Analytics Market Commoditization and Consolidation [bold emphasis mine]

Why Do ETL and OLAP Remain Important?

Extract, Transform, and Load (ETL) and Online Analytical Processing (OLAP) skills remain important because garbage in stills leads to garbage out. We can now get aggregated and scaled garbage out – supported by stellar graphics engines executing on much faster GPUs. As I’ve shared in several presentations over the years, the data quality of a useless data warehouse often exceeds 99%. Why is the data in the data warehouse (or data lake, data store, data puddle, data closet) useless? Because losses accumulate, gains don’t. Aggregating the data for predictive analytics compounds the problem, amplifying minor variation in data quality into major Δ between expected and actual results.

A Problem

As Jen points out earlier in her Analytics Market Commoditization and Consolidation post (you should read it all – it’s awesome – like all of Jen’s posts!) many analytics solution providers share the “Same look, same marketing story, same saves time and allows users [to] avoid evil IT.”

I can hear some of you thinking, “Are you telling us analytics doesn’t work, Andy?” Goodness no. I’m telling you hype and sales strategy work in the analytics market as well as anywhere. When asked why a solution may not perform to expectations, the #1 response is “your data is not clean.”

NoOnesDataIsCleanQuote

The longest pole in data science is still data integration: data wrangling, munging, etc. The need for high-quality data is never going away.

A Solution

The value of data science and analytics solutions rides on the quality of the data in the data store. There are no workarounds for garbage-in-garbage-out. So how do we improve the quality of the data?

Automate

As I wrote in Why Automate? automation can reduce redundant and repetitive work, thereby mitigating human error. At Enterprise Data & Analytics, we focus on data integration automation. How? Here’s our mission:

AccomplishBetterDataIntegration

We don’t help “users avoid evil IT,” we help IT better serve the enterprise by speeding up data integration development, managing execution, and improving performance.

Contact us. We are here to help.

:{>

Learn More:

Designing an SSIS Framework (recording)
Biml in the Enterprise Data Integration Lifecycle (recording)
From Zero to Biml – 19-22 Jun 2017, London 
IESSIS1: Immersion Event on Learning SQL Server Integration Services – Oct 2017, Chicago

Andy Leonard

andyleonard.blog

Christian, husband, dad, grandpa, Data Philosopher, Data Engineer, Azure Data Factory, SSIS guy, and farmer. I was cloud before cloud was cool. :{>

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.