Data Integration is the Foundation

Unless you live under a rock, you’ve seen the buzz about Data Lakes, Big Data, Data Mining, Cloud-tech, and Machine Learning. I watch and read reports from two perspectives: technical and as a consultant.

As a Consultant

If you watch CNBC, you won’t hear discussions about ETL Incremental Load or Slowly Changing Dimensions Design Patterns. You will hear them using words like “cloud” and “big data,” though. That means people who watch and respect the people on CNBC are going to hire consultants who are knowledgeable about cloud technology and Big Data.

As an Engineer

I started working with computers in 1975. Since that time, I believe I’ve witnessed about one major paradigm shift per decade. I believe I am now witnessing two at the same time: 1) A revolution in Machine Learning and all the things it touches (which includes Big Data and Data Lakes); and 2) the Cloud. These two are combining in some very interesting ways. Data Lakes and Big Data appliances and systems are the sources for many systems, Machine Learning and Data Mining solutions are but a couple of their consumers. At the same time, much of this technology and storage is either migrating to the Cloud, or is being built there (and in some cases, only there). But all of this awesome technology depends on something…

Data

In order for Machine Learning or Data Mining to work, there has to be data in the Data Lake or in the Big Data appliance or system. Without data, the Data Lake is dry. Without data, there’s no “Big” in Big Data. How do these solutions acquire data?

It Depends

Some of these new systems have access to data locally. But many of them – most, if I may be so bold – require data to be rounded up from myriad sources. Hence my claim that data integration is the foundation for these new solutions.

What is Data Integration and Why is it Important?

Data integration is the collection of data from myriad, disparate sources into a single (or minimal number of) repository (repositories). It’s “shipping” the data from where it is to someplace “nearer.” Why is this important? Internet connection speeds are awesome these days. I have – literally – 20,000 times more bandwidth than when I first connected to the internet. But modern internet connection speeds are hundreds-to-millions times slower than networks running inside data centers. Computing power – measured in cycles or flops per second – is certainly required to perform today’s magic with Machine Learning. But if the servers must wait hours (or longer) for data – instead of milliseconds? The magic happens in slow-motion. In slow-motion, magic doesn’t look awesome at all.

Trust me, speed matters.

Data integration is the foundation on which most of these systems depend. Some important questions to consider:

  • Are you getting the most out of your enterprise data integration?
  • Could your enterprise benefit from faster access to data – perhaps even near real-time business intelligence?
  • How can you improve your enterprise data integration solutions?

:{>

Learn more:

Enterprise Data & Analytics
Stairway to Integration Services
IESSIS1: Immersion Event on Learning SQL Server Integration Services
EnterpriseDNA Training

Andy Leonard

andyleonard.blog

Christian, husband, dad, grandpa, Data Philosopher, Data Engineer, Azure Data Factory, SSIS guy, and farmer. I was cloud before cloud was cool. :{>

3 thoughts on “Data Integration is the Foundation

  1. Good point, well made.
    Trick is how one gets that across to the phb without sounding reactionary or defeatist.  (Yeah, I’m reading Stalingrad, finally… so many parallels to data management)

  2. Antony Beevors rather large one.  I’d read Harrisons 900 Days – about the siege of Leningrad some years ago, and they both really make one think.  
    Puts a little perspective into your day.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.