How to build a Data Swamp!

For those of you looking to build a data swamp, here’s what you need to do…

  1. First start with technology, don’t worry about the Business. That can come later. Make sure your technology can store as much data as possible. You are going to need it, because we are going to put ALL of our data here. One day it may be useful!
  2. Get as much data as possible…Sales, Marketing, Finance, HR…get all of it. One day it may be useful!
  3. Don’t worry about understanding the data. Understanding the data will take time, so don’t worry about this step. Just load it!

Ok…so I’ll skip steps 4 and 5…as the Data Swamp is of course a BAD IDEA!

Yesterday I watched a great Webinar by Andrew White at Garter, on Data Governance. That’s what inspired me to write this blog (you can find the link below to watch the recording).

As the term ‘Data Lake’ is used more frequently, I see the same problems occurring that have existed over the past 10 years working with Enterprise Data Warehousing solutions. The ambition to have ALL of your enterprise data in a single place is a noble one, and I get it! The reality of ever getting there however, is very different.

With the cloud, it’s become easier and cheaper to store large volumes of data. But data is NOT VALUABLE. If you can leverage the data to drive results in your business, then that’s where the value is.

So what are your corporate or departmental objectives? How will they drive better growth, increased profits, improved customer satisfaction?

Simon Sinek‘s book title ‘Start with WHY’ sums up the problem. If we don’t know WHY we are extracting, loading, transforming data, or WHY we are spending money loading our data into the Data Lake or EDW (Enterprise Data Warehouse) then should we really be doing it?

Gartner’s value triangle places ‘Business Outcomes’ right at the top! This is the first place we should begin. If you don’t understand the ‘Business Outcomes’ that you are seeking to achieve, then you really don’t understand the WHY.

If your platform is flexible and your data models (for structured data) are agile and extendable, then start with a specific high value business problem, and deliver value quickly. Then extend your model, and introduce new data to solve the next problem.

Most of my clients will tell you that I’ve always been a huge advocate of ‘Business Outcomes’ first, and getting to the WHY! Andrew’s Webinar summed this up superbly, and anyone involved in or starting a Data Lake, EDW or MDM (Master Data Management) programme should take a look…before it’s too late!

I’d love to hear your thoughts and experiences, so please comment below.

Reference:
https://www.gartner.com/webinar/3873115/player?commId=317939&channelId=5502&srcId=1-4411694165&webinarType=UpcomingEvent
White, A (2018) Effective Data and Analytics Governance – Finally! : Gartner

Leave a Reply

Your email address will not be published. Required fields are marked *

Sign up below for...

Free industry insights

Popup

"*" indicates required fields

Name*