When I first heard of MADlib, the first things that came to my mind were the comical game and the hip-hop rapper. In the context of Big Data, MADlib is actually an open source project for Magnetic, Agile, Deep (MAD) data analysis, an orthogonal approach to traditional Enterprise Data Warehouses and Business Intelligence. The primary goal of the MADlib is to accelerate innovation in the Data Science community via a shared library of scalable in-database analytics.
One of the strong supporters and contributors to MADlib is EMC Greenplum, as MADlib is currently ported to the Greenplum Database, as well as the PostgreSQL Database. Since I am employed by EMC, I had the luxury of chatting with MADlib Architect Caleb Welton and MADlib Product Manager Gaurav Kumar over coffee in the Greenplum Break-room.
Describe how MADlib is Magnetic, Agile, and Deep?
Yes, Data Scientists do speak. In fact, the Data Scientist I spoke with for this blog piece is articulate, business savvy, and well polished. The term ‘mad scientist’ still applies to Data Scientists, as they are addicted to the iterative process of using knowledge, assumptions, and intuition to generate results from unruly data. The difference is that these information churners are not locked away in a lab, experimenting with data in isolation. Instead, they put themselves in the shoes of the business, and work with them to turn abstract business ideas into to tangible business value such as new revenue streams or improved operational efficiency.
Noelle Sio is a leading member of the EMC Greenplum Data Science Dream Team. My interview with her confirmed that Data Scientists are not just nerds, but rather, cool intellects with a diverse set of technical and interpersonal skills.
What skills are needed to be successful data scientist?
During my first few days at EMC, I was eagerly waiting to meet that one person who could tell me something about Big Data that I already didn’t know. I kept hearing verbose, repetitive concepts like “Big Data is the about Volume, Velocity, and Variety” or “Big Data allows you to harness information for competitive advantage” I wanted something more unique, meaningful and tangible.
Enter Bill Schmarzo.
I like to call him Bill ‘Smart’zo and he is the CTO of Enterprise Information Management and Analytic Service line at EMC. Bill is my personal Big Data Hero because he saved me from the unwanted noise and gave me the meaningful information I was seeking. Oh the Big Data irony!
I interviewed Bill Schmarzo to capture and share with you the nuggets of information I gained from a sea of high speed, diverse Big Data conversations.
You work with organizations in developing Big Data solutions that solve real business problems. What is the biggest misconception these organizations have when it comes to implementing a Big Data solution?