Big Data: Understanding How Data Powers Big Business is yet another Big Data book to hit the market. What makes this book unique? There is practical advice and hands on exercises so that you end up with a Big Data action plan unique to your business after completion of the book. I spoke to the author, EMC’s own Big Data’s preeminent expert William Schmarzo, to explain the goals of his book and why organizations grappling with Big Data should pick it up.
1. What makes you a Big Data expert in providing practical advice for developing Big Data strategies?
Traditional BI makes it very difficult for people in the business who know the story behind the data to actually gain direct access to the data. Instead, they submit data requirements to IT and when IT does finally deliver the data, it is typically only a subset or incomplete data, and in the wrong format. When data gets lost in translation, business users become frustrated, abandon analytics altogether, and operate on hunches and guesses. Fortunately Tableau solves this problem through its Self Service BI paradigm whereby any user in the organization can quickly gain direct access to the data needed, with flexibility to create any visualization imaginable (goodbye Excel!). But wait, there is more. Tableau has partnered with Pivotal to add a social element to these Self Service BI capabilities, whereby people in the business, data scientists, and IT can come together as a team to collaborate around data sets, visualizations, predictive models, and more to uncover new and better insight. The result – Big Data No Longer Lost in Translation.
Click inside to watch 11 Tableau customers talk about how Self Service BI has changed the way they do business
I spoke with Ted Wasserman, a Product Manager at Tableau to learn more about the value of their technology and partnership with Pivotal.
1. Let’s first talk about Tableau. Describe what part of the analytical process Tableau fits in and what problems it solves?
To accelerate the value of Big Data, many products have been developed to make data managed in Hadoop much easier to access and analyze through SQL. First there was Hive, which provides a SQL query abstraction layer by converting SQL queries into MapReduce jobs. More recently, Cloudera announced Impala which bypasses MapReduce to enable interactive queries on data stored in Hadoop using the same variant of SQL that Hive uses. And today, EMC Greenplum announced Pivotal HD, the only high performing, true SQL query engine on top of Hadoop. Don’t be confused by these approaches, as there is a common thread – to leverage Hadoop as a Big Data platform for running SQL queries. The major difference with Pivotal HD is that now there is a single, scalable, flexible, and cost-effective data platform for all of your analytic needs.
I spoke with Greenplum Chief Scientist Milind Bhandarkar to explain this breakthrough SQL interface to Hadoop.
1. How does Pivotal HD provide a true, high performing SQL interface to Hadoop?
The announcement of OpenChorus Project a few months ago provided a glimpse into the upcoming EMC Greenplum Chorus Release 2.2 release and its superb integrations to accelerate Big Data time to value. Chorus Release 2.2 provides a single platform whereby users now gain direct access to filtered and clean Twitter feeds from Gnip, perform advanced analysis faster with the on-demand assistance from expert Kaggle data scientists, and share insights seamlessly through Tableau advanced visualizations.
Chorus 2.2 is now available for free download, with the same code base also available through the OpenChorus Project download. For those of you not familiar with Chorus, it is the only collaborative Data Science platform that streamlines the complex analytic process, enabling users to quickly create their own sandboxes, and easily collaborate around data sets, analysis, and findings. Additionally, open sourcing Chorus brings greater freedom. Anyone can download the source code and get started, modifying and extending it to any environment. This also promotes an ecosystem of applications and startups around Big Data applications, bringing extensibility into the product at a much higher velocity than we would be able to achieve on our own. For example, the release of Greenplum Chorus 2.2 includes valuable contributions from partners I mentioned earlier – Gnip, Kaggle, and Tableau. Have I peaked your interest to download Chorus 2.2? Here is a Q&A I conducted with Logan Lee, Director of Product Management at EMC Greenplum, to prepare you for success.
1. What are the system requirements or pre-requisites for Chorus 2.2?