To accelerate the value of Big Data, many products have been developed to make data managed in Hadoop much easier to access and analyze through SQL. First there was Hive, which provides a SQL query abstraction layer by converting SQL queries into MapReduce jobs. More recently, Cloudera announced Impala which bypasses MapReduce to enable interactive queries on data stored in Hadoop using the same variant of SQL that Hive uses. And today, EMC Greenplum announced Pivotal HD, the only high performing, true SQL query engine on top of Hadoop. Don’t be confused by these approaches, as there is a common thread – to leverage Hadoop as a Big Data platform for running SQL queries. The major difference with Pivotal HD is that now there is a single, scalable, flexible, and cost-effective data platform for all of your analytic needs.
I spoke with Greenplum Chief Scientist Milind Bhandarkar to explain this breakthrough SQL interface to Hadoop.
1. How does Pivotal HD provide a true, high performing SQL interface to Hadoop?
The announcement of OpenChorus Project a few months ago provided a glimpse into the upcoming EMC Greenplum Chorus Release 2.2 release and its superb integrations to accelerate Big Data time to value. Chorus Release 2.2 provides a single platform whereby users now gain direct access to filtered and clean Twitter feeds from Gnip, perform advanced analysis faster with the on-demand assistance from expert Kaggle data scientists, and share insights seamlessly through Tableau advanced visualizations.
Chorus 2.2 is now available for free download, with the same code base also available through the OpenChorus Project download. For those of you not familiar with Chorus, it is the only collaborative Data Science platform that streamlines the complex analytic process, enabling users to quickly create their own sandboxes, and easily collaborate around data sets, analysis, and findings. Additionally, open sourcing Chorus brings greater freedom. Anyone can download the source code and get started, modifying and extending it to any environment. This also promotes an ecosystem of applications and startups around Big Data applications, bringing extensibility into the product at a much higher velocity than we would be able to achieve on our own. For example, the release of Greenplum Chorus 2.2 includes valuable contributions from partners I mentioned earlier – Gnip, Kaggle, and Tableau. Have I peaked your interest to download Chorus 2.2? Here is a Q&A I conducted with Logan Lee, Director of Product Management at EMC Greenplum, to prepare you for success.
1. What are the system requirements or pre-requisites for Chorus 2.2?