The path to competitive advantage is being able to make predictions from Big Data. Therefore, the more you can build predictive analytics into your business processes, the more successful your organization will become. There is no doubt that open-source R is the programming language of choice for predictive analytics, and thanks to Revolution Analytics, R has the enterprise capabilities needed to drive adoption across the organization and for every employee to make data-driven decisions.
Revolution Analytics is to R what the vendor RedHat is to the Linux operating system—a company devoted to enhancing and supporting open-source software for enterprise deployments. For example, Revolution Analytics recently released R Enterprise 7 to meet the performance demands of Big Data whereby R now runs natively within Hadoop and data warehouses. I spoke with David Smith, VP of Marketing at Revolution Analytics to explain how Revolution Analytics has accelerated the adoption of R in the enterprise.
1. What benefits do Revolution Analytics provide to organizations over just using open-source R?
To accelerate the value of Big Data, many products have been developed to make data managed in Hadoop much easier to access and analyze through SQL. First there was Hive, which provides a SQL query abstraction layer by converting SQL queries into MapReduce jobs. More recently, Cloudera announced Impala which bypasses MapReduce to enable interactive queries on data stored in Hadoop using the same variant of SQL that Hive uses. And today, EMC Greenplum announced Pivotal HD, the only high performing, true SQL query engine on top of Hadoop. Don’t be confused by these approaches, as there is a common thread – to leverage Hadoop as a Big Data platform for running SQL queries. The major difference with Pivotal HD is that now there is a single, scalable, flexible, and cost-effective data platform for all of your analytic needs.
I spoke with Greenplum Chief Scientist Milind Bhandarkar to explain this breakthrough SQL interface to Hadoop.
1. How does Pivotal HD provide a true, high performing SQL interface to Hadoop?
OpenChorus Project is the first real attempt to help companies succeed with Big Data. How? We all know that the barrier to success has been a lack of available data science talent and the tools needed to address Big Data analytic challenges. Open sourcing Greenplum Chorus is an attempt to rapidly grow the data science community by giving them a rich analytic platform to easily gain insight, grow and share their skills, and ultimately deliver value with Big Data projects.
Partners, startups, and even individual developers can download the source code and deliver new Chorus-integrated Big Data applications and tools needed for the diverse requirements across industries and business functions. For example, the release of Greenplum Chorus 2.2 at the end of this quarter will include valuable contributions from partners Gnip, Tableau, and Kaggle, enabling Data Scientists to correlate Twitter data into their analysis, leverage advanced Tableau visualizations, and gain access to Kaggle expert Data Scientists.
Check out the interview with Logan Lee, Director of Product Management at Greenplum, about the company’s reasons for releasing the Chorus code and the types of contributions that are expected to create a much needed Data Science movement.