The path to competitive advantage is being able to make predictions from Big Data. Therefore, the more you can build predictive analytics into your business processes, the more successful your organization will become. There is no doubt that open-source R is the programming language of choice for predictive analytics, and thanks to Revolution Analytics, R has the enterprise capabilities needed to drive adoption across the organization and for every employee to make data-driven decisions.
Revolution Analytics is to R what the vendor RedHat is to the Linux operating system—a company devoted to enhancing and supporting open-source software for enterprise deployments. For example, Revolution Analytics recently released R Enterprise 7 to meet the performance demands of Big Data whereby R now runs natively within Hadoop and data warehouses. I spoke with David Smith, VP of Marketing at Revolution Analytics to explain how Revolution Analytics has accelerated the adoption of R in the enterprise.
1. What benefits do Revolution Analytics provide to organizations over just using open-source R?
Revolution R Enterprise creates a culture to promote analytics-driven decision-making by providing a consistent, validated platform so developers to come together, collaborate, and reuse code to deploy applications across the enterprise.
Revolution R Enterprise is also architected for Big Data analytics, eliminating out-of-memory conditions, and enabling users to analyze data without moving it, build more accurate models using large data sets, and deploy models into production without recoding.
To protect existing investments in databases and applications, we give organizations the choice to deploy the same code in a wide variety of data management platforms—such as Hadoop, enterprise data warehouses, grids, clusters, servers and workstations. We also provide a web services API so you can easily deliver or embed predictive results to existing enterprise applications and BI tools.
We provide training in the R language to expand the Big Data talent pool in the organization. For example, we can quickly train SAS users to use Revolution R Enterprise through courses available in classroom, virtual, self-paced and blended learning setups. Our Professional Services team provides clients with expertise and education for services ranging from initial proof of value, application migration, and technical staffing.
2. Statistical tools such as SAS, SPSS, and even MS Excel have been popular and around for a while. Why is the R programming language now the statistical tool of choice for analysts and data scientists over these other incumbent tools?
Because it is open source and free, it has been widely adopted, especially in Universities offering Data Mining and Statistics courses. So now you have a steady stream of students schooled in R making it the language of choice by default.
But beyond it being free and open-source, there are additional advantages R brings to the table. It is a full-fledged programming language, not a GUI or drag and drop interface with a collection of procedures. While there is a bit of learning curve for R programming, users are far more productive with R than with these other systems. I’ve heard from users that they can perform tasks within hours in R programming compared to weeks with SAS.
Also, R programming is popular due to its extensibility, driving innovation and adoption. The community has contributed approximately 5000 freely available “packages” of documented R programs and data to CRAN (the Comprehensive R Archive Network). The advantage of this is that it is highly likely that you may never actually need to develop a function to do what you need from scratch.
3. Can you provide an example of new statistical methods that can be created in R that is not available in these other tools?
Ensemble methods are a general class of statistical prediction techniques originating in R and have only been available in R since their inception. The purpose of ensemble techniques is to aggregate of multiple learned models with the goal of improving accuracy. Ensemble techniques really took off in 2006 after being used to win the Netflix competition, improving Netflix’s prediction accuracy by 10%.
R also offers powerful data visualizations techniques, enabling users to tell unique stories about the complex data being analyzed. “Ggplot” is a commonly used visualization package only available in R, for example, and not available in any other tool such as MS Excel.
4. What are the prerequisites for learning R?
It is a fourth-generation object-oriented programming language so it is not a big leap to learn R programming if you have already learned languages such as C, Java, and Python. Because R programming is open source there is an added advantage of free training resources online developed by the community.
5. What value does Revolution Analytics provide for R programming?
Revolution Analytics provides a more productive and high performance environment for organizations developing in the R language. On the Windows platform we provide Visual Studio based IDE for the R language (called Revolution R Enterprise) so users can develop, debug and manage code better. We also make run R faster without changing any of the code — up to 20 times faster in some cases — by linking R with multi-threaded libraries. And to address the in-memory constraints of the R language, we write extensions to standard algorithms such as decision trees and linear regression to eliminate data size restrictions and make them run faster for Big Data on servers, in Hadoop and in databases.
6. Similar to R’s popularity, Python is also gaining traction for data analysis and there is industry buzz that Python may take over popularity due to its general-purpose nature. What is your take on R versus Python?
If you look at the data, it’s pretty clear that both R and Python are growing rapidly, but the evidence favors R growing faster for data science applications. I’ve written more on this topic in a recent post at the Revolutions blog.Tags: analytics, big data, data mining, data science, data scientist, EMC, Greenplum, hadoop, open source, predictive analytics, predictive model, R language, R programming, revolution analytics