Revolution Analytics Boosts the Adoption of R in the Enterprise

The path to competitive advantage is being able to make predictions from Big Data. Therefore, the more you can build predictive analytics into your business processes, the more successful your organization will become. There is no doubt that open-source R is the programming language of choice for predictive analytics, and thanks to Revolution Analytics, R has the enterprise capabilities needed to drive adoption across the organization and for every employee to make data-driven decisions.

Revolution Analytics is to R what the vendor RedHat is to the Linux operating system—a company devoted to enhancing and supporting open-source software for enterprise deployments. For example, Revolution Analytics recently released R Enterprise 7 to meet the performance demands of Big Data whereby R now runs natively within Hadoop and data warehouses. I spoke with David Smith, VP of Marketing at Revolution Analytics to explain how Revolution Analytics has accelerated the adoption of R in the enterprise.

1.  What benefits do Revolution Analytics provide to organizations over just using open-source R?

Continue reading

Want to Explore Hadoop, But No Tour Guide? EMC Just Published a Step-By-Step Guide

Are you a VMware Vsphere customer? Do you also own EMC Isilon? If you said yes to both, I have great news for you – you have all the ingredients for the EMC Hadoop Starter Kit (HSK).  In just a few short hours you can spin up a virtualized Hadoop cluster by downloading the HSK step-by-step guide.  Watch the demo below of HSK being used to deploy Hadoop:

Now you don’t have to imagine what Hadoop tastes like because this starter kit is designed to help you execute and discover the potential of Hadoop within your organization. Whether you are new to Hadoop or an experienced Hadoop user, you will want to take advantage of this turnkey solution for the following reasons:

-Rapid provisioning – From the creation of virtual Hadoop nodes to starting up the hadoop services on the cluster, much of the Hadoop cluster deployment can be automated, requiring little expertise on the user’s part.

-High availability – HA protection can be provided through the virtualization platform to protect the single points of failure in the Hadoop system, such as NameNode and JobTracker Virtual Machines.

-Elasticity – Hadoop capacity can be scaled up and down on demand in a virtual environment, thus allowing the same physical infrastructure to be shared among Hadoop and other applications.

-Multi-tenancy – Different tenants running Hadoop can be isolated in separate VMs, providing stronger VM-grade resource and security isolation.

-Portability – Use any Hadoop distribution throughout the Big Data application lifecycle with zero data migration – Apache Open Source, Pivotal HD, Cloudera, Hortonworks.

I spoke with the creator of this starter kit James F. Ruddy, Principal Architect for the EMC Office of the CTO to explain why every organization that uses VMware Vsphere and EMC Isilon should use this starter kit for Big Data projects.

1.  Why did you create the starter kit and what are the best use cases for this starter kit?

Continue reading

NSA Outrage. Are We Really That Unaware?

The minute we happily turn on our electronic devices, we voluntarily expose ourselves. Sure, we think our emails and transactions are encrypted and unreadable, our location services are turned off, and our FB settings do not allow unauthorized users to see our most private thoughts. But the truth of the matter is that we all know, deep down inside, that nothing is foolproof and that there are people and bots out there capable of accessing our personal data. The NSA unfortunately did not have the class and expertise to secretly hack into systems unnoticed like the other bad guys, but instead, asked for it directly which makes it unexpected and worse in the eyes of democracy.

As an optimist, I personally think the U.S. government embracing Big Data is a good thing. Perhaps the whole NSA Prism debacle can be thought of as a New Age Search Warrant, commanding that all necessary data in the universe be collected and analyzed for the greater good. For those of you outraged by this breach in privacy, have you stood by your beliefs and terminated your Verizon cell phone service, deleted your FB account, and eliminated all email communications? My guess is no because you probably don’t actually know the extent of the NSA violation and the digital world is far more interesting than personal privacy. So before you go out protesting or draw conclusions based on the Patriot Act, read this interview with Technology Evangelist and Big Data influencer Theo Priestley to understand exactly how much power Prism holds with your personal data.



1.  What sources of data has the NSA secured in Prism without our consent? And how is this different from corporations and data brokers misusing our personal data?


Continue reading

Dreaming of Building a 1000 node Hadoop Cluster?

The dream is real for EMC Greenplum as a 1000-node Hadoop Cluster, a.k.a Analytics Workbench, went live May 22, 2012 during EMC World.  When I first heard about this large-scale Analytics Workbench project, I immediately thought how harmful it must be for the environment.  What is the point of creating the world’s largest environment for Hadoop testing and development?  Well the joke is on me because this Big Data platform will facilitate ground-breaking insight to improve the quality of life AND create a greener environment.

I wanted to speak to the person who led the effort in creating this Big Data masterpiece –  Apurva Desai, Sr. Director of Hadoop Engineering at EMC Greenplum.  Apurva worked with our internal systems integration team, led by Gerg Robidoux, to architect the cluster and managed a team to build and test the cluster. It has been rumored that Apurva’s blood is on the cluster since there were many cuts and scrapes suffered while putting the cluster together.  Watch stop motion video clip of the Analytics Workbench being built by Apurva’s team.

Creating a 1000 node Apache Hadoop cluster seems like a dubious task. Why did you take on the project?

Continue reading