Destination Data Lake: Accelerating the Big Data Journey

Most people understand that big data and analytics can have a positive impact on their business. What trips them up is how to make that happen. EMC’s answer to that complex challenge is the EMC Business Data Lake, the industry’s first fully engineered, enterprise-grade data lake that’s redefining big data.  For details, check out the virtual launch event.


I spoke with Aidan O’Brien, Senior Director of EMC’s Strategic Big Data Initiative, and asked him why he’s excited about EMC Business Data Lake and why it sets precedence in the world of big data analytics.

1.  What are extraordinary outcomes companies may achieve with big data analytics?

There are many well-understood examples already. A great one is Rolls Royce. Instead of selling their high-end jet engines, they practically give them away and sell an associated multi-year data-driven service contract.

The service contract helps customers minimize downtime if there’s a problem with components in the engine. Each component has sensors that capture performance and health data along with the aircraft’s coordinates. That’s all fed in real time to an analytics system at the company’s global service center. If any component performance abnormality is detected, the system kicks off an automated logistics process that ensures that the replacement part is shipped to the right gate even before the plane lands at its destination.

This returns planes to the air faster so airlines can get back to making money. In the meantime, Rolls Royce has found an innovative way to strengthen their customer relationships.

What’s extraordinary about this, and many other similar examples, is that business value has so clearly shifted from the inanimate objects that customers produce to the data about that product.

2.  You speak to customers often about the power of big data analytics, but what are their key challenges in embracing big data?

The challenges vary based on an organization’s maturity level with big data. While every customer’s big data journey is unique for various business and technology reasons, we generally group companies into one of three buckets.

First are companies in the exploratory phase. They’ve heard about big data and are trying to work out what it is, how it’s different from business intelligence, what skills they need, and so on. The big challenge for these folks is figuring out how to identify the right opportunity to get started.

Some companies are a little farther along and have big data projects springing up all over the place. Their prime challenge is how to show meaningful value to the business from their various initiatives.

Then there are companies achieving big results with big data. Their challenges relate to making the necessary changes across people, process, data, and technology so that transformation and improved business performance stick.

3.  Why do you think EMC’s approach is appropriate for these companies?

What excites me about our approach is that we have an engineered solution and a range of services offerings that can help companies address the challenges at each phase of their big data journey.

For example, when first starting out on a big data journey, companies usually want to understand exactly what is possible and then identify and focus on key use cases. That’s what EMC’s Big Data Vision Workshop is all about. It gets IT and business stakeholders on the same page so they can prioritize use cases that are feasible and expected to deliver meaningful outcomes for their business.

Companies trying to get their hands dirty and build skills in data science, machine learning and rapid application development can use our Proof of Value Service. This helps them deploy a small but viable analytics project to demonstrate ROI for a target use case.

And for more mature companies struggling to manage and scale their big data infrastructure, we offer EMC’s Technology Onboarding Service. This includes consulting and deployment services to move them quickly to the EMC Business Data Lake.

We also see a number of more mature companies already knowing the business application they want.  For these customers, we look to engage them via the Pivotal Labs group. That engagement also tends to lead to the implementation of the underpinning EMC Business Data Lake.

4.  How does the Federation Business Data Lake accelerate adoption of big data analytics to achieve these kinds of results?

EMC Business Data Lake enables more people to benefit from big data quickly and effectively. We see customers struggling for weeks and months to instantiate these complex environments.

The EMC Business Data Lake is a platform that delivers greater standardization to help people stand up them up more quickly. Yet it also provides flexibility by letting people select the different technology products they need to deliver on a particular use case. As much as we’re seeking to make the job of the IT operator easier, the ultimate goal is to provide a self-service big data environment for the wide variety of people involved in big data, including data scientists, application developers, and line-of-business analysts.

5.  What makes the EMC Business Data Lake unique?

Clearly, being the first fully engineered, enterprise grade business data lake in the industry is important, as is its ability to bring together data, analytics and applications. To me, what makes the EMC Business Data Lake stand out the most is the way it combines our top Federation technologies with the ecosystem of third-party products.

Because it’s built on a platform that embraces third-party technologies, new products can be easily embedded into the platform and made available to developers or data scientists almost immediately. Being able to evolve big data analytics environments over time as technology changes is critical. Traditional, physical infrastructures simply aren’t agile enough to keep up with that pace of technology change.

The prospect of EMC and the Federation being able to keep up to date with the rapid change in the big data market is why I’m so excited about the EMC Business Data Lake.

EMC Hadoop Starter Kit: Creating a Smarter Data Lake

Pivotal HD offers a wide variety of data processing technologies for Hadoop – real-time, interactive, and batch. Add integrated data storage EMC Isilon scale-out NAS to Pivotal HD and you have a shared data repository with multi-protocol support, including HDFS, to service a wide variety of data processing requests. This smells like a Data Lake to me – a general-purpose data storage and processing resource center where Big Data applications can develop and evolve. Add EMC ViPR software defined storage to the mix and you have the smartest Data Lake in town, one that supports additional protocols/hardware and automatically adapts to changing workload demands to optimize application performance.

EMC Hadoop Starter Kit, ViPR Edition, now makes it easier to deploy this ‘smart’ Data Lake with Pivotal HD and other Hadoop distributions such as Cloudera and Hortonworks. Simply download this step-by-step guide and you can quickly deploy a Hadoop or a Big Data analytics environment, configuring Hadoop to utilize ViPR for HDFS, with Isilon hosting the Object/HDFS data service.  Although in this guide Isilon is the storage array that ViPR deploys objects to, other storage platforms are also supported – EMC VNX, NetApp, OpenStack Swift and Amazon S3.

I spoke with the creator of this starter kit James F. Ruddy, Principal Architect for the EMC Office of the CTO to explain why every organization should use this starter kit optimize their IT infrastructure for Hadoop deployments.

1.  The original EMC Hadoop Starter Kit released last year was a huge success.  Why did you create ViPR Edition?

Continue reading

Revolution Analytics Boosts the Adoption of R in the Enterprise

The path to competitive advantage is being able to make predictions from Big Data. Therefore, the more you can build predictive analytics into your business processes, the more successful your organization will become. There is no doubt that open-source R is the programming language of choice for predictive analytics, and thanks to Revolution Analytics, R has the enterprise capabilities needed to drive adoption across the organization and for every employee to make data-driven decisions.

Revolution Analytics is to R what the vendor RedHat is to the Linux operating system—a company devoted to enhancing and supporting open-source software for enterprise deployments. For example, Revolution Analytics recently released R Enterprise 7 to meet the performance demands of Big Data whereby R now runs natively within Hadoop and data warehouses. I spoke with David Smith, VP of Marketing at Revolution Analytics to explain how Revolution Analytics has accelerated the adoption of R in the enterprise.

1.  What benefits do Revolution Analytics provide to organizations over just using open-source R?

Continue reading

Want to Explore Hadoop, But No Tour Guide? EMC Just Published a Step-By-Step Guide

Are you a VMware Vsphere customer? Do you also own EMC Isilon? If you said yes to both, I have great news for you – you have all the ingredients for the EMC Hadoop Starter Kit (HSK).  In just a few short hours you can spin up a virtualized Hadoop cluster by downloading the HSK step-by-step guide.  Watch the demo below of HSK being used to deploy Hadoop:

Now you don’t have to imagine what Hadoop tastes like because this starter kit is designed to help you execute and discover the potential of Hadoop within your organization. Whether you are new to Hadoop or an experienced Hadoop user, you will want to take advantage of this turnkey solution for the following reasons:

-Rapid provisioning – From the creation of virtual Hadoop nodes to starting up the hadoop services on the cluster, much of the Hadoop cluster deployment can be automated, requiring little expertise on the user’s part.

-High availability – HA protection can be provided through the virtualization platform to protect the single points of failure in the Hadoop system, such as NameNode and JobTracker Virtual Machines.

-Elasticity – Hadoop capacity can be scaled up and down on demand in a virtual environment, thus allowing the same physical infrastructure to be shared among Hadoop and other applications.

-Multi-tenancy – Different tenants running Hadoop can be isolated in separate VMs, providing stronger VM-grade resource and security isolation.

-Portability – Use any Hadoop distribution throughout the Big Data application lifecycle with zero data migration – Apache Open Source, Pivotal HD, Cloudera, Hortonworks.

I spoke with the creator of this starter kit James F. Ruddy, Principal Architect for the EMC Office of the CTO to explain why every organization that uses VMware Vsphere and EMC Isilon should use this starter kit for Big Data projects.

1.  Why did you create the starter kit and what are the best use cases for this starter kit?

Continue reading