Posts Tagged ‘cluster’

Break the cycle of deploying unwieldy Hadoop infrastructure

Chris Harrold

CTO Big Data Solutions at EMC
Chris is responsible for the development of large-scale analytics solutions for EMC customers around emerging analytics platform technologies. Currently, he is focused on EMC Business Data Lake Solutions and delivering this solution to key EMC customer accounts.

Latest posts by Chris Harrold (see all)

 

We are in a new data-driven age. With the rise in adoption of big data analytics as a decision-making tool comes the need to accelerate time-to-insights and deliver faster innovation informed by these new data-driven insights.

You know what? That’s a lot of mumbo-jumbo. Let’s boil it down to the real issue for IT: the tools that analysts and data science professionals need were not really designed to be enterprise-friendly, and they can be unwieldy to deploy and manage. Specifically, I’m talking about Hadoop. Anything that requires the provisioning and configuration of a multitude of physical servers (that are exactly the same) is always going to be the enemy of speed and reliability. More so when those servers operate as a stand-alone, single-instance solution, without any link to the rest of the IT ecosystem (the whole point of shared nothing). Shared nothing may work for experimentation, but it is a terrible thing to build a business on and to support as an IT operations person.

How do I know this? Because I have been that guy for 25 years!

In order to bridge the gap between data science experimentation and IT operational stability, new approaches are needed to provide operational resiliency without compromising the ability to rapidly deploy new analytical tools and solutions. This speed in deployment is essential to support the needs of developers and data scientists. But the complexity and unwieldy nature of traditional Hadoop infrastructure is a major barrier to success for big data analytics projects.

Consider these questions and see if they sound familiar:

(more…)

Dreaming of Building a 1000 node Hadoop Cluster?

Mona Patel

Senior Manager, Big Data Solutions Marketing at EMC
Mona Patel is a Senior Manager for Big Data Marketing at EMC Corporation. With over 15 years of working with data at The Department of Water and Power, Air Touch Communications, Oracle, and MicroStrategy, Mona decided to grow her career at EMC, a leader in Big Data.

The dream is real for EMC Greenplum as a 1000-node Hadoop Cluster, a.k.a Analytics Workbench, went live May 22, 2012 during EMC World.  When I first heard about this large-scale Analytics Workbench project, I immediately thought how harmful it must be for the environment.  What is the point of creating the world’s largest environment for Hadoop testing and development?  Well the joke is on me because this Big Data platform will facilitate ground-breaking insight to improve the quality of life AND create a greener environment.

I wanted to speak to the person who led the effort in creating this Big Data masterpiece –  Apurva Desai, Sr. Director of Hadoop Engineering at EMC Greenplum.  Apurva worked with our internal systems integration team, led by Gerg Robidoux, to architect the cluster and managed a team to build and test the cluster. It has been rumored that Apurva’s blood is on the cluster since there were many cuts and scrapes suffered while putting the cluster together.  Watch stop motion video clip of the Analytics Workbench being built by Apurva’s team.

Creating a 1000 node Apache Hadoop cluster seems like a dubious task. Why did you take on the project?

(more…)

Follow Dell EMC

Dell EMC Big Data Portfolio

See how the Dell EMC Big Data Portfolio can make a difference for your analytics journey

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Dell EMC Community Network

Participate in the Everything Big Data technical community

Follow us on Twitter