We are in a new data-driven age. With the rise in adoption of big data analytics as a decision-making tool comes the need to accelerate time-to-insights and deliver faster innovation informed by these new data-driven insights.

You know what? That’s a lot of mumbo-jumbo. Let’s boil it down to the real issue for IT: the tools that analysts and data science professionals need were not really designed to be enterprise-friendly, and they can be unwieldy to deploy and manage. Specifically, I’m talking about Hadoop. Anything that requires the provisioning and configuration of a multitude of physical servers (that are exactly the same) is always going to be the enemy of speed and reliability. More so when those servers operate as a stand-alone, single-instance solution, without any link to the rest of the IT ecosystem (the whole point of shared nothing). Shared nothing may work for experimentation, but it is a terrible thing to build a business on and to support as an IT operations person.

How do I know this? Because I have been that guy for 25 years!

In order to bridge the gap between data science experimentation and IT operational stability, new approaches are needed to provide operational resiliency without compromising the ability to rapidly deploy new analytical tools and solutions. This speed in deployment is essential to support the needs of developers and data scientists. But the complexity and unwieldy nature of traditional Hadoop infrastructure is a major barrier to success for big data analytics projects.

Consider these questions and see if they sound familiar:

  • Do you struggle with under-utilized resources in your big data analytics clusters?
  • Do you continually try and balance the growth in compute and storage for your Hadoop cluster environment?
  • Do you want to be able to extend your analytics toolbox beyond Hadoop but don’t want to manage the infrastructure that goes with it?

There are better ways to operationalize Hadoop and to achieve the functionality you need for the business, without sacrificing operational consistency and the ability to create and reconfigure new big data tools on demand. There are much more effective ways to deploy your big data infrastructure and manage your tools. There is a way to avoid being caught in the continual trap of rebalancing and rebuilding your Hadoop platforms over and over again.

EMC is fortunate to share this vision with BlueData and we have the same goals in mind: creating operational, enterprise-ready Hadoop using the time-tested principles of shared storage and virtualized infrastructure. Our Big Data team invites you to a BrightTalk webinar on December 8th to discuss this vision, explore solutions to the challenges outlined above, and share real-world examples from our customers’ deployments.

EMC-BigData-Webinar on Hadoop

Click this link to get to the webinar

Chris Harrold

CTO Big Data Solutions at EMC
Chris is responsible for the development of large-scale analytics solutions for EMC customers around emerging analytics platform technologies. Currently, he is focused on EMC Business Data Lake Solutions and delivering this solution to key EMC customer accounts.

Latest posts by Chris Harrold (see all)

Tags: , , , , , , , , , ,

Leave a Comment

Comments are moderated. Dell EMC reserves the right to remove any content it deems inappropriate, including but not limited to spam, promotional and offensive comments.

Follow Dell EMC

Dell EMC Big Data Portfolio

See how the Dell EMC Big Data Portfolio can make a difference for your analytics journey

Dell EMC Community Network

Participate in the Everything Big Data technical community