Chris is responsible for the development of large-scale analytics solutions for EMC customers around emerging analytics platform technologies. Currently, he is focused on EMC Business Data Lake Solutions and delivering this solution to key EMC customer accounts.
We are in a new data-driven age. With the rise in adoption of big data analytics as a decision-making tool comes the need to accelerate time-to-insights and deliver faster innovation informed by these new data-driven insights.
You know what? That’s a lot of mumbo-jumbo. Let’s boil it down to the real issue for IT: the tools that analysts and data science professionals need were not really designed to be enterprise-friendly, and they can be unwieldy to deploy and manage. Specifically, I’m talking about Hadoop. Anything that requires the provisioning and configuration of a multitude of physical servers (that are exactly the same) is always going to be the enemy of speed and reliability. More so when those servers operate as a stand-alone, single-instance solution, without any link to the rest of the IT ecosystem (the whole point of shared nothing). Shared nothing may work for experimentation, but it is a terrible thing to build a business on and to support as an IT operations person.
How do I know this? Because I have been that guy for 25 years!
In order to bridge the gap between data science experimentation and IT operational stability, new approaches are needed to provide operational resiliency without compromising the ability to rapidly deploy new analytical tools and solutions. This speed in deployment is essential to support the needs of developers and data scientists. But the complexity and unwieldy nature of traditional Hadoop infrastructure is a major barrier to success for big data analytics projects.
Consider these questions and see if they sound familiar:
Before the ink has even really dried on Hadoop Summit15 in San Jose I am sitting down in a rare moment of peace to write out some reflections from my experience and what I have seen from the sessions, keynotes, partners, and users here at the show.
Hadoop Gets Real
The most lasting impression I got from the overall theme of the show and the people in attendance was that Hadoop is not an “emerging tool” anymore. The momentum, use cases, and indeed the buzz of attendees was that there is massive adoption and momentum built up in the marketplace. Behind this wave of early adoption is a lot of pent-up demand that is waiting for things to stabilize and become more enterprise ready. Once the tooling around the Hadoop ecosystem is more robust, and the platforms that it runs on are more operational, there is no limit to the demand that this ecosystem can produce.
In counterpoint to this fact, there is another countercurrent of theme that Hadoop is not “all things to all people”, and so there is a lot of discussion around the emergence of the logical successor to Hadoop as the analytics tool of record. Certainly the buzz around Spark is indicative that this is the way of the future and ties into the second theme of the show that I observed in numerous conversations and sessions.
The opinions and interests expressed on Dell EMC employee blogs are the employees' own and do not necessarily represent Dell EMC's positions, strategies or views. Dell EMC makes no representation or warranties about employee blogs or the accuracy or reliability of such blogs. When you access employee blogs, even though they may contain the Dell EMC logo and content regarding Dell EMC products and services, employee blogs are independent of Dell EMC and Dell EMC does not control their content or operation. In addition, a link to a blog does not mean that EMC endorses that blog or has responsibility for its content or use.