We are in a new data-driven age. With the rise in adoption of big data analytics as a decision-making tool comes the need to accelerate time-to-insights and deliver faster innovation informed by these new data-driven insights.
You know what? That’s a lot of mumbo-jumbo. Let’s boil it down to the real issue for IT: the tools that analysts and data science professionals need were not really designed to be enterprise-friendly, and they can be unwieldy to deploy and manage. Specifically, I’m talking about Hadoop. Anything that requires the provisioning and configuration of a multitude of physical servers (that are exactly the same) is always going to be the enemy of speed and reliability. More so when those servers operate as a stand-alone, single-instance solution, without any link to the rest of the IT ecosystem (the whole point of shared nothing). Shared nothing may work for experimentation, but it is a terrible thing to build a business on and to support as an IT operations person.
How do I know this? Because I have been that guy for 25 years!
In order to bridge the gap between data science experimentation and IT operational stability, new approaches are needed to provide operational resiliency without compromising the ability to rapidly deploy new analytical tools and solutions. This speed in deployment is essential to support the needs of developers and data scientists. But the complexity and unwieldy nature of traditional Hadoop infrastructure is a major barrier to success for big data analytics projects.
Consider these questions and see if they sound familiar: