Most people understand that big data and analytics can have a positive impact on their business. What trips them up is how to make that happen. EMC’s answer to that complex challenge is the EMC Business Data Lake, the industry’s first fully engineered, enterprise-grade data lake that’s redefining big data. For details, check out the virtual launch event.
I spoke with Aidan O’Brien, Senior Director of EMC’s Strategic Big Data Initiative, and asked him why he’s excited about EMC Business Data Lake and why it sets precedence in the world of big data analytics.
1. What are extraordinary outcomes companies may achieve with big data analytics?
Is your organization constrained by 2nd platform data warehouse technologies with limited or no budget to move forward towards 3rd platform agile technologies such as a Data Lake? As an EMC customer you have the advantage of leveraging existing EMC investments to develop a Federation Data Lake at minimal cost. Additionally, the Federation Data Lake will generate healthy returns, as it is packaged up with the expertise needed to immediately execute on data lake uses cases such as data warehouse ETL offloading and archiving.
Hadoop is Everywhere – 99% companies will deploy/pilot Hadoop in 18-24 months according to IDC. These environments will largely be based around standalone servers resulting in added management tasks due to data being spread out across many disk spindles across the data center. With Hadoop clusters quickly expanding, organizations are starting to experience the typical growing pains one can compare to adolescence. This begs the question- should DAS server configuration be the accepted status-quo for Hadoop deployments?
Whether you are getting started with Hadoop or growing your Hadoop deployment, EMC provides a long-term solution for Hadoop through shared storage and VM’s, delivering distinct value to the business in lower TCO and faster time-to-results. I spoke with EMC Technical Advisory Architect Chris Harrold to explain why organizations are now turning to EMC to help transition Hadoop environments into adulthood.
1. Almost every Hadoop deployment is based around the accepted configuration of standalone servers with DAS. What have you seen as issues with this configuration with your customers?
Pivotal HD offers a wide variety of data processing technologies for Hadoop – real-time, interactive, and batch. Add integrated data storage EMC Isilon scale-out NAS to Pivotal HD and you have a shared data repository with multi-protocol support, including HDFS, to service a wide variety of data processing requests. This smells like a Data Lake to me – a general-purpose data storage and processing resource center where Big Data applications can develop and evolve. Add EMC ViPR software defined storage to the mix and you have the smartest Data Lake in town, one that supports additional protocols/hardware and automatically adapts to changing workload demands to optimize application performance.
EMC Hadoop Starter Kit, ViPR Edition, now makes it easier to deploy this ‘smart’ Data Lake with Pivotal HD and other Hadoop distributions such as Cloudera and Hortonworks. Simply download this step-by-step guide and you can quickly deploy a Hadoop or a Big Data analytics environment, configuring Hadoop to utilize ViPR for HDFS, with Isilon hosting the Object/HDFS data service. Although in this guide Isilon is the storage array that ViPR deploys objects to, other storage platforms are also supported – EMC VNX, NetApp, OpenStack Swift and Amazon S3.
I spoke with the creator of this starter kit James F. Ruddy, Principal Architect for the EMC Office of the CTO to explain why every organization should use this starter kit optimize their IT infrastructure for Hadoop deployments.
1. The original EMC Hadoop Starter Kit released last year was a huge success. Why did you create ViPR Edition?