Hadoop Summit 2015 Reflections

Chris Harrold

Chris Harrold

CTO Big Data Solutions at EMC
Chris is responsible for the development of large-scale analytics solutions for EMC customers around emerging analytics platform technologies. Currently, he is focused on EMC Business Data Lake Solutions and delivering this solution to key EMC customer accounts.

EMC-Reflections-v01

Before the ink has even really dried on HS15 in San Jose I am sitting down in a rare moment of peace to write out some reflections from my experience and what I have seen from the sessions, keynotes, partners, and users here at the show.

Hadoop Gets Real

The most lasting impression I got from the overall theme of the show and the people in attendance was that Hadoop is not an “emerging tool” anymore. The momentum, use cases, and indeed the buzz of attendees was that there is massive adoption and momentum built up in the marketplace. Behind this wave of early adoption is a lot of pent-up demand that is waiting for things to stabilize and become more enterprise ready. Once the tooling around the Hadoop ecosystem is more robust, and the platforms that it runs on are more operational, there is no limit to the demand that this ecosystem can produce.

In counterpoint to this fact, there is another countercurrent of theme that Hadoop is not “all things to all people”, and so there is a lot of discussion around the emergence of the logical successor to Hadoop as the analytics tool of record. Certainly the buzz around Spark is indicative that this is the way of the future and ties into the second theme of the show that I observed in numerous conversations and sessions.

Continue reading

Is It All About The Data Scientist?

Mona Patel

Mona Patel

Senior Manager, Big Data Solutions Marketing at EMC
Mona Patel is a Senior Manager for Big Data Marketing at EMC Corporation. With over 15 years of working with data at The Department of Water and Power, Air Touch Communications, Oracle, and MicroStrategy, Mona decided to grow her career at EMC, a leader in Big Data.

The answer is no. It is a holistic, team effort that involves expanding the mind and skill set of executives, business users, IT implementers, data scientists, and application developers to all work collectively to define a strategy and derive newer insight from big data.

And that is why EMC is so heavily focused on breaking down organizational silos and training professionals to become data scientists or at least think like data scientists, transforming these individuals into data savvy professionals working towards the same goal – competitive advantage.

I spoke to Louis Frolio, Advisory Technical Ed Consultant for EMC Big Data Solutions, how as part of a team in EMC Education Services is creating a massive professional transformation through a MOOC – Massive Open Online Course. Data Lakes for Big Data MOOC gives you an opportunity to become a data savvy professional and take on a big data or data science role in your organization at absolutely no cost.

The course kicked off May 11, but you still have plenty of time to enroll and complete the course to earn a certificate before June 8. The top 500 students (based on cumulative grade for the MOOC) will receive an electronic copy of the Data Science book just released by EMC Education Services.

1.  What is a MOOC and what is the goal of this education format? Why was it used for this course?

Continue reading

Destination Data Lake: Accelerating the Big Data Journey

Mona Patel

Mona Patel

Senior Manager, Big Data Solutions Marketing at EMC
Mona Patel is a Senior Manager for Big Data Marketing at EMC Corporation. With over 15 years of working with data at The Department of Water and Power, Air Touch Communications, Oracle, and MicroStrategy, Mona decided to grow her career at EMC, a leader in Big Data.

Most people understand that big data and analytics can have a positive impact on their business. What trips them up is how to make that happen. EMC’s answer to that complex challenge is the EMC Business Data Lake, the industry’s first fully engineered, enterprise-grade data lake that’s redefining big data.  For details, check out the virtual launch event.

EBDL_Q215_2015-03-23_v5.2[2]

I spoke with Aidan O’Brien, Senior Director of EMC’s Strategic Big Data Initiative, and asked him why he’s excited about EMC Business Data Lake and why it sets precedence in the world of big data analytics.

1.  What are extraordinary outcomes companies may achieve with big data analytics?

Continue reading

EMC Hadoop Starter Kit: Creating a Smarter Data Lake

Mona Patel

Mona Patel

Senior Manager, Big Data Solutions Marketing at EMC
Mona Patel is a Senior Manager for Big Data Marketing at EMC Corporation. With over 15 years of working with data at The Department of Water and Power, Air Touch Communications, Oracle, and MicroStrategy, Mona decided to grow her career at EMC, a leader in Big Data.

Pivotal HD offers a wide variety of data processing technologies for Hadoop – real-time, interactive, and batch. Add integrated data storage EMC Isilon scale-out NAS to Pivotal HD and you have a shared data repository with multi-protocol support, including HDFS, to service a wide variety of data processing requests. This smells like a Data Lake to me – a general-purpose data storage and processing resource center where Big Data applications can develop and evolve. Add EMC ViPR software defined storage to the mix and you have the smartest Data Lake in town, one that supports additional protocols/hardware and automatically adapts to changing workload demands to optimize application performance.

EMC Hadoop Starter Kit, ViPR Edition, now makes it easier to deploy this ‘smart’ Data Lake with Pivotal HD and other Hadoop distributions such as Cloudera and Hortonworks. Simply download this step-by-step guide and you can quickly deploy a Hadoop or a Big Data analytics environment, configuring Hadoop to utilize ViPR for HDFS, with Isilon hosting the Object/HDFS data service.  Although in this guide Isilon is the storage array that ViPR deploys objects to, other storage platforms are also supported – EMC VNX, NetApp, OpenStack Swift and Amazon S3.

I spoke with the creator of this starter kit James F. Ruddy, Principal Architect for the EMC Office of the CTO to explain why every organization should use this starter kit optimize their IT infrastructure for Hadoop deployments.

1.  The original EMC Hadoop Starter Kit released last year was a huge success.  Why did you create ViPR Edition?

Continue reading