Hadoop Summit 2015 Reflections

Chris Harrold

Chris Harrold

CTO Big Data Solutions at EMC
Chris is responsible for the development of large-scale analytics solutions for EMC customers around emerging analytics platform technologies. Currently, he is focused on EMC Business Data Lake Solutions and delivering this solution to key EMC customer accounts.


Before the ink has even really dried on HS15 in San Jose I am sitting down in a rare moment of peace to write out some reflections from my experience and what I have seen from the sessions, keynotes, partners, and users here at the show.

Hadoop Gets Real

The most lasting impression I got from the overall theme of the show and the people in attendance was that Hadoop is not an “emerging tool” anymore. The momentum, use cases, and indeed the buzz of attendees was that there is massive adoption and momentum built up in the marketplace. Behind this wave of early adoption is a lot of pent-up demand that is waiting for things to stabilize and become more enterprise ready. Once the tooling around the Hadoop ecosystem is more robust, and the platforms that it runs on are more operational, there is no limit to the demand that this ecosystem can produce.

In counterpoint to this fact, there is another countercurrent of theme that Hadoop is not “all things to all people”, and so there is a lot of discussion around the emergence of the logical successor to Hadoop as the analytics tool of record. Certainly the buzz around Spark is indicative that this is the way of the future and ties into the second theme of the show that I observed in numerous conversations and sessions.

Hadoop Gets Real RealTime

EMC-HistoricData-v01The emphasis on real-time and near-real time capability in the Hadoop ecosystem is, without question, a top thought on people’s minds. This follows the logical analytics maturity curve of: testing – getting serious – operationalize – out-grow.

Attendees were in all 4 phases, but a lot of presentations from many of the early adopter community were definitely focused on this last phase of “we need more than what Hadoop provides, so what’s next?” Real-time is the focus of a lot of organizations as they seek to influence and engage through data-driven applications and change outcomes on the fly, rather than simply searching for insights based on historical data sets.

This change in tooling, and the increased demands on infrastructure was also a key focal point of the show overall. I spoke about this specifically with theCUBE host and SiliconAngle founder John Furrier. If you missed the video, watch it on demand here. Certainly the theme I felt really resonating the most and underpinning all the others was the last big takeaway for me.

DevOps rising

While it has a lot of connotations and nuances, the overarching point of all the sessions was that Hadoop and the emerging ecosystem of advanced analytics tools requires the full embrace of the DevOps mindset. I used the quote in my interview with theCUBE that “The Analytics space is the ‘killer app’ that DevOps has been waiting for to mature.” and this was in evidence all over the show as companies showcased how they have used the DevOps mindset and tools for rapid integration and deployment to streamline and enable advanced analytics environments.

I really enjoyed the inclusion of operations and the need for enterprise-ready platform solutions for analytics that was all over the show. I think the time is NOW for companies to begin researching and falling in with a platform solution for analytics so that they are ready when the business need catches up to them. They also need to make the shift to a data-aware, and DevOps minded culture in order to really capitalize on the momentum that analytics has and use it to harness their own data and create real value and insight.

Is It All About The Data Scientist?

Mona Patel
Mona Patel is a Senior Manager for Big Data Marketing at EMC Corporation. With over 15 years of working with data at The Department of Water and Power, Air Touch Communications, Oracle, and MicroStrategy, Mona decided to grow her career at EMC, a leader in Big Data.

The answer is no. It is a holistic, team effort that involves expanding the mind and skill set of executives, business users, IT implementers, data scientists, and application developers to all work collectively to define a strategy and derive newer insight from big data.

And that is why EMC is so heavily focused on breaking down organizational silos and training professionals to become data scientists or at least think like data scientists, transforming these individuals into data savvy professionals working towards the same goal – competitive advantage.

I spoke to Louis Frolio, Advisory Technical Ed Consultant for EMC Big Data Solutions, how as part of a team in EMC Education Services is creating a massive professional transformation through a MOOC – Massive Open Online Course. Data Lakes for Big Data MOOC gives you an opportunity to become a data savvy professional and take on a big data or data science role in your organization at absolutely no cost.

The course kicked off May 11, but you still have plenty of time to enroll and complete the course to earn a certificate before June 8. The top 500 students (based on cumulative grade for the MOOC) will receive an electronic copy of the Data Science book just released by EMC Education Services.

1.  What is a MOOC and what is the goal of this education format? Why was it used for this course?

Over the past several years, a new trend in Education Delivery has slowly gained traction. “Classroom Flipping”, aka MOOC (Massive Open Online Course), involves delivering courses to very large numbers of students (upwards of 40,000), all at once via the internet. The term “flipping” is used because the students consume the lectures and materials at their leisure and use the MOOC platform to do homework, assignments, and interact with other students and instructors – The opposite of a traditional classroom experience.

In a MOOC, instructors develop the curriculum but do not teach it live. The educational material can be video, interactive eLearnings, literature, plus much more. The content is delivered asynchronously, meaning the students can consume the material at their convenience. What makes a MOOC so great is that it provides a place for students to engage and learn from each other.

My role as an instructor in the “Data Lakes for Big Data” MOOC is to engage students in the forums, answer questions, and foster collaboration between the students. The Emerging Technology Education group within EMC Education Services chose a MOOC for its “Data Lake for Big Data” course because it fits into a blended learning model approach to technical education and training.

2.  Who should attend this course and what value will this education bring to the organization after the 4 week course?

This course was designed with a single demographic in mind – Anyone who wants to know more about Big Data and Data Lakes. There are no pre-requisites for this course outside of a genuine interest in these emerging technologies and a desire to learn.

At the conclusion of this course students will have a solid foundation of knowledge to build upon which can include the EMC Certified Data Science and Big Data Analytics track. More importantly, students will be fully equipped to recognize and speak to big data challenges within their organizations. We fully expect that students completing this course will be able to demonstrate the value of big data and data science to their management teams.

3.  What is your role at EMC and what other EMC Education is available for burgeoning data scientists and professionals looking to expand their skill set for big data?

I have a pretty cool job. I am part of EMC’s Big Data Solution’s team where my primary objective is to distill and impart the seemingly complex subjects of Big Data Analytics and Data Science to both technical and non-technical audiences. This is made easier for me because I am also embedded in the Data Science team in the Emerging Technologies Education group within EMC Education Services. There I help develop technical training material for EMC’s Certified Data Science curriculum. If I could sum up what I love and do best it is this: I craft and narrate the story of Big Data Analytics and Data Science for everyone.

EMC Education Services has a full complement of courses for Big Data Analytics and Data Science. This includes certification in data science at the Associate level and, most recently, at the Specialist level. There is also training on Hadoop administration, Hadoop programming, and much more.

4.  Week 3 of the MOOC dives into the EMC Federation Business Data Lake. Explain to the readers what value this solution brings to Data Scientists, IT Implementers, and Executives.

The EMC Federation Business Data Lake (FBDL) is leading the vanguard in Big Data Analytics enablement, bringing together data, analytics, and applications into a fully-engineered solution. IT can deploy faster, data scientists gain insights faster, and executives see value faster.

With the rise of Big Data, we are seeing more and more software applications being defined by data and data analytics. With FBDL, it is possible to quickly build the data driven applications needed to empower organizations to extract more insights and realize more revenue from the data that is already in their data centers.

5.  What feedback have you received so far from the MOOC? Is it too late to enroll?

The feedback from the “Data Lakes for Big Data” MOOC has been nothing short of extraordinary! The overwhelming sentiment from the students has been extremely positive. A common thread amongst the feedback is that the material is extremely interesting and that the students are learning things about big data of which they were unaware. They are thrilled that this course is being offered now and that it is free. It is the right course at the right time.

It is not too late to join the MOOC. Students can complete the required tasks at any time during the four week period. The only caveat is that all the tasks have to be completed by the close of the course. Although, we are now in Week 2 and new students may feel that they may not be able to catch up, I can assure you that this is not the case. Week 1 can be completed in as little as 2-3 hours. We purposely kept week 1 light in material to give students time to familiarize themselves with the MOOC environment.

I strongly recommend that you sign up today at: educast.emc.com!

When Big Data Becomes More Valuable Than Your Products/Services

Mona Patel
Mona Patel is a Senior Manager for Big Data Marketing at EMC Corporation. With over 15 years of working with data at The Department of Water and Power, Air Touch Communications, Oracle, and MicroStrategy, Mona decided to grow her career at EMC, a leader in Big Data.

A recent global study across 1000 executives conducted by EMC and Capgemini reports the following: “64% of respondents said that big data is changing traditional business boundaries and enabling non-traditional providers to move into their industry, and over half (53%) expect to face increased competition from start-ups enabled by data.”

My take: Eventually any company expecting to compete effectively must become a software company, where data is the primary asset driving business strategy and revenue. Going a step further, by monetizing big data, companies are creating new revenue streams that will actually eclipse the value of a company’s existing products or services over time. This is supported by the EMC and Capgemini study as well: “Among our respondents, 63% consider that the monetization of data could eventually become as valuable to their organizations as their existing products and services.”


The question is how to find gold in the flood of data flowing in and out of the organization to compete effectively, especially against new digital startups. To answer this question, I spoke to Capgemini’s Global Vice President for Big Data Steve Jones who strongly believes the answer lies within the power of a business data lake.

1.  As an industry leader in big data, what is so exciting about a data lake solution?

Continue reading

Destination Data Lake: Accelerating the Big Data Journey

Mona Patel
Mona Patel is a Senior Manager for Big Data Marketing at EMC Corporation. With over 15 years of working with data at The Department of Water and Power, Air Touch Communications, Oracle, and MicroStrategy, Mona decided to grow her career at EMC, a leader in Big Data.

Most people understand that big data and analytics can have a positive impact on their business. What trips them up is how to make that happen. EMC’s answer to that complex challenge is the EMC Business Data Lake, the industry’s first fully engineered, enterprise-grade data lake that’s redefining big data.  For details, check out the virtual launch event.


I spoke with Aidan O’Brien, Senior Director of EMC’s Strategic Big Data Initiative, and asked him why he’s excited about EMC Business Data Lake and why it sets precedence in the world of big data analytics.

1.  What are extraordinary outcomes companies may achieve with big data analytics?

Continue reading