Hadoop Summit 2015 Reflections

Chris Harrold

Chris Harrold

CTO Big Data Solutions at EMC
Chris is responsible for the development of large-scale analytics solutions for EMC customers around emerging analytics platform technologies. Currently, he is focused on EMC Business Data Lake Solutions and delivering this solution to key EMC customer accounts.

EMC-Reflections-v01

Before the ink has even really dried on HS15 in San Jose I am sitting down in a rare moment of peace to write out some reflections from my experience and what I have seen from the sessions, keynotes, partners, and users here at the show.

Hadoop Gets Real

The most lasting impression I got from the overall theme of the show and the people in attendance was that Hadoop is not an “emerging tool” anymore. The momentum, use cases, and indeed the buzz of attendees was that there is massive adoption and momentum built up in the marketplace. Behind this wave of early adoption is a lot of pent-up demand that is waiting for things to stabilize and become more enterprise ready. Once the tooling around the Hadoop ecosystem is more robust, and the platforms that it runs on are more operational, there is no limit to the demand that this ecosystem can produce.

In counterpoint to this fact, there is another countercurrent of theme that Hadoop is not “all things to all people”, and so there is a lot of discussion around the emergence of the logical successor to Hadoop as the analytics tool of record. Certainly the buzz around Spark is indicative that this is the way of the future and ties into the second theme of the show that I observed in numerous conversations and sessions.

Hadoop Gets Real RealTime

EMC-HistoricData-v01The emphasis on real-time and near-real time capability in the Hadoop ecosystem is, without question, a top thought on people’s minds. This follows the logical analytics maturity curve of: testing – getting serious – operationalize – out-grow.

Attendees were in all 4 phases, but a lot of presentations from many of the early adopter community were definitely focused on this last phase of “we need more than what Hadoop provides, so what’s next?” Real-time is the focus of a lot of organizations as they seek to influence and engage through data-driven applications and change outcomes on the fly, rather than simply searching for insights based on historical data sets.

This change in tooling, and the increased demands on infrastructure was also a key focal point of the show overall. I spoke about this specifically with theCUBE host and SiliconAngle founder John Furrier. If you missed the video, watch it on demand here. Certainly the theme I felt really resonating the most and underpinning all the others was the last big takeaway for me.

DevOps rising

While it has a lot of connotations and nuances, the overarching point of all the sessions was that Hadoop and the emerging ecosystem of advanced analytics tools requires the full embrace of the DevOps mindset. I used the quote in my interview with theCUBE that “The Analytics space is the ‘killer app’ that DevOps has been waiting for to mature.” and this was in evidence all over the show as companies showcased how they have used the DevOps mindset and tools for rapid integration and deployment to streamline and enable advanced analytics environments.

I really enjoyed the inclusion of operations and the need for enterprise-ready platform solutions for analytics that was all over the show. I think the time is NOW for companies to begin researching and falling in with a platform solution for analytics so that they are ready when the business need catches up to them. They also need to make the shift to a data-aware, and DevOps minded culture in order to really capitalize on the momentum that analytics has and use it to harness their own data and create real value and insight.

Destination Data Lake: Accelerating the Big Data Journey

Mona Patel
Mona Patel is a Senior Manager for Big Data Marketing at EMC Corporation. With over 15 years of working with data at The Department of Water and Power, Air Touch Communications, Oracle, and MicroStrategy, Mona decided to grow her career at EMC, a leader in Big Data.

Most people understand that big data and analytics can have a positive impact on their business. What trips them up is how to make that happen. EMC’s answer to that complex challenge is the EMC Business Data Lake, the industry’s first fully engineered, enterprise-grade data lake that’s redefining big data.  For details, check out the virtual launch event.

EBDL_Q215_2015-03-23_v5.2[2]

I spoke with Aidan O’Brien, Senior Director of EMC’s Strategic Big Data Initiative, and asked him why he’s excited about EMC Business Data Lake and why it sets precedence in the world of big data analytics.

1.  What are extraordinary outcomes companies may achieve with big data analytics?

Continue reading

Innovating The Marketing Process With A Data Lake

Mona Patel
Mona Patel is a Senior Manager for Big Data Marketing at EMC Corporation. With over 15 years of working with data at The Department of Water and Power, Air Touch Communications, Oracle, and MicroStrategy, Mona decided to grow her career at EMC, a leader in Big Data.

Like many global companies, EMC depends heavily on a CRM to manage sales and purchasing data about its vast global installed base. Over time, we realized that without big data analytics, this customer data was trapped inside our systems and providing limited value.

marketing

EMC decided that offering analytical capabilities through a data lake architecture would substantially increase the value of this data. To get there, EMC hired Todd Forsythe, EMC Vice President, Corporate Marketing, to create the Marketing Science Lab. I spoke to Todd about why he is so excited about the impact of big data and the Marketing Science Lab on sales and marketing:

1. What is the Marketing Science Lab?

Continue reading

Want To Build A Data Science Team? EMC Offers a Holistic Approach

Mona Patel
Mona Patel is a Senior Manager for Big Data Marketing at EMC Corporation. With over 15 years of working with data at The Department of Water and Power, Air Touch Communications, Oracle, and MicroStrategy, Mona decided to grow her career at EMC, a leader in Big Data.

Many of our customers invest in big data solutions to target their sales prospects better, explore advanced medical research, and make their internal processes more efficient. The biggest obstacle to getting these initiatives out of the gate is the shortage of big data skills within their own firms and across the industry.

To address this skills gap, EMC has developed a thorough data science and big data analytics curriculum for our customers. EMC was one of the first companies to offer data science education with rigorous, live instruction using free and open source tools. As of today, more than 10,000 customers, partners, and college students have attended the training.

data_science_book_top_banner_image_973x300

I spoke with EMC’s David Dietrich, who leads this unique program to discuss his approach to data science education, which differs from more traditional product-oriented education. What I found most interesting is that in addition to David’s work at EMC, he has also helped design big data analytics curricula for Babson College and other universities.  More recently,  David has published a book, Data Science and Big Data Analytics, to help further develop data science skills and expertise in the industry.

1.  Why is EMC pushing so hard to educate and develop data scientists?

As an information company, we’re extremely attuned to the value of big data, which is exploding in both the sheer amount and how organizations in virtually every field and industry are using it to solve critical problems. When EMC acquired our first big data company, Greenplum, several years ago, we quickly became aware that there was a shortage of people who had the data science and business skills to help companies utilize big data.

2.  How is EMC taking a holistic approach to data science education?

We recognize that learning how to use big data technology alone does not ensure success. Senior management must make sure that appropriate people and processes are in place to drive the change and innovation necessary for valuable big data results to occur. To help companies on their journey, we offer courses for data scientists, who execute big data projects, and business executives who sponsor, run and manage them.

Our goal is to educate all levels of an organization so that data scientists and business people understand one another. That way, the organization is able to roll out big data projects with greater adoption and success. In addition to offering courses to our customers, we also work closely with universities and educational institutions to help them develop their own curriculum and programs.

3.  Please describe some of the important skills for aspiring data scientists.

Working in strategy and analytics for the past 20 years, I’ve always been drawn to experimenting with data to solve problems, which is exactly is the mindset you need to tackle big data. Companies often ask me how to go about using massive amounts of structured and unstructured data to solve business problems. How do they know what to choose and ignore? How do they know what algorithms to apply? Our courses encourage a culture of experimentation that leads to answering these questions. We teach our students how to test an idea with data, measure it quantitatively, learn from it and iterate. This test and learn mindset is critical to becoming a talented data scientist and data-driven organization.

4.  What are some of the challenges with evolving into a data-driven organization?

There can be a substantial divide between data scientists and business people who manage and work with them on big data projects. Many business people lack the technical background to understand how the algorithms apply to the problem and how to test ideas with data. And some data scientists may not understand the business context. We’re trying to educate each side so they can get a clearer picture and drive toward common goals. Once you bridge that gap, you can start driving real change, and solving old problems with big data or new information sources that were once unusable.

5.  What should companies expect after they have successfully made the leap to big data?

We’re educating them in how to train and staff a big data team, as well as build processes to be effective and successful. With this approach, companies can more effectively define the business problem, acquire the right data sets, experiment, communicate the results, and finally, operationalize the new processes.