Is It All About The Data Scientist?

Mona Patel
Mona Patel is a Senior Manager for Big Data Marketing at EMC Corporation. With over 15 years of working with data at The Department of Water and Power, Air Touch Communications, Oracle, and MicroStrategy, Mona decided to grow her career at EMC, a leader in Big Data.

The answer is no. It is a holistic, team effort that involves expanding the mind and skill set of executives, business users, IT implementers, data scientists, and application developers to all work collectively to define a strategy and derive newer insight from big data.

And that is why EMC is so heavily focused on breaking down organizational silos and training professionals to become data scientists or at least think like data scientists, transforming these individuals into data savvy professionals working towards the same goal – competitive advantage.

I spoke to Louis Frolio, Advisory Technical Ed Consultant for EMC Big Data Solutions, how as part of a team in EMC Education Services is creating a massive professional transformation through a MOOC – Massive Open Online Course. Data Lakes for Big Data MOOC gives you an opportunity to become a data savvy professional and take on a big data or data science role in your organization at absolutely no cost.

The course kicked off May 11, but you still have plenty of time to enroll and complete the course to earn a certificate before June 8. The top 500 students (based on cumulative grade for the MOOC) will receive an electronic copy of the Data Science book just released by EMC Education Services.

1.  What is a MOOC and what is the goal of this education format? Why was it used for this course?

Over the past several years, a new trend in Education Delivery has slowly gained traction. “Classroom Flipping”, aka MOOC (Massive Open Online Course), involves delivering courses to very large numbers of students (upwards of 40,000), all at once via the internet. The term “flipping” is used because the students consume the lectures and materials at their leisure and use the MOOC platform to do homework, assignments, and interact with other students and instructors – The opposite of a traditional classroom experience.

In a MOOC, instructors develop the curriculum but do not teach it live. The educational material can be video, interactive eLearnings, literature, plus much more. The content is delivered asynchronously, meaning the students can consume the material at their convenience. What makes a MOOC so great is that it provides a place for students to engage and learn from each other.

My role as an instructor in the “Data Lakes for Big Data” MOOC is to engage students in the forums, answer questions, and foster collaboration between the students. The Emerging Technology Education group within EMC Education Services chose a MOOC for its “Data Lake for Big Data” course because it fits into a blended learning model approach to technical education and training.

2.  Who should attend this course and what value will this education bring to the organization after the 4 week course?

This course was designed with a single demographic in mind – Anyone who wants to know more about Big Data and Data Lakes. There are no pre-requisites for this course outside of a genuine interest in these emerging technologies and a desire to learn.

At the conclusion of this course students will have a solid foundation of knowledge to build upon which can include the EMC Certified Data Science and Big Data Analytics track. More importantly, students will be fully equipped to recognize and speak to big data challenges within their organizations. We fully expect that students completing this course will be able to demonstrate the value of big data and data science to their management teams.

3.  What is your role at EMC and what other EMC Education is available for burgeoning data scientists and professionals looking to expand their skill set for big data?

I have a pretty cool job. I am part of EMC’s Big Data Solution’s team where my primary objective is to distill and impart the seemingly complex subjects of Big Data Analytics and Data Science to both technical and non-technical audiences. This is made easier for me because I am also embedded in the Data Science team in the Emerging Technologies Education group within EMC Education Services. There I help develop technical training material for EMC’s Certified Data Science curriculum. If I could sum up what I love and do best it is this: I craft and narrate the story of Big Data Analytics and Data Science for everyone.

EMC Education Services has a full complement of courses for Big Data Analytics and Data Science. This includes certification in data science at the Associate level and, most recently, at the Specialist level. There is also training on Hadoop administration, Hadoop programming, and much more.

4.  Week 3 of the MOOC dives into the EMC Federation Business Data Lake. Explain to the readers what value this solution brings to Data Scientists, IT Implementers, and Executives.

The EMC Federation Business Data Lake (FBDL) is leading the vanguard in Big Data Analytics enablement, bringing together data, analytics, and applications into a fully-engineered solution. IT can deploy faster, data scientists gain insights faster, and executives see value faster.

With the rise of Big Data, we are seeing more and more software applications being defined by data and data analytics. With FBDL, it is possible to quickly build the data driven applications needed to empower organizations to extract more insights and realize more revenue from the data that is already in their data centers.

5.  What feedback have you received so far from the MOOC? Is it too late to enroll?

The feedback from the “Data Lakes for Big Data” MOOC has been nothing short of extraordinary! The overwhelming sentiment from the students has been extremely positive. A common thread amongst the feedback is that the material is extremely interesting and that the students are learning things about big data of which they were unaware. They are thrilled that this course is being offered now and that it is free. It is the right course at the right time.

It is not too late to join the MOOC. Students can complete the required tasks at any time during the four week period. The only caveat is that all the tasks have to be completed by the close of the course. Although, we are now in Week 2 and new students may feel that they may not be able to catch up, I can assure you that this is not the case. Week 1 can be completed in as little as 2-3 hours. We purposely kept week 1 light in material to give students time to familiarize themselves with the MOOC environment.

I strongly recommend that you sign up today at:!

Destination Data Lake: Accelerating the Big Data Journey

Mona Patel
Mona Patel is a Senior Manager for Big Data Marketing at EMC Corporation. With over 15 years of working with data at The Department of Water and Power, Air Touch Communications, Oracle, and MicroStrategy, Mona decided to grow her career at EMC, a leader in Big Data.

Most people understand that big data and analytics can have a positive impact on their business. What trips them up is how to make that happen. EMC’s answer to that complex challenge is the EMC Business Data Lake, the industry’s first fully engineered, enterprise-grade data lake that’s redefining big data.  For details, check out the virtual launch event.


I spoke with Aidan O’Brien, Senior Director of EMC’s Strategic Big Data Initiative, and asked him why he’s excited about EMC Business Data Lake and why it sets precedence in the world of big data analytics.

1.  What are extraordinary outcomes companies may achieve with big data analytics?

Continue reading

EMC CIO Takes On Big Data Problems With Big Data Analytics

Mona Patel
Mona Patel is a Senior Manager for Big Data Marketing at EMC Corporation. With over 15 years of working with data at The Department of Water and Power, Air Touch Communications, Oracle, and MicroStrategy, Mona decided to grow her career at EMC, a leader in Big Data.

Every second of every day, IT generates enormous amounts of data around operational activity – system behavior, application performance, user actions, security activity, and more. Instead of viewing this data explosion as a Big Data problem, IT views it as opportunity to use Big Data solutions such as IT Operations Analytics to improve the quality of their services.


For example, 75% IT professionals surveyed recently said that they believe that IT Operations Analytics are able to transform data into relevant insights into actionable plans for improvement. I spoke with EMC CIO Vic Bhagat to describe how EMC is embracing Big Data for IT Operations Analytics to solve critical problems affecting EMC IT Operations and customers.

1.  What are the biggest problems faced by IT Operations Management at EMC and how were these problems addressed before the world of Big Data?

IT generates enormous amounts of data when monitoring complex, rapidly growing and changing IT infrastructures and the applications. The challenge for IT Operations Management is to leverage this data to build an adaptive system that is more proactive, and less reactive. The more the system can learn from the data, the better it can identify variances and problems areas in a timely manner to help IT fix issues before it negatively impacts the business such as downtime or poor performance.

In the past, we relied on traditional business intelligence and data warehousing systems to gain intelligence or insight based on historical trends. Now, with analytics, we can uncover important variables and modify them to predict an outcome. And, the more data we collect at a detailed level, the more accurate we can be.

2.  How does Big Data analytics change the game to address these problems more effectively?

It cuts down the time to gain insight. The most heavily used word after ‘selfie’ is now ‘data lake’. Everyone wants to build a data lake since it provides the right architecture and capabilities to cut down the cycle time in deriving newer, predictive insight, and then continuously integrating these results back into our business processes and decision-making. At EMC, we are moving away from data warehouses to a data lake architecture enabling us to not only gain faster insight, but also gain newer insight by bringing together and analyzing both structured and unstructured data.

For example, in a data warehouse you manage structured data such as part numbers, bay numbers, disk numbers, chassis numbers, and more. In a data lake you can manage all of this structured data in addition to unstructured data such as user manuals for each system and component. Let’s now apply this data lake solution to a use case – we continuously monitor the health of a customer’s infrastructure with our call home systems. We can now leverage a data lake with more data sets to not only make more accurate component failure predictions, but we can also provide the relevant information needed from user manuals to fix the problem in a timely manner so the customer experiences no downtime.

3.  What is EMC’s IT Operations Analytics solution leveraging Big Data technologies and techniques?

We are leveraging the entire Pivotal Big Data Suite to ingest and store all of the structured and unstructured data – Pivotal Gemfire XD, Pivotal HD, Pivotal HAWQ, and Pivotal Greenplum Database. Our Data Scientists are then able to apply advanced analytic techniques to the data they need using their choice of tools which are MadLib, R, and Python. This Big Data environment will be part of a wider business data lake strategy, where all enterprise data will be managed, accessed, and used equally by all business applications, not just IT Operations. Only a few legacy or specialized applications will standalone.

4. What benefits has EMC gained from this Big Data solution?

The benefits are enormous and can be extracted from both business and technical benefits. Building predictive models and predicting imminent system failure reduces downtime and the number of alerts and enables us to identify the real issues faster, reducing the cycle for decision making and taking corrective action. This improves our performance, productivity and value we gain from Big Data.

But we are only scratching the surface. The more we can optimize our Big Data environment so that it is elastic and accessible, the faster and more precise Data Scientists will be in solving problems. For example, we can now predict MS Exchange outages two hours in advance.

5. One of the biggest barriers to getting value from Big Data is the skills shortage. How does EMC IT Operations address this issue?

EMC had the foresight to build Centers of Excellence (COE) around the globe, producing the expertise and skills needed to transition into the realm of Data Science. We are fortunate to leverage talent within the company, but also leverage the COE to attract and acquire new Data Science talent outside the company.

6. What books are you currently reading on your Kindle or if you are still paper based like me, what books are stacked on your nightstand?

I’m Kindle based, so I read periodicals such as Techmeme and Engadget. Since we are a company that is data and digital driven, I am reading a book called ‘Leading Digital’. I want help lead this digital revolution at EMC and this book provides great examples of how digital makes significant changes in how a company operates and kills bureaucracy.

Big Data Pains & Gains From A Real Life CIO

Mona Patel
Mona Patel is a Senior Manager for Big Data Marketing at EMC Corporation. With over 15 years of working with data at The Department of Water and Power, Air Touch Communications, Oracle, and MicroStrategy, Mona decided to grow her career at EMC, a leader in Big Data.

What does it take to make CIO Magazine’s Top 100 List? Big Data victory is one of them.
Michael Cucchi, Sr Director of Product Maketing at Pivotal, had the privilege to speak with one of the winners – EMC CIO Vic Bhagat. Discussing the pains and gains of EMC’s Big Data initiative, I have put together a summary of this interview below.  EMC IT’s approach to Big Data is exactly what the EVP Federation enables organizations to do – first collect any and all data in a Data Lake, deploy the right analytic tool that your people know how to use to analyze the data, and finally learn agile development so you can take those insights and build applications rapidly.

1. Why is Big Data important to your business?

Continue reading