What Are The Real Effects Of Climate Change? EMC Utilizes Data Science To Find Out.

Mona Patel

Mona Patel

Senior Manager, Big Data Solutions Marketing at EMC
Mona Patel is a Senior Manager for Big Data Marketing at EMC Corporation. With over 15 years of working with data at The Department of Water and Power, Air Touch Communications, Oracle, and MicroStrategy, Mona decided to grow her career at EMC, a leader in Big Data.

Contributing to social good is now literally at everyone’s fingertips. That is why EMC and Earthwatch Institute have teamed up to encourage citizens to become data collectors, or citizen scientists. Through the collection of more data sources, data scientists can better uncover how climate change is affecting plants and animals by altering the timing of key natural events.

This collaboration is called the Whenology project, with the first study underway to investigate how climate change is affecting raptor migrations at Acadia National Park. To create awareness and encourage more participation, EMC launched a microsite that provides educational materials, track progress, and report insights.


I spoke with EMC Distinguished Engineer John Cardente about the Whenology project and it’s potential to provide a powerful citizen science platform for collaboratively tackling virtually any large-scale, high impact societal issue.

1.  What is the Whenology project and what are your major objectives?

The Whenology project was born out of collaboration between the EMC Corporate Sustainability Office and Earthwatch Institute. The project’s name is a play on Phenology, a field of science that studies how climate change affects the seasonal timings of plant and animal life cycles.

The goal is to help scientists bring a variety of data sets together for the first time, analyze them to better understand how climate change may be disrupting complex interactions between life-cycle events (phenophases), and improve collaboration with citizen scientists. EMC’s comprehensive portfolio of Big Data solutions and technologies puts us in a unique position to help this important scientific endeavor.

We’re kicking off Whenology with a pilot project to study phenophase changes related to raptor migrations at Acadia National Park. Acadia is an important waypoint along the Eastern Seaboard migration route and scientists are worried that changes there may prevent migrating birds from getting the nutrition needed to complete their journey.

This project relies heavily on observational data collected by citizen scientists. It would be impossible without the participation of citizen science organizations like eBird, the Hawk Migration Association of North America, and the USA National Phenology Network who have all provided data for the project.

2.  The key to insight is not only about building the right model, but also about asking the right questions. What questions are you hoping to get answers for with big data?

The interactions between species in nature are varied and complex. But they all work off a common “clock”, climate patterns. As climate change perturbs this clock, scientists are unsure how those interspecies relationships are being affected. The fear is that a tipping point will be reached after which sudden, drastic changes will occur. That’s pretty scary. We’re hoping that by assembling a wide variety of data sets and providing Big Data tools, scientists will be able to uncover the relationships, develop models capable of forecasting phenophase changes, and initiate societal changes to prevent bad outcomes.

From a technology perspective, we’re very interested in learning more about enabling large-scale data science collaborations. Building environments to enable citizen scientists from across the globe share data, analytics, visualizations, and insights will yield valuable lessons that EMC can in turn use to help its customers.

3.  What stage are you at with this project and what obstacles are you facing?

The team has done a lot of great work to get the pilot project going. We’ve worked with the citizen science organizations mentioned above to bring multiple data sets together in a single environment for the first time. In addition, we’ve developed a suite of analytics software to collaboratively combine, analyze, and visualize the data. We’re starting to uncover interesting insights but want to be responsible about reporting any findings and therefore are waiting until the rigorous analyses are completed.

Obtaining the data was a challenge, as we had to ensure any agreements the participating organizations had with their users were not violated. But perhaps the biggest challenge has been determining the right balance between maintaining scientific rigor and sharing findings with the public in a timely manner so that we can start to influence behaviors.

Publishing findings too early or late could be equally harmful. We want to get that right and we’re fortunate to have great scientists involved to make sure we do. What hasn’t been a challenge is finding people to participate in the project. A lot of people at EMC care deeply about the environment and are excited by the opportunity to use their technical skills to make a difference. It’s been a profound experience.

4. What is your role with the project? EMC?

As a Distinguished Engineer in the Corporate CTO Office, I often get tasked with bootstrapping new initiatives. My role focuses on Big Data and Data Science so it seemed natural to support this project. To that end, I developed the initial suite of software to process, analyze, and visualize the data. I also created the dynamic data visualizations for the microsite. I’ve had a tremendous amount of fun working on this project. More importantly, I’ve developed a strong interest in data science for social good and plan to do more projects like Whenology in the future.

5.  What technologies, tools, and skills are required for this project and what are the gaps?

This project is a great example of exploratory analytics, we’re not sure what insights are hidden inside the data or how to find them! This situation requires the usual data science skills like data wrangling, feature engineering, applying machine learning techniques, and visualizing data. It also requires a lot of applied curiosity, and experimentation. That’s the part of data science that I really enjoy.

The pilot project’s limited scope makes it possible to use open source tools like R, Python, Spark, and D3js. As the project expands, however, we’re going to need more capable technologies like those provided by EMC’s Federation Business Data Lake.

Our goal is to expand the Whenology project to cover not only a wider geographic region but also other climate change related topics. If successful, we might even expand to hosting social good projects related to other “grand challenges” like healthcare. Accomplishing that will likely require the full compliment of EMC Federation Big Data technologies.

6.  For people reading this blog story, how can they help or participate?

To start, readers can checkout the “Participate” section of the Whenology microsite. It provides profiles and contact information for the citizen science organizations participating in the Whenology project. Or, check out SciStarter.com and search for a citizen science project that matches your interests and capabilities. We need your help! Together, we can make a big difference.

Is It All About The Data Scientist?

Mona Patel

Mona Patel

Senior Manager, Big Data Solutions Marketing at EMC
Mona Patel is a Senior Manager for Big Data Marketing at EMC Corporation. With over 15 years of working with data at The Department of Water and Power, Air Touch Communications, Oracle, and MicroStrategy, Mona decided to grow her career at EMC, a leader in Big Data.

The answer is no. It is a holistic, team effort that involves expanding the mind and skill set of executives, business users, IT implementers, data scientists, and application developers to all work collectively to define a strategy and derive newer insight from big data.

And that is why EMC is so heavily focused on breaking down organizational silos and training professionals to become data scientists or at least think like data scientists, transforming these individuals into data savvy professionals working towards the same goal – competitive advantage.

I spoke to Louis Frolio, Advisory Technical Ed Consultant for EMC Big Data Solutions, how as part of a team in EMC Education Services is creating a massive professional transformation through a MOOC – Massive Open Online Course. Data Lakes for Big Data MOOC gives you an opportunity to become a data savvy professional and take on a big data or data science role in your organization at absolutely no cost.

The course kicked off May 11, but you still have plenty of time to enroll and complete the course to earn a certificate before June 8. The top 500 students (based on cumulative grade for the MOOC) will receive an electronic copy of the Data Science book just released by EMC Education Services.

1.  What is a MOOC and what is the goal of this education format? Why was it used for this course?

Continue reading

Destination Data Lake: Accelerating the Big Data Journey

Mona Patel

Mona Patel

Senior Manager, Big Data Solutions Marketing at EMC
Mona Patel is a Senior Manager for Big Data Marketing at EMC Corporation. With over 15 years of working with data at The Department of Water and Power, Air Touch Communications, Oracle, and MicroStrategy, Mona decided to grow her career at EMC, a leader in Big Data.

Most people understand that big data and analytics can have a positive impact on their business. What trips them up is how to make that happen. EMC’s answer to that complex challenge is the EMC Business Data Lake, the industry’s first fully engineered, enterprise-grade data lake that’s redefining big data.  For details, check out the virtual launch event.


I spoke with Aidan O’Brien, Senior Director of EMC’s Strategic Big Data Initiative, and asked him why he’s excited about EMC Business Data Lake and why it sets precedence in the world of big data analytics.

1.  What are extraordinary outcomes companies may achieve with big data analytics?

Continue reading

EMC CIO Takes On Big Data Problems With Big Data Analytics

Mona Patel

Mona Patel

Senior Manager, Big Data Solutions Marketing at EMC
Mona Patel is a Senior Manager for Big Data Marketing at EMC Corporation. With over 15 years of working with data at The Department of Water and Power, Air Touch Communications, Oracle, and MicroStrategy, Mona decided to grow her career at EMC, a leader in Big Data.

Every second of every day, IT generates enormous amounts of data around operational activity – system behavior, application performance, user actions, security activity, and more. Instead of viewing this data explosion as a Big Data problem, IT views it as opportunity to use Big Data solutions such as IT Operations Analytics to improve the quality of their services.


For example, 75% IT professionals surveyed recently said that they believe that IT Operations Analytics are able to transform data into relevant insights into actionable plans for improvement. I spoke with EMC CIO Vic Bhagat to describe how EMC is embracing Big Data for IT Operations Analytics to solve critical problems affecting EMC IT Operations and customers.

1.  What are the biggest problems faced by IT Operations Management at EMC and how were these problems addressed before the world of Big Data?

IT generates enormous amounts of data when monitoring complex, rapidly growing and changing IT infrastructures and the applications. The challenge for IT Operations Management is to leverage this data to build an adaptive system that is more proactive, and less reactive. The more the system can learn from the data, the better it can identify variances and problems areas in a timely manner to help IT fix issues before it negatively impacts the business such as downtime or poor performance.

In the past, we relied on traditional business intelligence and data warehousing systems to gain intelligence or insight based on historical trends. Now, with analytics, we can uncover important variables and modify them to predict an outcome. And, the more data we collect at a detailed level, the more accurate we can be.

2.  How does Big Data analytics change the game to address these problems more effectively?

It cuts down the time to gain insight. The most heavily used word after ‘selfie’ is now ‘data lake’. Everyone wants to build a data lake since it provides the right architecture and capabilities to cut down the cycle time in deriving newer, predictive insight, and then continuously integrating these results back into our business processes and decision-making. At EMC, we are moving away from data warehouses to a data lake architecture enabling us to not only gain faster insight, but also gain newer insight by bringing together and analyzing both structured and unstructured data.

For example, in a data warehouse you manage structured data such as part numbers, bay numbers, disk numbers, chassis numbers, and more. In a data lake you can manage all of this structured data in addition to unstructured data such as user manuals for each system and component. Let’s now apply this data lake solution to a use case – we continuously monitor the health of a customer’s infrastructure with our call home systems. We can now leverage a data lake with more data sets to not only make more accurate component failure predictions, but we can also provide the relevant information needed from user manuals to fix the problem in a timely manner so the customer experiences no downtime.

3.  What is EMC’s IT Operations Analytics solution leveraging Big Data technologies and techniques?

We are leveraging the entire Pivotal Big Data Suite to ingest and store all of the structured and unstructured data – Pivotal Gemfire XD, Pivotal HD, Pivotal HAWQ, and Pivotal Greenplum Database. Our Data Scientists are then able to apply advanced analytic techniques to the data they need using their choice of tools which are MadLib, R, and Python. This Big Data environment will be part of a wider business data lake strategy, where all enterprise data will be managed, accessed, and used equally by all business applications, not just IT Operations. Only a few legacy or specialized applications will standalone.

4. What benefits has EMC gained from this Big Data solution?

The benefits are enormous and can be extracted from both business and technical benefits. Building predictive models and predicting imminent system failure reduces downtime and the number of alerts and enables us to identify the real issues faster, reducing the cycle for decision making and taking corrective action. This improves our performance, productivity and value we gain from Big Data.

But we are only scratching the surface. The more we can optimize our Big Data environment so that it is elastic and accessible, the faster and more precise Data Scientists will be in solving problems. For example, we can now predict MS Exchange outages two hours in advance.

5. One of the biggest barriers to getting value from Big Data is the skills shortage. How does EMC IT Operations address this issue?

EMC had the foresight to build Centers of Excellence (COE) around the globe, producing the expertise and skills needed to transition into the realm of Data Science. We are fortunate to leverage talent within the company, but also leverage the COE to attract and acquire new Data Science talent outside the company.

6. What books are you currently reading on your Kindle or if you are still paper based like me, what books are stacked on your nightstand?

I’m Kindle based, so I read periodicals such as Techmeme and Engadget. Since we are a company that is data and digital driven, I am reading a book called ‘Leading Digital’. I want help lead this digital revolution at EMC and this book provides great examples of how digital makes significant changes in how a company operates and kills bureaucracy.