EMC CIO Takes On Big Data Problems With Big Data Analytics

Every second of every day, IT generates enormous amounts of data around operational activity – system behavior, application performance, user actions, security activity, and more. Instead of viewing this data explosion as a Big Data problem, IT views it as opportunity to use Big Data solutions such as IT Operations Analytics to improve the quality of their services.

itoa

For example, 75% IT professionals surveyed recently said that they believe that IT Operations Analytics are able to transform data into relevant insights into actionable plans for improvement. I spoke with EMC CIO Vic Bhagat to describe how EMC is embracing Big Data for IT Operations Analytics to solve critical problems affecting EMC IT Operations and customers.

1.  What are the biggest problems faced by IT Operations Management at EMC and how were these problems addressed before the world of Big Data?

IT generates enormous amounts of data when monitoring complex, rapidly growing and changing IT infrastructures and the applications. The challenge for IT Operations Management is to leverage this data to build an adaptive system that is more proactive, and less reactive. The more the system can learn from the data, the better it can identify variances and problems areas in a timely manner to help IT fix issues before it negatively impacts the business such as downtime or poor performance.

In the past, we relied on traditional business intelligence and data warehousing systems to gain intelligence or insight based on historical trends. Now, with analytics, we can uncover important variables and modify them to predict an outcome. And, the more data we collect at a detailed level, the more accurate we can be.

2.  How does Big Data analytics change the game to address these problems more effectively?

It cuts down the time to gain insight. The most heavily used word after ‘selfie’ is now ‘data lake’. Everyone wants to build a data lake since it provides the right architecture and capabilities to cut down the cycle time in deriving newer, predictive insight, and then continuously integrating these results back into our business processes and decision-making. At EMC, we are moving away from data warehouses to a data lake architecture enabling us to not only gain faster insight, but also gain newer insight by bringing together and analyzing both structured and unstructured data.

For example, in a data warehouse you manage structured data such as part numbers, bay numbers, disk numbers, chassis numbers, and more. In a data lake you can manage all of this structured data in addition to unstructured data such as user manuals for each system and component. Let’s now apply this data lake solution to a use case – we continuously monitor the health of a customer’s infrastructure with our call home systems. We can now leverage a data lake with more data sets to not only make more accurate component failure predictions, but we can also provide the relevant information needed from user manuals to fix the problem in a timely manner so the customer experiences no downtime.

3.  What is EMC’s IT Operations Analytics solution leveraging Big Data technologies and techniques?

We are leveraging the entire Pivotal Big Data Suite to ingest and store all of the structured and unstructured data – Pivotal Gemfire XD, Pivotal HD, Pivotal HAWQ, and Pivotal Greenplum Database. Our Data Scientists are then able to apply advanced analytic techniques to the data they need using their choice of tools which are MadLib, R, and Python. This Big Data environment will be part of a wider business data lake strategy, where all enterprise data will be managed, accessed, and used equally by all business applications, not just IT Operations. Only a few legacy or specialized applications will standalone.

4. What benefits has EMC gained from this Big Data solution?

The benefits are enormous and can be extracted from both business and technical benefits. Building predictive models and predicting imminent system failure reduces downtime and the number of alerts and enables us to identify the real issues faster, reducing the cycle for decision making and taking corrective action. This improves our performance, productivity and value we gain from Big Data.

But we are only scratching the surface. The more we can optimize our Big Data environment so that it is elastic and accessible, the faster and more precise Data Scientists will be in solving problems. For example, we can now predict MS Exchange outages two hours in advance.

5. One of the biggest barriers to getting value from Big Data is the skills shortage. How does EMC IT Operations address this issue?

EMC had the foresight to build Centers of Excellence (COE) around the globe, producing the expertise and skills needed to transition into the realm of Data Science. We are fortunate to leverage talent within the company, but also leverage the COE to attract and acquire new Data Science talent outside the company.

6. What books are you currently reading on your Kindle or if you are still paper based like me, what books are stacked on your nightstand?

I’m Kindle based, so I read periodicals such as Techmeme and Engadget. Since we are a company that is data and digital driven, I am reading a book called ‘Leading Digital’. I want help lead this digital revolution at EMC and this book provides great examples of how digital makes significant changes in how a company operates and kills bureaucracy.

All Paths Lead To A Federation Data Lake

Is your organization constrained by 2nd platform data warehouse technologies with limited or no budget to move forward towards 3rd platform agile technologies such as a Data Lake? As an EMC customer you have the advantage of leveraging existing EMC investments to develop a Federation Data Lake at minimal cost. Additionally, the Federation Data Lake will generate healthy returns, as it is packaged up with the expertise needed to immediately execute on data lake uses cases such as data warehouse ETL offloading and archiving.

Data Lake

With the release of William Schmarzo’s Five Tactics to Modernize Your Existing Data Warehouse, I wanted to explore whether the Dean of Big Data views data warehouse modernization tactics or paths ultimately leading to a Federation Data Lake.

1.  What is a Data Lake and who should care?

Continue reading

Don’t Accept The Status Quo For Hadoop

Hadoop is Everywhere – 99% companies will deploy/pilot Hadoop in 18-24 months according to IDC.  These environments will largely be based around standalone servers resulting in added management tasks due to data being spread out across many disk spindles across the data center.  With Hadoop clusters quickly expanding, organizations are starting to experience the typical growing pains one can compare to adolescence.  This begs the question- should DAS server configuration be the accepted status-quo for Hadoop deployments?

idcisilon

Whether you are getting started with Hadoop or growing your Hadoop deployment, EMC provides a long-term solution for Hadoop through shared storage and VM’s, delivering distinct value to the business in lower TCO and faster time-to-results.  I spoke with EMC Technical Advisory Architect Chris Harrold to explain why organizations are now turning to EMC to help transition Hadoop environments into adulthood.

1.  Almost every Hadoop deployment is based around the accepted configuration of standalone servers with DAS.   What have you seen as issues with this configuration with your customers?

Continue reading

Cloudera Enterprise and EMC Isilon: Filling In The Hadoop Gaps

As Hadoop becomes the central component of enterprise data architectures, the open source community and technology vendors have built a large Big Data ecosystem of Hadoop platform capabilities to fill in the gaps of enterprise application requirements. For data processing, we have seen MapReduce batch processing being supplemented with additional data processing techniques such as Apache Hive, Apache Solr, and Apache Spark to fill in the gaps for SQL access, search, and streaming.  For data storage, direct attached storage (DAS) has been the common deployment configuration for Hadoop; however, the market is now looking to supplement DAS deployment with enterprise storage. Why take this approach? Organizations can HDFS enable valuable data already managed in enterprise storage without having to copy or move this data to a separate Hadoop DAS environment.

Cloudera

As a leader in enterprise storage, EMC has partnered with Hadoop vendors such as Cloudera to ensure customers can fill in the Hadoop gaps through HDFS enabled storage such as EMC Isilon. In addition to providing data protection, efficient storage utilization, and ease of import/export through multi-protocol support, EMC Isilon and Cloudera together allow organizations to quickly and easily take on new, analytic workloads.   With the announcement of Cloudera Enterprise certified with EMC Isilon for HDFS storage, I wanted to take the opportunity to speak with Cloudera’s Chief Strategy Officer Mike Olson about the partnership and how he sees the Hadoop ecosystem evolving over the next several years.

1.  The industry has different terminologies for enterprise data architectures centered around Hadoop. EMC refers to this next generation data architecture as a Data Lake and Cloudera as Enterprise Data Hub. What is the common thread?

Continue reading