Don’t Accept The Status Quo For Hadoop

Hadoop is Everywhere – 99% companies will deploy/pilot Hadoop in 18-24 months according to IDC.  These environments will largely be based around standalone servers resulting in added management tasks due to data being spread out across many disk spindles across the data center.  With Hadoop clusters quickly expanding, organizations are starting to experience the typical growing pains one can compare to adolescence.  This begs the question- should DAS server configuration be the accepted status-quo for Hadoop deployments?


Whether you are getting started with Hadoop or growing your Hadoop deployment, EMC provides a long-term solution for Hadoop through shared storage and VM’s, delivering distinct value to the business in lower TCO and faster time-to-results.  I spoke with EMC Technical Advisory Architect Chris Harrold to explain why organizations are now turning to EMC to help transition Hadoop environments into adulthood.

1.  Almost every Hadoop deployment is based around the accepted configuration of standalone servers with DAS.   What have you seen as issues with this configuration with your customers?

These environments are growing rapidly.  As a result, the ability to support these environments starts to degrade pretty rapidly when you get to a larger scale.  Servers with DAS tend to be more difficult because they have components that can fail internally and require more babysitting over an enterprise class platform.

For example, it’s no trivial task to expand this environment, as you have to acquire the servers, stand them all up, configure, rack, and power them.  Just to add a sandbox or test environment is difficult in this standalone server model.

There is also a steep learning curve with Hadoop not only in terms of the analytics component but also just to simply get data in and out of Hadoop in a DAS environment.

2.  Hadoop was designed to run on a DAS architecture where the compute and storage is tightly coupled.  Why does EMC believe decoupling storage and compute through a shared storage and virtualizing compute resources is a better architecture?  How does this architecture address the issues you mentioned above?

When Hadoop was first introduced to the market in 2000, shared or enterprise storage was a high-end commodity; therefore it was very difficult to design something like Hadoop on shared storage.  Since then, shared storage has become more affordable.   At the same time, networking speeds have become faster so now it is more feasible to decouple compute and storage. By deploying Hadoop on a shared storage model, you eliminate all the issues around manageability with DAS and gain the benefits of enterprise class features such as virtualization, SANs, and scale out NAS.

Also, deploying large-scale standalone architectures is really a legacy approach, as many enterprises have moved away from this to a shared architecture.  As Hadoop is becoming a key component of data architectures, it will be challenging to maintain standalone servers since many enterprises have evolved to virtualized, shared environments.

EMC is enabling organizations to leverage enterprise storage and virtualization to quickly and easily deploy and manage growing Hadoop environments.  Utilizing enterprise technologies with Hadoop you also gain benefits such as ease of data import/export, data protection, and security.

3.  What are the components of the EMC recommended architecture for Hadoop?

We provide choice on how organizations want to architect Hadoop environments.  For shared storage, we provide EMC Isilon HDFS enabled storage, which is certified with all major Hadoop distributions.  We also provide Software Defined Storage through EMC ViPR HDFS/Object storage, which is certified with all major storage arrays and Hadoop distributions.

For pre-integrated compute and shared storage, the EMC Data Computing Appliance provides an optimized architecture utilizing Pivotal HD and EMC Isilon.

For an integrated infrastructure approach, we partner with VCE vBlock to provide choice of compute, storage, and networking technologies from VMware, Cisco, and EMC to optimize any major Hadoop distribution deployment.

4.  Who are the ideal candidates for the EMC architecture for Hadoop and why?

All organizations benefit from this architecture.  Whether you are just getting started with Hadoop or have a large-scale deployment, we make it easy to rapidly deploy and manage a Hadoop environment.  In fact, utilizing EMC storage, you can analytics enable data already living in storage arrays.  You don’t have to copy or move data to a separate standalone Hadoop DAS environment.   We have several step by step guides to walk you the process of easily configuring your Hadoop environment for HDFS enabled storage.

5. Although Hadoop is everywhere with IDC estimating that 99%o of companies will deploy/pilot Hadoop in the next 18-24 months, gaining ROI from the deployment is a challenge due to lack of skills – identifying the right opportunity and then executing.  How does EMC address this issue?

Yes, this a huge problem in the industry especially lack of Data Science skills.  EMC addresses the skills shortage through our services across the EMC Federation.  Pivotal Data Labs provides access to some of the best minds in Data Science to help organizations identify opportunities and execute utilizing the latest Big Data technologies and techniques.  The EMC Vision Workshop creates a strategic Big Data blueprint for organizations to continuously identify Big Data uses cases based on the organization’s business initiatives and implementation feasibility.  And this has become a huge success as the EMC Vision Workshop creates the needed organizational alignment – Lines of Business continuously working, communicating, and collaborating with IT in order to successfully identify the right Big Data use cases for success.

Can Big Data Shape A Better Future? Quid is Paving the Way

World hunger, political conflict, business competition and other complex problems cannot be solved with mathematical algorithms measuring probabilities alone. However, by combining together human intelligence with the best artificial intelligence, the company Quid has built software that experts are calling the worlds first augmented intelligence platform. Using superior speed and storage capacity of computation, the process by which human beings typically acquire the deep pattern recognition of expertise is accelerated. The software does more than run simple prediction algorithms, it allows users to interact with data in an immersive, visual environment to better understand the world at a high resolution so that they can ultimately shape it and change it.

Founded in 2010, Quid is addressing a new class of problems to help organizations make strategic decisions around business innovation, public relations, foreign policy, human welfare, and more. Through advanced visualizations that interpret massive amounts of diverse internal and publicly accessible external data sets, Quid tells a unique and compelling story about the complexity of our world – trends, comparisons, multi-dimensional relationships, etc. – to change the direction of decision making.

For Quid, it’s not about man battling it out with machines, but rather, man working with machines when entering a new level of complex problem solving. For example, military intelligence may one day be able to change the direction of future conflicts by working with Quid software to analyze millions of data points from war logs and reports, news articles, and social media about the most recent casualties of war. The intelligence teams plugged into Quid would be able to see the war unfold as it happens across multiple data dimensions and uncover the mathematical patterns hidden in the data that are shaping the direction of the conflict.



I spoke with Quid Co-founder and CTO Sean Gourley to explain how Quid is helping organizations leverage Big Data and augmented intelligence to tackle the Bigger Problems they are facing in a fast moving world.

1.  Quid applies Data Intelligence to Big Data – a very different concept than applying Data Science to Big Data. Please explain.

Continue reading

RSA and Pivotal: Laying the Foundation for a Wider Big Data Strategy

Building from years of security expertise, RSA was able to exploit Big Data to better detect, investigate, and understand threats with its RSA Security Analytics platform launched last year. Similarly, Pivotal leveraged its world-class Data Science team in conjunction with its Big Data platform to deliver Pivotal Network Intelligence for enhanced threat detection using statistical and machine learning techniques on Big Data. Utilizing both RSA Security Analytics and Pivotal Network Intelligence together, customers were able to identify and isolate potential threats faster than competing solutions for better risk mitigation.

As a natural next step, RSA and Pivotal last week announced the availability of the Big Data for Security Analytics reference architecture, solidifying a partnership that brings together the leaders in Security Analytics and Big Data/Data science. RSA and Pivotal will not only enhance the overall Security Analytics strategy, but also provide a foundation for a broader ‘IT Data Lake’ strategy to help organizations gain better ROI from these IT investments.

RSA’s reference architecture utilizes Pivotal HD, enabling security teams to gain access to a scalable platform with rich analytic capabilities from Pivotal tools and the Hadoop ecosystem to experiment and gain further visibility around enterprise security and threat detection. Moreover, the combined Pivotal and RSA platform allows organizations to leverage the collected data for non-security use cases such as capacity planning, mean-time-to-repair analysis, downtime impact analysis, shadow IT detection, and more.



Distributed architecture allows for enterprise scalability and deployment

I spoke with Jonathan Kingsepp, Director of Federation EVP Solutions from Pivotal to discuss how the RSA-Pivotal partnership allows customers to gain much wider benefits across their organization.

1.  What are the technology components of this is this new RSA-Pivotal Reference architecture?

Continue reading

Revolution Analytics Boosts the Adoption of R in the Enterprise

The path to competitive advantage is being able to make predictions from Big Data. Therefore, the more you can build predictive analytics into your business processes, the more successful your organization will become. There is no doubt that open-source R is the programming language of choice for predictive analytics, and thanks to Revolution Analytics, R has the enterprise capabilities needed to drive adoption across the organization and for every employee to make data-driven decisions.

Revolution Analytics is to R what the vendor RedHat is to the Linux operating system—a company devoted to enhancing and supporting open-source software for enterprise deployments. For example, Revolution Analytics recently released R Enterprise 7 to meet the performance demands of Big Data whereby R now runs natively within Hadoop and data warehouses. I spoke with David Smith, VP of Marketing at Revolution Analytics to explain how Revolution Analytics has accelerated the adoption of R in the enterprise.

1.  What benefits do Revolution Analytics provide to organizations over just using open-source R?

Continue reading