Cloudera Enterprise and EMC Isilon: Filling In The Hadoop Gaps

As Hadoop becomes the central component of enterprise data architectures, the open source community and technology vendors have built a large Big Data ecosystem of Hadoop platform capabilities to fill in the gaps of enterprise application requirements. For data processing, we have seen MapReduce batch processing being supplemented with additional data processing techniques such as Apache Hive, Apache Solr, and Apache Spark to fill in the gaps for SQL access, search, and streaming.  For data storage, direct attached storage (DAS) has been the common deployment configuration for Hadoop; however, the market is now looking to supplement DAS deployment with enterprise storage. Why take this approach? Organizations can HDFS enable valuable data already managed in enterprise storage without having to copy or move this data to a separate Hadoop DAS environment.

Cloudera

As a leader in enterprise storage, EMC has partnered with Hadoop vendors such as Cloudera to ensure customers can fill in the Hadoop gaps through HDFS enabled storage such as EMC Isilon. In addition to providing data protection, efficient storage utilization, and ease of import/export through multi-protocol support, EMC Isilon and Cloudera together allow organizations to quickly and easily take on new, analytic workloads.   With the announcement of Cloudera Enterprise certified with EMC Isilon for HDFS storage, I wanted to take the opportunity to speak with Cloudera’s Chief Strategy Officer Mike Olson about the partnership and how he sees the Hadoop ecosystem evolving over the next several years.

1.  The industry has different terminologies for enterprise data architectures centered around Hadoop. EMC refers to this next generation data architecture as a Data Lake and Cloudera as Enterprise Data Hub. What is the common thread?

Continue reading

Dear BI Users: Your Hadoop SQL Wish Has Finally Come True

To accelerate the value of Big Data, many products have been developed to make data managed in Hadoop much easier to access and analyze through SQL.  First there was Hive, which provides a SQL query abstraction layer by converting SQL queries into MapReduce jobs.  More recently, Cloudera announced Impala which bypasses MapReduce to enable interactive queries on data stored in Hadoop using the same variant of SQL that Hive uses.  And today, EMC Greenplum announced Pivotal HD, the only high performing, true SQL query engine on top of Hadoop.  Don’t be confused by these approaches, as there is a common thread – to leverage Hadoop as a Big Data platform for running SQL queries.  The major difference with Pivotal HD is that now there is a single, scalable, flexible, and cost-effective data platform for all of your analytic needs.

pivotalhd

 

I spoke with Greenplum Chief Scientist Milind Bhandarkar to explain this breakthrough SQL interface to Hadoop.

1. How does Pivotal HD provide a true, high performing SQL interface to Hadoop?

Continue reading