EMC Hadoop Starter Kit ViPR Edition: Creating a Smarter Data Lake

Pivotal HD offers a wide variety of data processing technologies for Hadoop – real-time, interactive, and batch. Add integrated data storage EMC Isilon scale-out NAS to Pivotal HD and you have a shared data repository with multi-protocol support, including HDFS, to service a wide variety of data processing requests. This smells like a Data Lake to me – a general-purpose data storage and processing resource center where Big Data applications can develop and evolve. Add EMC ViPR software defined storage to the mix and you have the smartest Data Lake in town, one that supports additional protocols/hardware and automatically adapts to changing workload demands to optimize application performance.

EMC Hadoop Starter Kit, ViPR Edition, now makes it easier to deploy this ‘smart’ Data Lake with Pivotal HD and other Hadoop distributions such as Cloudera and Hortonworks. Simply download this step-by-step guide and you can quickly deploy a Hadoop or a Big Data analytics environment, configuring Hadoop to utilize ViPR for HDFS, with Isilon hosting the Object/HDFS data service.  Although in this guide Isilon is the storage array that ViPR deploys objects to, other storage platforms are also supported – EMC VNX, NetApp, OpenStack Swift and Amazon S3.

I spoke with the creator of this starter kit James F. Ruddy, Principal Architect for the EMC Office of the CTO to explain why every organization should use this starter kit optimize their IT infrastructure for Hadoop deployments.

1.  The original EMC Hadoop Starter Kit released last year was a huge success.  Why did you create ViPR Edition?

Continue reading

Pivotal Big Data Suite: Eliminating the Tax On A Growing Hadoop Cluster

The promise of Big Data is about analyzing more data to gain unprecedented insight, but Hadoop pricing can place serious constraints on the amount of data that can actually be stored for analysis.  Each time a node is added to a Hadoop cluster to increase storage capacity, you are charged for it.  Because this pricing model is counterintuitive to the philosophy of Big Data, Pivotal has removed the tax to store data in Hadoop with its announcement of Pivotal Big Data Suite.

Through a Pivotal Big Data Suite subscription, customers store as much data as they want in fully supported Pivotal HD, paying for only value added services per core – Pivotal Greenplum Database, GemFire, SQLFire, GemFire XD, and HAWQ.   The significance of this new consumption model is that customers can now store as much Big Data as they want, but only be charged for the value they extract from Big Data.

BigDataSuite_Diagram

*Calculate your savings with Pivotal Big Data Suite compared to traditional Enterprise Data Warehouse technologies.

Additionally, Pivotal Big Data Suite removes the mind games associated with diverse data processing needs of Big Data.  With a flexible subscription of your choice of real-time, interactive, and batch processing technologies, organizations are no longer locked into a specific technology because of a contract.  At any point of time, as Big Data applications grow and Data Warehouse applications shrink, you can spin up or down licenses across the value added services without incurring additional costs.  This pooled approach eliminates the need to procure new technologies, which results in delayed projects, additional costs, and more data silos.

I spoke with Michael Cucchi, Senior Director of Product Maketing at Pivotal, to explain how Pivotal Big Data Suite radically redefines the economics of Big Data so organizations can achieve the Data Lake dream.

1. What Big Data challenges does Big Data Suite address and why?

Continue reading

VCE Vblock: Converging Big Data Investments To Drive More Value

As Big Data continues to demonstrate real business value, organizations are looking to leverage this high value data across different applications and use cases. The uptake is also driving organizations to transition from siloed Big Data sandboxes, to enterprise architectures where they are mandated to address mission-critical availability and performance, security and privacy, provisioning of new services, and interoperability with the rest of the enterprise infrastructure.

Sandbox or experimental Hadoop on commodity hardware with direct attached storage (DAS) makes it difficult to address such challenges for several reasons – difficult to replicate data across applications and data centers, lack of IT oversight and visibility into the data, lack of multi-tenancy and virtualization, difficult to streamline upgrades and migrate technology components, and more. As a result, VCE, leader in converged or integrated infrastructures, is receiving an increased number of requests on how to evolve Hadoop implementations reliant on DAS to being deployed on VCE Vblock Systems -  an enterprise-class infrastructure that combines server, shared storage, network devices, virtualization, and management in a pre-integrated stack.

Formed by Cisco and EMC, with investments from VMware and Intel, VCE enables organizations to rapidly deploy business services on demand and at scale – all without triggering an explosion in capital and operating expenses. According to IDC’s recent report, organizations around the world spent over $3.3 billion on converged systems in 2012, and forecasted this spending to increase by 20% in 2013 and again in 2014. In fact, IDC calculated that Vblock Systems infrastructure resulted in a return on investment of 294% over a three-year period and 435% over a five-year period compared to data on traditional infrastructure due to fast deployments, simplified operations, improved business-support agility, cost savings, and freed staff to launch new applications, extend services, and improve user/customer satisfaction.

I spoke with Julianna DeLua from VCE Product Management to discuss how VCE’s Big Data solution enables organizations to extract more value from Big Data investments.

vce

 

1.  Why are organizations interested deploying Hadoop and Big Data applications on converged or integrated infrastructures such as Vblock?

Continue reading

You Asked, Rackspace Listened – New Big Data Hosting Options

Running Hadoop on bare metals may fit some use cases, but many organizations have the types of data workloads that demand more storage than compute resources. In order to get the most efficient utilization for these types of Hadoop workloads, separating the compute and storage resources makes sense and is a configuration users are asking for, i.e EMC Isilon storage for Hadoop.  In response to diverse Big Data needs such as mixed Hadoop workloads, hybrid cloud models, and heterogenous data layers, Rackspace recently delivered new Big Data hosting options whereby users now have more choices – run Hadoop on Rackspace managed dedicated servers, spin up Hadoop on the public cloud, or configure your own private cloud.

EMC_Isilon_RefArch

I spoke with Sean Anderson, Product Marketing Manager for Data Solutions at Rackspace, to talk about one particular new offering called ‘Managed Big Data Platform’ whereby customers can design the optimal configuration for their data, and leave the management details to Rackspace.

1.  The Big Data lifecycle will go through various stages, with each stage imposing different requirements. From what you are seeing with your customers, can you explain this lifecycle and what value Rackspace brings to support this lifecycle?

Continue reading