Pivotal Big Data Suite: Eliminating the Tax On A Growing Hadoop Cluster

The promise of Big Data is about analyzing more data to gain unprecedented insight, but Hadoop pricing can place serious constraints on the amount of data that can actually be stored for analysis.  Each time a node is added to a Hadoop cluster to increase storage capacity, you are charged for it.  Because this pricing model is counterintuitive to the philosophy of Big Data, Pivotal has removed the tax to store data in Hadoop with its announcement of Pivotal Big Data Suite.

Through a Pivotal Big Data Suite subscription, customers store as much data as they want in fully supported Pivotal HD, paying for only value added services per core – Pivotal Greenplum Database, GemFire, SQLFire, GemFire XD, and HAWQ.   The significance of this new consumption model is that customers can now store as much Big Data as they want, but only be charged for the value they extract from Big Data.


*Calculate your savings with Pivotal Big Data Suite compared to traditional Enterprise Data Warehouse technologies.

Additionally, Pivotal Big Data Suite removes the mind games associated with diverse data processing needs of Big Data.  With a flexible subscription of your choice of real-time, interactive, and batch processing technologies, organizations are no longer locked into a specific technology because of a contract.  At any point of time, as Big Data applications grow and Data Warehouse applications shrink, you can spin up or down licenses across the value added services without incurring additional costs.  This pooled approach eliminates the need to procure new technologies, which results in delayed projects, additional costs, and more data silos.

I spoke with Michael Cucchi, Senior Director of Product Maketing at Pivotal, to explain how Pivotal Big Data Suite radically redefines the economics of Big Data so organizations can achieve the Data Lake dream.

1. What Big Data challenges does Big Data Suite address and why?

When we introduced Business Data Lake last year, the industry confirmed that we had the right vision – include real-time, interactive, and batch data ingest and processing capabilities supported by data management technologies such as in-memory, MPP, and HDFS technologies. The challenge for customers was how to get started with the Data Lake journey and how much budget should be allocated across the breadth of data management technologies that comprise a Data Lake. Also, as data processing requirements change over time, customers want to protect IT investments and not be locked down into any specific technology.

Although Pivotal has always provided enterprise-class technologies to support Busniess Data Lakes, customers were still challenged with how much to invest in Pivotal Greenplum Database for MPP analytical processing versus Pivotal HAWQ for interactive SQL access to HDFS versus Pivotal Gemfire for real time, in-memory database processing, etc. To take these pain points off the table, Big Data Suite offers customers a flexible, multi-year subscription to Pivotal Greenplum Database, GemFire, SQLFire, GemFire XD, HAWQ, and Pivotal HD. It includes unlimited use Pivotal HD through a paid subscription of value added services- Pivotal Greenplum Database, GemFire, SQLFire, GemFire XD, HAWQ.

The significance of this new consumption model is that customers can now store as much Big Data as they want in HDFS, but only be charged for the value they extract from the data.  As an example, a customer could buy 1,000 cores worth of Big Data Suite, and for the first year use 80% of cores dedicated to Pivotal Greenplum Database and 20% of cores dedicated to HAWQ. Over the years, as data and insight start to expand in HDFS, the customer can spin down the use of Pivotal Greenplum Database, and spin up the use of HAWQ without having to pay anything extra as long as the cores don’t exceed 1,000.

2.  What was the impetus in providing unlimited use of Pivotal HD in the Big Data Suite?

Data grows 60% per year, yet IT budgets grow 3-5% per year. Hadoop pricing does not meet limited IT budgets, as vendors charge by terabyte or node. Each time you want to add more data to your Data Lake to increase capacity, you are charged for it. We are telling customers that if they invest in Pivotal, they can grow their Data Lake or expand the HDFS footprint without being taxed for it.  This allows customers to focus on more important aspects such as data analysis and operationalization through analytical database, SQL query, and in-memory technologies.

3.  It sounds like Pivotal Big Data Suite brings all data management technologies in line with Hadoop economics?

Yes, with Big Data Suite, we are aggressively cutting the price of Greenplum (Analytics Data Warehouse) and GemFire (In-memory data grid system) to be in line with the cost economics of Hadoop.

4.  How does Big Data Suite address Data Lake strategies?

Big Data suite fulfills the data management needs of a Data Lake. And because each organization will have different data processing needs over time, we have designed a flexible pricing model for Big Data Suite whereby you can mix and match technologies at any point in time.

For example, a Data Lake for a Telecommunications organization will look different from a Data Lake for a Healthcare organization. The Telco may have immediate real time requirements, whereas the Healthcare Payor may have immediate interactive SQL access to HDFS requirements, but prioritize real time capabilities for next year. If customers standardize with other Hadoop vendors, they may end up purchasing multi-vendor technologies for real time, interactive, and batch processing over time simply because of pricing, creating more data silos. With Pivotal, we remove these silos with the Big Data Suite flexible consumption model approach.

5.  Who are the ideal candidates for the Big Data Suite?

Big Data Suite is ideal for any organization since we believe a flexible subscription model is the smart way to grow a Data Lake. I confirmed this approach with our Data Science team – when they experiment with new sets of data to solve a problem, the data processing requirements are unknown until you operationalize it. One use case may require an analytical database technology versus another may require interactive SQL access to HDFS technology. Therefore, the Data Lake must offer data processing options or a toolkit to address diverse use cases without creating additional data silos.

Calculate your savings with Pivotal Big Data Suite compared to data management in an Enterprise Data Warehouse.

Pivotal HD 2.0: Hadoop Gets Real-Time

Everything we do generates events – click on a mobile ad, pay with a credit card, tweet, measure heart rate, accelerate on the gas pedal, etc. What if an organization can feed these events into predictive models as soon as the event happens to quickly and more accurately make decisions that generate more revenue, lower costs, minimize risk, and improve the quality of care? You would need deep and fast analytics provided by Big Data platforms such as Pivotal HD 2.0 announced yesterday.

Pivotal HD 2.0 brings an in-memory, SQL database to Hadoop through seamless integration with Pivotal GemFire XD, enabling you to combine real-time data with historical data managed in HDFS. Closed loop analytics, operational BI, and high-speed data ingest are now possible in a single OLTP/OLAP platform without any ETL processing required. Use cases are ones that are time sensitive in nature. For example, telecom companies are at the forefront of applying real-time Big Data analytics to network traffic. The “store first, analyze second” method does not make sense for rapidly shifting traffic that requires immediate action when issues arise.


I spoke with Senior Director of Engineering at Pivotal Makarand Gokhale to explain the value in bringing OLTP to a traditional batch processing Hadoop.

1. Real-time solutions for Hadoop can mean many things- performing interactive queries, real-time event processing, and fast data ingest. How would you describe Pivotal HD’s real-time data services for Hadoop?

Continue reading

RSA and Pivotal: Laying the Foundation for a Wider Big Data Strategy

Building from years of security expertise, RSA was able to exploit Big Data to better detect, investigate, and understand threats with its RSA Security Analytics platform launched last year. Similarly, Pivotal leveraged its world-class Data Science team in conjunction with its Big Data platform to deliver Pivotal Network Intelligence for enhanced threat detection using statistical and machine learning techniques on Big Data. Utilizing both RSA Security Analytics and Pivotal Network Intelligence together, customers were able to identify and isolate potential threats faster than competing solutions for better risk mitigation.

As a natural next step, RSA and Pivotal last week announced the availability of the Big Data for Security Analytics reference architecture, solidifying a partnership that brings together the leaders in Security Analytics and Big Data/Data science. RSA and Pivotal will not only enhance the overall Security Analytics strategy, but also provide a foundation for a broader ‘IT Data Lake’ strategy to help organizations gain better ROI from these IT investments.

RSA’s reference architecture utilizes Pivotal HD, enabling security teams to gain access to a scalable platform with rich analytic capabilities from Pivotal tools and the Hadoop ecosystem to experiment and gain further visibility around enterprise security and threat detection. Moreover, the combined Pivotal and RSA platform allows organizations to leverage the collected data for non-security use cases such as capacity planning, mean-time-to-repair analysis, downtime impact analysis, shadow IT detection, and more.



Distributed architecture allows for enterprise scalability and deployment

I spoke with Jonathan Kingsepp, Director of Federation EVP Solutions from Pivotal to discuss how the RSA-Pivotal partnership allows customers to gain much wider benefits across their organization.

1.  What are the technology components of this is this new RSA-Pivotal Reference architecture?

Continue reading

EMC and RainStor Optimize Interactive SQL on Hadoop

Pivotal HAWQ was one of the most groundbreaking technologies entering the Hadoop ecosystem last year through its ability to execute complete ANSI SQL on large-scale datasets managed in Pivotal HD. This was great news for SQL users – organizations heavily reliant on SQL applications and common BI tools such as Tableau and MicroStrategy can leverage these investments to access and analyze new data sets managed in Hadoop.

Similarly, RainStor, a leading enterprise database known for its efficient data compression and built-in security, also enables organizations to run ANSI SQL queries against data in Hadoop – highly compressed data.  Due to the reduced footprint from extreme data compression (typically 90%+ less), RainStor enables users to run analytics on Hadoop much more efficiently.  In fact, there are many instances where queries run significantly faster with a reduced footprint plus some filtering capabilities that figure out what not to read.  This allows customers to minimize infrastructure costs and maximize insight for data analysis on larger data sets.

Serving some of the largest telecommunications and financial services organizations, RainStor enables customers to readily query and analyze petabytes of data instead of archiving data sets to tape and then having to reload it whenever it is needed for analysis. RainStor chose to partner with EMC Isilon scale-out NAS for its storage layer to manage these petabyte-scale data environments even more efficiently. Using Isilon, the compute and storage for Hadoop workload is decoupled, enabling organizations to balance CPU and storage capacity optimally as data volumes and number of queries grow.


Furthermore, not only are organizations able to run any Hadoop distribution of choice with RainStor-Isilon, but you can also run multiple distributions of Hadoop against the same compressed data. For example, a single copy of the data managed in Rainstor-Isilon can service Marketing’s Pivotal HD environment, Finance’s Cloudera environment, and HR’s Apache Hadoop environment.

To summarize, running RainStor and Hadoop on EMC Isilon, you achieve:

  • Flexible Architecture Running Hadoop on NAS and DAS together: Companies leverage DAS local storage for hot data where performance is critical and use Isilon for mass data storage. With RainStor’s compression, you efficiently move more data across the network, essentially creating an I/O multiplier.
  • Built-in Security and Reliability: Data is securely stored with built-in encryption, and data masking in addition to user authentication and authorization. Carrying very little overhead, you benefit from EMC Isilon FlexProtect, which provides a reliable, highly available Big Data environment.
  • Improved Query Speed: Data is queried using a variety of tools including standard SQL, BI tools Hive, Pig and MapReduce. With built-in filtering, queries speed-up by a factor of 2-10X compared to Hive on HDFS/DAS.
  • Compliant WORM Solution: For absolute retention and protection of business critical data, including stringent SEC 17a-4 requirements, you leverage EMC Isilon’s SmartLock in addition to RainStor’s built-in immutable data retention capabilities.

I spoke to Jyothi Swaroop, Director of Product Marketing at Rainstor, to explain the value of deploying EMC Isilon with RainStor and Hadoop.

1.  RainStor is known in the industry as an enterprise database architected for Big Data. Can you please explain how this technology evolved and what needs it addresses in the market?

Continue reading