Get even more choice with the Ready Bundle for Hortonworks with Isilon

Brett Roberts

Brett Roberts

Data Analytics Systems Engineer at Dell EMC
Brett is the Technical Lead for Dell EMC’s Data Analytics Technology Alliances, focused on developing solutions that help customers solve their data challenges. You can find him on social media at @Broberts2261

Earlier this month marked the 1 year anniversary of Dell Technologies and the coming together of Dell and EMC. Looking back, it has truly been a great year with a lot of bright spots to reflect on. I am most excited about how we have been able to bring together two powerful product portfolios to create choice and value through unique solutions we now build for our customers. This can be seen across the company as our new portfolio drives increased opportunities to meet specific customer needs like creating more value add solutions for specific workloads. As a data analytics junkie, one that is near and dear to my heart is the recently released Dell EMC Ready Bundle for Hortonworks with Isilon Shared Storage.

You might ask “Why is this so important”? First, this is a Ready Bundle and a part of the Ready Solutions family meaning you reduce your deployment risks and speed up your time in value. If you aren’t sure what Ready Solutions are then here is a Whitepaper from IDG.  Secondly, this new Ready Bundle with Isilon extends flexibility for the user more than ever before. As a heritage Dell offering, the Dell EMC Ready Bundles for Hadoop have been around for years but traditionally they have been designed on PowerEdge servers. When you needed to scale your environment you would need to scale both compute and storage together; not a bad thing for many customers and deployments of these Ready Bundles have been outstanding. Now however, with heritage EMC’s Isilon added to the Ready Solution cadre of technologies, we offer organization the choice to decouple storage from compute and scale independently these two distinct components while delivering world class data services that have earned Isilon the top spot on Gartner’s Magic Quadrant for Scale-Out File and Object storage. We generally find this is a great option for Hadoop deployments where capacity requirements are growing much more rapidly than processing requirements.

In addition to the increased choice and data services that you get with Isilon, you still enjoy all of the benefits of the other Ready Solutions for Hortonworks Hadoop. This solution has been tested and validated for Hortonworks HDP by both Dell EMC and Hortonworks. Dell EMC and Hortonworks
have continued to strengthen their partnership over the years and this is yet another example of how we have come together to provide a unique, integrated solution to meet customers’
needs. Both Dell EMC and Hortonworks are excited about how this new Ready Bundle will help drive even more business outcomes with customers achieving success with Hadoop much more quickly. Jeff Schmitt, Hortonworks’ Sr. Director of Channels and Alliances had this to say about the Ready Bundle “The Ready Bundle for Hortonworks is yet another example of joint Dell EMC and Hortonworks investment bringing increased value to customers. As HDP deployments continue to grow in scale, offering customers choice in their infrastructure deployments is critical. The Ready Bundle for Hortonworks provides a customer simplified deployment while allowing storage and compute to scale independently.”

This new Ready Bundle release is the epitome of the value that this merger has created. If you find yourself having to scale your Hadoop environment to meet capacity needs or are looking where to start on your Hadoop journey, the Dell EMC Ready Bundle for Hortonworks Hadoop with Isilon is a great fit. Here is the Ready Bundle Solution Overview for you to learn more about this great solution.

 

 

How Schema On Read vs. Schema On Write Started It All

Thomas Henson

Thomas Henson

Unstructured Data Engineer and Hadoop Black Belt at Dell EMC
Thomas Henson is a blogger, author, and podcaster in the Big Data Analytics Community. He is an Unstructured Data Engineer and Hadoop Black Belt at Dell EMC. Previously he worked helping Federal sector customers build their first Hadoop clusters. Thomas has been involved in the Hadoop Community since the early Hadoop 1.0 days. Connect with him @henson_tm.
Thomas Henson
Thomas Henson

Article originally appeared as Schema On Read vs. Schema On Write Explained.

Schema On Read vs. Schema On Write

What’s the difference between Schema on read vs. Schema on write?

How did Schema on read shift the way data is stored?

Since the inception of Relational Databases in the 70’s, schema on write has be the defacto procedure for storing data to be analyzed. However recently there has been a shift to use a schema on read approach, which has led to the exploding popularity of Big Data platforms and NoSQL databases. In this post let’s take a deep dive into what are the differences between schema on read vs. schema on write.

What is Schema On Write

Schema on write is defined as creating a schema for data before writing into the database. If you have done any kind of development with a database you understand the structured nature of Relational Database(RDBMS) because you have used Structured Query Language (SQL) to read data from the database.

One of the most time consuming task in a RDBMS  is doing Extract Transform Load (ETL) work. Remember just because the data is structured doesn’t mean it starts out that way. Most of the data that exist is in an unstructured fashion. Not only do you have to define the schema for the data but you must also structure it based on that schema.

For example (more…)

Architecture Changes in a Bound vs. Unbound Data World

Thomas Henson

Thomas Henson

Unstructured Data Engineer and Hadoop Black Belt at Dell EMC
Thomas Henson is a blogger, author, and podcaster in the Big Data Analytics Community. He is an Unstructured Data Engineer and Hadoop Black Belt at Dell EMC. Previously he worked helping Federal sector customers build their first Hadoop clusters. Thomas has been involved in the Hadoop Community since the early Hadoop 1.0 days. Connect with him @henson_tm.
Thomas Henson
Thomas Henson

Originally posted as Bound vs. Unbound Data in Real Time Analytics.

Breaking The World of Processing

Streaming and Real-Time analytics are pushing the boundaries of our analytic architecture patterns. In the big data community we now break down analytics processing into batch or streaming. If you glance at the top contributions most of the excitement is on the streaming side (Apache Beam, Flink, & Spark).

What is causing the break in our architecture patterns?

A huge reason for the break in our existing architecture patterns is the concept of Bound vs. Unbound data. This concept is as fundamental as the Data Lake or Data Hub and we have been dealing with it long before Hadoop. Let’s break down both Bound and Unbound data.

Bound vs. Unbound Data (more…)

Distributed Analytics Meets Distributed Data with a World Wide Herd

Jean Marie Martini

Jean Marie Martini

Director, Data Analytics Portfolio Messaging and Strategy at Dell EMC
Jean Marie Martini is a Director of messaging and strategy across the data analytics portfolio at Dell EMC. Martini has been involved in data analytics for over ten years. Today the focus is on communicating the value of the Dell EMC solutions to enable customers to begin and advance their data analytics journeys to transform their organizations into data-driven businesses. You can follow Martini on Twitter @martinij.

Originally posted on CIO.com by Patricia Florissi, Ph.D.

What is a World Wide Herd (WWH)?

What does it mean to have “Distributed analytics meet distributed data?” In short, it means having a group of industry experts, in this case a group given the title of World Wide Herd, to form a global virtual computing cluster. The WWH concept creates a global network of distributed Apache™ Hadoop® instances to form a single virtual computing cluster that brings analytics capabilities to the data. In a recent CIO.com blog, Patricia Florissi, Ph.D., vice president and global CTO for sales and a distinguished engineer for Dell EMC, details how this approach enables analysis of geographically dispersed data, without requiring the data to be moved to a single location before analysis. (more…)