Archive for the ‘Data Analytics’ Category

Democratizing Artificial Intelligence, Deep Learning and Machine Learning with Dell EMC Ready Solutions

Bill Schmarzo

Bill Schmarzo

CTO, Dell EMC Services (aka “Dean of Big Data”)
Bill Schmarzo, author of “Big Data: Understanding How Data Powers Big Business” and “Big Data MBA: Driving Business Strategies with Data Science”, is responsible for setting strategy and defining the Big Data service offerings for Dell EMC’s Big Data Practice. As a CTO within Dell EMC’s 2,000+ person consulting organization, he works with organizations to identify where and how to start their big data journeys. He’s written white papers, is an avid blogger and is a frequent speaker on the use of Big Data and data science to power an organization’s key business initiatives. He is a University of San Francisco School of Management (SOM) Executive Fellow where he teaches the “Big Data MBA” course. Bill also just completed a research paper on “Determining The Economic Value of Data”. Onalytica recently ranked Bill as #4 Big Data Influencer worldwide. Bill has over three decades of experience in data warehousing, BI and analytics. Bill authored the Vision Workshop methodology that links an organization’s strategic business initiatives with their supporting data and analytic requirements. Bill serves on the City of San Jose’s Technology Innovation Board, and on the faculties of The Data Warehouse Institute and Strata. Previously, Bill was vice president of Analytics at Yahoo where he was responsible for the development of Yahoo’s Advertiser and Website analytics products, including the delivery of “actionable insights” through a holistic user experience. Before that, Bill oversaw the Analytic Applications business unit at Business Objects, including the development, marketing and sales of their industry-defining analytic applications. Bill holds a Masters Business Administration from University of Iowa and a Bachelor of Science degree in Mathematics, Computer Science and Business Administration from Coe College.

Artificial Intelligence (AI), Machine Learning (ML) and Deep Learning (DL) are at the heart of digital transformation by enabling organizations to exploit their growing wealth of big data to optimize key business and operational use cases.

• AI is the theory and development of computer systems able to perform tasks normally requiring human intelligence (e.g. visual perception, speech recognition, translation between languages, etc.).
• ML is a sub-field of AI that provides systems the ability to learn and improve by itself from experience without being explicitly programmed.
• DL is a type of ML built on a deep hierarchy of layers, with each layer solving different pieces of a complex problem. These layers are interconnected into a “neural network.” A DL framework is SW that accelerates the development and deployment of these models.

See “Artificial Intelligence is not Fake Intelligence” for more details on AI | ML | DL.

And the business ramifications are staggering (see Figure 1)!

Figure 1: Source : McKinsey

And Senior Executives seem to have gotten the word.  BusinessWeek (October 23, 2017) reported a dramatic increase in mentions of  (more…)

Scientific Method: Embrace the Art of Failure

Bill Schmarzo

Bill Schmarzo

CTO, Dell EMC Services (aka “Dean of Big Data”)
Bill Schmarzo, author of “Big Data: Understanding How Data Powers Big Business” and “Big Data MBA: Driving Business Strategies with Data Science”, is responsible for setting strategy and defining the Big Data service offerings for Dell EMC’s Big Data Practice. As a CTO within Dell EMC’s 2,000+ person consulting organization, he works with organizations to identify where and how to start their big data journeys. He’s written white papers, is an avid blogger and is a frequent speaker on the use of Big Data and data science to power an organization’s key business initiatives. He is a University of San Francisco School of Management (SOM) Executive Fellow where he teaches the “Big Data MBA” course. Bill also just completed a research paper on “Determining The Economic Value of Data”. Onalytica recently ranked Bill as #4 Big Data Influencer worldwide. Bill has over three decades of experience in data warehousing, BI and analytics. Bill authored the Vision Workshop methodology that links an organization’s strategic business initiatives with their supporting data and analytic requirements. Bill serves on the City of San Jose’s Technology Innovation Board, and on the faculties of The Data Warehouse Institute and Strata. Previously, Bill was vice president of Analytics at Yahoo where he was responsible for the development of Yahoo’s Advertiser and Website analytics products, including the delivery of “actionable insights” through a holistic user experience. Before that, Bill oversaw the Analytic Applications business unit at Business Objects, including the development, marketing and sales of their industry-defining analytic applications. Bill holds a Masters Business Administration from University of Iowa and a Bachelor of Science degree in Mathematics, Computer Science and Business Administration from Coe College.

I use the phrase “fail fast / learn faster” to describe the iterative nature of the data science exploration, testing and validation process.  In order to create the “right” analytic models, the data science team will go through multiple iterations testing different variables, different data transformations, different data enrichments and different analytic algorithms until they have failed enough times to feel “comfortable” with the model that they have developed.

However an early variant of this process has been employed a long time: it’s called the Scientific Method. The scientific method is a body of techniques for investigating phenomena, acquiring new knowledge, or correcting and integrating previous knowledge. To be termed scientific, a method of inquiry is commonly based on empirical or measurable evidence subject to specific principles of reasoning[1] (see Figure 1).

Figure 1: The Scientific Method

The Scientific Method is comprised of the following components: (more…)

Get even more choice with the Ready Bundle for Hortonworks with Isilon

Brett Roberts

Brett Roberts

Data Analytics Systems Engineer at Dell EMC
Brett is the Technical Lead for Dell EMC’s Data Analytics Technology Alliances, focused on developing solutions that help customers solve their data challenges. You can find him on social media at @Broberts2261

Earlier this month marked the 1 year anniversary of Dell Technologies and the coming together of Dell and EMC. Looking back, it has truly been a great year with a lot of bright spots to reflect on. I am most excited about how we have been able to bring together two powerful product portfolios to create choice and value through unique solutions we now build for our customers. This can be seen across the company as our new portfolio drives increased opportunities to meet specific customer needs like creating more value add solutions for specific workloads. As a data analytics junkie, one that is near and dear to my heart is the recently released Dell EMC Ready Bundle for Hortonworks with Isilon Shared Storage.

You might ask “Why is this so important”? First, this is a Ready Bundle and a part of the Ready Solutions family meaning you reduce your deployment risks and speed up your time in value. If you aren’t sure what Ready Solutions are then here is a Whitepaper from IDG.  Secondly, this new Ready Bundle with Isilon extends flexibility for the user more than ever before. As a heritage Dell offering, the Dell EMC Ready Bundles for Hadoop have been around for years but traditionally they have been designed on PowerEdge servers. When you needed to scale your environment you would need to scale both compute and storage together; not a bad thing for many customers and deployments of these Ready Bundles have been outstanding. Now however, with heritage EMC’s Isilon added to the Ready Solution cadre of technologies, we offer organization the choice to decouple storage from compute and scale independently these two distinct components while delivering world class data services that have earned Isilon the top spot on Gartner’s Magic Quadrant for Scale-Out File and Object storage. We generally find this is a great option for Hadoop deployments where capacity requirements are growing much more rapidly than processing requirements.

In addition to the increased choice and data services that you get with Isilon, you still enjoy all of the benefits of the other Ready Solutions for Hortonworks Hadoop. This solution has been tested and validated for Hortonworks HDP by both Dell EMC and Hortonworks. Dell EMC and Hortonworks
have continued to strengthen their partnership over the years and this is yet another example of how we have come together to provide a unique, integrated solution to meet customers’
needs. Both Dell EMC and Hortonworks are excited about how this new Ready Bundle will help drive even more business outcomes with customers achieving success with Hadoop much more quickly. Jeff Schmitt, Hortonworks’ Sr. Director of Channels and Alliances had this to say about the Ready Bundle “The Ready Bundle for Hortonworks is yet another example of joint Dell EMC and Hortonworks investment bringing increased value to customers. As HDP deployments continue to grow in scale, offering customers choice in their infrastructure deployments is critical. The Ready Bundle for Hortonworks provides a customer simplified deployment while allowing storage and compute to scale independently.”

This new Ready Bundle release is the epitome of the value that this merger has created. If you find yourself having to scale your Hadoop environment to meet capacity needs or are looking where to start on your Hadoop journey, the Dell EMC Ready Bundle for Hortonworks Hadoop with Isilon is a great fit. Here is the Ready Bundle Solution Overview for you to learn more about this great solution.

 

 

How Schema On Read vs. Schema On Write Started It All

Thomas Henson

Thomas Henson

Unstructured Data Engineer and Hadoop Black Belt at Dell EMC
Thomas Henson is a blogger, author, and podcaster in the Big Data Analytics Community. He is an Unstructured Data Engineer and Hadoop Black Belt at Dell EMC. Previously he worked helping Federal sector customers build their first Hadoop clusters. Thomas has been involved in the Hadoop Community since the early Hadoop 1.0 days. Connect with him @henson_tm.
Thomas Henson
Thomas Henson

Article originally appeared as Schema On Read vs. Schema On Write Explained.

Schema On Read vs. Schema On Write

What’s the difference between Schema on read vs. Schema on write?

How did Schema on read shift the way data is stored?

Since the inception of Relational Databases in the 70’s, schema on write has be the defacto procedure for storing data to be analyzed. However recently there has been a shift to use a schema on read approach, which has led to the exploding popularity of Big Data platforms and NoSQL databases. In this post let’s take a deep dive into what are the differences between schema on read vs. schema on write.

What is Schema On Write

Schema on write is defined as creating a schema for data before writing into the database. If you have done any kind of development with a database you understand the structured nature of Relational Database(RDBMS) because you have used Structured Query Language (SQL) to read data from the database.

One of the most time consuming task in a RDBMS  is doing Extract Transform Load (ETL) work. Remember just because the data is structured doesn’t mean it starts out that way. Most of the data that exist is in an unstructured fashion. Not only do you have to define the schema for the data but you must also structure it based on that schema.

For example (more…)

Follow Dell EMC

Dell EMC Big Data Portfolio

See how the Dell EMC Big Data Portfolio can make a difference for your analytics journey

Dell EMC Community Network

Participate in the Everything Big Data technical community