Architectural Tenets of Deep Learning

Keith Manthey

Keith Manthey

CTO - Unstructured Storage Division
Keith has spent 25+ years building distributing computing and high performance computing systems for the Financial Services industry and in support of the US Government. He built his first machine learning system in 2009 and has been fascinated by data driven technology since then. Keith holds 6 issued patents and a few still pending around distributed analytics and high performance computing. Keith holds degrees from Virginia Tech and the University of Georgia
Keith Manthey

Latest posts by Keith Manthey (see all)

Lately, I have spent large swaths of my time focused around Deep Learning and Neural Networks (either with customers or in our lab).   One of the most common questions that I get is around underperforming model training with regard to “wall clock time”.  This has more to do with focusing on only one aspect of their architecture, say GPUs. As such, I will spend a little time writing about the 3 fundamental tenets for a successful Deep Learning architecture.  These fundamental tenants are compute, file access, and bandwidth. Hopefully this will resonate and help provide some thoughts for those customers on their journey.

Overview

Deep Learning (DL) is certainly all the rage. We are defining DL as a type of Machine Learning (ML) built on a deep hierarchy of layers, with each layer solving different pieces of a complex problem. These layers are interconnected into a “neural network”.

The use cases that I am presented with continue to grow exponentially with very compelling financial return on investments. Whether it is Convolutional Neural Networks (CNNs) for Computer Vision or Recurrent Neural Networks (RNNs) for Natural Language Processing (NLP) or Deep Belief Networks (DBN) for Restricted Boltzmann Machines (RBMs), Deep Learning has many architectural structures and acronyms. There is some great Neural Network information out there.  Pic 1 is a good representation of the structural layers for Deep Learning on Neural Networks:

 

Pic 1

Orchestration

Orchestration tools like BlueData, Kubernetes, Mesosphere, or Spark Cluster Manager are the top of the layer cake of (more…)

Democratizing Artificial Intelligence, Deep Learning and Machine Learning with Dell EMC Ready Solutions

Bill Schmarzo

Bill Schmarzo

CTO, Dell EMC Services (aka “Dean of Big Data”)
Bill Schmarzo, author of “Big Data: Understanding How Data Powers Big Business” and “Big Data MBA: Driving Business Strategies with Data Science”, is responsible for setting strategy and defining the Big Data service offerings for Dell EMC’s Big Data Practice. As a CTO within Dell EMC’s 2,000+ person consulting organization, he works with organizations to identify where and how to start their big data journeys. He’s written white papers, is an avid blogger and is a frequent speaker on the use of Big Data and data science to power an organization’s key business initiatives. He is a University of San Francisco School of Management (SOM) Executive Fellow where he teaches the “Big Data MBA” course. Bill also just completed a research paper on “Determining The Economic Value of Data”. Onalytica recently ranked Bill as #4 Big Data Influencer worldwide. Bill has over three decades of experience in data warehousing, BI and analytics. Bill authored the Vision Workshop methodology that links an organization’s strategic business initiatives with their supporting data and analytic requirements. Bill serves on the City of San Jose’s Technology Innovation Board, and on the faculties of The Data Warehouse Institute and Strata. Previously, Bill was vice president of Analytics at Yahoo where he was responsible for the development of Yahoo’s Advertiser and Website analytics products, including the delivery of “actionable insights” through a holistic user experience. Before that, Bill oversaw the Analytic Applications business unit at Business Objects, including the development, marketing and sales of their industry-defining analytic applications. Bill holds a Masters Business Administration from University of Iowa and a Bachelor of Science degree in Mathematics, Computer Science and Business Administration from Coe College.

Artificial Intelligence (AI), Machine Learning (ML) and Deep Learning (DL) are at the heart of digital transformation by enabling organizations to exploit their growing wealth of big data to optimize key business and operational use cases.

• AI is the theory and development of computer systems able to perform tasks normally requiring human intelligence (e.g. visual perception, speech recognition, translation between languages, etc.).
• ML is a sub-field of AI that provides systems the ability to learn and improve by itself from experience without being explicitly programmed.
• DL is a type of ML built on a deep hierarchy of layers, with each layer solving different pieces of a complex problem. These layers are interconnected into a “neural network.” A DL framework is SW that accelerates the development and deployment of these models.

See “Artificial Intelligence is not Fake Intelligence” for more details on AI | ML | DL.

And the business ramifications are staggering (see Figure 1)!

Figure 1: Source : McKinsey

And Senior Executives seem to have gotten the word.  BusinessWeek (October 23, 2017) reported a dramatic increase in mentions of  (more…)

Scientific Method: Embrace the Art of Failure

Bill Schmarzo

Bill Schmarzo

CTO, Dell EMC Services (aka “Dean of Big Data”)
Bill Schmarzo, author of “Big Data: Understanding How Data Powers Big Business” and “Big Data MBA: Driving Business Strategies with Data Science”, is responsible for setting strategy and defining the Big Data service offerings for Dell EMC’s Big Data Practice. As a CTO within Dell EMC’s 2,000+ person consulting organization, he works with organizations to identify where and how to start their big data journeys. He’s written white papers, is an avid blogger and is a frequent speaker on the use of Big Data and data science to power an organization’s key business initiatives. He is a University of San Francisco School of Management (SOM) Executive Fellow where he teaches the “Big Data MBA” course. Bill also just completed a research paper on “Determining The Economic Value of Data”. Onalytica recently ranked Bill as #4 Big Data Influencer worldwide. Bill has over three decades of experience in data warehousing, BI and analytics. Bill authored the Vision Workshop methodology that links an organization’s strategic business initiatives with their supporting data and analytic requirements. Bill serves on the City of San Jose’s Technology Innovation Board, and on the faculties of The Data Warehouse Institute and Strata. Previously, Bill was vice president of Analytics at Yahoo where he was responsible for the development of Yahoo’s Advertiser and Website analytics products, including the delivery of “actionable insights” through a holistic user experience. Before that, Bill oversaw the Analytic Applications business unit at Business Objects, including the development, marketing and sales of their industry-defining analytic applications. Bill holds a Masters Business Administration from University of Iowa and a Bachelor of Science degree in Mathematics, Computer Science and Business Administration from Coe College.

I use the phrase “fail fast / learn faster” to describe the iterative nature of the data science exploration, testing and validation process.  In order to create the “right” analytic models, the data science team will go through multiple iterations testing different variables, different data transformations, different data enrichments and different analytic algorithms until they have failed enough times to feel “comfortable” with the model that they have developed.

However an early variant of this process has been employed a long time: it’s called the Scientific Method. The scientific method is a body of techniques for investigating phenomena, acquiring new knowledge, or correcting and integrating previous knowledge. To be termed scientific, a method of inquiry is commonly based on empirical or measurable evidence subject to specific principles of reasoning[1] (see Figure 1).

Figure 1: The Scientific Method

The Scientific Method is comprised of the following components: (more…)

Get even more choice with the Ready Bundle for Hortonworks with Isilon

Brett Roberts

Brett Roberts

Data Analytics Systems Engineer at Dell EMC
Brett is the Technical Lead for Dell EMC’s Data Analytics Technology Alliances, focused on developing solutions that help customers solve their data challenges. You can find him on social media at @Broberts2261

Earlier this month marked the 1 year anniversary of Dell Technologies and the coming together of Dell and EMC. Looking back, it has truly been a great year with a lot of bright spots to reflect on. I am most excited about how we have been able to bring together two powerful product portfolios to create choice and value through unique solutions we now build for our customers. This can be seen across the company as our new portfolio drives increased opportunities to meet specific customer needs like creating more value add solutions for specific workloads. As a data analytics junkie, one that is near and dear to my heart is the recently released Dell EMC Ready Bundle for Hortonworks with Isilon Shared Storage.

You might ask “Why is this so important”? First, this is a Ready Bundle and a part of the Ready Solutions family meaning you reduce your deployment risks and speed up your time in value. If you aren’t sure what Ready Solutions are then here is a Whitepaper from IDG.  Secondly, this new Ready Bundle with Isilon extends flexibility for the user more than ever before. As a heritage Dell offering, the Dell EMC Ready Bundles for Hadoop have been around for years but traditionally they have been designed on PowerEdge servers. When you needed to scale your environment you would need to scale both compute and storage together; not a bad thing for many customers and deployments of these Ready Bundles have been outstanding. Now however, with heritage EMC’s Isilon added to the Ready Solution cadre of technologies, we offer organization the choice to decouple storage from compute and scale independently these two distinct components while delivering world class data services that have earned Isilon the top spot on Gartner’s Magic Quadrant for Scale-Out File and Object storage. We generally find this is a great option for Hadoop deployments where capacity requirements are growing much more rapidly than processing requirements.

In addition to the increased choice and data services that you get with Isilon, you still enjoy all of the benefits of the other Ready Solutions for Hortonworks Hadoop. This solution has been tested and validated for Hortonworks HDP by both Dell EMC and Hortonworks. Dell EMC and Hortonworks
have continued to strengthen their partnership over the years and this is yet another example of how we have come together to provide a unique, integrated solution to meet customers’
needs. Both Dell EMC and Hortonworks are excited about how this new Ready Bundle will help drive even more business outcomes with customers achieving success with Hadoop much more quickly. Jeff Schmitt, Hortonworks’ Sr. Director of Channels and Alliances had this to say about the Ready Bundle “The Ready Bundle for Hortonworks is yet another example of joint Dell EMC and Hortonworks investment bringing increased value to customers. As HDP deployments continue to grow in scale, offering customers choice in their infrastructure deployments is critical. The Ready Bundle for Hortonworks provides a customer simplified deployment while allowing storage and compute to scale independently.”

This new Ready Bundle release is the epitome of the value that this merger has created. If you find yourself having to scale your Hadoop environment to meet capacity needs or are looking where to start on your Hadoop journey, the Dell EMC Ready Bundle for Hortonworks Hadoop with Isilon is a great fit. Here is the Ready Bundle Solution Overview for you to learn more about this great solution.

 

 

Follow Dell EMC

Dell EMC Big Data Portfolio

See how the Dell EMC Big Data Portfolio can make a difference for your analytics journey

Dell EMC Community Network

Participate in the Everything Big Data technical community