Posts Tagged ‘machine learning’

Getting started with machine-generated data

Brett Roberts

Brett Roberts

Data Analytics Systems Engineer at Dell EMC
Brett is the Technical Lead for Dell EMC’s Data Analytics Technology Alliances, focused on developing solutions that help customers solve their data challenges. You can find him on social media at @Broberts2261
Brett Roberts

Latest posts by Brett Roberts (see all)

By Brett Roberts with Debra Slapak

We are literally surrounded by data generated from devices and other machines—things like the phones in our pockets, vehicle sensors, the ATM at our favorite spot, cameras on the street, even the thermostats and appliances in our homes. As consumers, we benefit from insights generated when this data is analyzed and put to work for us. This, ideally, protects us or makes us more loyal to the companies that provide better experiences or outcomes for us.

Increasingly, business, government and non-profit organizations alike are generating, capturing, and analyzing massive amounts of machine-generated data to help them improve operational efficiency and customer experience. This process may look simple from the consumer side, but the reality is that transforming business and operating models using machine-generated data can be challenging. The data itself is typically a mix of structured, unstructured or semi-structured data from a wide variety of sources, and organizations often to struggle with how best to collect and analyze it.

To address these problems, Dell EMC and Splunk formed a strategic partnership to architect, test and validate solutions that combine Splunk on Dell EMC infrastructure. The work our teams do together simplifies decision-making and deployment of analytics solutions for machine-generated data, so our customers can focus on better experiences and outcomes for their customers.

Splunk is a platform for real-time operational intelligence using machine-generated data. It enables organizations to search, analyze and visualize those massive streams of machine data generated using highly-optimized IT systems and technology infrastructure—physical, virtual and in the cloud. Within just a few years, Splunk has emerged as one of the fastest-growing data-focused platforms, used by more than 75 of the Fortune 100 companies to extract value from machine-generated data. Some have called Splunk the “easy button” for analytics, because it quickly collects and analyzes all varieties of data whether it’s big data, fast data, the Internet of Things (IoT) sensor data, cyber-security streaming data, or sentiment analysis data from social media. Many attribute Splunk’s rapid success to its simplified end-to-end platform, which enables users to collect data from anywhere with its universal forwarding and indexing technology, as well as its ability to search and analyze data using “schema on the fly” technology, all resulting in the delivery of real-time insights and accelerated time to value.

To support the performance and evolving storage demands associated with creating actionable information using machine-generated data, Splunk requires powerful and flexible infrastructure that:

  • Provides processor and memory configurations based on Splunk recommendations (Splunk has a great document on this for different deployment needs.)
  • Enables a flexible, scale-out capacity consumption mode
  • Includes data services like data reduction (deduplication or compression) and encryption
  • Delivers cost-effective and optimized tiered storage for hot, warm, and cold data
  • Is optimized and validated by Splunk- to meet or exceed pre-determined reference hardware requirements

Let’s look at an example.

One of the world’s largest logistics companies recently embarked on a data journey to take control of fast, diverse and large amounts of machine-generated data. The company has planes, trucks, scanners, and warehouses, all creating enormous amounts of data, as much as multi-TBs per day. In this competitive industry where minutes matter, the risk of not harnessing data for a multitude of insights can mean the difference between success and failure. With so many machines generating data, capturing and leveraging that data can be massively complex.  Here is where the Splunk platform has delivered the power, flexibility, scalability and speed they need to tackle these challenges.

Splunk is an important half of the equation. The other half is ensuring that the infrastructure running Splunk will optimize Splunk operations in this environment. This means having a correctly sized configuration to support multi-TB-per day ingestion, with a scale-out architecture that grows as Splunk use cases expand and as data ingestion grows. Using Splunk on optimized, Splunk-validated infrastructure that provides powerful data services and cost effective tiering, our customer is now well on the journey to proactive insights that will drive their business farther and faster.

The figure below summarizes the key requirements for Splunk-optimized infrastructure.

 

Deploying Splunk on proven architectures that have the attributes shown in the above figure helps Splunk run efficiently and scale easily as Splunk usage evolves in an organization. This is where Dell EMC comes in. Dell EMC’s portfolio of technologies are a proven landing spot for Splunk workloads. To see many of the documented solutions that have been implemented over the past year, visit the Dell EMC partner page on Splunk.com. The strength of the partnership has led to the development of jointly validated solutions for Splunk. These solutions meet or exceed Splunk performance benchmarks, based on their documented reference hardware. The solutions (linked below) have been configured for all types of deployment needs and use cases. With these solutions, organizations reduce complexity and risk associated with do-it-yourself solutions and speed time to value and insights in Splunk deployments.

Deploying Splunk on Converged Infrastructure with Dell EMC Vblock540

Deploying Splunk on Dell EMC Scale-Out Hyper-converged Infrastructure

If you refer back to the checklist above, you’ll find that the Dell EMC | Splunk solutions cover the requirements listed: Proper processing and computer sizing, scale-out architecture and cost-effective tiering coupled with highly advanced data services.

Machine generated data is everywhere and has tremendous potential value. Don’t miss out on the chance to capitalize on it. Dell EMC solutions for Splunk are ideal for getting started and, as you scale, you’ll be confident knowing the solutions will scale with you.

 

 

Is It All About The Data Scientist?

Mona Patel

Senior Manager, Big Data Solutions Marketing at EMC
Mona Patel is a Senior Manager for Big Data Marketing at EMC Corporation. With over 15 years of working with data at The Department of Water and Power, Air Touch Communications, Oracle, and MicroStrategy, Mona decided to grow her career at EMC, a leader in Big Data.

The answer is no. It is a holistic, team effort that involves expanding the mind and skill set of executives, business users, IT implementers, data scientists, and application developers to all work collectively to define a strategy and derive newer insight from big data.

And that is why EMC is so heavily focused on breaking down organizational silos and training professionals to become data scientists or at least think like data scientists, transforming these individuals into data savvy professionals working towards the same goal – competitive advantage.

I spoke to Louis Frolio, Advisory Technical Ed Consultant for EMC Big Data Solutions, how as part of a team in EMC Education Services is creating a massive professional transformation through a MOOC – Massive Open Online Course. Data Lakes for Big Data MOOC gives you an opportunity to become a data savvy professional and take on a big data or data science role in your organization at absolutely no cost.

The course kicked off May 11, but you still have plenty of time to enroll and complete the course to earn a certificate before June 8. The top 500 students (based on cumulative grade for the MOOC) will receive an electronic copy of the Data Science book just released by EMC Education Services.

1.  What is a MOOC and what is the goal of this education format? Why was it used for this course?

(more…)

Want To Build A Data Science Team?

Mona Patel

Senior Manager, Big Data Solutions Marketing at EMC
Mona Patel is a Senior Manager for Big Data Marketing at EMC Corporation. With over 15 years of working with data at The Department of Water and Power, Air Touch Communications, Oracle, and MicroStrategy, Mona decided to grow her career at EMC, a leader in Big Data.

EMC Offers a Holistic Approach to data science. Many of our customers invest in big data solutions to target their sales prospects better, explore advanced medical research, and make their internal processes more efficient. The biggest obstacle to getting these initiatives out of the gate is the shortage of big data skills within their own firms and across the industry.

To address this skills gap, EMC has developed a thorough data science and big data analytics curriculum for our customers. EMC was one of the first companies to offer data science education with rigorous, live instruction using free and open source tools. As of today, more than 10,000 customers, partners, and college students have attended the training.

data_science_book_top_banner_image_973x300

I spoke with EMC’s David Dietrich, who leads this unique program to discuss his approach to data science education, which differs from more traditional product-oriented education. What I found most interesting is that in addition to David’s work at EMC, he has also helped design big data analytics curricula for Babson College and other universities.  More recently,  David has published a book, Data Science and Big Data Analytics, to help further develop data science skills and expertise in the industry.

1.  Why is EMC pushing so hard to educate and develop data scientists?

As an information company, we’re extremely attuned to the value of big data, which is exploding in both the sheer amount and how organizations in virtually every field and industry are using it to solve critical problems. When EMC acquired our first big data company, Greenplum, several years ago, we quickly became aware that there was a shortage of people who had the data science and business skills to help companies utilize big data.

2.  How is EMC taking a holistic approach to data science education?

We recognize that learning how to use big data technology alone does not ensure success. Senior management must make sure that appropriate people and processes are in place to drive the change and innovation necessary for valuable big data results to occur. To help companies on their journey, we offer courses for data scientists, who execute big data projects, and business executives who sponsor, run and manage them.

Our goal is to educate all levels of an organization so that data scientists and business people understand one another. That way, the organization is able to roll out big data projects with greater adoption and success. In addition to offering courses to our customers, we also work closely with universities and educational institutions to help them develop their own curriculum and programs.

3.  Please describe some of the important skills for aspiring data scientists.

Working in strategy and analytics for the past 20 years, I’ve always been drawn to experimenting with data to solve problems, which is exactly is the mindset you need to tackle big data. Companies often ask me how to go about using massive amounts of structured and unstructured data to solve business problems. How do they know what to choose and ignore? How do they know what algorithms to apply? Our courses encourage a culture of experimentation that leads to answering these questions. We teach our students how to test an idea with data, measure it quantitatively, learn from it and iterate. This test and learn mindset is critical to becoming a talented data scientist and data-driven organization.

4.  What are some of the challenges with evolving into a data-driven organization?

There can be a substantial divide between data scientists and business people who manage and work with them on big data projects. Many business people lack the technical background to understand how the algorithms apply to the problem and how to test ideas with data. And some data scientists may not understand the business context. We’re trying to educate each side so they can get a clearer picture and drive toward common goals. Once you bridge that gap, you can start driving real change, and solving old problems with big data or new information sources that were once unusable.

5.  What should companies expect after they have successfully made the leap to big data?

We’re educating them in how to train and staff a big data team, as well as build processes to be effective and successful. With this approach, companies can more effectively define the business problem, acquire the right data sets, experiment, communicate the results, and finally, operationalize the new processes.

Follow Dell EMC

Dell EMC Big Data Portfolio

See how the Dell EMC Big Data Portfolio can make a difference for your analytics journey

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Dell EMC Community Network

Participate in the Everything Big Data technical community

Follow us on Twitter