Recap – Strata+Hadoop World 2017 San Jose

Erin K. Banks

Erin K. Banks

Portfolio Marketing Director at Dell EMC
Erin K. Banks has been in the IT industry for almost 20 years. She is the Portfolio Marketing Director for Big Data and Data Analytics at Dell EMC. Previously she worked at Juniper Networks in Technical Marketing for the Security Business Unit. She has also worked at VMware and EMC as an SE in the Federal Division, focused on Virtualization and Security. She holds both CISSP and CISA accreditations. Erin has a BS in Electrical Engineering and is an author, blogger, and avid runner. You can find her on social media at @banksek
Erin K. Banks

I love the Strata + Hadoop World Conference (renamed Strata Data Conference) and once again the 2017 conference did not fail me in anyway. I wanted to take this opportunity to give you a quick recap since I had the privilege of attending.

I love how the keynotes are short and impactful. Wednesday delivered great insight from a conversation with Beau Cronin and Phil Keslin, CTO and Founder of Niantic. Another great session was from Rajiv Maheswaran from Second Spectrum. Niantic created Pokeman Go and some of the great insight that Phil brought was how they started with a strong architecture that they knew would scale. Pokemon GOThey had no idea that it would need to scale so fast or so soon. Although there were sleepless nights, they were prepared and able to solve both the compute and big data problems they encountered immediately. Rajiv’s topic was “When machines understand sports” and it was great to see how they were able to track the movement of the ball and players and how they could transform the data into facts about players, strategies and probabilities as well as changing the overall game. We watch a great deal of basketball, especially now with the NCAA Men and Women’s Championships going on, and to see data analytics applied to the game like never before is really cool. Is there anything that data analytics can’t impact?

hurricaneOn Thursday we got to see many of the good things that data analytics can help to achieve. For instance, Desiree Matel-Anderson from the Field Innovation Team talked about “Data in disasters: Saving lives and innovating in real time”. Desiree talked about hurricane Sandy and the Boston Marathon bombing as well as other recent events. She talked about how we can use social media and data analytics to determine the impact of the event and how they make it easier to better follow or respond to events in the future. For instance, looking at social media, they saw that people were tweeting for “help” after Sandy hit the coast, electricity was gone and the hurricane had subsided. There was no panic occurring during the storm hitting the land. These facts can help agencies like FEMA in future hurricanes help people feel more connected and can respond to them faster and with this historical data. Another great session was Maya Shankar who worked for the White House Office of Science and Technology Policy under President Barack Obama. She spoke about “Improving Public Policy with Behavioral Insights” and provided us with four conclusions as to the how those insights can guide next steps…

 

1) convert interest into impact

2) quantify impact

(3) celebrate small-wins

(4) generate organic buy-in

These conclusions are based on many years of work and delivered as proof to Washington DC insiders that “data tells you what people are doing” and “behavioral science tells you why.”

The rest of the day was filled with expo time and some great technical sessions. The majority of the topics for the conference focused on real-time machine learning and Artificial Intelligence (AI). In one presentation, Rob Craft from Google explained machine learning best… he said, machine learning is “one branch of the field of AI”, “a way of solving problems without explicitly codifying the solution”, and, last but not least, “a way of building systems that improve themselves over time.” Machine learning and AI are clearly the future with regard to data analytics and a great reason why they changed the name of the conference to Strata Data Conference. Don’t get me wrong… Apache Hadoop, Spark, Impala, Flink, Kafka, Beam, Apex, Kudo, etc are still being talked about. Data analytics means a lot of things to people and it varies in multiple aspects, but one thing that remains the same is the fact that data analytics drives change and impact. Whether it is changing our nation’s policies or making better video games, we were all at Strata + Hadoop World to learn more and use that information to make a difference. That’s why I love Strata + Hadoop World. So much to learn, so many people to learn from, and realizing that for once, our jobs are making an impact and what is better than that?

Getting started on your data analytics journey

Jean Marie Martini

Jean Marie Martini

Director, Data Analytics Portfolio Messaging and Strategy at Dell EMC
Jean Marie Martini is a Senior Consultant for messaging and strategy across the data analytics portfolio at Dell EMC. Martini has been involved in data analytics for over ten years and today focuses on communicating the value of the Dell EMC solutions to enable customers to begin their data analytics journey, to remain competitive throughout their journey, and to drive the insights that will transform their organizations into data-driven businesses. You can follow Martini on Twitter @martinij.
Jean Marie Martini

Latest posts by Jean Marie Martini (see all)

 

The data analytics journey begins with an understanding of use cases and solutions that can help an organization unlock the value of its data. This is the focus of two new Dell EMC resources.

In the course of my work with the Dell EMC data analytics program, I often talk with customers who are focused on extracting value from enormous amounts of data. That was certainly the case at the recent Strata + Hadoop World conference in San Jose. The conference center was filled with people looking for innovative ways to unlock the business value that is embedded in the data they capture from the Internet of Things, social media, their corporate systems and countless diverse sources.

While each organization comes at the problem from different industries, everyone shares the goal of using data analytics to gain business insights and capitalize on the digital transformation that is under way. People understand that their enterprise data warehouses and data lakes hold the keys to achieving closer customer relationships, operational efficiencies and competitive advantages. The question then becomes, “How do you get there?”

This topic is explored in two new Dell EMC resources for organizations looking to capitalize on data for analytics. One of these assets is a white paper that explores how companies in different industries are turning to data analytics, data lakes, and the Apache™ Hadoop® platform for data collection, management and analysis.

In this paper, titled “Leveraging Data Analytics to Gain Competitive Advantage in Your Industry,” we highlight examples of diverse industry-specific and cross-industry uses cases for data analytics solutions. These use cases are based on the collective experiences of Dell EMC and our partners Intel, Cloudera, and Hortonworks.

The second asset is a brochure that drills down into solutions for organizations that are ready to begin their data analytics journeys. This brochure, titled “Power New Possibilities: Solutions for Your Data Analytics Journey,” explains the capabilities and benefits of the Dell EMC options for organizations on this path.

As for those options, Dell EMC has your needs covered no matter where you are in your data analytics journey. These offerings, summarized in the brochure, include solutions for getting started with Hadoop, building a data lake for analytics, extending your analytics capabilities, and enabling and accelerating your journey.

Regardless of the path you’re on, Dell EMC can help your organization move forward with confidence. We can help you gain hands-on experience across many solutions, from initial briefings through a proof of concept and into a full production environment that leverages validated solutions and proven reference architectures.

We can also help you with the essential initial steps of aligning the goals of IT and the business to address a use case that will deliver measureable business value. For example, you might choose a marketing analytics solution that uses predictive modeling to help your sales team target the right customer at the right time. That’s a use case that we’ve put into action at Dell EMC. (Read the case study.)

While different organizations will target different needs, the key is to begin with a use case that will showcase the power of data analytics and generate measurable results — the return on information. From that starting point, you can grow over time into an organization that is truly data-driven and poised for success in the digital economy.

For a closer look at the ways that Dell EMC can help your organization unlock the value of your data, visit DellEMC.com/BigData.

Getting started with machine-generated data

Brett Roberts

Brett Roberts

Data Analytics Systems Engineer at Dell EMC
Brett is the Technical Lead for Dell EMC’s Data Analytics Technology Alliances, focused on developing solutions that help customers solve their data challenges. You can find him on social media at @Broberts2261
Brett Roberts

Latest posts by Brett Roberts (see all)

By Brett Roberts with Debra Slapak

We are literally surrounded by data generated from devices and other machines—things like the phones in our pockets, vehicle sensors, the ATM at our favorite spot, cameras on the street, even the thermostats and appliances in our homes. As consumers, we benefit from insights generated when this data is analyzed and put to work for us. This, ideally, protects us or makes us more loyal to the companies that provide better experiences or outcomes for us.

Increasingly, business, government and non-profit organizations alike are generating, capturing, and analyzing massive amounts of machine-generated data to help them improve operational efficiency and customer experience. This process may look simple from the consumer side, but the reality is that transforming business and operating models using machine-generated data can be challenging. The data itself is typically a mix of structured, unstructured or semi-structured data from a wide variety of sources, and organizations often to struggle with how best to collect and analyze it.

To address these problems, Dell EMC and Splunk formed a strategic partnership to architect, test and validate solutions that combine Splunk on Dell EMC infrastructure. The work our teams do together simplifies decision-making and deployment of analytics solutions for machine-generated data, so our customers can focus on better experiences and outcomes for their customers.

Splunk is a platform for real-time operational intelligence using machine-generated data. It enables organizations to search, analyze and visualize those massive streams of machine data generated using highly-optimized IT systems and technology infrastructure—physical, virtual and in the cloud. Within just a few years, Splunk has emerged as one of the fastest-growing data-focused platforms, used by more than 75 of the Fortune 100 companies to extract value from machine-generated data. Some have called Splunk the “easy button” for analytics, because it quickly collects and analyzes all varieties of data whether it’s big data, fast data, the Internet of Things (IoT) sensor data, cyber-security streaming data, or sentiment analysis data from social media. Many attribute Splunk’s rapid success to its simplified end-to-end platform, which enables users to collect data from anywhere with its universal forwarding and indexing technology, as well as its ability to search and analyze data using “schema on the fly” technology, all resulting in the delivery of real-time insights and accelerated time to value.

To support the performance and evolving storage demands associated with creating actionable information using machine-generated data, Splunk requires powerful and flexible infrastructure that:

  • Provides processor and memory configurations based on Splunk recommendations (Splunk has a great document on this for different deployment needs.)
  • Enables a flexible, scale-out capacity consumption mode
  • Includes data services like data reduction (deduplication or compression) and encryption
  • Delivers cost-effective and optimized tiered storage for hot, warm, and cold data
  • Is optimized and validated by Splunk- to meet or exceed pre-determined reference hardware requirements

Let’s look at an example.

One of the world’s largest logistics companies recently embarked on a data journey to take control of fast, diverse and large amounts of machine-generated data. The company has planes, trucks, scanners, and warehouses, all creating enormous amounts of data, as much as multi-TBs per day. In this competitive industry where minutes matter, the risk of not harnessing data for a multitude of insights can mean the difference between success and failure. With so many machines generating data, capturing and leveraging that data can be massively complex.  Here is where the Splunk platform has delivered the power, flexibility, scalability and speed they need to tackle these challenges.

Splunk is an important half of the equation. The other half is ensuring that the infrastructure running Splunk will optimize Splunk operations in this environment. This means having a correctly sized configuration to support multi-TB-per day ingestion, with a scale-out architecture that grows as Splunk use cases expand and as data ingestion grows. Using Splunk on optimized, Splunk-validated infrastructure that provides powerful data services and cost effective tiering, our customer is now well on the journey to proactive insights that will drive their business farther and faster.

The figure below summarizes the key requirements for Splunk-optimized infrastructure.

 

Deploying Splunk on proven architectures that have the attributes shown in the above figure helps Splunk run efficiently and scale easily as Splunk usage evolves in an organization. This is where Dell EMC comes in. Dell EMC’s portfolio of technologies are a proven landing spot for Splunk workloads. To see many of the documented solutions that have been implemented over the past year, visit the Dell EMC partner page on Splunk.com. The strength of the partnership has led to the development of jointly validated solutions for Splunk. These solutions meet or exceed Splunk performance benchmarks, based on their documented reference hardware. The solutions (linked below) have been configured for all types of deployment needs and use cases. With these solutions, organizations reduce complexity and risk associated with do-it-yourself solutions and speed time to value and insights in Splunk deployments.

Deploying Splunk on Converged Infrastructure with Dell EMC Vblock540

Deploying Splunk on Dell EMC Scale-Out Hyper-converged Infrastructure

If you refer back to the checklist above, you’ll find that the Dell EMC | Splunk solutions cover the requirements listed: Proper processing and computer sizing, scale-out architecture and cost-effective tiering coupled with highly advanced data services.

Machine generated data is everywhere and has tremendous potential value. Don’t miss out on the chance to capitalize on it. Dell EMC solutions for Splunk are ideal for getting started and, as you scale, you’ll be confident knowing the solutions will scale with you.

 

 

Strata+Hadoop World 2017 – San Jose

Erin K. Banks

Erin K. Banks

Portfolio Marketing Director at Dell EMC
Erin K. Banks has been in the IT industry for almost 20 years. She is the Portfolio Marketing Director for Big Data and Data Analytics at Dell EMC. Previously she worked at Juniper Networks in Technical Marketing for the Security Business Unit. She has also worked at VMware and EMC as an SE in the Federal Division, focused on Virtualization and Security. She holds both CISSP and CISA accreditations. Erin has a BS in Electrical Engineering and is an author, blogger, and avid runner. You can find her on social media at @banksek
Erin K. Banks

Another Strata+Hadoop World San Jose is upon us and some of us are extra excited because this is the first time we are in San Jose as Dell EMC. I was literally on a call today going over the logistics and a couple of people could not stop talking about the event and how it will be great that we get to be a part of it as Dell EMC. We are incredibly proud of the way the merger has come together and all that we have accomplished along the way. This time we get the opportunity to show you that we are not just an infrastructure company but a company that is focused on your business outcomes. We have worked with the full range of customers and helped them be successful across multiple disciplines.

Strata+Hadoop World 2017What is our main message? Analytics that Drive Business

The only way that you can solve your business problems is through data analytics. Your current and future data has the potential to answer these problems, but recognizing the questions is not that easy. How do you find these questions and how do you get a return on information, is the true question. We want to guide you through the maze and challenges of data analytics through our services and simple and complete offerings that take the guess work out of the struggle. We want to allow your business to focus on what’s most important, driving your business forward.

We have two sessions occurring at Strata+Hadoop World

Tuesday @ 1:30 Pm in room 210 B/F with Bill Schmarzo ( @schmarzo )

Wednesday @ 11:00 am in room 210 B/F with yours truly, Erin Banks ( @banksek )

We will be in booth 1409, which is to the right of the main entrance, so please come on by and let us prove to you that we are more than just infrastructure. Also follow us on @DellEMCbigdata to see how the conference is going.

Some additional information can be found at our Dell EMC events site

Follow Dell EMC

Dell EMC Big Data Portfolio

See how the Dell EMC Big Data Portfolio can make a difference for your analytics journey

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Dell EMC Community Network

Participate in the Everything Big Data technical community

Follow us on Twitter