Posts Tagged ‘splunk’

Getting started with machine-generated data

Brett Roberts

Brett Roberts

Data Analytics Systems Engineer at Dell EMC
Brett is the Technical Lead for Dell EMC’s Data Analytics Technology Alliances, focused on developing solutions that help customers solve their data challenges. You can find him on social media at @Broberts2261
Brett Roberts

Latest posts by Brett Roberts (see all)

By Brett Roberts with Debra Slapak

We are literally surrounded by data generated from devices and other machines—things like the phones in our pockets, vehicle sensors, the ATM at our favorite spot, cameras on the street, even the thermostats and appliances in our homes. As consumers, we benefit from insights generated when this data is analyzed and put to work for us. This, ideally, protects us or makes us more loyal to the companies that provide better experiences or outcomes for us.

Increasingly, business, government and non-profit organizations alike are generating, capturing, and analyzing massive amounts of machine-generated data to help them improve operational efficiency and customer experience. This process may look simple from the consumer side, but the reality is that transforming business and operating models using machine-generated data can be challenging. The data itself is typically a mix of structured, unstructured or semi-structured data from a wide variety of sources, and organizations often to struggle with how best to collect and analyze it.

To address these problems, Dell EMC and Splunk formed a strategic partnership to architect, test and validate solutions that combine Splunk on Dell EMC infrastructure. The work our teams do together simplifies decision-making and deployment of analytics solutions for machine-generated data, so our customers can focus on better experiences and outcomes for their customers.

Splunk is a platform for real-time operational intelligence using machine-generated data. It enables organizations to search, analyze and visualize those massive streams of machine data generated using highly-optimized IT systems and technology infrastructure—physical, virtual and in the cloud. Within just a few years, Splunk has emerged as one of the fastest-growing data-focused platforms, used by more than 75 of the Fortune 100 companies to extract value from machine-generated data. Some have called Splunk the “easy button” for analytics, because it quickly collects and analyzes all varieties of data whether it’s big data, fast data, the Internet of Things (IoT) sensor data, cyber-security streaming data, or sentiment analysis data from social media. Many attribute Splunk’s rapid success to its simplified end-to-end platform, which enables users to collect data from anywhere with its universal forwarding and indexing technology, as well as its ability to search and analyze data using “schema on the fly” technology, all resulting in the delivery of real-time insights and accelerated time to value.

To support the performance and evolving storage demands associated with creating actionable information using machine-generated data, Splunk requires powerful and flexible infrastructure that:

  • Provides processor and memory configurations based on Splunk recommendations (Splunk has a great document on this for different deployment needs.)
  • Enables a flexible, scale-out capacity consumption mode
  • Includes data services like data reduction (deduplication or compression) and encryption
  • Delivers cost-effective and optimized tiered storage for hot, warm, and cold data
  • Is optimized and validated by Splunk- to meet or exceed pre-determined reference hardware requirements

Let’s look at an example.

One of the world’s largest logistics companies recently embarked on a data journey to take control of fast, diverse and large amounts of machine-generated data. The company has planes, trucks, scanners, and warehouses, all creating enormous amounts of data, as much as multi-TBs per day. In this competitive industry where minutes matter, the risk of not harnessing data for a multitude of insights can mean the difference between success and failure. With so many machines generating data, capturing and leveraging that data can be massively complex.  Here is where the Splunk platform has delivered the power, flexibility, scalability and speed they need to tackle these challenges.

Splunk is an important half of the equation. The other half is ensuring that the infrastructure running Splunk will optimize Splunk operations in this environment. This means having a correctly sized configuration to support multi-TB-per day ingestion, with a scale-out architecture that grows as Splunk use cases expand and as data ingestion grows. Using Splunk on optimized, Splunk-validated infrastructure that provides powerful data services and cost effective tiering, our customer is now well on the journey to proactive insights that will drive their business farther and faster.

The figure below summarizes the key requirements for Splunk-optimized infrastructure.


Deploying Splunk on proven architectures that have the attributes shown in the above figure helps Splunk run efficiently and scale easily as Splunk usage evolves in an organization. This is where Dell EMC comes in. Dell EMC’s portfolio of technologies are a proven landing spot for Splunk workloads. To see many of the documented solutions that have been implemented over the past year, visit the Dell EMC partner page on The strength of the partnership has led to the development of jointly validated solutions for Splunk. These solutions meet or exceed Splunk performance benchmarks, based on their documented reference hardware. The solutions (linked below) have been configured for all types of deployment needs and use cases. With these solutions, organizations reduce complexity and risk associated with do-it-yourself solutions and speed time to value and insights in Splunk deployments.

Deploying Splunk on Converged Infrastructure with Dell EMC Vblock540

Deploying Splunk on Dell EMC Scale-Out Hyper-converged Infrastructure

If you refer back to the checklist above, you’ll find that the Dell EMC | Splunk solutions cover the requirements listed: Proper processing and computer sizing, scale-out architecture and cost-effective tiering coupled with highly advanced data services.

Machine generated data is everywhere and has tremendous potential value. Don’t miss out on the chance to capitalize on it. Dell EMC solutions for Splunk are ideal for getting started and, as you scale, you’ll be confident knowing the solutions will scale with you.



Big Data Conversation with Splunk

Erin K. Banks

Erin K. Banks

Portfolio Marketing Director at Dell EMC
Erin K. Banks has been in the IT industry for almost 20 years. She is the Portfolio Marketing Director for Big Data and Data Analytics at Dell EMC. Previously she worked at Juniper Networks in Technical Marketing for the Security Business Unit. She has also worked at VMware and EMC as an SE in the Federal Division, focused on Virtualization and Security. She holds both CISSP and CISA accreditations. Erin has a BS in Electrical Engineering and is an author, blogger, and avid runner. You can find her on social media at @banksek
Erin K. Banks

I had the opportunity to talk with Jon Rooney, Senior Director, IT Solutions Marketing, from Splunk a couple of weeks ago. It was a great chance for me to know more about Splunk and of course I had to ask him his thoughts on Big Data. He was kind enough to allow our conversation to be a part of my Big Data Conversation series. Splunk

A little background about Splunk though Jon’s voice… Splunk helps you make sense of machine data and machine data is the largest and fastest growing component of Big Data. The most under used data comes from the massive amounts of data from applications, devices, servers, network end points and are often under-used because of how difficult is can be to capture, store and analyze using outdated. Our Big Data story is about real-time machine data. We keep your systems up and running and we keep you more secure.


EB: How does your company define Big Data?

JR: Splunk wouldn’t define it differently then anyone else. We believe that the jumping off point to Big Data is volume, velocity, and variety. All the data that is too unwieldy to put in to traditional databases and is difficult to keep up with.  The business press would discuss Big Data with Hadoop and state it was all about dumping together all your e-commerce and company transactions, and develop sentiment analysis about what people wrote on Twitter and product reviews. This is the human generated part of Big Data but the machine generated part of Big Data is actually the bigger portion of that and the harder to manage at scale. If you look at what people were doing with that data like pattern recognition, you can do that through batch but we focus on real-time data. Yes, we have that historical piece but It is much more valuable to do while it is happening then doing it post-mortem which is the traditional way of doing it.


EB: Do you feel the majority of organizations associate Big Data with Hadoop?

JR: I don’t think our customers do but the broader business and tech media, in the past 6 – 8 years, use “how does Amazon know what to recommend you” and “how does the CDC know that there are flu infections based on what they see on twitter”. Those are examples that ground Big Data vs “how do you look at millions of transactions through an API end point to see response time”. These are also Big Data examples and what Splunk does.


EB: How do you see Big Data changing in the future?

JR: People over time, as it becomes normalized, will see the scale of “big” change. The goalpost on what “big” means will move. People will remove the requirement that it is Big Data if you can’t cleanly fit in to a relationship database. Right now if you have to put it in to a NoSQL database, it is Big Data but that is not necessarily true. Right now there is a tight coupling between NoSQL databases and Big Data and I think that will change just as architectures change. You need to have the solution fit your architecture better and not because it handles petabytes of data. It now becomes another storage strategy that isn’t solely driven by volume, velocity, and variety. There are other architectural considerations that can help you make a decision.


EB: What is the biggest myth about Big Data?

JR:There are a lot. One of them is that not many people have figured it out and that there are only a handful of businesses that are driven by Big Data. There is the myth that people over estimate the sophistication of analysis done in Big Data, everyone thinks that everyone is doing what Amazon is doing when instead people are doing simple correlations.

Follow Dell EMC

Dell EMC Big Data Portfolio

See how the Dell EMC Big Data Portfolio can make a difference for your analytics journey

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Dell EMC Community Network

Participate in the Everything Big Data technical community

Follow us on Twitter