Unstructured Data Engineer and Hadoop Black Belt at Dell EMC
Thomas Henson is a blogger, author, and podcaster in the Big Data Analytics Community. He is an Unstructured Data Engineer and Hadoop Black Belt at Dell EMC. Previously he worked helping Federal sector customers build their first Hadoop clusters. Thomas has been involved in the Hadoop Community since the early Hadoop 1.0 days. Connect with him @henson_tm.
Streaming and Real-Time analytics are pushing the boundaries of our analytic architecture patterns. In the big data community we now break down analytics processing into batch or streaming. If you glance at the top contributions most of the excitement is on the streaming side (Apache Beam, Flink, & Spark).
What is causing the break in our architecture patterns?
A huge reason for the break in our existing architecture patterns is the concept of Bound vs. Unbound data. This concept is as fundamental as the Data Lake or Data Hub and we have been dealing with it long before Hadoop. Let’s break down both Bound and Unbound data.
Director, Data Analytics Portfolio Messaging and Strategy at Dell EMC
Jean Marie Martini is a Director of messaging and strategy across the data analytics portfolio at Dell EMC. Martini has been involved in data analytics for over ten years. Today the focus is on communicating the value of the Dell EMC solutions to enable customers to begin and advance their data analytics journeys to transform their organizations into data-driven businesses. You can follow Martini on Twitter @martinij.
Originally posted on CIO.com by Patricia Florissi, Ph.D.
What is a World Wide Herd (WWH)?
What does it mean to have “Distributed analytics meet distributed data?” In short, it means having a group of industry experts, in this case a group given the title of World Wide Herd, to form a global virtual computing cluster. The WWH concept creates a global network of distributed Apache™ Hadoop® instances to form a single virtual computing cluster that brings analytics capabilities to the data. In a recent CIO.com blog, Patricia Florissi, Ph.D., vice president and global CTO for sales and a distinguished engineer for Dell EMC, details how this approach enables analysis of geographically dispersed data, without requiring the data to be moved to a single location before analysis. (more…)
Nicholas Wakou is a Senior Principal Performance Engineer with the Dell EMC Open Source Solutions team. Nicholas's role, interest and activity is focused on the characterization and optimization of the performance of Dell EMC Cloud and Big Data solutions.
Nicholas has been involved and is engaged with Industry efforts to define performance benchmark specifications. He is active on the SPEC (www.spec.org) Cloud committee and several committees of the TPC (www.tpc.org). Nicholas represents Dell Technologies on the Board of Directors of the TPC and on its Technical Advisory Board (TAB). Previously, he was Chair of the TPC Public Relations standing committee.
Nicholas has an MS. Electrical Engineering from Oklahoma State University, MS. Microelectronics Technology from Middlesex University, London and a BSc. Electrical Engineering from Makerere University, Kampala, Uganda.
Dell EMC is focused on providing information that helps customers make the most of their big data technology investment. The failure rate for Hadoop big data projects is still too high given the maturity of the technology. Customers can’t afford to guess when designing and sizing a solution; they need to deliver optimal performance for their business use cases and to scale as needed. Dell EMC recently completed and published a new TPCx-BigBench (TPCx-BB) result that will help customers make the right choices for Hadoop performance and scalability. Today we are happy to announce that
Dell EMC is the industry leading supplier of hyper-converged, converged and “Ready” Solutions by many standards. Dell EMC’s tested and validated Ready Bundle for Cloudera Hadoop, together with the right performance benchmark results, takes the guess work out of Hadoop implementations.
The Transaction Processing Council (TPC) is a non-profit corporation founded (more…)
Brett is the Technical Lead for Dell EMC’s Data Analytics Technology Alliances, focused on developing solutions that help customers solve their data challenges. You can find him on social media at @Broberts2261
Operational Intelligence and machine generated data have been very hot topics lately as organizations are beginning to realize how valuable this data is for the business. For the last few years, Splunk has been the leader in this space with their all-encompassing platform that enables the ability to collect, search and analyze machine generated data. (Not up to speed on this yet? Check out my other blog on getting started with machine generated data) Dell EMC and Splunk have had a tremendous partnership over the past couple years that is based on the premise that we offer market leading infrastructure that is optimal for Splunk’s world class analytics platform for machine generated data. A couple weeks ago, we took this one step further… I’m excited to announce the release of the Solution Guide for Machine Analytics with Splunk Enterprise on VxRack Flex 1000! With this, Dell EMC now has a validated rack scale, hyper-converged infrastructure solution for Splunk that has been jointly validated by Splunk & Dell EMC.
Why is this important?
Having this solution that has been jointly validated by both Splunk and Dell EMC to “meet or exceed Splunk’s performance benchmarks” gives users a higher degree of confidence in the environment. With this solution the performance needed to run Splunk effectively and gain the valuable insights to make critical IT and business decisions will be there. Our solutions engineering team along with Splunk put hundreds of engineering hours into designing specific configurations based on a variety of different deployment scenarios and rigorously tested them to ensure performance. The solutions guide gives you not only those configurations but also implementation guidelines and deployment practices. All of this equals lower risk, quicker time to value and validated for performance…can’t ask for anything better.
How is VxRack Optimal for Splunk?
VxRack provides flexible, rack scale, hyper-converged infrastructure that allows you to use the hypervisor of your choice or bare metal as well as the ability to start small but scale-out to thousands of nodes. With VxRack you are given the flexibility to optimize your tiering for Splunk by putting Hot and Warm buckets in SSD while using HHD or even Isilon scale-out NAS for your cold bucket needs (Solution guide shows how to use Isilon for cold tiering). You also get to enjoy the benefits of Software Defined Storage and data services that are essential in today’s data center. The best part is that VxRack gives a turnkey experience that is engineered and designed to be ready to run, giving you a quicker time to insight and value. Additionally, with single support and life-cycle management for your infrastructure you lower complexity and reduce risk and costs. All of this equals great performance, economical tiering structure & easy to deploy and manage infrastructure that is validated to run Splunk.
The opinions and interests expressed on Dell EMC employee blogs are the employees' own and do not necessarily represent Dell EMC's positions, strategies or views. Dell EMC makes no representation or warranties about employee blogs or the accuracy or reliability of such blogs. When you access employee blogs, even though they may contain the Dell EMC logo and content regarding Dell EMC products and services, employee blogs are independent of Dell EMC and Dell EMC does not control their content or operation. In addition, a link to a blog does not mean that EMC endorses that blog or has responsibility for its content or use.