Strata + Hadoop World New York is an event I look forward to attending each year. I like the sessions, speakers, and also seeing how the attendees and vendors have evolved over time. When I first began attending SHW 4 or 5 years ago, the attendees were only about 200 people, and this year in NYC they topped 6,300! As a result, the conference has had to move from the New York Hilton to the Javits Center, since it is such a huge event now.
As with Big Data itself, focusing on the sheer volume of conference attendees doesn’t tell the whole story. Here are a few of my observations from the conference.
- Early majority now joining the fray. As I mentioned, I’ve attended Strata for 4-5 years, and knowing that others also return every year, I expected more attendees at the sessions around emerging technologies that are gaining adoption, such as Spark or Kafka. Although this did occur, I underestimated the heavy attendance in sessions for people just attending Strata for the first time. This means the ‘beginner’ sessions were packed, for every person who’s been using Hadoop for years and now wants to experiment with Spark, there are still scores more trying to learn R or learn how to deploy Hadoop in a feasible way.
- Hadoop is entering an inflection point toward enterprise grade use. It’s been said that early on, Hadoop was like the iPad, people weren’t exactly sure what to do with it, but they still wanted it. My sense is at this point, most organizations have done something with Hadoop. Maybe they’ve done a project or two in a small cluster (5 – 25 nodes), or more. By now, people have a few years of experience under their belts and are grappling with making it into an enterprise grade technology deployment that IT understands and has comfort with. In order to make IT teams comfortable there are a host of related concerns to make Hadoop clusters enterprise grade, and people are now grappling with security, access, latency, and how it expands to support Business Intelligence use cases related to offloading workloads from an Enterprise Data Warehouse (EDW). (This is one of the reasons we develop and offer a Federation Business Data Lake, to address many of these points and make it easier for enterprises to work with Big Data in a controlled way.)
- Expanding Ecosystem. Now that the Hadoop ecosystem is evolving, there is a proliferation of niche players focused on specific needs and gaps across the data lifecycle. That is, people have done the “Hello World” of word counts and now realize they need more than “simple” MapReduce and Hadoop to fulfill their needs in a real life situation. At Strata + Hadoop World, these emerging vendors ranged from specialists on streaming data ingest, enabling collaboration across the enterprise, enhanced security, easier data visualization and more tools for business analysts to do data conditioning. To me, this reflects the evolving maturity of Hadoop, in that there are still multiple Hadoop distributions, but the emphasis is shifting to the myriad other things people need to do to use data effectively as an asset for the organization. It’s not just about Hadoop.
Tags: analytics, big data, data science, data scientist, hadoop