95% of clients starting Hadoop projects don’t have an established use case; therefore, selecting the right distribution will probably be a shot in the dark. You may start off with Hortonworks for a dev/test environment, but then realize that Pivotal HD is a better choice for enterprise-class deployment. The good news is that if you start your Hadoop project using EMC Isilon scale-out NAS, you have zero data migration when moving from one Hadoop distribution to another. In fact, you can run multiple Hadoop distributions against the same data – no duplication of data required.
All this makes sense to me. Utilize Isilon scale-out NAS as the native storage layer for Hadoop, making the entire Hadoop environment more flexible. But wait, there’s more. Using Isilon storage with Hadoop instead of a traditional DAS configuration makes the entire Hadoop environment easier and faster to deploy, reliable, and in some cases, a lower TCO than DAS.
De-coupling the Hadoop compute and storage layer may lead you to believe there is a performance hit. Not true. You can expect up to 100GB/s of concurrent throughput on the Hadoop storage layer with Isilon. Additionally, by off-loading storage-related HDFS overhead to Isilon, Hadoop compute farms can be better utilized for performing more analysis jobs instead of managing local storage.
You may think I am biased towards Isilon because I do Big Data Marketing for EMC. Not true. I genuinely believe Isilon is a better choice for Hadoop than traditional DAS for the reasons listed in the table below and based on my interview with Ryan Peterson, Director of Solutions Architecture at Isilon.
The minute we happily turn on our electronic devices, we voluntarily expose ourselves. Sure, we think our emails and transactions are encrypted and unreadable, our location services are turned off, and our FB settings do not allow unauthorized users to see our most private thoughts. But the truth of the matter is that we all know, deep down inside, that nothing is foolproof and that there are people and bots out there capable of accessing our personal data. The NSA unfortunately did not have the class and expertise to secretly hack into systems unnoticed like the other bad guys, but instead, asked for it directly which makes it unexpected and worse in the eyes of democracy.
As an optimist, I personally think the U.S. government embracing Big Data is a good thing. Perhaps the whole NSA Prism debacle can be thought of as a New Age Search Warrant, commanding that all necessary data in the universe be collected and analyzed for the greater good. For those of you outraged by this breach in privacy, have you stood by your beliefs and terminated your Verizon cell phone service, deleted your FB account, and eliminated all email communications? My guess is no because you probably don’t actually know the extent of the NSA violation and the digital world is far more interesting than personal privacy. So before you go out protesting or draw conclusions based on the Patriot Act, read this interview with Technology Evangelist and Big Data influencer Theo Priestley to understand exactly how much power Prism holds with your personal data.
1. What sources of data has the NSA secured in Prism without our consent? And how is this different from corporations and data brokers misusing our personal data?