Author Archive

Erin K. Banks

Portfolio Marketing Director at Dell EMC
Erin K. Banks has been in the IT industry for almost 20 years. She is the Portfolio Marketing Director for Big Data and Data Analytics at Dell EMC. Previously she worked at Juniper Networks in Technical Marketing for the Security Business Unit. She has also worked at VMware and EMC as an SE in the Federal Division, focused on Virtualization and Security. She holds both CISSP and CISA accreditations. Erin has a BS in Electrical Engineering and is an author, blogger, and avid runner. You can find her on social media at @banksek

Big Data Conversation with Spotify

I spoke with Eliot Van Buskirk ( @listeningpost ), Data Storyteller at Spotify, as part of my Big Data Conversations series for Dell EMC. Note that this is not a product or technology blog and certainly there are no endorsements implied from either side but, in all fairness, I am a paying Spotify user and I am in love with what they do to bring music to the masses.

Erin K. Banks: How do you utilize big data at Spotify?

photo credit: Larisa Barita

Eliot Van Buskirk: A good way to understand how I utilize data in my role at Spotify is to look at my background. I came out of journalism. I worked at CNET and Wired and I reviewed the first 50-plus digital players in the world and have covered digital music since it came onto the scene in the 1990’s. I left Wired to go to a startup called The Echo Nest, a music data company. They had more individual data points about music than any other entity on the planet. This is everything like “which artists are similar to other artists”, which enabled the clients to make radio stations that make sense, down to what key the chorus is in a song, and where the beats are in a song. It is incredibly detailed data. There are attributes including acousticness which measures how acoustic a song is versus how electronic, and mechanism tied to how regular of a beat it is. Whatever you can think of — these guys were teaching computers to listen to music and to understand what people were saying about music on the web. If a thousand people say something is dubstep then the system says that is probably dubstep. It draws conclusions using the acoustic data and cross references that with the semiotic data from the web to figure out how to understand music at a scale that has pretty much never been possible before. Going from journalism to then going to The Echo Nest, I found the sweet spot in telling stories about the data and analyzing all this incredible technology that is there to solve other problems like “what someone should listen to right now” or “will someone like a certain song.” We started to realize the potential for storytelling – things like how music from various decades is listened to now. The break-out story for us was how artists are related to the individual states. We looked at which artist is listened to disproportionately in each US state, and we made a map that got a ton of traction. Music charts are great, and they tell us a lot about what people like, but the most popular music is popular everywhere. We started looking at what music makes things distinct from each other. You can apply this to anything, like what music is distinctive to millennials versus boomers… the answer is Skrillex versus Roy Orbison. The basic idea is that the data that people view as kind of dry is capable of producing these incredible interesting nuggets of information, and as a journalist who was shifting into a marketing and communications role, what appealed to me was that everything we did with the data is real. It is like being a journalist where you actually do have access to all of the facts. At Spotify, I help publications like the NY Times or whomever to tell data stories, and I publish my own data stories based on the massive amounts of data at Spotify. It was great doing this at The Echo Nest because we had all the data about the music itself, and now that I am a part of the largest on demand music service in the world,  there a lots of anonymized data about how people listen to music, so we are able to see even more, and that makes my job even more interesting

Erin: You said you created a map to identify music regions. Is this map publicly available?

Eliot: This map is publicly available and went everywhere, I think USA Today almost printed instead of the weather map. This is what I mean about it being a success: to get this information out there and to get people talking about it. When we did it originally, it was a static map and it was a snapshot of a certain time when those were the popular artists. I have gone further with the concept now and made this musical map of the world that you can find at, which updates twice a month so you can see the music that is distinct for a thousand cities worldwide at any given time. People from those cities will recognize local bands because people listen to them in, say, Nashville but might not have heard of them outside of Nashville. The beautiful thing about this map is that I worked with this incredibly talented guy called Glenn McDonald ( @EveryNoise ) and he maintains these playlists that update for each city. In order to create the map I linked each spot on the map to one of these playlists. This is a demonstration of what we call at Spotify, “Care at Scale”. We are lavishing attention on these cities and showing exactly what makes them special compared to the rest of the world, but it scales remarkably well. We have a great response from people all over the world, like a radio station in Napoli, Italy where they have a bi-weekly show just dedicated to the Naples playlist on this map. I get emails from people all over the place who can tell that the care is there. It doesn’t come off as, “here is the Top Ten globally” — big data can essentially allow for a more nuanced view of the world, and again, it is important to emphasize that this is all based on anonymized and aggregated listening data.

Erin: Your background is journalism but do you consider yourself a data scientist? Are there data scientists there? Lately I have been hearing that it is not about just the data scientist. When it comes down to it, it is regular people searching through the data trying to get insights. Do you have thoughts on that?

Eliot: We definitely do have data scientists here and I do not consider myself one of them. I also don’t consider myself a data journalist which is another title. I have been a journalist before and this is not journalism. I came up with “data storyteller” as the title which I think is the sweet spot for describing what I do. I have a math and science background from a long time ago. I was very much into calculus and physics and became an English major because it was more interesting to me at that point. I have sort of a split early background between the humanities and more math and sciences and did a tiny amount of programming but really nothing after the age of ten or eleven. I taught myself SQL with the help of people at Spotify who are legendary programmers who are helping me along and I am finally at the point where I can get direct access to data and I can write my own queries and crunch my own data and it is actually empowering and I recommend it to anyone. The data can be a bottleneck, as we all know, and only certain people know how to analyze it and without the ability to do that, I couldn’t do my job.

Erin: Do you pull from the data scientists or do you do it yourself? What is the difference between what you are doing and what data scientists are doing?

Eliot: The data scientists have PhDs and patents and I was an English major who was a former journalist. The skill level is very different, but there is a democratizing aspect to this where once you learn your way around the tools, you are looking at the same data as everyone else, so if you learn how to  use the data, the results are going to be similar to them analyzing it and that is the great thing about technology. Regardless of whether you have a PhD or not… if you figure something out, you figure it out. I would never put myself on the same level as they are and I think there needs to be more of us that are in-between various fields and data science. At Spotify, the data scientists are solving major problems and doing very difficult work. They keep this whole thing running as they optimize everything and provide the best music recommendations. For example, it would not make sense for them to be working on a data-driven story about what happens to Aerosmith listening when there is a comet in the news. Turns out that people listen to “Don’t Miss a Thing” every time there is a comet event. I consider that fun and interesting from a news and PR point of view but, that isn’t something you need a data scientist working full time on.

Erin: I think it says a lot that you know that whenever there is news about a comet, people listen to that song and that it turns into something else. How has big data impacted your business? Did Spotify start off as a streaming business and then they added analytics in order to provide different services?

Eliot: Spotify was always data driven. The Echo Nest brought a different approach. They combined what people said to what music actually is. To have a machine ear that says this song sounds happy or this song sounds sad… this is an actual number that we have, an emotional valence. It varies between zero which is completely unhappy and 1, completely happy. I don’t think anyone else had been able to scale up like that. Spotify is scaling up massively in terms of usage and The Echo Nest scales very well for understanding music and how people listen to music. I would like to add that when people hear about big data being associated with music, they can understandably get the wrong idea that it is just algorithms — what do they know, and why would a robot make a good radio station? That is completely the wrong way to look at it. Thinking of music as data and algorithms is misleading. People make music and people listen to music and it is all about humans being involved. The comparison I often make: I used to review music. That is one person’s opinion, and you can base something on that, or you can base something on how 75 million people listen to music. I argue that 75 million people can make a data-drive approach more human than one person can. The algorithms are a way of getting at very human things through this route that it is sometimes mistakenly viewed as artificial.

Erin: What is the biggest myth about big data? Do you think that it is that big data is not looking at the human aspect?

Eliot: Yes, that would be a fair answer to that question. A great example is the “Discover Weekly” feature on Spotify. Every Monday, for every single user of Spotify, Discover Weekly creates a personalized playlist of music that listeners have never heard before on Spotify, and that they know you are going to like. It seems impossible, but when you have that many people listening to music, making playlists, adding things to their collections, it is possible to say that for every single user, here is music that you will enjoy and we know that you have never listened to it before on Spotify, so here it is. That happens every Monday, and people’s minds are blown. It is a similar feeling to a friend making a mix tape. You feel like “There is no way this thing knows me this well”. That might seem strange or magical or impossible if you are just viewing it as an algorithm, but once you understand that it is all about people, the human element, their responses and reactions, then it makes sense.


Erin: Now on Monday you have people interested in listening to new music and you are offering it up to them based on everything they have given you before. You have a portfolio on me, as a Spotify user, this is what I listen to and this is what I like and on Monday you give me something different based on me and others.

Eliot: Yes, that is essentially how it works. It is a great demonstration of how big data or algorithms actually provide a very human experience. Once you understand that it is about people, it makes sense.

Erin: So, big data allows your business to be more human. Instead of being a streaming music service you are now creating human interaction with us by gathering all this data around us. You are changing how I listen to music and how I discover music.

Eliot: I think that is true. People really love this feature. People return every Monday saying “give me more of this please.” I think it is one of the most powerful ways that we are using data to improve people’s listening experiences right now.

Erin: Does data allow you to create additional products and services?

Eliot: For sure, as we discussed, there is “Discover Weekly” and there is a degree of outreach being done. If we can tell that you are really into a certain band, then we can tell you if they are going to tour where you live. There is no end to where you can go with the data. I am not involved in the product here, but would guess that we don’t want to “over hit” people with offers. But, if we could help artists find people that are interested in them, only good things can come from that.

Erin: I feel people are not seeing all the power of the data and the additional products and services it can provide. We tend to get stuck in our bubble and not see outside of it.

Eliot: I think we are seeing a lot of examples where people are in bubbles more and more with technology, but that data can expand people’s spheres. Discover Weekly is a great example of that. A similar approach could be used on social networks to bring us outside of our bubbles. The amount of data that we generate in everything we do these days, people can use that in a number of different ways. The part that is interesting to me is using it to expand people’s exposure to things –  rather than keeping them in their bubbles.

Erin: What can you leave us with that is important to note?

Eliot: Another of the offers that we have that we couldn’t have done without big data is “Fresh Finds,” which we release every Wednesday, and it is all about expanding people’s spheres. It is introducing bands that are not in the mainstream. In one case, a local bar band in Florida was featured on the list, found completely using big data, and then using the data for understanding very subtle signals that this band was about to take off. They were then showcased on Fresh Finds and then toured in NY and I think scheduled shows all around the United States. There are ways to use big data to personalize things, or ways to use it to identify very early trends, in our case music trends, but it can be applied to every kind of market.

Erin: When you said that people are talking about them are you bringing in social media?

Eliot: The whole web is sitting right there if you want to know about music. If people suddenly start tweeting about bands, there is a way to do semantic analysis of social networks to see what is bubbling up. What I think is powerful is to combine lots of different signals like listening data, what regions are heating up, social media, and who is opening for what band. The number of signals are almost without limit, and that is what is so powerful about big data and the ability to analyze it — you can come close to understanding all of this, resulting in a play list that feels magical. It is the result of lots of hard work and data processing and analysis of anonymized listening. The more signals you can factor in the better. Then you can continue to apply new data and get more results.

Erin: Can bands receive additional services through Spotify based on big data?

Eliot: We released the Spotify for Artists site and service where artists can access data that they couldn’t earlier receive — levels beyond what they can see in the Spotify app, to get an understanding of where their fans are, and so on. It is another great opportunity from big data.

Erin: It isn’t just Healthcare companies or banks, it is more than that, we are talking about bands.

Eliot: Yes, and we’re still talking about data. If you are a band in Seattle and you are trying to decide if you should drive to San Francisco to play, there are ways to answer that question… even that question can be answered using data.

Recap – Strata+Hadoop World 2017 San Jose

I love the Strata + Hadoop World Conference (renamed Strata Data Conference) and once again the 2017 conference did not fail me in anyway. I wanted to take this opportunity to give you a quick recap since I had the privilege of attending.

I love how the keynotes are short and impactful. Wednesday delivered great insight from a conversation with Beau Cronin and Phil Keslin, CTO and Founder of Niantic. Another great session was from Rajiv Maheswaran from Second Spectrum. Niantic created Pokeman Go and some of the great insight that Phil brought was how they started with a strong architecture that they knew would scale. Pokemon GOThey had no idea that it would need to scale so fast or so soon. Although there were sleepless nights, they were prepared and able to solve both the compute and big data problems they encountered immediately. Rajiv’s topic was “When machines understand sports” and it was great to see how they were able to track the movement of the ball and players and how they could transform the data into facts about players, strategies and probabilities as well as changing the overall game. We watch a great deal of basketball, especially now with the NCAA Men and Women’s Championships going on, and to see data analytics applied to the game like never before is really cool. Is there anything that data analytics can’t impact?

hurricaneOn Thursday we got to see many of the good things that data analytics can help to achieve. For instance, Desiree Matel-Anderson from the Field Innovation Team talked about “Data in disasters: Saving lives and innovating in real time”. Desiree talked about hurricane Sandy and the Boston Marathon bombing as well as other recent events. She talked about how we can use social media and data analytics to determine the impact of the event and how they make it easier to better follow or respond to events in the future. For instance, looking at social media, they saw that people were tweeting for “help” after Sandy hit the coast, electricity was gone and the hurricane had subsided. There was no panic occurring during the storm hitting the land. These facts can help agencies like FEMA in future hurricanes help people feel more connected and can respond to them faster and with this historical data. Another great session was Maya Shankar who worked for the White House Office of Science and Technology Policy under President Barack Obama. She spoke about “Improving Public Policy with Behavioral Insights” and provided us with four conclusions as to the how those insights can guide next steps…


1) convert interest into impact

2) quantify impact

(3) celebrate small-wins

(4) generate organic buy-in

These conclusions are based on many years of work and delivered as proof to Washington DC insiders that “data tells you what people are doing” and “behavioral science tells you why.”

The rest of the day was filled with expo time and some great technical sessions. The majority of the topics for the conference focused on real-time machine learning and Artificial Intelligence (AI). In one presentation, Rob Craft from Google explained machine learning best… he said, machine learning is “one branch of the field of AI”, “a way of solving problems without explicitly codifying the solution”, and, last but not least, “a way of building systems that improve themselves over time.” Machine learning and AI are clearly the future with regard to data analytics and a great reason why they changed the name of the conference to Strata Data Conference. Don’t get me wrong… Apache Hadoop, Spark, Impala, Flink, Kafka, Beam, Apex, Kudo, etc are still being talked about. Data analytics means a lot of things to people and it varies in multiple aspects, but one thing that remains the same is the fact that data analytics drives change and impact. Whether it is changing our nation’s policies or making better video games, we were all at Strata + Hadoop World to learn more and use that information to make a difference. That’s why I love Strata + Hadoop World. So much to learn, so many people to learn from, and realizing that for once, our jobs are making an impact and what is better than that?

Strata+Hadoop World 2017 – San Jose

Another Strata+Hadoop World San Jose is upon us and some of us are extra excited because this is the first time we are in San Jose as Dell EMC. I was literally on a call today going over the logistics and a couple of people could not stop talking about the event and how it will be great that we get to be a part of it as Dell EMC. We are incredibly proud of the way the merger has come together and all that we have accomplished along the way. This time we get the opportunity to show you that we are not just an infrastructure company but a company that is focused on your business outcomes. We have worked with the full range of customers and helped them be successful across multiple disciplines.

Strata+Hadoop World 2017What is our main message? Analytics that Drive Business

The only way that you can solve your business problems is through data analytics. Your current and future data has the potential to answer these problems, but recognizing the questions is not that easy. How do you find these questions and how do you get a return on information, is the true question. We want to guide you through the maze and challenges of data analytics through our services and simple and complete offerings that take the guess work out of the struggle. We want to allow your business to focus on what’s most important, driving your business forward.

We have two sessions occurring at Strata+Hadoop World

Tuesday @ 1:30 Pm in room 210 B/F with Bill Schmarzo ( @schmarzo )

Wednesday @ 11:00 am in room 210 B/F with yours truly, Erin Banks ( @banksek )

We will be in booth 1409, which is to the right of the main entrance, so please come on by and let us prove to you that we are more than just infrastructure. Also follow us on @DellEMCbigdata to see how the conference is going.

Some additional information can be found at our Dell EMC events site

Big Data Analytics and Its Impact on Holiday Shopping

It was my niece Kajsa’s birthday in October. It was so crazy at work that I almost completely forgot to get her a gift. Please don’t tell her!!! My biggest problem, outside of forgetting, is the struggle to find a gift for an 8-year-old when you live in two separate states and you want to get get her something unique and that you really hope she will like. holiday shopping

Shopping can be difficult and not many people like to do it, especially when you need to get something for someone else. What worried me more… there were two more months until holiday shopping occurred. Now I need to buy for her brother Tennison and Kajsa AGAIN. What is the appropriate age to start giving gift cards? Well until that age, I need a fall back plan which is of course online shopping. If I didn’t have time to remember her birthday, I certainly don’t have time to drive to the mall. We all have our favorite sites and personally my favorites include reviews and most importantly… recommendations.

So how does my problem help you and your business? If your business is not applying any data analytics to your inventory and your sales, you are not being competitive. There are different types of data analytics that you can using within your business but the most impactful are predictive and prescriptive analytics.

predictive analytics

Predictive analytics helps you understand what will happen. By looking into the past and how people reacted in certain conditions, can help you predict what will happen in the future. For instance, if I buy a home gaming system then more than likely other accessories will go with it and be recommended. Does your business have the ability to “upsell”? Does your business understand the correlation of the buyers to what they have bought over time? Buyers like me are looking for the recommendation, they are looking for ways to easily find out what else is out there. Your business more than likely has all of this past data. You might be using it your inventory tracking or just your quarterly sales. Now with predictive analytics, you have the ability to sell even more products or services.

Prescriptive analytics takes predictive analytics one step further and takes your business even closer into the big data analytics realm. Taking multiple points of information not only from purchases on your site but other sites as well as credit card purchases, and the attributes of the buyer, can help your business find the right options for your customers as well as drive newer products to them. Prescriptive analytics identifies the best course of action that an individual should take.  It is almost like you do the thinking for me because your business inputs all the points that is necessary to make the right decision. For example, maybe there is a trend going in that state and therefore I want to choose a gift that is something she truly will want.

Mother and son with pad during car travel at nightUnderstanding the data and applying it in such a way that it helps drives sales is not a new concept. Often times when we work with someone face to face they make recommendations on what others liked or paired with something. Now businesses have the opportunity to pull from many sources and therefore give you a richer insight. Not just the ability to know what we will more than likely buy but an understanding of what we should buy based on the information accumulated from people like me. Pulling information from other people buying an 8-year-old a gift will more than likely make sure that I buy the perfect gift. Isn’t that all what we want anyway… to be the perfect Aunt and Uncle that knows just what the kids want. Happy Holidays and happy online shopping this season. Just make sure you shop at a site that does data analytics.


Follow Dell EMC

Dell EMC Big Data Portfolio

See how the Dell EMC Big Data Portfolio can make a difference for your analytics journey

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Dell EMC Community Network

Participate in the Everything Big Data technical community

Follow us on Twitter