Spotify Wrapped unwrapped: The Power of User Data & music recommendations systems.

  • Discussing Spotify Wrapped's Impact
  • Covering how Spotify's Music suggestion algorithm works with vectors
  • Final thoughts regarding how you use usage and user data within your business
min read

Christmas is almost here, and with it comes Santa, Eggnog & Spotify Wrapped. 

Even if you’ve been living under a rock without a friend’s Spotify account to steal borrow, you likely haven’t been able to get away from the shock and awe bombardment that is Spotify’s genius ad campaign. 
Confused? Let me remind you. 

Hard Rock Spotify Wrapped
Spotify Wrapped - Top 5 Artists & Songs
Stormerz top statistics on Spotify Wrapped
Top 10 Country Albums Streamed on Spotify 2021

The sheer power of Spotify’s 172 million paid premium subscribers [Q3 in 2021] listening data might not have been realized without the right person seeing this as an opportunity for marketing. 
Spotify Wrapped has managed to turn this into a social media advertising phenomenon. Their platform & individual user’s listening data has been distributed in a stylized, visualized format that is ripe for consumption on social media.  

Looking at Google Trends, we can see that Spotify has managed to eclipse last year’s interest over time and hit the mythical 100 rating. This is no small feat for a large company that has had major peaks in the news cycle.  
Something major must have happened – and that something is Spotify Unwrapped. 

Google Trends for #SpotifyWrapped in 2021

Take a look at these usage statistics [source

Spotify usage time by region

Region Average daily usage (minutes) 
Asia Pacific110
Latin America117
Middle East & Africa124
North America140

With COVID-19 accelerating people’s dependency on the internet for work, entertainment and communication, I would argue that these figures are quite strong and shows a level of dependency amongst people to get their music fix from paid streaming sites like Apple Music & YouTube Music. 

As Spotify’s importance and market share have increased, there is a huge wealth of data that can be gleaned from your average user.  

Now as a marketer myself, I can safely say that most successful businesses find a way to tell a story. Whether it’s a true story or not, if its engaging, it will get attention. 

With the passion that people get from being a superfan of certain bands, singers and genres, Spotify Wrapped has transformed the simple act of listening to music, into a performative expression of one’s music tastes.  
The neat, crisp visuals of Spotify, shape this data into an annual narrative, showing people the stats on the music & podcasts that they love. This form of interpretative data, in a sharable visual medium is gold on social media.  

There has been post after post on Reddit by users sharing which artists, they were a top listener of. Peep the upvotes next to the main thread & comments. This specific thread hit the front page which is no easy feat. 

Reddit talking about Spotify Wrapped
Reddit talking about Spotify Wrapped 1
Reddit talking about Spotify Wrapped 2
Reddit talking about Spotify Wrapped 3

You can see that not only is there a high positive sentiment and engagement with Spotify Wrapped, it’s also abundantly clear that artists take these listeners seriously; with tickets & promotional gifts, given away as a thank you for being a top listener.  

Limitations of the Spotify’s data set 

Interestingly, this data has a major consideration to ponder. How much does Spotify’s recommendation system influence the annual Wrapped data?

Is it logical to assume that the algorithm suggests Aesop Rock to this user, influencing them to listen to 78,010 minutes of Aesop Rock?

Aesop Rock Drums on the Wheel - Spotify Wrapped

Or that a user decided to that they all they wanted to do was listen to Blink 182 ad nauseum to enter the top 0.05% of listeners? 

Blink 182 Top Listener Spotify Wrapped

I ask this question as it is important to understand the power of AI. In some ways, the mathematical models of recommendation systems program our tastes, or at the very least influence them.  

With Netflix spending millions upon millions in fine tuning and configuring their recommendations system, and going as far as to personalize the movie artwork according to the end user’s. This gives us a clue to how important search, discovery and recommendation algorithms are to software companies, as well as the underlying data that fuels this.  

Netflix Suggestion Algorithm

Deeper mining of Spotify Wrapped. 

The prospect of mining the data set of Spotify’s user base, or at the very least their artists music data and statistics is mouthwatering. Alas, there is no publicly accessible API for the entire platform’s data (there’s an idea for you Spotify!) – however through their API, they do offer the ability to pull requests for accounts you can access – 
Spotify REST API is primarily used to pull JSON metadata regarding songs, artists & albums on the platform.  
Recognizing that user data is incredibly useful and engaging, they have expanded this to pull requests regarding user related data, like playlists and music that the user saves in the Your Music library. 

To access this data for a specific user, you must authorize it via Spotify Accounts Service. Without this, even if you specify a Spotify user ID, it will not return the data you are asking for.  
For a great overview on the deeper processes involved, it’s highly recommended you view Pavan Sanagapati’s tutorial on Spotify Music API – Data Extraction 

Technical Commentary by Michelangiolo Mazzeschi – Data Evangelist at Relevance AI 

Spotify has not released information regarding the details of its recommendation system. There are several ways in which Machine Learning can be used to enhance the user experience based on its historical preferences, from Collaborative Filtering to Neural Network based recommendations. 

In this post, we are going to explore one of the most advanced recommendation approaches that uses vector-based search as its main core search engine. The advantage of using vectors to encode songs into a multi-dimensional space is being able to find the relationships between different songs, meaning that we can immediately group songs that are similar. Once a model has understood what are the main preferences of a user, it can suggest the songs that most corresponds to its musical taste. 

Encoding Spotify’s songs 

Before being able to recommend any song to a user, the main issue is that all the data is categorical. Even by using tags or song content, the data is too complex to represent: how would you convert a song lyric into a number? For this particular purpose we use something called encoder, a complex algorithm that is able to convert categorical data into a series of numbers that we call vector. 

Once we have converted all the songs in Spotify into vectors, we can visualize them into a cartesian plane (note that in this example I am only using 2 AC/DC songs and one song from Lesly Gore:  

Spotify Vector Example

As we can immediately see, songs that are similar to each other in terms of tags, content, description and lyrics occupy the same region in space. For example, I can assume that all the cluster of all the blue dots represent Heavy Metal songs, while the Turquoise dots are likely going to be vintage songs. 

Multi-vector search 

The difficulty of encoding songs and podcast into a vector-space, is that there are several features that provide valuable information on the relationships between elements. The content of a song would be the main feature, but each song is given one or more tag, belongs to one album or a collection, and there could be one or more artists that have been contributing to its creation. 

To take advantage of multiple features, we use a technology called multi-vector search, which is able to perform a search taking in consideration the various feature of each item, each one corresponding to a separate vector space. 

Multi-modal embedding 

The way a vector-based recommendation system works is by creating a dedicated vector space for each user in the platform. This dedicated space only hosts vectors that are part of the user history. Depending on the complexity we wish to adopt and our storage limitation, we can weight these vectors differently according to several factors, sometimes even ignoring some of them.  

Given all user preferences, identified in its dedicated embedding, we can identify one location that best represents the average of all its previous preference.  

Spotify Multi-modal embedding 2
Spotify Multi-modal embedding

We can then use this location in the general item embedding to find the best matches. Note that we are not limited to a single representative user vector, but we can use multiple, if we wish. 

Spotify Vector Multi-modal embedding

The representative dot of user1 is closer to It’s my party, while the representative dot of user2 is closer to Highway to Hell: we have just simulated a vector-based recommendation system. 

Key takeaways and considerations 

It is clear despite valid privacy considerations and concerns, that the benefits of big data are too large to ignore. The world is richer for the takeaways that our collective data is used, whether it is from a user experience, healthcare, business analytics or legislation point of view. 

With great power comes great responsibility, yet with the right ethical considerations, businesses can leverage the anonymized data they collect to effective use. 

Data such as: 

  • Initial trial account usage  
  • User firmographics 
  • User demographics 
  • UI engagements 
  • API & SDK Requests  
  • Usage predictions based on usage data 
  • Customer churn data 
  • Product Purchasing data  

Can be used for: 

  • Data as content  
  • Marketing Strategy 
  • Business Strategy 
  • Understanding weaknesses in the UI / UX to flag for improvement  
  • Buyer Persona Insights 
  • Conversion rate optimization  
  • Content customization 
  • Sales and Marketing automation 
  • Development & Product Roadmaps  

With this in mind, and the clear power of Spotify’s data as content, we ask you: 

1) What useful and important data does your business capture currently? 
2) And how will you leverage this data in an ethical manner within your product, development, sales, marketing, or general business activities?  

Spotify Wrapped unwrapped: The Power of User Data & music recommendations systems.
Benedek Zajkas
December 22, 2021
Interested in harnessing vectors to untap your data's potential?

Our vector experimentation platform will help you analyse your data immediately.

Talk to our team of data science experts