Faster Web Scraping in Python
Faster Web Scraping in Python with Multithreading
It’s clear that the triangle on the right is more compact than the triangle on the left. The red, blue, and green lines are much smaller.
The Cluster Spread metric captures that difference. The value for the right triangle is much smaller than the value for the left triangle, which means it’s more compact.
It’s easy to see how this works in two dimensions. While I can’t visualize how this works in eight dimensions, the logic extends perfectly. I can use my Cluster Spread metric to measure the spread of any set of points in 8-D space just like I did in the 2-D space above.
With the math taken care of, it’s time to move on to the music.
The Spotify API is awesome. Most importantly, for this project, Spotify provides access to audio features for any song in their collection. Audio features are characteristics of songs like danceability, energy, and loudness that Spotify has assigned numerical values to represent.
I decided to use eight of the features. I picked ones that naturally lend themselves to being ranked numerically (so I didn’t use things like the key and time signature, but did include things like how acoustic the song is). The eight characteristics I chose are:
With these features, I can represent each song as an 8-dimensional vector. I did a little bit of data transformation to scale all the features to between 0 and 1, since I don’t want any one feature to dominate the spread metric.
And that’s all there is to it. With my Cluster Spread formula, my features chosen, and the data from Spotify’s API, I’m ready to start ranking artists.
At the end of every year from 2012 to 2016, Spotify released a playlist of their Global Top 100 Tracks of the Year. There are several hundred artists featured on these playlists, and I ranked all of the artists above a baseline popularity threshold.
In total, I ranked about 200 artists. The 10 most one-dimensional artists with top hits from 2012-2016 are:
Rank | Artist | Cluster Spread Value |
---|---|---|
1 | Kesha | 1.72 |
2 | Foster The People | 1.86 |
3 | Fall Out Boy | 1.93 |
4 | American Authors | 1.94 |
5 | PSY | 1.95 |
6 | DNCE | 1.98 |
7 | Carly Rae Jepsen | 1.99 |
8 | Alesso | 2.03 |
9 | WALK THE MOON | 2.03 |
10 | Robin Schulz | 2.07 |
These look pretty good to me! Artists like Kesha, PSY, and Carly Rae Jepsen are classic examples of one-dimensional bands you might have picked before reading this.
While I won’t say this is the definitive ranking, and I chose the audio features I personally thought were relevant, I think this is the best anyone has done so far.
After ranking artists from the 2012-2016 Top 100 Tracks playlists, I wanted to evaluate some classic artists across time periods.
I didn’t calculate Cluster Spread for every major group in history, but I did try a lot of bands (at least another 75-100). In a nail biter, Kesha managed to beat out Creed (a heavy favorite going into this process) as the most one-dimensional of all time.
But don’t worry Creed fans. There’s always a chance they’ll come out of retirement and write a new hit that will take them higher.
For those interested in recreating or expanding this analysis, the Jupyter Notebook with all the code can be found in the Github repository for this post.
Leave a Comment