Ben Nour

Using Spotify's Web API to analyse my music listening habits

Since 2015 I’ve been adding my most-liked rock songs to the aptly named playlist Favourite Rock.

I thought it’d be fun to use content metadata from Spotify’s Web API to learn more about my taste in rock music, such as favourite music era and how often I’ve added to the playlist over the years.

Getting the data

import os

import requests
import pandas as pd

# Getting the API access token.
url = "https://accounts.spotify.com/api/token"
headers = {
    "Content-Type": "application/x-www-form-urlencoded"
}
data = {
    "grant_type": "client_credentials",
    "client_id": os.environ['SPOTIFY_CLIENT_ID'],
    "client_secret": os.environ['SPOTIFY_CLIENT_SECRET']
}

response = requests.post(url, headers=headers, data=data)
token = response.json()['access_token']

The Get Playlist Items endpoint returns a maximum of 50 items (tracks/songs).

However in the API call you can specify the index of the first item to return, a parameter which you can take advantage of to make multiple API calls to capture all songs on the playlist:

# Getting all the songs in my Favourite Rock playlist.
headers = {
    "Authorization": f"Bearer {token}"
}

tracks = []
for number in range(0, 520, 50):
    response = requests.get(f"https://api.spotify.com/v1/playlists/{os.environ['PLAYLIST_ID']}/tracks?limit=50&offset={number}", headers=headers)
    data = response.json()
    tracks = tracks + (data['items'])

# Double check that my list contains all the playlist songs.
data['total'] == len(tracks) 
True

Transforming the data

Not all of the metadata returned from the API is relevant, as you can infer from a look at the following JSON keys (for example, ‘video_thumbnail’), or structured in a way that can be passed to a pandas DataFrame immediately.

tracks[0].keys()
dict_keys(['added_at', 'added_by', 'is_local', 'primary_color', 'track', 'video_thumbnail'])

The artist name is nested with the ‘artists’ key.

tracks[0]['track']['artists']
[{'external_urls': {'spotify': 'https://open.spotify.com/artist/6bUJpbekaIlq2fT5FMV2mQ'},
  'href': 'https://api.spotify.com/v1/artists/6bUJpbekaIlq2fT5FMV2mQ',
  'id': '6bUJpbekaIlq2fT5FMV2mQ',
  'name': 'Wavves',
  'type': 'artist',
  'uri': 'spotify:artist:6bUJpbekaIlq2fT5FMV2mQ'}]
# Creating a list of track dictionaries.
new_tracks = []
for track in tracks:
    new_track = track['track']
    new_track['added_at'] = track['added_at']
    new_track['release_date'] = track['track']['album']['release_date']
    new_track['artist_name'] = track['track']['artists'][0]['name']
    new_tracks.append(new_track)
df = pd.DataFrame(new_tracks)
# Keeping only relevant columns.
df = df[['artist_name', 'name', 'duration_ms', 'release_date', 'added_at']]
df= df.rename(columns={'name':'track_name'})
df
artist_nametrack_nameduration_msrelease_dateadded_at
0WavvesWay Too Much1536402015-07-212015-11-01T13:20:26Z
1The RubensHoops1589732015-08-072015-11-01T13:17:48Z
2Lurch & ChiefKeep It Together2362802014-10-172015-11-01T13:23:21Z
3Violent SohoCovered in Chrome2125462013-09-062015-11-03T07:05:03Z
4Foo FightersEverlong2505461997-05-202015-11-01T13:17:59Z
..................
515Ainslie WillsDrive3019102015-09-142023-11-05T12:28:19Z
516Sonic YouthSunday2923061998-01-012023-11-05T12:31:29Z
517Brand NewThe Quiet Things That No One Ever Knows24164020032023-11-05T12:36:23Z
518Chastity BeltLydia2398262015-03-232023-11-05T12:40:02Z
519The Velvet UndergroundOh! Sweet Nuthin'44477819702023-11-07T08:31:32Z

520 rows × 5 columns

Data analysis

Now that we have a pandas DataFrame I can start doing some data analysis.

Which artist appears the most frequently?

I was curious to see which artist has the most songs featured on the playlist.

Unsuprisingly it was my favourite band - Nirvana!

most_popular_artists = df['artist_name'].value_counts()
most_popular_artists.head(1)
artist_name
Nirvana    15
Name: count, dtype: int64

What is the shortest and longest song in the playlist?

# Creating a more readable song length column.
df['song_length'] = round(df['duration_ms'] / 60000, 2)
max_length_index = df['song_length'].idxmax()
max_song = df.loc[max_length_index]
print(f"Song: {max_song.iloc[1]}, Artist: {max_song.iloc[0]}, Song length: {max_song.iloc[5]}")
Song: Beach Life-In-Death, Artist: Car Seat Headrest, Song length: 13.31

13 minutes!

min_length_index = df['song_length'].idxmin()
min_song = df.loc[min_length_index]
print(f"Song: {min_song.iloc[1]}, Artist: {min_song.iloc[0]}, Song length: {min_song.iloc[5]}")
Song: We See U, Artist: Speed, Song length: 1.06

How many songs have I added to the playlist over the years?

I was curious to see if I added a similar number of songs per year since creating the playlist.

df['year_added'] = df['added_at'].str[:4] # Creating a year_added column.
songs_per_years = df['year_added'].value_counts().sort_index()
songs_per_years.plot()
<Axes: xlabel='year_added'>

What era of rock music is most represented?

Perhaps unsuprisingly given my age (30 at time of writing), the 2010s were the most popular decade when looking at the number of songs added to the playlist by decade released.

df['release_date_decade'] = df['release_date'].str[:3] + "0s"
rock_era_song_count = df['release_date_decade'].value_counts().sort_index()
rock_era_song_count.plot(kind="bar", rot=0)
<Axes: xlabel='release_date_decade'>
era_artists = df.groupby(['release_date_decade', 'artist_name']).size().reset_index(name='song_count')
max_count_id = era_artists.groupby('release_date_decade')['song_count'].idxmax()
top_artists_per_era = era_artists.loc[max_count_id]
top_artists_per_era
release_date_decadeartist_namesong_count
01960sThe Beatles1
31970sDavid Bowie2
151980sDescendents5
441990sNirvana10
872000sNOFX8
2832010sViolent Soho8
2942020sAmyl and The Sniffers6

Conclusion

Some interesting insights came out of this analysis, my favourites of which include:

Thanks for reading!

You can find the Juypter notebook here.

#Python #Pandas #Data-Analysis