Prince Song Recommender

Building a song recommender using the Million Song Dataset

Telvis Calhoun
technicalelvis.com

Goals

Getting and Cleaning the Data

  • The entire dataset is 280GB and Stored in Amazon S3 in HDF5 format.
  • We extracted some HDF5 fields to a serialized R file containing the features.
df <- readRDS("data/songs.rds")
names(df)
##  [1] "artist_name"               "title"                    
##  [3] "release"                   "song_hotttnesss"          
##  [5] "tempo"                     "loudness"                 
##  [7] "energy"                    "danceability"             
##  [9] "duration"                  "artist_familiarity"       
## [11] "artist_hotttnesss"         "artist_latitude"          
## [13] "artist_location"           "artist_longitude"         
## [15] "end_of_fade_in"            "key"                      
## [17] "key_confidence"            "song_id"                  
## [19] "start_of_fade_out"         "time_signature"           
## [21] "time_signature_confidence" "track_id"

Feature Selection

plot of chunk unnamed-chunk-3

plot of chunk unnamed-chunk-4

Recommender

  • We use the song_hotttnesss and loudness as numeric features to model songs in a 2-dimensional space.
  • A user can query the recommender by selecting values for song_hotttnesss and loudness
  • The recommender calculates the 5 nearest songs in the 2D space using a euclidean distance metric.

plot of chunk unnamed-chunk-5

Shiny Application: Prince Song Recommender