As I write this, I am listening to my very own personalized radio station on Pandora. Pandora is a web site which uses your musical tastes to design a customized "radio station" that will serve up only the music you love. I have been listening to it, on and off, for about couple of weeks now, and I must say, with apologies to MacDonald's Corp., that I'm lovin' it.
Now, had I not been a programmer, I would have simply been impressed, and accepted Arthur C. Clarke's third law - "Any sufficiently advanced technology is indistinguishable from magic", and gone on with my life. But being one, and being a sucker for this kind of stuff, I keep thinking of how they do it, rather than accept and enjoy the fact that they just do such a bang-up job. So having nothing more concrete to write about this week, here is my analysis. But be warned...its probably far enough from the mark to have the Pandora guys rolling on their office floor in laughter at my naivete.
The founders of Pandora are also the originators of the Music Genome Project. Each song in Pandora's collection is classified along 400+ attributes, called its 'genes'. Assuming exactly 400 attributes, each song is now a point in a 400 dimensional space.
When you register on Pandora, they ask you for three things - your age, gender, and your choice of songs for your station. The last can be an artist or a band, a genre or a period. While your age and gender are probably not song attributes on their own, they are very strong indicators of the "type" of music you like to listen to. The reason for this is that we tend to listen to most of our music during our teens, and our preferred genre of music usually happens to be whatever was most popular during this time. As to gender, boys and girls typically listen to different artists, even within the same genre, although the distinction may not be as clear cut as with age.
So, by the time you register and set up your station, Pandora already knows about 5-10 of the attributes of the songs you would probably like. So assuming 7 known attributes, the songs that you are most likely to enjoy are songs which lie closest in the 7-dimensional space defined by these attributes. This could be a simple Eucledian distance calculation:
1 2 3 4 5 6 7 8
distance = 0 for song in songs: for attribute in user.attributes: distance += (song[attribute].value - attribute.value) ** 2 distance = sqrt(distance) if (distance < closeness_cutoff: song.play() continue
As they stream the songs in to you, Pandora asks that you optionally rate the song with either a thumbs-up and thumbs-down. What this does is weight your user object with the attributes of the song you just rated. The process can be thought of as "evolving" your user object to have more "genes" or attributes. So, something like this:
1 2 3 4 5
for attribute in song.attributes: if (song.ratedAs(thumbs_up)): user[attribute].value += attribute.value else: user[attribute].value -= attribute.value
Applying your "evolved" user object to filter out the songs will now result in distance calculations across more dimensions, and thus make it possible for Pandora to give you results which are closer to your tastes.
Even if my analysis is completely off the mark, I think the idea of a radio station customized per user is really cool. Its like having your personal music collection wherever you have a connection to the Internet. Kudos to the Pandora team for cooking this up.