Re: Netflix Prize data - Mailing list pgsql-hackers

From Mark Woodward
Subject Re: Netflix Prize data
Date
Msg-id 21733.24.91.171.78.1160002282.squirrel@mail.mohawksoft.com
Whole thread Raw
In response to Re: Netflix Prize data  ("Greg Sabino Mullane" <greg@turnstep.com>)
List pgsql-hackers
>> I signed up for the Netflix Prize. (www.netflixprize.com)
>> and downloaded their data and have imported it into PostgreSQL.
>> Here is how I created the table:
>
> I signed up as well, but have the table as follows:
>
> CREATE TABLE rating (
>   movie  SMALLINT NOT NULL,
>   person INTEGER  NOT NULL,
>   rating SMALLINT NOT NULL,
>   viewed DATE     NOT NULL
> );
>
> I also recommend not loading the entire file until you get further
> along in the algorithm solution. :)
>
> Not that I have time to really play with this....

As luck would have it, I wrote a recommendations system based on music
ratings a few years ago.

After reading the NYT article, it seems as though one or more of the guys
behind "Net Perceptions" is either helping them or did their system, I'm
not sure. I wrote my system because Net Perceptions was too slow and did a
lousy job.

I think the notion of "communities" in general is an interesting study in
statistics, but every thing I've seen in the form of bad recommendations
shows that while [N] people may share certain tastes, but that doesn't
nessisarily mean that what one likes the others do. This is especially
flawed with movie rentals because it is seldom a 1:1 ratio of movies to
people. There are often multiple people in a household. Also, movies are
almost always for multiple people.

Anyway, good luck! (Not better than me, of course :-)


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: workaround for buggy strtod is not necessary
Next
From: "Mark Woodward"
Date:
Subject: Re: Netflix Prize data