Unacceptable postgres performance vs. Microsoft sqlserver - Mailing list pgsql-general

From tosbalok@gmail.com
Subject Unacceptable postgres performance vs. Microsoft sqlserver
Date
Msg-id 5229123d-a7ad-4e3e-800b-49d925b13a49@s50g2000hsb.googlegroups.com
Whole thread Raw
List pgsql-general
Let me just start off by saying that I *want* to use postgresql.
That's my goal.  I do not want to use SQLServer.  I'm posting this
message not to slam postgres, but to ask for someone to help me figure
out what I'm doing wrong.

I've used postgres for several years as the backend to web
applications. I'm not entirely new to it, but I've never needed to
bother with performance tuning.  Netflix is running a contest in which
they will give you a set of movie ratings and you try to develop an
algorithm to predict future ratings.  I've imported this data into a
postgresql database, and the performance is abysmal.  Here's my setup:

Core 2 Quad with 4GB RAM and two SATAII hard drives in RAID 0

I install debian linux and postgresql 8.1 (this is the latest version
that is available through the debian package manager).  I import the
Netflix data into a table with the following characteristics:

Create table rating (movieid int, userid int, rating int4, ratingdate
date)

There are 180 million rows.  I've done the import 3 times, it takes on
average 815 seconds.  I'm not too worried about that.  Now for a test,
I run the query, select count(*) from rating;

This takes an average of 172 seconds.  Call it three minutes.

Now, on the same machine, I install windows XP and Microsoft SQL
server 2000.  The import took 742 seconds.  The count(*) query took 22
seconds.

22 seconds.  What's gong on?

Another test.  In postgres I added an index to the userid column and
then counted distinct userids.  The average run time over three
queries was 4666 seconds, or 78 minutes.  Unbelievable.

On SQL Server, with *no* index, the same query takes on average 414
seconds, or about 7 minutes.  Ten times faster!

I'm sure someone will flame me and say that I'm a worthless noob and
if only I was smart I would know what I'm doing wrong.  So let me just
repeat: I want to use postgres.  That's my goal.  Please tell me what
I can do to make the performance of this acceptable.  It's not a
complicated setup.  One table.  A couple of count(*) queries.


pgsql-general by date:

Previous
From: Colin Wetherbee
Date:
Subject: Re: Suggestion for psql command interpretation
Next
From: Greg Smith
Date:
Subject: Re: Which Python library - psycopg2 or pygresql?