Re: Unacceptable postgres performance vs. Microsoft sqlserver - Mailing list pgsql-general

From Chris Browne
Subject Re: Unacceptable postgres performance vs. Microsoft sqlserver
Date
Msg-id 60zlrwuyqb.fsf@dba2.int.libertyrms.com
Whole thread Raw
Responses Re: Unacceptable postgres performance vs. Microsoft sqlserver  ("David Wilson" <david.t.wilson@gmail.com>)
Re: Unacceptable postgres performance vs. Microsoft sqlserver  (Ivan Sergio Borgonovo <mail@webthatworks.it>)
Re: Unacceptable postgres performance vs. Microsoft sqlserver  (Andreas 'ads' Scherbaum <adsmail@wars-nicht.de>)
List pgsql-general
"tosbalok@gmail.com" <tosbalok@gmail.com> writes:
> Let me just start off by saying that I *want* to use postgresql.
> That's my goal.  I do not want to use SQLServer.  I'm posting this
> message not to slam postgres, but to ask for someone to help me figure
> out what I'm doing wrong.
>
> I've used postgres for several years as the backend to web
> applications. I'm not entirely new to it, but I've never needed to
> bother with performance tuning.  Netflix is running a contest in which
> they will give you a set of movie ratings and you try to develop an
> algorithm to predict future ratings.  I've imported this data into a
> postgresql database, and the performance is abysmal.  Here's my setup:
>
> Core 2 Quad with 4GB RAM and two SATAII hard drives in RAID 0
>
> I install debian linux and postgresql 8.1 (this is the latest version
> that is available through the debian package manager).  I import the
> Netflix data into a table with the following characteristics:
>
> Create table rating (movieid int, userid int, rating int4, ratingdate
> date)
>
> There are 180 million rows.  I've done the import 3 times, it takes on
> average 815 seconds.  I'm not too worried about that.  Now for a test,
> I run the query, select count(*) from rating;
>
> This takes an average of 172 seconds.  Call it three minutes.
>
> Now, on the same machine, I install windows XP and Microsoft SQL
> server 2000.  The import took 742 seconds.  The count(*) query took 22
> seconds.
>
> 22 seconds.  What's gong on?
>
> Another test.  In postgres I added an index to the userid column and
> then counted distinct userids.  The average run time over three
> queries was 4666 seconds, or 78 minutes.  Unbelievable.
>
> On SQL Server, with *no* index, the same query takes on average 414
> seconds, or about 7 minutes.  Ten times faster!
>
> I'm sure someone will flame me and say that I'm a worthless noob and
> if only I was smart I would know what I'm doing wrong.  So let me just
> repeat: I want to use postgres.  That's my goal.  Please tell me what
> I can do to make the performance of this acceptable.  It's not a
> complicated setup.  One table.  A couple of count(*) queries.

This is a well-understood issue...

Some database systems are able to do some optimization where either:
a) They collect some statistics to answer such queries in O(1) time, or
b) They can use index accesses and only look at an index.

The MVCC strategy in PostgreSQL, which allows it to avoid the need for
readers to block writers, and vice-versa, has the result that running
"count(*)" without any WHERE clause requires a scan across the entire
table.

If the entire purpose of your application is to run COUNT(*) against
the entireties of a table with 180M rows, then PostgreSQL may not be
the right database for your application.

If, on the other hand, this is a poor benchmark of your actual
requirements, then it would be a bad idea to draw any conclusions
based on the performance of "select count(*) from some_table;"
--
(format nil "~S@~S" "cbbrowne" "linuxfinances.info")
http://www3.sympatico.ca/cbbrowne/oses.html
"If a word in the dictionary  were misspelled, how would we know?"
-- Steven Wright

pgsql-general by date:

Previous
From: "Roberts, Jon"
Date:
Subject: Re: How do I make sure that an employee and supervisor belong to the same company?
Next
From: Robert Treat
Date:
Subject: Re: pgcrypto and dblink