Re: Performance of count(*) - Mailing list pgsql-performance

From Andreas Kostyrka
Subject Re: Performance of count(*)
Date
Msg-id 20070322122944.GD4439@andi-lap.la.revver.com
Whole thread Raw
In response to Re: Performance of count(*)  (Andreas Tille <tillea@rki.de>)
Responses Re: Performance of count(*)
Re: Performance of count(*)
List pgsql-performance
* Andreas Tille <tillea@rki.de> [070322 13:24]:
> On Thu, 22 Mar 2007, Andreas Kostyrka wrote:
>
> >Which version of PG?
>
> Ahh, sorry, forgot that.  The issue occurs in Debian (Etch) packaged
> version 7.4.16.  I plan to switch soon to 8.1.8.
I'd recommend 8.2 if at all possible :)
>
> >That's the reason why PG (check the newest releases, I seem to
> >remember that there has been some aggregate optimizations there),
>
> I'll verify this once I moved to the new version.
8.1 won't help you I'd guess. ;)

>
> >does
> >a SeqScan for select count(*) from table. btw, depending upon your
> >data, doing a select count(*) from table where user=X will use an
> >Index, but will still need to fetch the rows proper to validate them.
>
> I have an index on three (out of 7 columns) of this table.  Is there
> any chance to optimize indexing regarding this.
Well, that depends upon you query pattern. It's an art and a science
at the same time ;)
>
> Well, to be honest I'm not really interested in the performance of
> count(*).  I was just discussing general performance issues on the
> phone line and when my colleague asked me about the size of the
> database he just wonderd why this takes so long for a job his
> MS-SQL server is much faster.  So in principle I was just asking
> a first question that is easy to ask.  Perhaps I come up with
> more difficult optimisation questions.

Simple. MSSQL is optimized for this case, and uses "older"
datastructures. PG uses a MVCC storage, which is not optimized for
this usecase. It's quite fast for different kinds of queries.

The basic trouble here is that mvcc makes it a little harder to decide
what is valid for your transaction, plus the indexes seems to be
designed for lookup, not for data fetching. (Basically, PG can use
indexes only to locate potential data, but cannot return data directly
out of an index)

Andreas

pgsql-performance by date:

Previous
From: David Brain
Date:
Subject: Potential memory usage issue
Next
From: Bill Moran
Date:
Subject: Re: Performance of count(*)