Re: Strange count(*) implementation? - Mailing list pgsql-general

From Henk Ernst Blok
Subject Re: Strange count(*) implementation?
Date
Msg-id 417E3B79.8080902@utwente.nl
Whole thread Raw
In response to Re: Strange count(*) implementation?  (Tino Wildenhain <tino@wildenhain.de>)
Responses Re: Strange count(*) implementation?  (Neil Conway <neilc@samurai.com>)
Re: Strange count(*) implementation?  (Alvaro Herrera <alvherre@dcc.uchile.cl>)
Re: Strange count(*) implementation?  (Tino Wildenhain <tino@wildenhain.de>)
List pgsql-general
Hi,

My question was more of a fundamental nature as this count by scan seemed to contradict the theory about how to optimize it.

I assume(d) the more expensive statistics (e.g., value distribution info) are updated only when outdated too much or on request (manual vacuum). Usually, other/cheap statistics can easily be maintained incrementally and thus reflect actual table state after each update. Of course, the MVCC principle seems to make things a bit more complicated I understand now. But tracking whether statistics are dirty has to be in the system anyway. How does it otherwise decide when to do its own statistics updates? So  if explain can get the most recent count, why not use it in the count as well if you know the statistics are still acurate?

By the way, a count(*) without any where does occur very frequently if you are dealing with an OLAP load, which is the case in my setting. So, I indeed already 'fixed' the performance problem by precomputing all counts I can predict to be of any use.

Anyway, I understood this issue has been subject to discusion before I was on the list (searching the archive/website was/is not very effective, so I didn't know until someone told me so, sorry). So, I leave it to the developers what to do with this topic.

Regards,



Henk Ernst


Tino Wildenhain wrote:
hi,

On Tue, 2004-10-26 at 10:16, Henk Ernst Blok wrote: 
Hi Posgres users/developers,

Can anyone explain why PosgreSQL (version 7.4.5 on Linux) does a full 
table scan to compute a count(*) on a base table after a vacuum analyze 
has been done with no following updates that might have outdated any 
statistics. Strangly the explain command does give the correct number of 
tuples instantaniously from the catalog, as one would expect. Still the 
optimizer thinks it needs a full table scan to do count.
   
... 
The consequence of this seemingly odd count implementation is a very 
very slow count.   
How should the query planner know the vacuum was recent enough and there
were no modifications to the table since?

If you are interested in rough numbers you could read the system tables
for the last vacuum statistics. If you need fast count and can spend
some cycles on inserts, just make a buffer table with count results
after insert.

Unqualified count e.g. without a WHERE clause should not need to 
be used a lot.

Regards
Tino 

-- 
address: DB group, Computer Science, EEMCS Dept., University of Twente,        PO Box 217, 7500 AE, ENSCHEDE, THE NETHERLANDS
phone:   ++31 (0)53 489 3754 (if no response: 3690)
email:   h.e.blok@utwente.nl
WWW:     http://www.cs.utwente.nl/~blokh

pgsql-general by date:

Previous
From: Ian Barwick
Date:
Subject: Re: compatibilityissues from 7.1 to 7.4
Next
From: Neil Conway
Date:
Subject: Re: Strange count(*) implementation?