Thread: Strange count(*) implementation?

Strange count(*) implementation?

From
Henk Ernst Blok
Date:
Hi Posgres users/developers,

Can anyone explain why PosgreSQL (version 7.4.5 on Linux) does a full
table scan to compute a count(*) on a base table after a vacuum analyze
has been done with no following updates that might have outdated any
statistics. Strangly the explain command does give the correct number of
tuples instantaniously from the catalog, as one would expect. Still the
optimizer thinks it needs a full table scan to do count.

See example below:

------8<---------------------------------------------------------------------------------------------

TestDB=# \d test_tbl;
 Table "public.test_tbl"
 Column |  Type   | Modifiers
--------+---------+-----------
 pre    | integer | not null
 name   | text    | not null
Indexes:
    "test_tbl_pkey" primary key, btree (pre)
    "test_tbl_pre_index" unique, btree (pre)
    "test_tbl_name_index" btree (name)

TestDB=# explain select count(*) from test_tbl;
                                 QUERY PLAN
----------------------------------------------------------------------------
 Aggregate  (cost=34293.60..34293.60 rows=1 width=0)
   ->  Seq Scan on test_tbl  (cost=0.00..34293.60 rows=166558 width=0)
(2 rows)

Time: 25.188 ms
TestDB=# select count(*) from test_tbl;
 count
--------
 166558
(1 row)

Time: 1024.803 ms
TestDB=#

------8<---------------------------------------------------------------------------------------------

The consequence of this seemingly odd count implementation is a very
very slow count.

Regards,


Henk Ernst Blok

--
address: DB group, Computer Science, EEMCS Dept., University of Twente,
         PO Box 217, 7500 AE, ENSCHEDE, THE NETHERLANDS
phone:   ++31 (0)53 489 3754 (if no response: 3690)
email:   h.e.blok@utwente.nl
WWW:     http://www.cs.utwente.nl/~blokh


Re: Strange count(*) implementation?

From
Richard Huxton
Date:
Henk Ernst Blok wrote:
> Hi Posgres users/developers,
>
> Can anyone explain why PosgreSQL (version 7.4.5 on Linux) does a full
> table scan to compute a count(*) on a base table after a vacuum analyze
> has been done with no following updates that might have outdated any
> statistics. Strangly the explain command does give the correct number of
> tuples instantaniously from the catalog, as one would expect. Still the
> optimizer thinks it needs a full table scan to do count.

> The consequence of this seemingly odd count implementation is a very
> very slow count.

To put it simply, count() doesn't look at the statistics because most of
the time they are out of date. In any case, they aren't useful for any
query with a WHERE clause.

The next most obvious choice is to use the primary-key index rather than
scanning the table. However, MVCC means that we can have multiple views
of the table, with some backends seeing a different number of rows than
others. So - either we need to store multiple index-entry versions as
well as multiple row versions or you need to check the actual row in
these cases. PostgreSQL does the second, which results in the full scan
which you see.

There is plenty of discussion of this (and also max()/min() aggregate
functions) in the mailing list archives.

HTH
--
   Richard Huxton
   Archonet Ltd

Re: Strange count(*) implementation?

From
Tino Wildenhain
Date:
hi,

On Tue, 2004-10-26 at 10:16, Henk Ernst Blok wrote:
> Hi Posgres users/developers,
>
> Can anyone explain why PosgreSQL (version 7.4.5 on Linux) does a full
> table scan to compute a count(*) on a base table after a vacuum analyze
> has been done with no following updates that might have outdated any
> statistics. Strangly the explain command does give the correct number of
> tuples instantaniously from the catalog, as one would expect. Still the
> optimizer thinks it needs a full table scan to do count.
>
...
> The consequence of this seemingly odd count implementation is a very
> very slow count.

How should the query planner know the vacuum was recent enough and there
were no modifications to the table since?

If you are interested in rough numbers you could read the system tables
for the last vacuum statistics. If you need fast count and can spend
some cycles on inserts, just make a buffer table with count results
after insert.

Unqualified count e.g. without a WHERE clause should not need to
be used a lot.

Regards
Tino


Re: Strange count(*) implementation?

From
Henk Ernst Blok
Date:
Hi,

My question was more of a fundamental nature as this count by scan seemed to contradict the theory about how to optimize it.

I assume(d) the more expensive statistics (e.g., value distribution info) are updated only when outdated too much or on request (manual vacuum). Usually, other/cheap statistics can easily be maintained incrementally and thus reflect actual table state after each update. Of course, the MVCC principle seems to make things a bit more complicated I understand now. But tracking whether statistics are dirty has to be in the system anyway. How does it otherwise decide when to do its own statistics updates? So  if explain can get the most recent count, why not use it in the count as well if you know the statistics are still acurate?

By the way, a count(*) without any where does occur very frequently if you are dealing with an OLAP load, which is the case in my setting. So, I indeed already 'fixed' the performance problem by precomputing all counts I can predict to be of any use.

Anyway, I understood this issue has been subject to discusion before I was on the list (searching the archive/website was/is not very effective, so I didn't know until someone told me so, sorry). So, I leave it to the developers what to do with this topic.

Regards,



Henk Ernst


Tino Wildenhain wrote:
hi,

On Tue, 2004-10-26 at 10:16, Henk Ernst Blok wrote: 
Hi Posgres users/developers,

Can anyone explain why PosgreSQL (version 7.4.5 on Linux) does a full 
table scan to compute a count(*) on a base table after a vacuum analyze 
has been done with no following updates that might have outdated any 
statistics. Strangly the explain command does give the correct number of 
tuples instantaniously from the catalog, as one would expect. Still the 
optimizer thinks it needs a full table scan to do count.
   
... 
The consequence of this seemingly odd count implementation is a very 
very slow count.   
How should the query planner know the vacuum was recent enough and there
were no modifications to the table since?

If you are interested in rough numbers you could read the system tables
for the last vacuum statistics. If you need fast count and can spend
some cycles on inserts, just make a buffer table with count results
after insert.

Unqualified count e.g. without a WHERE clause should not need to 
be used a lot.

Regards
Tino 

-- 
address: DB group, Computer Science, EEMCS Dept., University of Twente,        PO Box 217, 7500 AE, ENSCHEDE, THE NETHERLANDS
phone:   ++31 (0)53 489 3754 (if no response: 3690)
email:   h.e.blok@utwente.nl
WWW:     http://www.cs.utwente.nl/~blokh

Re: Strange count(*) implementation?

From
Neil Conway
Date:
Henk Ernst Blok wrote:
> I assume(d) the more expensive statistics (e.g., value distribution
> info) are updated only when outdated too much or on request (manual
> vacuum).

They are only updated on request -- i.e. when an ANALYZE is issued.

> So  if explain can get the most recent count, why
> not use it in the count as well if you know the statistics are still
> acurate?

Aside from the issue of stale statistics, there is another problem:
optimizer statistics are designed to be approximations. They are not
necessarily precise, even if ANALYZE has just been run (for example,
pg_class.reltuples is stored as a floating point number).

A practical problem is that aggregates like count() are implemented via
a general-purpose API; there is currently no provision for bypassing the
API in certain special case scenarios. See here for more info:

http://developer.postgresql.org/docs/postgres/functions-aggregate.html

-Neil

Re: Strange count(*) implementation?

From
Alvaro Herrera
Date:
On Tue, Oct 26, 2004 at 01:56:41PM +0200, Henk Ernst Blok wrote:

Hi,

> I assume(d) the more expensive statistics (e.g., value distribution
> info) are updated only when outdated too much or on request (manual
> vacuum). Usually, other/cheap statistics can easily be maintained
> incrementally and thus reflect actual table state after each update. Of
> course, the MVCC principle seems to make things a bit more complicated I
> understand now.

It's not only MVCC; it's also the fact that aggregates are extensible.
So to the system they are just opaque functions and it doesn't know how
to optimize them.

Of course this can be done, e.g. by supplying an optimizing function with
each aggregate that would try to convert the step-by-step aggregate
execution plan into a completely different operation (say by looking at
an index to obtain a max() value), but as you see this is not a trivial
task, and more importantly, it hasn't been done.  Which is to mean, if
you want to try and submit a patch, it could be improved in the future.

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"There is evil in the world. There are dark, awful things. Occasionally, we get
a glimpse of them. But there are dark corners; horrors almost impossible to
imagine... even in our worst nightmares." (Van Helsing, Dracula A.D. 1972)


Re: Strange count(*) implementation?

From
Tino Wildenhain
Date:
On Tue, 2004-10-26 at 13:56, Henk Ernst Blok wrote:
> Hi,
>
> My question was more of a fundamental nature as this count by scan
> seemed to contradict the theory about how to optimize it.

It is hard or next to impossible to optimize count() or more generally
aggregates in a MVCC environment. Note also all the cases where
you have a qualified select to calculate the aggregate.

> I assume(d) the more expensive statistics (e.g., value distribution
> info) are updated only when outdated too much or on request (manual
> vacuum). Usually, other/cheap statistics can easily be maintained
> incrementally and thus reflect actual table state after each update.

I remember some discussion about this. But also here the MVCC and
the general call for performance leads to the current solution
where statistics are only updated on vacuum. With PG8.0 you have
vacuum strategies with better performance which can run more often
as I understand - so while not giving you exact figures your
count() could be estimated at least.

>  Of course, the MVCC principle seems to make things a bit more
> complicated I understand now. But tracking whether statistics are
> dirty has to be in the system anyway. How does it otherwise decide
> when to do its own statistics updates? So  if explain can get the most
> recent count, why not use it in the count as well if you know the
> statistics are still acurate?

The point is: you dont know it. There is curently no: mark statistics
dirty if table has new tuples or tuples removed.

> By the way, a count(*) without any where does occur very frequently if
> you are dealing with an OLAP load, which is the case in my setting.
> So, I indeed already 'fixed' the performance problem by precomputing
> all counts I can predict to be of any use.

I'm not familar with OLAP specifics, so what is the meaning of the
count() here? What is done with this information?

> Anyway, I understood this issue has been subject to discusion before I
> was on the list (searching the archive/website was/is not very
> effective, so I didn't know until someone told me so, sorry). So, I
> leave it to the developers what to do with this topic.

Yes, very very often I can tell you :-)
Really this is an FAQ :-)

Regards
Tino


Re: Strange count(*) implementation?

From
Henk Ernst Blok
Date:
Hi Tino,

Tino Wildenhain wrote:
I assume(d) the more expensive statistics (e.g., value distribution
info) are updated only when outdated too much or on request (manual
vacuum). Usually, other/cheap statistics can easily be maintained
incrementally and thus reflect actual table state after each update.   
I remember some discussion about this. But also here the MVCC and
the general call for performance leads to the current solution 
where statistics are only updated on vacuum. With PG8.0 you have
vacuum strategies with better performance which can run more often
as I understand - so while not giving you exact figures your
count() could be estimated at least. 
I need exact figures at the moment due to the type of some scientific experiments I'm running. Approximations are in the pipeline but I need a base-line run without any tricks that affect the accuracy of the numbers involved.
 Of course, the MVCC principle seems to make things a bit more
complicated I understand now. But tracking whether statistics are
dirty has to be in the system anyway. How does it otherwise decide
when to do its own statistics updates? So  if explain can get the most
recent count, why not use it in the count as well if you know the
statistics are still acurate?    
The point is: you dont know it. There is curently no: mark statistics
dirty if table has new tuples or tuples removed. 
OK. Didn't know that, but got the impression by now it worked that way in Postgres.
By the way, a count(*) without any where does occur very frequently if
you are dealing with an OLAP load, which is the case in my setting.
So, I indeed already 'fixed' the performance problem by precomputing
all counts I can predict to be of any use.   
I'm not familar with OLAP specifics, so what is the meaning of the
count() here? What is done with this information? 
OLAP stands for OnLine Analystical Processing, typically meaning heavy queries with a lot of data (for typical examples, see: http://www.tpc.org/
the TPC-H query set in particular). So decision support and datamining are in that area for instance. My topic of interest is IR (information retrieval) in a database context. My experiments behave like an OLAP load at the moment. My current experiments involve a lot of counting and expensive joins as I have to compute certain estimators in a mathematical model I'm working on, hence the importance of the count... ;)
On MySQL each of the 30 queries I have to run took on average about 24 h. As my queries are getting even complexer I'm now trying to find out whether Postgres can do a better job.

Regards,


Henk Ernst
-- 
address: DB group, Computer Science, EEMCS Dept., University of Twente,        PO Box 217, 7500 AE, ENSCHEDE, THE NETHERLANDS
phone:   ++31 (0)53 489 3754 (if no response: 3690)
email:   h.e.blok@utwente.nl
WWW:     http://www.cs.utwente.nl/~blokh

Re: Strange count(*) implementation?

From
Tino Wildenhain
Date:
Hi,

On Tue, 2004-10-26 at 15:25, Henk Ernst Blok wrote:
...

> the TPC-H query set in particular). So decision support and datamining
> are in that area for instance. My topic of interest is IR (information
> retrieval) in a database context. My experiments behave like an OLAP
> load at the moment. My current experiments involve a lot of counting
> and expensive joins as I have to compute certain estimators in a
> mathematical model I'm working on, hence the importance of the
> count... ;)
> On MySQL each of the 30 queries I have to run took on average about 24
> h. As my queries are getting even complexer I'm now trying to find out
> whether Postgres can do a better job.

In your specific application if you have not many inserts
or have a phase where you do the inserts and another distinct
phase where you do the analysis, you should be able to
count() just before your analysis. If not you can always
experiment with triggers to count() in an optimized way
using just another table to store the count value
for every table you need.

INSERT/DELETE via function, use a trigger and/or RULES.
This should do the trick.
Maybe you can precalculate a lot more - depending on
the algorithms you use.

Regards
Tino


Re: Strange count(*) implementation?

From
Henk Ernst Blok
Date:
Tino,

Thanks for the sugestion about exploiting the rules system. I hadn't thought about that option yet. Currently I'm trying to pre-compute as much as possible.

Regards,


Henk Ernst

Tino Wildenhain wrote:
Hi,

On Tue, 2004-10-26 at 15:25, Henk Ernst Blok wrote:
...
 
the TPC-H query set in particular). So decision support and datamining
are in that area for instance. My topic of interest is IR (information
retrieval) in a database context. My experiments behave like an OLAP
load at the moment. My current experiments involve a lot of counting
and expensive joins as I have to compute certain estimators in a
mathematical model I'm working on, hence the importance of the
count... ;) 
On MySQL each of the 30 queries I have to run took on average about 24
h. As my queries are getting even complexer I'm now trying to find out
whether Postgres can do a better job.   
In your specific application if you have not many inserts
or have a phase where you do the inserts and another distinct
phase where you do the analysis, you should be able to
count() just before your analysis. If not you can always
experiment with triggers to count() in an optimized way
using just another table to store the count value
for every table you need.

INSERT/DELETE via function, use a trigger and/or RULES.
This should do the trick.
Maybe you can precalculate a lot more - depending on
the algorithms you use.

Regards
Tino


---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate     subscribe-nomail command to majordomo@postgresql.org so that your     message can get through to the mailing list cleanly 

-- 
address: DB group, Computer Science, EEMCS Dept., University of Twente,        PO Box 217, 7500 AE, ENSCHEDE, THE NETHERLANDS
phone:   ++31 (0)53 489 3754 (if no response: 3690)
email:   h.e.blok@utwente.nl
WWW:     http://www.cs.utwente.nl/~blokh

Re: Strange count(*) implementation?

From
Henk Ernst Blok
Date:
Hi Alvaro,

I used to do some research in extensibility of query optimizers to match the extensibility of the operators. However, it's not really in the focus of my research anymore so I can't spend much time on it, unfortunately. I'll keep it in mind in case a student of the group where I'm working is looking for an project.

Regards,


Henk Ernst.

Alvaro Herrera wrote:
On Tue, Oct 26, 2004 at 01:56:41PM +0200, Henk Ernst Blok wrote:

Hi,
 
I assume(d) the more expensive statistics (e.g., value distribution 
info) are updated only when outdated too much or on request (manual 
vacuum). Usually, other/cheap statistics can easily be maintained 
incrementally and thus reflect actual table state after each update. Of 
course, the MVCC principle seems to make things a bit more complicated I 
understand now.   
It's not only MVCC; it's also the fact that aggregates are extensible.
So to the system they are just opaque functions and it doesn't know how
to optimize them.

Of course this can be done, e.g. by supplying an optimizing function with
each aggregate that would try to convert the step-by-step aggregate
execution plan into a completely different operation (say by looking at
an index to obtain a max() value), but as you see this is not a trivial
task, and more importantly, it hasn't been done.  Which is to mean, if
you want to try and submit a patch, it could be improved in the future.
 

-- 
address: DB group, Computer Science, EEMCS Dept., University of Twente,        PO Box 217, 7500 AE, ENSCHEDE, THE NETHERLANDS
phone:   ++31 (0)53 489 3754 (if no response: 3690)
email:   h.e.blok@utwente.nl
WWW:     http://www.cs.utwente.nl/~blokh