Thread: Strange count(*) implementation?
Hi Posgres users/developers, Can anyone explain why PosgreSQL (version 7.4.5 on Linux) does a full table scan to compute a count(*) on a base table after a vacuum analyze has been done with no following updates that might have outdated any statistics. Strangly the explain command does give the correct number of tuples instantaniously from the catalog, as one would expect. Still the optimizer thinks it needs a full table scan to do count. See example below: ------8<--------------------------------------------------------------------------------------------- TestDB=# \d test_tbl; Table "public.test_tbl" Column | Type | Modifiers --------+---------+----------- pre | integer | not null name | text | not null Indexes: "test_tbl_pkey" primary key, btree (pre) "test_tbl_pre_index" unique, btree (pre) "test_tbl_name_index" btree (name) TestDB=# explain select count(*) from test_tbl; QUERY PLAN ---------------------------------------------------------------------------- Aggregate (cost=34293.60..34293.60 rows=1 width=0) -> Seq Scan on test_tbl (cost=0.00..34293.60 rows=166558 width=0) (2 rows) Time: 25.188 ms TestDB=# select count(*) from test_tbl; count -------- 166558 (1 row) Time: 1024.803 ms TestDB=# ------8<--------------------------------------------------------------------------------------------- The consequence of this seemingly odd count implementation is a very very slow count. Regards, Henk Ernst Blok -- address: DB group, Computer Science, EEMCS Dept., University of Twente, PO Box 217, 7500 AE, ENSCHEDE, THE NETHERLANDS phone: ++31 (0)53 489 3754 (if no response: 3690) email: h.e.blok@utwente.nl WWW: http://www.cs.utwente.nl/~blokh
Henk Ernst Blok wrote: > Hi Posgres users/developers, > > Can anyone explain why PosgreSQL (version 7.4.5 on Linux) does a full > table scan to compute a count(*) on a base table after a vacuum analyze > has been done with no following updates that might have outdated any > statistics. Strangly the explain command does give the correct number of > tuples instantaniously from the catalog, as one would expect. Still the > optimizer thinks it needs a full table scan to do count. > The consequence of this seemingly odd count implementation is a very > very slow count. To put it simply, count() doesn't look at the statistics because most of the time they are out of date. In any case, they aren't useful for any query with a WHERE clause. The next most obvious choice is to use the primary-key index rather than scanning the table. However, MVCC means that we can have multiple views of the table, with some backends seeing a different number of rows than others. So - either we need to store multiple index-entry versions as well as multiple row versions or you need to check the actual row in these cases. PostgreSQL does the second, which results in the full scan which you see. There is plenty of discussion of this (and also max()/min() aggregate functions) in the mailing list archives. HTH -- Richard Huxton Archonet Ltd
hi, On Tue, 2004-10-26 at 10:16, Henk Ernst Blok wrote: > Hi Posgres users/developers, > > Can anyone explain why PosgreSQL (version 7.4.5 on Linux) does a full > table scan to compute a count(*) on a base table after a vacuum analyze > has been done with no following updates that might have outdated any > statistics. Strangly the explain command does give the correct number of > tuples instantaniously from the catalog, as one would expect. Still the > optimizer thinks it needs a full table scan to do count. > ... > The consequence of this seemingly odd count implementation is a very > very slow count. How should the query planner know the vacuum was recent enough and there were no modifications to the table since? If you are interested in rough numbers you could read the system tables for the last vacuum statistics. If you need fast count and can spend some cycles on inserts, just make a buffer table with count results after insert. Unqualified count e.g. without a WHERE clause should not need to be used a lot. Regards Tino
Hi,
My question was more of a fundamental nature as this count by scan seemed to contradict the theory about how to optimize it.
I assume(d) the more expensive statistics (e.g., value distribution info) are updated only when outdated too much or on request (manual vacuum). Usually, other/cheap statistics can easily be maintained incrementally and thus reflect actual table state after each update. Of course, the MVCC principle seems to make things a bit more complicated I understand now. But tracking whether statistics are dirty has to be in the system anyway. How does it otherwise decide when to do its own statistics updates? So if explain can get the most recent count, why not use it in the count as well if you know the statistics are still acurate?
By the way, a count(*) without any where does occur very frequently if you are dealing with an OLAP load, which is the case in my setting. So, I indeed already 'fixed' the performance problem by precomputing all counts I can predict to be of any use.
Anyway, I understood this issue has been subject to discusion before I was on the list (searching the archive/website was/is not very effective, so I didn't know until someone told me so, sorry). So, I leave it to the developers what to do with this topic.
Regards,
Henk Ernst
Tino Wildenhain wrote:
My question was more of a fundamental nature as this count by scan seemed to contradict the theory about how to optimize it.
I assume(d) the more expensive statistics (e.g., value distribution info) are updated only when outdated too much or on request (manual vacuum). Usually, other/cheap statistics can easily be maintained incrementally and thus reflect actual table state after each update. Of course, the MVCC principle seems to make things a bit more complicated I understand now. But tracking whether statistics are dirty has to be in the system anyway. How does it otherwise decide when to do its own statistics updates? So if explain can get the most recent count, why not use it in the count as well if you know the statistics are still acurate?
By the way, a count(*) without any where does occur very frequently if you are dealing with an OLAP load, which is the case in my setting. So, I indeed already 'fixed' the performance problem by precomputing all counts I can predict to be of any use.
Anyway, I understood this issue has been subject to discusion before I was on the list (searching the archive/website was/is not very effective, so I didn't know until someone told me so, sorry). So, I leave it to the developers what to do with this topic.
Regards,
Henk Ernst
Tino Wildenhain wrote:
hi, On Tue, 2004-10-26 at 10:16, Henk Ernst Blok wrote:Hi Posgres users/developers, Can anyone explain why PosgreSQL (version 7.4.5 on Linux) does a full table scan to compute a count(*) on a base table after a vacuum analyze has been done with no following updates that might have outdated any statistics. Strangly the explain command does give the correct number of tuples instantaniously from the catalog, as one would expect. Still the optimizer thinks it needs a full table scan to do count....The consequence of this seemingly odd count implementation is a very very slow count.How should the query planner know the vacuum was recent enough and there were no modifications to the table since? If you are interested in rough numbers you could read the system tables for the last vacuum statistics. If you need fast count and can spend some cycles on inserts, just make a buffer table with count results after insert. Unqualified count e.g. without a WHERE clause should not need to be used a lot. Regards Tino
-- address: DB group, Computer Science, EEMCS Dept., University of Twente, PO Box 217, 7500 AE, ENSCHEDE, THE NETHERLANDS phone: ++31 (0)53 489 3754 (if no response: 3690) email: h.e.blok@utwente.nl WWW: http://www.cs.utwente.nl/~blokh
Henk Ernst Blok wrote: > I assume(d) the more expensive statistics (e.g., value distribution > info) are updated only when outdated too much or on request (manual > vacuum). They are only updated on request -- i.e. when an ANALYZE is issued. > So if explain can get the most recent count, why > not use it in the count as well if you know the statistics are still > acurate? Aside from the issue of stale statistics, there is another problem: optimizer statistics are designed to be approximations. They are not necessarily precise, even if ANALYZE has just been run (for example, pg_class.reltuples is stored as a floating point number). A practical problem is that aggregates like count() are implemented via a general-purpose API; there is currently no provision for bypassing the API in certain special case scenarios. See here for more info: http://developer.postgresql.org/docs/postgres/functions-aggregate.html -Neil
On Tue, Oct 26, 2004 at 01:56:41PM +0200, Henk Ernst Blok wrote: Hi, > I assume(d) the more expensive statistics (e.g., value distribution > info) are updated only when outdated too much or on request (manual > vacuum). Usually, other/cheap statistics can easily be maintained > incrementally and thus reflect actual table state after each update. Of > course, the MVCC principle seems to make things a bit more complicated I > understand now. It's not only MVCC; it's also the fact that aggregates are extensible. So to the system they are just opaque functions and it doesn't know how to optimize them. Of course this can be done, e.g. by supplying an optimizing function with each aggregate that would try to convert the step-by-step aggregate execution plan into a completely different operation (say by looking at an index to obtain a max() value), but as you see this is not a trivial task, and more importantly, it hasn't been done. Which is to mean, if you want to try and submit a patch, it could be improved in the future. -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "There is evil in the world. There are dark, awful things. Occasionally, we get a glimpse of them. But there are dark corners; horrors almost impossible to imagine... even in our worst nightmares." (Van Helsing, Dracula A.D. 1972)
On Tue, 2004-10-26 at 13:56, Henk Ernst Blok wrote: > Hi, > > My question was more of a fundamental nature as this count by scan > seemed to contradict the theory about how to optimize it. It is hard or next to impossible to optimize count() or more generally aggregates in a MVCC environment. Note also all the cases where you have a qualified select to calculate the aggregate. > I assume(d) the more expensive statistics (e.g., value distribution > info) are updated only when outdated too much or on request (manual > vacuum). Usually, other/cheap statistics can easily be maintained > incrementally and thus reflect actual table state after each update. I remember some discussion about this. But also here the MVCC and the general call for performance leads to the current solution where statistics are only updated on vacuum. With PG8.0 you have vacuum strategies with better performance which can run more often as I understand - so while not giving you exact figures your count() could be estimated at least. > Of course, the MVCC principle seems to make things a bit more > complicated I understand now. But tracking whether statistics are > dirty has to be in the system anyway. How does it otherwise decide > when to do its own statistics updates? So if explain can get the most > recent count, why not use it in the count as well if you know the > statistics are still acurate? The point is: you dont know it. There is curently no: mark statistics dirty if table has new tuples or tuples removed. > By the way, a count(*) without any where does occur very frequently if > you are dealing with an OLAP load, which is the case in my setting. > So, I indeed already 'fixed' the performance problem by precomputing > all counts I can predict to be of any use. I'm not familar with OLAP specifics, so what is the meaning of the count() here? What is done with this information? > Anyway, I understood this issue has been subject to discusion before I > was on the list (searching the archive/website was/is not very > effective, so I didn't know until someone told me so, sorry). So, I > leave it to the developers what to do with this topic. Yes, very very often I can tell you :-) Really this is an FAQ :-) Regards Tino
Hi Tino,
Tino Wildenhain wrote:
the TPC-H query set in particular). So decision support and datamining are in that area for instance. My topic of interest is IR (information retrieval) in a database context. My experiments behave like an OLAP load at the moment. My current experiments involve a lot of counting and expensive joins as I have to compute certain estimators in a mathematical model I'm working on, hence the importance of the count... ;)
On MySQL each of the 30 queries I have to run took on average about 24 h. As my queries are getting even complexer I'm now trying to find out whether Postgres can do a better job.
Regards,
Henk Ernst
Tino Wildenhain wrote:
I need exact figures at the moment due to the type of some scientific experiments I'm running. Approximations are in the pipeline but I need a base-line run without any tricks that affect the accuracy of the numbers involved.I assume(d) the more expensive statistics (e.g., value distribution info) are updated only when outdated too much or on request (manual vacuum). Usually, other/cheap statistics can easily be maintained incrementally and thus reflect actual table state after each update.I remember some discussion about this. But also here the MVCC and the general call for performance leads to the current solution where statistics are only updated on vacuum. With PG8.0 you have vacuum strategies with better performance which can run more often as I understand - so while not giving you exact figures your count() could be estimated at least.
OK. Didn't know that, but got the impression by now it worked that way in Postgres.Of course, the MVCC principle seems to make things a bit more complicated I understand now. But tracking whether statistics are dirty has to be in the system anyway. How does it otherwise decide when to do its own statistics updates? So if explain can get the most recent count, why not use it in the count as well if you know the statistics are still acurate?The point is: you dont know it. There is curently no: mark statistics dirty if table has new tuples or tuples removed.
OLAP stands for OnLine Analystical Processing, typically meaning heavy queries with a lot of data (for typical examples, see: http://www.tpc.org/By the way, a count(*) without any where does occur very frequently if you are dealing with an OLAP load, which is the case in my setting. So, I indeed already 'fixed' the performance problem by precomputing all counts I can predict to be of any use.I'm not familar with OLAP specifics, so what is the meaning of the count() here? What is done with this information?
the TPC-H query set in particular). So decision support and datamining are in that area for instance. My topic of interest is IR (information retrieval) in a database context. My experiments behave like an OLAP load at the moment. My current experiments involve a lot of counting and expensive joins as I have to compute certain estimators in a mathematical model I'm working on, hence the importance of the count... ;)
On MySQL each of the 30 queries I have to run took on average about 24 h. As my queries are getting even complexer I'm now trying to find out whether Postgres can do a better job.
Regards,
Henk Ernst
-- address: DB group, Computer Science, EEMCS Dept., University of Twente, PO Box 217, 7500 AE, ENSCHEDE, THE NETHERLANDS phone: ++31 (0)53 489 3754 (if no response: 3690) email: h.e.blok@utwente.nl WWW: http://www.cs.utwente.nl/~blokh
Hi, On Tue, 2004-10-26 at 15:25, Henk Ernst Blok wrote: ... > the TPC-H query set in particular). So decision support and datamining > are in that area for instance. My topic of interest is IR (information > retrieval) in a database context. My experiments behave like an OLAP > load at the moment. My current experiments involve a lot of counting > and expensive joins as I have to compute certain estimators in a > mathematical model I'm working on, hence the importance of the > count... ;) > On MySQL each of the 30 queries I have to run took on average about 24 > h. As my queries are getting even complexer I'm now trying to find out > whether Postgres can do a better job. In your specific application if you have not many inserts or have a phase where you do the inserts and another distinct phase where you do the analysis, you should be able to count() just before your analysis. If not you can always experiment with triggers to count() in an optimized way using just another table to store the count value for every table you need. INSERT/DELETE via function, use a trigger and/or RULES. This should do the trick. Maybe you can precalculate a lot more - depending on the algorithms you use. Regards Tino
Tino,
Thanks for the sugestion about exploiting the rules system. I hadn't thought about that option yet. Currently I'm trying to pre-compute as much as possible.
Regards,
Henk Ernst
Tino Wildenhain wrote:
Thanks for the sugestion about exploiting the rules system. I hadn't thought about that option yet. Currently I'm trying to pre-compute as much as possible.
Regards,
Henk Ernst
Tino Wildenhain wrote:
Hi, On Tue, 2004-10-26 at 15:25, Henk Ernst Blok wrote: ...the TPC-H query set in particular). So decision support and datamining are in that area for instance. My topic of interest is IR (information retrieval) in a database context. My experiments behave like an OLAP load at the moment. My current experiments involve a lot of counting and expensive joins as I have to compute certain estimators in a mathematical model I'm working on, hence the importance of the count... ;) On MySQL each of the 30 queries I have to run took on average about 24 h. As my queries are getting even complexer I'm now trying to find out whether Postgres can do a better job.In your specific application if you have not many inserts or have a phase where you do the inserts and another distinct phase where you do the analysis, you should be able to count() just before your analysis. If not you can always experiment with triggers to count() in an optimized way using just another table to store the count value for every table you need. INSERT/DELETE via function, use a trigger and/or RULES. This should do the trick. Maybe you can precalculate a lot more - depending on the algorithms you use. Regards Tino ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly
-- address: DB group, Computer Science, EEMCS Dept., University of Twente, PO Box 217, 7500 AE, ENSCHEDE, THE NETHERLANDS phone: ++31 (0)53 489 3754 (if no response: 3690) email: h.e.blok@utwente.nl WWW: http://www.cs.utwente.nl/~blokh
Hi Alvaro,
I used to do some research in extensibility of query optimizers to match the extensibility of the operators. However, it's not really in the focus of my research anymore so I can't spend much time on it, unfortunately. I'll keep it in mind in case a student of the group where I'm working is looking for an project.
Regards,
Henk Ernst.
Alvaro Herrera wrote:
I used to do some research in extensibility of query optimizers to match the extensibility of the operators. However, it's not really in the focus of my research anymore so I can't spend much time on it, unfortunately. I'll keep it in mind in case a student of the group where I'm working is looking for an project.
Regards,
Henk Ernst.
Alvaro Herrera wrote:
On Tue, Oct 26, 2004 at 01:56:41PM +0200, Henk Ernst Blok wrote: Hi,I assume(d) the more expensive statistics (e.g., value distribution info) are updated only when outdated too much or on request (manual vacuum). Usually, other/cheap statistics can easily be maintained incrementally and thus reflect actual table state after each update. Of course, the MVCC principle seems to make things a bit more complicated I understand now.It's not only MVCC; it's also the fact that aggregates are extensible. So to the system they are just opaque functions and it doesn't know how to optimize them. Of course this can be done, e.g. by supplying an optimizing function with each aggregate that would try to convert the step-by-step aggregate execution plan into a completely different operation (say by looking at an index to obtain a max() value), but as you see this is not a trivial task, and more importantly, it hasn't been done. Which is to mean, if you want to try and submit a patch, it could be improved in the future.
-- address: DB group, Computer Science, EEMCS Dept., University of Twente, PO Box 217, 7500 AE, ENSCHEDE, THE NETHERLANDS phone: ++31 (0)53 489 3754 (if no response: 3690) email: h.e.blok@utwente.nl WWW: http://www.cs.utwente.nl/~blokh