Thread: Performance of count(*) on large tables vs SQL Server
Doing some rather crude comparative performance tests between PG 8.0.1 on Windows XP and SQL Server 2000, PG whips SQL Server's ass on insert into junk (select * from junk) on a one column table defined as int. If we start with a 1 row table and repeatedly execute this command, PG can take the table from 500K rows to 1M rows in 20 seconds; SQL Server is at least twice as slow. BUT... SQL Server can do select count(*) on junk in almost no time at all, probably because this query can be optimised to go back and use catalogue statistics. PG, on the other hand, appears to do a full table scan to answer this question, taking nearly 4 seconds to process the query. Doing an ANALYZE on the table and also VACUUM did not seem to affect this. Can PG find a table's row count more efficiently?. This is not an unusual practice in commercial applications which assume that count(*) with no WHERE clause will be a cheap query - and use it to test if a table is empty, for instance. (because for Oracle/Sybase/SQL Server, count(*) is cheap). (sure, I appreciate there are other ways of doing this, but I am curious about the way PG works here).
On Tuesday 01 Feb 2005 6:11 pm, Andrew Mayo wrote: > PG, on the other hand, appears to do a full table scan > to answer this question, taking nearly 4 seconds to > process the query. > > Doing an ANALYZE on the table and also VACUUM did not > seem to affect this. > > Can PG find a table's row count more efficiently?. > This is not an unusual practice in commercial > applications which assume that count(*) with no WHERE > clause will be a cheap query - and use it to test if > a table is empty, for instance. (because for > Oracle/Sybase/SQL Server, count(*) is cheap). First of all, such an assumption is no good. It should hit concurrency under heavy load but I know people do use it. For the specific question, after a vacuum analyze, you can use select reltuples from pg_class where relname='Foo'; Remember, you will get different results between 'analyze' and 'vacuum analyze', since later actually visit every page in the table and hence is expected to be more accurate. > (sure, I appreciate there are other ways of doing > this, but I am curious about the way PG works here). Answer is MVCC and PG's inability use index alone. This has been a FAQ for a loong time.. Furthermore PG has custom aggregates to complicate the matter.. Most of the pg developers/users think that unqualified select count(*) is of no use. You can search the archives for more details.. HTH Shridhar
Hello Andrew, Everything that Shridhar says makes perfect sense, and, speaking from experience in dealing with this type of 'problem', everything you say does as well. Such is life really :) I would not be at -all- surprised if Sybase and Oracle did query re-writing behind the scene's to send un-defined count's to a temporary table which holds the row count. For an example of such done in postgreSQL (using triggers and a custom procedure) look into the 'General Bits' newsletter. Specifically http://www.varlena.com/varlena/GeneralBits/49.php I know, giving a URL as an answer 'sucks', but, well, it simply repeats my experience. Triggers and Procedures. Regards Steph On Tue, Feb 01, 2005 at 06:32:56PM +0530, Shridhar Daithankar wrote: > On Tuesday 01 Feb 2005 6:11 pm, Andrew Mayo wrote: > > PG, on the other hand, appears to do a full table scan > > to answer this question, taking nearly 4 seconds to > > process the query. > > > > Doing an ANALYZE on the table and also VACUUM did not > > seem to affect this. > > > > Can PG find a table's row count more efficiently?. > > This is not an unusual practice in commercial > > applications which assume that count(*) with no WHERE > > clause will be a cheap query - and use it to test if > > a table is empty, for instance. (because for > > Oracle/Sybase/SQL Server, count(*) is cheap). > > First of all, such an assumption is no good. It should hit concurrency under > heavy load but I know people do use it. > > For the specific question, after a vacuum analyze, you can use > > select reltuples from pg_class where relname='Foo'; > > Remember, you will get different results between 'analyze' and 'vacuum > analyze', since later actually visit every page in the table and hence is > expected to be more accurate. > > > (sure, I appreciate there are other ways of doing > > this, but I am curious about the way PG works here). > > Answer is MVCC and PG's inability use index alone. This has been a FAQ for a > loong time.. Furthermore PG has custom aggregates to complicate the matter.. > > Most of the pg developers/users think that unqualified select count(*) is of > no use. You can search the archives for more details.. > > HTH > > Shridhar > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org >
Attachment
> clause will be a cheap query - and use it to test if > a table is empty, for instance. (because for > Oracle/Sybase/SQL Server, count(*) is cheap). To test if a table is empty, use a SELECT EXISTS or whatever SELECT with a LIMIT 1...