Thread: constant time count(*) ?
We're looking into moving some data from mysql to postgresql, and notice that count(*) does not seem to be a constant-time function as it seems to be in mysql. planb=# explain select count(*) from assets; QUERY PLAN ---------------------------------------------------------------- Aggregate (cost=22.50..22.50 rows=1 width=0) -> Seq Scan on assets (cost=0.00..20.00 rows=1000 width=0) (2 rows) Is there a way to optimize count(*) such that it does not have to do a sequential scan? We use this on some big tables and it is slowing down processing quite a lot. Thanks! Mark -- Mark Harrison Pixar Animation Studios
Mark Harrison writes: > Is there a way to optimize count(*) such that it does not have > to do a sequential scan? No. If you need to count a lot, you need to store the information separately. -- Peter Eisentraut peter_e@gmx.net
This should definitely be a FAQ. The semantics of MVCC (multi-version concurrency control) means that you can't just store a number somewhere in the header of the table like some other database systems do. Try a count(*) on Oracle and you will see similar behavior. They use MVCC also. > -----Original Message----- > From: Mark Harrison [mailto:mh@pixar.com] > Sent: Wednesday, October 15, 2003 11:00 AM > To: pgsql-general@postgresql.org > Subject: [GENERAL] constant time count(*) ? > > > We're looking into moving some data from mysql to postgresql, > and notice that count(*) does not seem to be a constant-time > function as it seems to be in mysql. > > planb=# explain select count(*) from assets; > QUERY PLAN > ---------------------------------------------------------------- > Aggregate (cost=22.50..22.50 rows=1 width=0) > -> Seq Scan on assets (cost=0.00..20.00 rows=1000 > width=0) (2 rows) > > Is there a way to optimize count(*) such that it does not > have to do a sequential scan? We use this on some big tables > and it is slowing down processing quite a lot. > > Thanks! > Mark > > -- > Mark Harrison > Pixar Animation Studios > > > ---------------------------(end of > broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to > majordomo@postgresql.org) >
On Wed, Oct 15, 2003 at 11:00:10AM -0700, Mark Harrison wrote: > > Is there a way to optimize count(*) such that it does not have > to do a sequential scan? We use this on some big tables and it > is slowing down processing quite a lot. No. There's a busload of discussion on this topic in the archives. If you need an approximate value, you can get it from the system tables. A -- ---- Andrew Sullivan 204-4141 Yonge Street Afilias Canada Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110
Hi Mark, Mark Harrison wrote: > We're looking into moving some data from mysql to postgresql, and > notice that count(*) does not seem to be a constant-time function > as it seems to be in mysql. > > planb=# explain select count(*) from assets; > QUERY PLAN > ---------------------------------------------------------------- > Aggregate (cost=22.50..22.50 rows=1 width=0) > -> Seq Scan on assets (cost=0.00..20.00 rows=1000 width=0) > (2 rows) > > Is there a way to optimize count(*) such that it does not have > to do a sequential scan? We use this on some big tables and it > is slowing down processing quite a lot. How do you need an unqualified select count(*) on a table so often it is making a problem? Regards Tino