Thread: constant time count(*) ?

constant time count(*) ?

From
Mark Harrison
Date:
We're looking into moving some data from mysql to postgresql, and
notice that count(*) does not seem to be a constant-time function
as it seems to be in mysql.

planb=# explain select count(*) from assets;
                            QUERY PLAN
----------------------------------------------------------------
  Aggregate  (cost=22.50..22.50 rows=1 width=0)
    ->  Seq Scan on assets  (cost=0.00..20.00 rows=1000 width=0)
(2 rows)

Is there a way to optimize count(*) such that it does not have
to do a sequential scan?  We use this on some big tables and it
is slowing down processing quite a lot.

Thanks!
Mark

--
Mark Harrison
Pixar Animation Studios


Re: constant time count(*) ?

From
Peter Eisentraut
Date:
Mark Harrison writes:

> Is there a way to optimize count(*) such that it does not have
> to do a sequential scan?

No.  If you need to count a lot, you need to store the information
separately.

--
Peter Eisentraut   peter_e@gmx.net


Re: constant time count(*) ?

From
"Dann Corbit"
Date:
This should definitely be a FAQ.

The semantics of MVCC (multi-version concurrency control) means that you
can't just store a number somewhere in the header of the table like some
other database systems do.

Try a count(*) on Oracle and you will see similar behavior.  They use
MVCC also.

> -----Original Message-----
> From: Mark Harrison [mailto:mh@pixar.com]
> Sent: Wednesday, October 15, 2003 11:00 AM
> To: pgsql-general@postgresql.org
> Subject: [GENERAL] constant time count(*) ?
>
>
> We're looking into moving some data from mysql to postgresql,
> and notice that count(*) does not seem to be a constant-time
> function as it seems to be in mysql.
>
> planb=# explain select count(*) from assets;
>                             QUERY PLAN
> ----------------------------------------------------------------
>   Aggregate  (cost=22.50..22.50 rows=1 width=0)
>     ->  Seq Scan on assets  (cost=0.00..20.00 rows=1000
> width=0) (2 rows)
>
> Is there a way to optimize count(*) such that it does not
> have to do a sequential scan?  We use this on some big tables
> and it is slowing down processing quite a lot.
>
> Thanks!
> Mark
>
> --
> Mark Harrison
> Pixar Animation Studios
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to
> majordomo@postgresql.org)
>

Re: constant time count(*) ?

From
Andrew Sullivan
Date:
On Wed, Oct 15, 2003 at 11:00:10AM -0700, Mark Harrison wrote:
>
> Is there a way to optimize count(*) such that it does not have
> to do a sequential scan?  We use this on some big tables and it
> is slowing down processing quite a lot.

No.  There's a busload of discussion on this topic in the archives.
If you need an approximate value, you can get it from the system
tables.

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Afilias Canada                        Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


Re: constant time count(*) ?

From
Tino Wildenhain
Date:
Hi Mark,

Mark Harrison wrote:
> We're looking into moving some data from mysql to postgresql, and
> notice that count(*) does not seem to be a constant-time function
> as it seems to be in mysql.
>
> planb=# explain select count(*) from assets;
>                            QUERY PLAN
> ----------------------------------------------------------------
>  Aggregate  (cost=22.50..22.50 rows=1 width=0)
>    ->  Seq Scan on assets  (cost=0.00..20.00 rows=1000 width=0)
> (2 rows)
>
> Is there a way to optimize count(*) such that it does not have
> to do a sequential scan?  We use this on some big tables and it
> is slowing down processing quite a lot.

How do you need an unqualified
select count(*) on a table so often
it is making a problem?

Regards
Tino