Thread: counting rows

counting rows

From
pgboy@guthrie.charm.net
Date:
i need to get a row count on a large table. it appears that
  select count(*) from <table>;
always does a table scan, whether i have an index or not, and that is too
slow.

so i found this:
  select reltuples from pg_class where relname = '<table>';

this is nice and fast, but not all that accurate at any given time. but, i
*did* notice that it is pretty accurate if i run analyze just before.

so, i am wondering if analyze also does a table scan, in which case i am
back to where i started. it *seems* to run fairly quickly, even on a large
table, but i am just wall-clocking all of this.

or any other ideas on counting rows are welcome.

thanks.
pg


Re: counting rows

From
greg@turnstep.com
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


> i need to get a row count on a large table. it appears that
>   select count(*) from <table>;
> always does a table scan, whether i have an index or not, and that is too
> slow.


http://www.gtsm.com/oscon2003/findrows.html


- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200308060951
-----BEGIN PGP SIGNATURE-----
Comment: http://www.turnstep.com/pgp.html

iD8DBQE/MQgmvJuQZxSWSsgRAkFDAJwPz5C7OljCg4TYqxvw4/WjLr8TlgCgovtS
xQPuL/l+QCPgmwVDeX0yocM=
=XtnA
-----END PGP SIGNATURE-----



Re: counting rows

From
Tom Lane
Date:
pgboy@guthrie.charm.net writes:
> so, i am wondering if analyze also does a table scan,

It does not --- it does a statistical sampling, and it's quite possible
that the estimate will be badly off.

            regards, tom lane

Re: counting rows

From
pgboy@guthrie.charm.net
Date:
nice stuff (the whole presentation, full of stuff i did not know), thanks.

of course, a question...

the bit that reads:
  If you know the maximum value of an integer column and have an index on
it:
  SELECT COUNT(*) FROM t2 WHERE c2 < 1000000;

made me think that the above would do an index scan vice a table scan. but
when i try it with my table (currently has almost 70,000,000 rows) it
still does a table scan (at least, EXPLAIN says it will do a table scan)
... well, sometimes. the table has two integer fields, SECONDS and MICROS,
and i have created an index on (SECONDS, MICROS). ANALYZEd the table so
the index would be considered useful, tested that pg was happy by running:

  EXPLAIN SELECT count(*) FROM tess WHERE SECONDS < 100;

sure enough, query plan says it'll use an index scan. but...

  EXPLAIN SELECT count(*) from tess where SECONDS < 1100000000;

query plan here says it will use a table scan.

the cutoff, the highest value at which it will still do an index scan, is
1060183451. given that this value is typically gotten from the system call
gettimeofday and the current maximum value is 1060188816 (about noon
today), this is, er, problematic.

any ideas?

thanks.
pg



On Wed, 6 Aug 2003 greg@turnstep.com wrote:

>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
>
> > i need to get a row count on a large table. it appears that
> >   select count(*) from <table>;
> > always does a table scan, whether i have an index or not, and that is too
> > slow.
>
>
> http://www.gtsm.com/oscon2003/findrows.html
>
>
> - --
> Greg Sabino Mullane greg@turnstep.com
> PGP Key: 0x14964AC8 200308060951
> -----BEGIN PGP SIGNATURE-----
> Comment: http://www.turnstep.com/pgp.html
>
> iD8DBQE/MQgmvJuQZxSWSsgRAkFDAJwPz5C7OljCg4TYqxvw4/WjLr8TlgCgovtS
> xQPuL/l+QCPgmwVDeX0yocM=
> =XtnA
> -----END PGP SIGNATURE-----
>
>