Thread: Quick estimate of num of rows & table size

Quick estimate of num of rows & table size

From
Thalis Kalfigkopoulos
Date:
Hi all,

I read somewhere that the following query gives a quick estimate of the # of rows in a table regardless of the table's size (which would matter in a simple SELECT count(*)?):

SELECT (CASE WHEN reltuples > 0 THEN
pg_relation_size('mytable')/(8192*relpages/reltuples)
ELSE 0
END)::bigint AS estimated_row_count
FROM pg_class
WHERE oid = 'mytable'::regclass;

If relpages & reltuples are recorded accurately each time VACUUM is run, wouldn't it be the same to just grab directly the value of reltuples like:

SELECT reltuples FROM pg_class WHERE oid='mytable'::regclass;

In the same manner, are pg_relation_size('mytable') and 8192*relpages the same?

I run both assumptions against a freshly VACUUMed table and they seem correct.

TIA,
Thalis K.

Re: Quick estimate of num of rows & table size

From
Lonni J Friedman
Date:
On Mon, Nov 5, 2012 at 2:02 PM, Thalis Kalfigkopoulos
<tkalfigo@gmail.com> wrote:
> Hi all,
>
> I read somewhere that the following query gives a quick estimate of the # of
> rows in a table regardless of the table's size (which would matter in a
> simple SELECT count(*)?):
>
> SELECT (CASE WHEN reltuples > 0 THEN
> pg_relation_size('mytable')/(8192*relpages/reltuples)
> ELSE 0
> END)::bigint AS estimated_row_count
> FROM pg_class
> WHERE oid = 'mytable'::regclass;
>
> If relpages & reltuples are recorded accurately each time VACUUM is run,
> wouldn't it be the same to just grab directly the value of reltuples like:
>
> SELECT reltuples FROM pg_class WHERE oid='mytable'::regclass;
>
> In the same manner, are pg_relation_size('mytable') and 8192*relpages the
> same?
>
> I run both assumptions against a freshly VACUUMed table and they seem
> correct.

This doesn't seem to work for me.  I get an estimated row_count of 0
on a table that I know has millions of rows.


Re: Quick estimate of num of rows & table size

From
Thalis Kalfigkopoulos
Date:

On Mon, Nov 5, 2012 at 7:14 PM, Lonni J Friedman <netllama@gmail.com> wrote:
On Mon, Nov 5, 2012 at 2:02 PM, Thalis Kalfigkopoulos
<tkalfigo@gmail.com> wrote:
> Hi all,
>
> I read somewhere that the following query gives a quick estimate of the # of
> rows in a table regardless of the table's size (which would matter in a
> simple SELECT count(*)?):
>
> SELECT (CASE WHEN reltuples > 0 THEN
> pg_relation_size('mytable')/(8192*relpages/reltuples)
> ELSE 0
> END)::bigint AS estimated_row_count
> FROM pg_class
> WHERE oid = 'mytable'::regclass;
>
> If relpages & reltuples are recorded accurately each time VACUUM is run,
> wouldn't it be the same to just grab directly the value of reltuples like:
>
> SELECT reltuples FROM pg_class WHERE oid='mytable'::regclass;
>
> In the same manner, are pg_relation_size('mytable') and 8192*relpages the
> same?
>
> I run both assumptions against a freshly VACUUMed table and they seem
> correct.

This doesn't seem to work for me.  I get an estimated row_count of 0
on a table that I know has millions of rows.

Which one doesn't work exactly? The larger query? Are you on a 9.x?


regards,
thalis k.

Re: Quick estimate of num of rows & table size

From
Lonni J Friedman
Date:
On Mon, Nov 5, 2012 at 3:56 PM, Thalis Kalfigkopoulos
<tkalfigo@gmail.com> wrote:
>
> On Mon, Nov 5, 2012 at 7:14 PM, Lonni J Friedman <netllama@gmail.com> wrote:
>>
>> On Mon, Nov 5, 2012 at 2:02 PM, Thalis Kalfigkopoulos
>> <tkalfigo@gmail.com> wrote:
>> > Hi all,
>> >
>> > I read somewhere that the following query gives a quick estimate of the
>> > # of
>> > rows in a table regardless of the table's size (which would matter in a
>> > simple SELECT count(*)?):
>> >
>> > SELECT (CASE WHEN reltuples > 0 THEN
>> > pg_relation_size('mytable')/(8192*relpages/reltuples)
>> > ELSE 0
>> > END)::bigint AS estimated_row_count
>> > FROM pg_class
>> > WHERE oid = 'mytable'::regclass;
>> >
>> > If relpages & reltuples are recorded accurately each time VACUUM is run,
>> > wouldn't it be the same to just grab directly the value of reltuples
>> > like:
>> >
>> > SELECT reltuples FROM pg_class WHERE oid='mytable'::regclass;
>> >
>> > In the same manner, are pg_relation_size('mytable') and 8192*relpages
>> > the
>> > same?
>> >
>> > I run both assumptions against a freshly VACUUMed table and they seem
>> > correct.
>>
>> This doesn't seem to work for me.  I get an estimated row_count of 0
>> on a table that I know has millions of rows.
>
>
> Which one doesn't work exactly? The larger query? Are you on a 9.x?

doh, sorry.  The first/larger doesn't work.  As it turns out the 2nd
actually does work well.  I'm on 9.1.x.


Re: Quick estimate of num of rows & table size

From
Thalis Kalfigkopoulos
Date:



On Mon, Nov 5, 2012 at 9:04 PM, Lonni J Friedman <netllama@gmail.com> wrote:
On Mon, Nov 5, 2012 at 3:56 PM, Thalis Kalfigkopoulos
<tkalfigo@gmail.com> wrote:
>
> On Mon, Nov 5, 2012 at 7:14 PM, Lonni J Friedman <netllama@gmail.com> wrote:
>>
>> On Mon, Nov 5, 2012 at 2:02 PM, Thalis Kalfigkopoulos
>> <tkalfigo@gmail.com> wrote:
>> > Hi all,
>> >
>> > I read somewhere that the following query gives a quick estimate of the
>> > # of
>> > rows in a table regardless of the table's size (which would matter in a
>> > simple SELECT count(*)?):
>> >
>> > SELECT (CASE WHEN reltuples > 0 THEN
>> > pg_relation_size('mytable')/(8192*relpages/reltuples)
>> > ELSE 0
>> > END)::bigint AS estimated_row_count
>> > FROM pg_class
>> > WHERE oid = 'mytable'::regclass;
>> >
>> > If relpages & reltuples are recorded accurately each time VACUUM is run,
>> > wouldn't it be the same to just grab directly the value of reltuples
>> > like:
>> >
>> > SELECT reltuples FROM pg_class WHERE oid='mytable'::regclass;
>> >
>> > In the same manner, are pg_relation_size('mytable') and 8192*relpages
>> > the
>> > same?
>> >
>> > I run both assumptions against a freshly VACUUMed table and they seem
>> > correct.
>>
>> This doesn't seem to work for me.  I get an estimated row_count of 0
>> on a table that I know has millions of rows.
>
>
> Which one doesn't work exactly? The larger query? Are you on a 9.x?

doh, sorry.  The first/larger doesn't work.  As it turns out the 2nd
actually does work well.  I'm on 9.1.x.

That's weird. I'm on a 9.1.5 as well and it works fine (!?)
Still the question remains: are they equivalent?

Re: Quick estimate of num of rows & table size

From
Tom Lane
Date:
Thalis Kalfigkopoulos <tkalfigo@gmail.com> writes:
> I read somewhere that the following query gives a quick estimate of the #
> of rows in a table regardless of the table's size (which would matter in a
> simple SELECT count(*)?):

> SELECT (CASE WHEN reltuples > 0 THEN
> pg_relation_size('mytable')/(8192*relpages/reltuples)
> ELSE 0
> END)::bigint AS estimated_row_count
> FROM pg_class
> WHERE oid = 'mytable'::regclass;

This seems a bit dubious, as it's protecting against only one of two
possible zero-divide conditions, and ignoring the risk of integer
overflow of 8192*relpages.  It also seems rather inconvenient to have to
specify 'mytable' twice.  I'd try something like

select
  case when relpages > 0 then
    (pg_relation_size(oid)::float8 * reltuples / relpages / 8192)::bigint
  else 0::bigint end
from pg_class
  where oid = 'mytable'::regclass;

> If relpages & reltuples are recorded accurately each time VACUUM is run,
> wouldn't it be the same to just grab directly the value of reltuples like:
> SELECT reltuples FROM pg_class WHERE oid='mytable'::regclass;

Uh, no.  The whole point of the more complicated query is to scale the
reltuples/relpages tuple-density ratio up to the table's current
physical size, so that you get an estimate that is not too far off even
if vacuum hasn't been run lately.  It's also worth noting that in recent
PG versions, reltuples and relpages are themselves only moving-average
estimates, and so your premise is faulty anyway: they are *not*
necessarily exact even immediately after a vacuum.

            regards, tom lane