Thread: does "select count(*) from mytable" always do a seq scan?

does "select count(*) from mytable" always do a seq scan?

From
Culley Harrelson
Date:
Hi,

I am using Postgresql 7.4.  I have a table with 1.5 million rows.  It
has a primary key. VACUUM FULL ANALYZE is run every night.  There are
2000-5000 inserts on this table every day but very few updates and
deletes.  When I select count(*) from this table it is using a
sequence scan.  Is this just life or is there some way to get this to
do an index scan?

culley

Re: does "select count(*) from mytable" always do a seq

From
Tino Wildenhain
Date:
Am Freitag, den 07.01.2005, 06:45 -0800 schrieb Culley Harrelson:
> Hi,
>
> I am using Postgresql 7.4.  I have a table with 1.5 million rows.  It
> has a primary key. VACUUM FULL ANALYZE is run every night.  There are
> 2000-5000 inserts on this table every day but very few updates and
> deletes.  When I select count(*) from this table it is using a
> sequence scan.  Is this just life or is there some way to get this to
> do an index scan?

How do you think an index would help if you do an unconditional
count(*)?

Regards
Tino


Re: does "select count(*) from mytable" always do a seq

From
Bruno Wolff III
Date:
On Fri, Jan 07, 2005 at 16:17:16 +0100,
  Tino Wildenhain <tino@wildenhain.de> wrote:
> Am Freitag, den 07.01.2005, 06:45 -0800 schrieb Culley Harrelson:
> > Hi,
> >
> > I am using Postgresql 7.4.  I have a table with 1.5 million rows.  It
> > has a primary key. VACUUM FULL ANALYZE is run every night.  There are
> > 2000-5000 inserts on this table every day but very few updates and
> > deletes.  When I select count(*) from this table it is using a
> > sequence scan.  Is this just life or is there some way to get this to
> > do an index scan?
>
> How do you think an index would help if you do an unconditional
> count(*)?

Some systems can just run through the index without having to access the
tuples. This can result in you having to read significantly fewer disk blocks
to get the count. Unfortunately, postgres still needs to check visibility
for each tuple and so an using index scan for count will be slower than
a sequential scan if a significant fraction of the table is being counted.

If an approximate answer is OK there is some information calculated when
you vacuum a table and you could query this value in the pg catalog.
I don't remember the name of what you want, but this should be in the
archives.

Another solution is to use a trigger to keep a count in another table.
from what you say above, this might be a practical solution for you.
Doing this has also been discussed in the archives.

Re: does "select count(*) from mytable" always do a seq scan?

From
Culley Harrelson
Date:
On Fri, 07 Jan 2005 16:17:16 +0100, Tino Wildenhain <tino@wildenhain.de> wrote:
>
> How do you think an index would help if you do an unconditional
> count(*)?

I really don't know <grin>.  I don't know the inner workings of
database internals but I would guess that there would be some
optimized way of counting the nodes in an index tree that would be
faster than sequentially going through a table.... I suppose there is
no free lunch.

One row, two rows, three rows, four rows, five rows.... <snore>

culley

Re: does "select count(*) from mytable" always do a seq scan?

From
Alex Turner
Date:
This is interesting... Perhaps a more knowledgable person for pgsql
could help us here...

I seem to remember something to do with the fact that You can't use
aggregate functions over an index...  I'm not sure why though.

You can do:
create index foo on my_table (lower(my_column))

but not
create index foo on my_table(min(my_column)) - I guess it wouldn't be
much of an index - it would be a single value.
You could reproduce that functionality with a trigger that updated a
table that contained the value of (min(my_column)), and I guess you
could do the same fo count(*) too.

I guess what I"m really asking is why can't you run aggregates over an index?

Alex Turner
NetEconomist

On Fri, 7 Jan 2005 09:09:49 -0800, Culley Harrelson <harrelson@gmail.com> wrote:
> On Fri, 07 Jan 2005 16:17:16 +0100, Tino Wildenhain <tino@wildenhain.de> wrote:
> >
> > How do you think an index would help if you do an unconditional
> > count(*)?
>
> I really don't know <grin>.  I don't know the inner workings of
> database internals but I would guess that there would be some
> optimized way of counting the nodes in an index tree that would be
> faster than sequentially going through a table.... I suppose there is
> no free lunch.
>
> One row, two rows, three rows, four rows, five rows.... <snore>
>
> culley
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>

Re: does "select count(*) from mytable" always do a seq

From
Scott Ribe
Date:
> I guess what I"m really asking is why can't you run aggregates over an index?

It's got to do with MVCC and transaction consistency. Running count(*) or an
aggregate function on an index could include records that should not be
visible to your current transaction.


--
Scott Ribe
scott_ribe@killerbytes.com
http://www.killerbytes.com/
(303) 665-7007 voice



Re: does "select count(*) from mytable" always do a seq

From
Alex Turner
Date:
No offense or anything, but that doesn't make any sense.  If you are
running count(*) against a table, it still has to worry about MVCC,
and which rows are visible to your transaction.  What difference does
it make, table or index, the system still has to figure out which rows
are visible in the current transaction, so why not use the index?

(The example is really count(pkey) because count(*) is always going to
do a seq scan I reckon - and could probably never use an index).

Alex Turner
NetEconomist


On Fri, 07 Jan 2005 11:17:32 -0700, Scott Ribe
<scott_ribe@killerbytes.com> wrote:
> > I guess what I"m really asking is why can't you run aggregates over an index?
>
> It's got to do with MVCC and transaction consistency. Running count(*) or an
> aggregate function on an index could include records that should not be
> visible to your current transaction.
>
> --
> Scott Ribe
> scott_ribe@killerbytes.com
> http://www.killerbytes.com/
> (303) 665-7007 voice
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend
>

Re: does "select count(*) from mytable" always do a seq

From
Scott Ribe
Date:
> No offense or anything, but that doesn't make any sense.  If you are
> running count(*) against a table, it still has to worry about MVCC,
> and which rows are visible to your transaction.  What difference does
> it make, table or index, the system still has to figure out which rows
> are visible in the current transaction, so why not use the index?

Your mistake seems to be assuming that row visibility is tracked in the
index. As was stated earlier in the thread, row visibility information is
not available in the index, therefore rows have to be looked at to determine
whether they're visible. What this means is that using the index would only
add an additional unnecessary step.

> (The example is really count(pkey) because count(*) is always going to
> do a seq scan I reckon - and could probably never use an index).

No, if there is an index on a column that is required, such as a primary
key, then count(pkey) is equal to count(*). Many databases make use of this
fact to optimize performance of count(*) by using an index scan.


--
Scott Ribe
scott_ribe@killerbytes.com
http://www.killerbytes.com/
(303) 665-7007 voice



Re: does "select count(*) from mytable" always do a seq

From
Pierre-Frédéric Caillaud
Date:
> (The example is really count(pkey) because count(*) is always going to
> do a seq scan I reckon - and could probably never use an index).

    postgres knows that count(*) is just "count the rows", you can use
count(1), it makes no difference...

>
> Alex Turner
> NetEconomist
>
>
> On Fri, 07 Jan 2005 11:17:32 -0700, Scott Ribe
> <scott_ribe@killerbytes.com> wrote:
>> > I guess what I"m really asking is why can't you run aggregates over
>> an index?
>>
>> It's got to do with MVCC and transaction consistency. Running count(*)
>> or an
>> aggregate function on an index could include records that should not be
>> visible to your current transaction.
>>
>> --
>> Scott Ribe
>> scott_ribe@killerbytes.com
>> http://www.killerbytes.com/
>> (303) 665-7007 voice
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 8: explain analyze is your friend
>>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend
>



Re: does "select count(*) from mytable" always do a seq

From
Alex Turner
Date:
Forgive my ignorance, but I'm still learning about much of this stuff.
 If you perform:

select an_id, int_value from my_table where int_value>400;

The table has an index on int_value and there are enough rows to
warrant using it.  Doesn't the database perform in index scan on
int_value followed by a retrieve for the datablocks with relavent oids
to get the an_id field?

If another transaction has inserted rows into this table, won't the
index have been updated, and contain new row references?  Does this
imply that the database must retrieve the row information to determine
if the row is a row from a different transaction?

thanks,

Alex Turner


On Sat, 08 Jan 2005 12:39:41 -0700, Scott Ribe
<scott_ribe@killerbytes.com> wrote:
> > No offense or anything, but that doesn't make any sense.  If you are
> > running count(*) against a table, it still has to worry about MVCC,
> > and which rows are visible to your transaction.  What difference does
> > it make, table or index, the system still has to figure out which rows
> > are visible in the current transaction, so why not use the index?
>
> Your mistake seems to be assuming that row visibility is tracked in the
> index. As was stated earlier in the thread, row visibility information is
> not available in the index, therefore rows have to be looked at to determine
> whether they're visible. What this means is that using the index would only
> add an additional unnecessary step.
>
> > (The example is really count(pkey) because count(*) is always going to
> > do a seq scan I reckon - and could probably never use an index).
>
> No, if there is an index on a column that is required, such as a primary
> key, then count(pkey) is equal to count(*). Many databases make use of this
> fact to optimize performance of count(*) by using an index scan.
>
> --
> Scott Ribe
> scott_ribe@killerbytes.com
> http://www.killerbytes.com/
> (303) 665-7007 voice
>
>

Re: does "select count(*) from mytable" always do a seq

From
Bruno Wolff III
Date:
On Mon, Jan 10, 2005 at 10:26:46 -0500,
  Alex Turner <armtuk@gmail.com> wrote:
> Forgive my ignorance, but I'm still learning about much of this stuff.
>  If you perform:
>
> select an_id, int_value from my_table where int_value>400;
>
> The table has an index on int_value and there are enough rows to
> warrant using it.  Doesn't the database perform in index scan on
> int_value followed by a retrieve for the datablocks with relavent oids
> to get the an_id field?

I don't think that oids are used in the process, but if the planner thinks
an index scan would be better it will use one.

> If another transaction has inserted rows into this table, won't the
> index have been updated, and contain new row references?  Does this
> imply that the database must retrieve the row information to determine
> if the row is a row from a different transaction?

When doing an index scan, the heap tuples still need to be checked for
visibility to the current transaction.

Re: does "select count(*) from mytable" always do a seq

From
Alex Turner
Date:
I'm no database writing guru, but wouldn't it just be a matter of
adding a transaction number to an index entry so as to determine it's
newness and only retrieve entries with an older transaction number?

I'm guessing that the theory is that most insert transactions will be
committed, or only contain a small number of rows relative to the
overall size of the table, and therefore the extra overhead of
checking newer tuples won't impact the overall performance that much?

I know I'm asking kind of deep questions that really don't affect much
of anything, but I'm a devilishly curious individual, and I like
understanding things that I use well.  Feel free to tell me that it's
irrelavant, or that I'm full of hot air and I don't have a good
question ;)

Alex Turner
NetEconomist


On Mon, 10 Jan 2005 10:34:46 -0600, Bruno Wolff III <bruno@wolff.to> wrote:
> On Mon, Jan 10, 2005 at 10:26:46 -0500,
>   Alex Turner <armtuk@gmail.com> wrote:
> > Forgive my ignorance, but I'm still learning about much of this stuff.
> >  If you perform:
> >
> > select an_id, int_value from my_table where int_value>400;
> >
> > The table has an index on int_value and there are enough rows to
> > warrant using it.  Doesn't the database perform in index scan on
> > int_value followed by a retrieve for the datablocks with relavent oids
> > to get the an_id field?
>
> I don't think that oids are used in the process, but if the planner thinks
> an index scan would be better it will use one.
>
> > If another transaction has inserted rows into this table, won't the
> > index have been updated, and contain new row references?  Does this
> > imply that the database must retrieve the row information to determine
> > if the row is a row from a different transaction?
>
> When doing an index scan, the heap tuples still need to be checked for
> visibility to the current transaction.
>
>

Re: does "select count(*) from mytable" always do a seq

From
Alvaro Herrera
Date:
On Mon, Jan 10, 2005 at 11:51:51AM -0500, Alex Turner wrote:
> I'm no database writing guru, but wouldn't it just be a matter of
> adding a transaction number to an index entry so as to determine it's
> newness and only retrieve entries with an older transaction number?

No, it's more complex than that.  Index entries would have to be labeled
with both a creation transaction Id and a destruction transaction Id
(xmin and xmax.  Probably it'd also need Cmin and Cmax to be completely
consistent.)  Keeping them in sync would be prone to deadlock because
it'd have to simultaneously update the table proper and the possibly
multiple indexes.  Plus, having all those identifiers in the index file
would imply more I/O costs.

> I'm guessing that the theory is that most insert transactions will be
> committed, or only contain a small number of rows relative to the
> overall size of the table, and therefore the extra overhead of
> checking newer tuples won't impact the overall performance that much?

The choice is yours (or whoever's): if you absolutely need exact
numbers, you pay the cost of having a trigger.  OTOH if you can do with
estimates, you can use the reltuples column from pg_class.

--
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)
"There is evil in the world. There are dark, awful things. Occasionally, we get
a glimpse of them. But there are dark corners; horrors almost impossible to
imagine... even in our worst nightmares." (Van Helsing, Dracula A.D. 1972)

Re: does "select count(*) from mytable" always do a seq

From
Bruno Wolff III
Date:
On Mon, Jan 10, 2005 at 11:51:51 -0500,
  Alex Turner <armtuk@gmail.com> wrote:
> I'm no database writing guru, but wouldn't it just be a matter of
> adding a transaction number to an index entry so as to determine it's
> newness and only retrieve entries with an older transaction number?

No, because transactions don't complete in the order they are started.
So you will need to be able to maintain some kind of list to cover
exceptions.

> I'm guessing that the theory is that most insert transactions will be
> committed, or only contain a small number of rows relative to the
> overall size of the table, and therefore the extra overhead of
> checking newer tuples won't impact the overall performance that much?
>
> I know I'm asking kind of deep questions that really don't affect much
> of anything, but I'm a devilishly curious individual, and I like
> understanding things that I use well.  Feel free to tell me that it's
> irrelavant, or that I'm full of hot air and I don't have a good
> question ;)

There have been discussions in the past about why the core developers
feel that moving visibility status into indexes would be a net loss
on average. I don't think there has been one for a while, but you can
try searching the hackers archive.

Re: does "select count(*) from mytable" always do a seq

From
Michael Fuhr
Date:
On Mon, Jan 10, 2005 at 11:23:42AM -0600, Bruno Wolff III wrote:

> There have been discussions in the past about why the core developers
> feel that moving visibility status into indexes would be a net loss
> on average. I don't think there has been one for a while, but you can
> try searching the hackers archive.

I wonder if a technical discussion or a link to one could find its
way into the FAQ.  Has anybody ever put together a single document
that describes in detail the issue, various proposed solutions and
their tradeoffs, and why the developers think the chosen implementation
is best?

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/