Thread: Re: [GENERAL] Yet Another (Simple) Case of Index not used

Re: [GENERAL] Yet Another (Simple) Case of Index not used

From

"Dann Corbit"

Date:

09 April 2003, 15:17:45

> -----Original Message-----
> From: Denis [mailto:denis@next2me.com]
> Sent: Tuesday, April 08, 2003 12:57 PM
> To: pgsql-performance@postgresql.org;
> pgsql-general@postgresql.org; pgsql-sql@postgresql.org
> Subject: [GENERAL] Yet Another (Simple) Case of Index not used
>
>
> Hi there,
> I'm running into a quite puzzling simple example where the
> index I've created on a fairly big table (465K entries) is
> not used, against all common sense expectations: The query I
> am trying to do (fast) is:
>
> select count(*) from addresses;
>
> This takes more than a second to complete, because, as the
> 'explain' command shows me, the index created on 'addresses'
> is not used, and a seq scan is being used.

As well it should be.

> One would assume
> that the creation of an index would allow the counting of the
> number of entries in a table to be instantanous?

Traversing the index to perform the count will definitely make the query
many times slower.

A general rule of thumb (not sure if it is true with PostgreSQL) is that
if you have to traverse more than 10% of the data with an index then a
full table scan will be faster.  This is especially true when there is
highly redundant data in the index fields.  If there were an index on
bit data type, and you have half and half 1 and 0, an index scan of the
table will be disastrous.

To simply scan the table, we will just sequentially read pages until the
data is exhausted.  If we follow the index, we will randomly jump from
page to page, defeating the read buffering.
[snip]

Re: [GENERAL] Yet Another (Simple) Case of Index not used

From

Dennis Gearon

Date:

09 April 2003, 15:17:57

from mysql manual:
-------------------------------------------------------------
"COUNT(*) is optimized to return very quickly if the SELECT retrieves from one
table, no other columns are retrieved, and there is no WHERE clause. For example:

mysql> select COUNT(*) from student;"
-------------------------------------------------------------

A nice little optimization, maybe not possible in a MVCC system.

Dann Corbit wrote:
>>-----Original Message-----
>>From: Denis [mailto:denis@next2me.com]
>>Sent: Tuesday, April 08, 2003 12:57 PM
>>To: pgsql-performance@postgresql.org;
>>pgsql-general@postgresql.org; pgsql-sql@postgresql.org
>>Subject: [GENERAL] Yet Another (Simple) Case of Index not used
>>
>>
>>Hi there,
>>I'm running into a quite puzzling simple example where the
>>index I've created on a fairly big table (465K entries) is
>>not used, against all common sense expectations: The query I
>>am trying to do (fast) is:
>>
>>select count(*) from addresses;
>>
>>This takes more than a second to complete, because, as the
>>'explain' command shows me, the index created on 'addresses'
>>is not used, and a seq scan is being used.
>
>
> As well it should be.
>
>
>>One would assume
>>that the creation of an index would allow the counting of the
>>number of entries in a table to be instantanous?
>
>
> Traversing the index to perform the count will definitely make the query
> many times slower.
>
> A general rule of thumb (not sure if it is true with PostgreSQL) is that
> if you have to traverse more than 10% of the data with an index then a
> full table scan will be faster.  This is especially true when there is
> highly redundant data in the index fields.  If there were an index on
> bit data type, and you have half and half 1 and 0, an index scan of the
> table will be disastrous.
>
> To simply scan the table, we will just sequentially read pages until the
> data is exhausted.  If we follow the index, we will randomly jump from
> page to page, defeating the read buffering.
> [snip]
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
>

Re: [PERFORM] [GENERAL] Yet Another (Simple) Case of Index not used

From

Bruce Momjian

Date:

15 April 2003, 13:23:45

Dennis Gearon wrote:
> from mysql manual:
> -------------------------------------------------------------
> "COUNT(*) is optimized to return very quickly if the SELECT retrieves from one
> table, no other columns are retrieved, and there is no WHERE clause. For example:
>
> mysql> select COUNT(*) from student;"
> -------------------------------------------------------------
>
> A nice little optimization, maybe not possible in a MVCC system.

I think the only thing you can do with MVCC is to cache the value and
tranaction id for "SELECT AGG(*) FROM tab" and make the cached value
visible to transaction id's greater than the one that executed the
query, and invalidate the cache every time the table is modified.

In fact, don't clear the cache, just record the transaction id of the
table modification command so we can use standard visibility routines to
make the cache usable as long as possiible.

The cleanest way would probably be to create an aggregate cache system
table, and to insert into it when someone does an unqualified aggregate,
and to delete from it when someone modifies the table --- the MVCC tuple
visibility rules are handled automatically.  Queries can look in there
to see if a visible cached value already exists. Of course, the big
question is whether this would be a big win, and whether the cost of
upkeep would justify it.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: [PERFORM] [GENERAL] Yet Another (Simple) Case of Index not used

From

Richard Huxton

Date:

15 April 2003, 15:29:42

On Tuesday 15 Apr 2003 3:23 pm, Bruce Momjian wrote:
> Dennis Gearon wrote:
> > from mysql manual:
> > -------------------------------------------------------------
> > "COUNT(*) is optimized to return very quickly if the SELECT retrieves
> > from one table, no other columns are retrieved, and there is no WHERE
> > clause. For example:
> >
> > mysql> select COUNT(*) from student;"
> > -------------------------------------------------------------

> The cleanest way would probably be to create an aggregate cache system
> table, and to insert into it when someone does an unqualified aggregate,
> and to delete from it when someone modifies the table --- the MVCC tuple
> visibility rules are handled automatically.  Queries can look in there
> to see if a visible cached value already exists. Of course, the big
> question is whether this would be a big win, and whether the cost of
> upkeep would justify it.

If the rule system could handle something like:

CREATE RULE quick_foo_count AS ON SELECT count(*) FROM foo
DO INSTEAD
SELECT quick_count FROM agg_cache WHERE tbl_name='foo';

The whole thing could be handled by user-space triggers/rules and still
invisible to the end-user.

--
  Richard Huxton

Re: [PERFORM] [GENERAL] Yet Another (Simple) Case of Index not used

From

Bruce Momjian

Date:

31 May 2003, 01:32:18

Added to TODO:

    * Consider using MVCC to cache count(*) queries with no WHERE
      clause

---------------------------------------------------------------------------

Bruce Momjian wrote:
> Dennis Gearon wrote:
> > from mysql manual:
> > -------------------------------------------------------------
> > "COUNT(*) is optimized to return very quickly if the SELECT retrieves from one
> > table, no other columns are retrieved, and there is no WHERE clause. For example:
> >
> > mysql> select COUNT(*) from student;"
> > -------------------------------------------------------------
> >
> > A nice little optimization, maybe not possible in a MVCC system.
>
> I think the only thing you can do with MVCC is to cache the value and
> tranaction id for "SELECT AGG(*) FROM tab" and make the cached value
> visible to transaction id's greater than the one that executed the
> query, and invalidate the cache every time the table is modified.
>
> In fact, don't clear the cache, just record the transaction id of the
> table modification command so we can use standard visibility routines to
> make the cache usable as long as possiible.
>
> The cleanest way would probably be to create an aggregate cache system
> table, and to insert into it when someone does an unqualified aggregate,
> and to delete from it when someone modifies the table --- the MVCC tuple
> visibility rules are handled automatically.  Queries can look in there
> to see if a visible cached value already exists. Of course, the big
> question is whether this would be a big win, and whether the cost of
> upkeep would justify it.
>
> --
>   Bruce Momjian                        |  http://candle.pha.pa.us
>   pgman@candle.pha.pa.us               |  (610) 359-1001
>   +  If your life is a hard drive,     |  13 Roberts Road
>   +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073