Thread: Use of index in 7.0 vs 6.5

Use of index in 7.0 vs 6.5

From
Ryan Bradetich
Date:
Tom (Or anyone else who is good with PostgreSQL statistics),

I am in the process of transitioning from postgreSQL 6.5.3 to
postgreSQL 7.0.  I ran into an issue where a sequential scan
is being choosen on postgreSQL 7.0 where an index scan was
choosen on postgreSQL 6.5.3.

Note: All tables have been freshly vacuum'd and analyzed.

procman=# select version();                             version
-------------------------------------------------------------------PostgreSQL 7.0.0 on hppa2.0w-hp-hpux11.00, compiled
bygcc 2.95.2
 
(1 row)

procman=# explain select count(catagory) from medusa where host_id = 404
and catagory like 'A%';
NOTICE:  QUERY PLAN:
Aggregate  (cost=189546.19..189546.19 rows=1 width=12) ->  Seq Scan on medusa  (cost=0.00..189529.43 rows=6704
width=12)
EXPLAIN

Note: The above query produces an index scan on postgreSQL 6.5.3.

procman=# set enable_seqscan = off;
SET VARIABLE

procman=# explain select count(catagory) from medusa where host_id = 404
and catagory like 'A%';
NOTICE:  QUERY PLAN:
Aggregate  (cost=207347.36..207347.36 rows=1 width=12) ->  Index Scan using medusa_host_id_key on medusa
(cost=0.00..207330.60 rows=6704 width=12)
EXPLAIN

Here are the statistics:

procman=# select attname,attdisbursion,s.*
procman-# from pg_statistic s, pg_attribute a, pg_class c
procman-# where starelid = c.oid and attrelid = c.oid and staattnum =
attnum
procman-# and relname = 'medusa'; attname  | attdisbursion | starelid | staattnum | staop | stanullfrac
| stacommonfrac |
stacommonval                                 |staloval                             |                stahival

-----------+---------------+----------+-----------+-------+-------------+---------------+-----------------------------------------------------------------------------+----------------------------

--------------------------------------+-----------------------------------------
host_id   |    0.00621312 | 30874288 |         1 |    97 |           0
|     0.0279425 |
446
| 0                                     | 11011
(1 row)


Here is my analysis of the stastics (based on the examples in the
archive).

The most common value host_id in the table is 446 with row fraction of
~ 2.8%.
The estimated number of rows in the index is 6704.  This table has
4,630,229
entries in the table.

Hopefully this analysis is correct, if not .. please correct me :)

I do not understand why the planner would choose a seqscan over the
index scan because
6704/4,630,229 is ~ 0.15%.

Thanks for your time,

Ryan



- Ryan




Re: Use of index in 7.0 vs 6.5

From
Tom Lane
Date:
Ryan Bradetich <ryan_bradetich@hp.com> writes:
> I am in the process of transitioning from postgreSQL 6.5.3 to
> postgreSQL 7.0.  I ran into an issue where a sequential scan
> is being choosen on postgreSQL 7.0 where an index scan was
> choosen on postgreSQL 6.5.3.

Since you're complaining, I assume the seqscan is slower ;-).
But you didn't say how much slower --- what are the actual timings?

Basically what's going on here is that we need to tune the fudge-factor
constants in the cost model so that they have something to do with
reality on as wide a variety of systems as possible.  You did an
excellent job of showing the estimates the planner computed --- but
what we really need here is to see how those relate to reality.

> I do not understand why the planner would choose a seqscan over the
> index scan because 6704/4,630,229 is ~ 0.15%.

I'm a bit surprised too.  What is the average tuple width on this table?
(Actually, probably a better question is how many pages and tuples
are in the relation according to its pg_class entry.  Try "select * from
pgclass where relname = 'medusa'".)
        regards, tom lane


Re: Use of index in 7.0 vs 6.5

From
Ryan Bradetich
Date:
Tom Lane wrote:

> Ryan Bradetich <ryan_bradetich@hp.com> writes:
> > I am in the process of transitioning from postgreSQL 6.5.3 to
> > postgreSQL 7.0.  I ran into an issue where a sequential scan
> > is being choosen on postgreSQL 7.0 where an index scan was
> > choosen on postgreSQL 6.5.3.
>
> Since you're complaining, I assume the seqscan is slower ;-).
> But you didn't say how much slower --- what are the actual timings?

Opps... Had them written down, just forgot to include them in the email :)

with enable_seqscan = on:   real 18.05   sys    0.01   user  0.02

with enable_seqscan = off:   real  0.08   sys   0.01   user 0.02

I stopped and restarted the postmaster daemon between these timing to
flush the cache.


> Basically what's going on here is that we need to tune the fudge-factor
> constants in the cost model so that they have something to do with
> reality on as wide a variety of systems as possible.  You did an
> excellent job of showing the estimates the planner computed --- but
> what we really need here is to see how those relate to reality.
>
> > I do not understand why the planner would choose a seqscan over the
> > index scan because 6704/4,630,229 is ~ 0.15%.
>
> I'm a bit surprised too.  What is the average tuple width on this table?
> (Actually, probably a better question is how many pages and tuples
> are in the relation according to its pg_class entry.  Try "select * from
> pgclass where relname = 'medusa'".)
>
>                         regards, tom lane

procman=# select * from pg_class where relname = 'medusa';relname | reltype | relowner | relam | relpages | reltuples |
rellongrelid
| relhasindex | relisshared | relkind | relnatts | relchecks | reltriggers
| relukeys | relfkeys | relrefs | relhas
pkey | relhasrules | relacl

---------+---------+----------+-------+----------+-----------+--------------+-------------+-------------+---------+----------+-----------+-------------+----------+----------+---------+-------

-----+-------------+--------medusa  |       0 |    36000 |     0 |   120076 |   4630229 |            0
| t           | f           | r       |        6 |         0 |           0
|        0 |        0 |       0 | f    | f           |
(1 row)

procman=# \d medusa         Table "medusa"Attribute |   Type    | Modifier
-----------+-----------+----------host_id   | integer   |timestamp | timestamp |current   | integer   |catagory  | text
    |cat_desc  | text      |anomaly   | text      |
 

This table has two fairly large text fields, the cat_desc and the anomaly.
The catagory field is very short and in the format: [ABC][0-9][0-9].

Thanks for the help,

- Ryan



Re: Use of index in 7.0 vs 6.5

From
Tom Lane
Date:
Ryan Bradetich <ryan_bradetich@hp.com> writes:
> procman=# explain select count(catagory) from medusa where host_id = 404
> and catagory like 'A%';

> Here is my analysis of the stastics (based on the examples in the
> archive).

> The most common value host_id in the table is 446 with row fraction of
> ~ 2.8%.  The estimated number of rows in the index is 6704.  This
> table has 4,630,229 entries in the table.
> I do not understand why the planner would choose a seqscan over the
> index scan because 6704/4,630,229 is ~ 0.15%.

I see at least part of the answer.  The 'rows=' number in the EXPLAIN
output is the planner's estimate of the net number of rows out after
applying all available WHERE restrictions.  In this case we've got a
clause "host_id = 404" which can be used with the index on host_id,
and then we have another clause "catagory like 'A%'" which will be
applied on-the-fly to the tuples returned by the indexscan.  The
rows number tells you about the estimated number of rows out after
that second filter step.  However, the cost of the indexscan depends
mainly on the number of tuples that have to be fetched, and that is
determined by the selectivity of just the "host_id = 404" clause.

I made a dummy table with the schema you showed and then inserted
the statistics you reported into the system tables (who's afraid
of "update pg_class ..." ;-)).  If I didn't mess up, you should
be able to reproduce these EXPLAIN results:

set enable_seqscan = off;
explain select count(catagory) from medusa where host_id = 404;
NOTICE:  QUERY PLAN:

Aggregate  (cost=206943.69..206943.69 rows=1 width=12) ->  Index Scan using medusa_host_id_key on medusa
(cost=0.00..206781.97rows=64690 width=12)
 

set enable_seqscan = on;
explain select count(catagory) from medusa where host_id = 404;
NOTICE:  QUERY PLAN:

Aggregate  (cost=178115.59..178115.59 rows=1 width=12) ->  Seq Scan on medusa  (cost=0.00..177953.86 rows=64690
width=12)


This shows that the planner is actually estimating that the indexscan
will fetch about 64690 rows (of which it guesses only 6704 will remain
after applying the catagory clause, but that's not really relevant to
the cost estimate).  Since there are 120076 pages in the table, that
would mean pretty nearly one separate page fetch for each retrieved
tuple, if the matching tuples are randomly distributed --- and that
very probably *would* take more time than reading the whole table
sequentially.  So the planner's chain of logic holds up if all these
assumptions are correct.

Since you find that in reality the indexscan method is very quick,
I'm guessing that there are actually fairly few tuples matching
host_id = 404.  Could you run a quick "select count(*)" to check?

This seems to point up (once again) the deficiency of assuming that
the most-common value in the table is a good guide to the frequency
of typical values.  You showed that host_id = 446 occurs in 2.8% of
the rows in this table; a search for 446 very probably would be faster
as a seqscan than as an indexscan (you might care to try it and see).
But that's probably a statistical outlier that's not got much to do
with the frequency of typical values in the table.

The only really good answer to this problem is to collect more-detailed
statistics in VACUUM ANALYZE, which I hope to see us doing in a release
or so.  In the meantime I am wondering about deliberately skewing the
cost model in favor of indexscans, because I sure haven't heard many
complaints about erroneous selection of indexscans...

One way to put a thumb on the scales is to reduce the value of the SET
variable random_page_cost.  The default value is 4.0, which seems to
correspond more or less to reality, but reducing it to 3 or so would
shift the planner pretty nicely in the direction of indexscans.
        regards, tom lane


RE: Use of index in 7.0 vs 6.5

From
"Hiroshi Inoue"
Date:
> -----Original Message-----
> From: pgsql-sql-owner@hub.org [mailto:pgsql-sql-owner@hub.org]On Behalf
> Of Tom Lane
> 

[snip]
> 
> This seems to point up (once again) the deficiency of assuming that
> the most-common value in the table is a good guide to the frequency
> of typical values.  You showed that host_id = 446 occurs in 2.8% of
> the rows in this table; a search for 446 very probably would be faster
> as a seqscan than as an indexscan (you might care to try it and see).
> But that's probably a statistical outlier that's not got much to do
> with the frequency of typical values in the table.
> 
> The only really good answer to this problem is to collect more-detailed
> statistics in VACUUM ANALYZE, which I hope to see us doing in a release
> or so.

For example we could count up distinct values for the first column of an
index by scanning its index relation.
> In the meantime I am wondering about deliberately skewing the
> cost model in favor of indexscans, because I sure haven't heard many
> complaints about erroneous selection of indexscans...
> 
> One way to put a thumb on the scales is to reduce the value of the SET
> variable random_page_cost.  The default value is 4.0, which seems to
> correspond more or less to reality, but reducing it to 3 or so would
> shift the planner pretty nicely in the direction of indexscans.
>

Or how about changing current fudge factor ?
For example,from 0.5 to 0.2 which is the fudge factor of attdisbursion
calculation. 

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp



RE: Use of index in 7.0 vs 6.5

From
"Mikheev, Vadim"
Date:
> For example we could count up distinct values for the first 
> column of an index by scanning its index relation.

Oracle has index keys distribution page... And we have near empty
meta-page for each index AM... And backend reads this page for
each insert...

Vadim


Re: Use of index in 7.0 vs 6.5

From
Tom Lane
Date:
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
>> One way to put a thumb on the scales is to reduce the value of the SET
>> variable random_page_cost.  The default value is 4.0, which seems to
>> correspond more or less to reality, but reducing it to 3 or so would
>> shift the planner pretty nicely in the direction of indexscans.

> Or how about changing current fudge factor ?
> For example,from 0.5 to 0.2 which is the fudge factor of attdisbursion
> calculation. 

Yes, that's another way --- and probably more defensible than changing
random_page_cost, now that I think about it.  Unfortunately it's a
hardwired constant and so not as easily experimented with :-(.
        regards, tom lane


Re: Use of index in 7.0 vs 6.5

From
Bruce Momjian
Date:
[ Charset ISO-8859-1 unsupported, converting... ]
> > For example we could count up distinct values for the first 
> > column of an index by scanning its index relation.
> 
> Oracle has index keys distribution page... And we have near empty
> meta-page for each index AM... And backend reads this page for
> each insert...

That certainly would make sense.  We have hesitated to gather more
statistics because of the time involved.  Fuller statistics on just the
indexed columns could be a big win and be done fairly quickly because
the rows are already sorted in the index.

--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Use of index in 7.0 vs 6.5

From
Jeff Hoffmann
Date:
Tom Lane wrote:
> One way to put a thumb on the scales is to reduce the value of the SET
> variable random_page_cost.  The default value is 4.0, which seems to
> correspond more or less to reality, but reducing it to 3 or so would
> shift the planner pretty nicely in the direction of indexscans.
> 

like you may know, i've gone through a lot of things related to cost
estimates & i've convinced myself that (on paper) they're pretty good
(other than selectivity estimates, of course).  this may be heavily
influenced by my choice of OS & the corresponding disk cache management
scheme, but it seems to me that the sequential scan cost is a lot closer
to reality because cache doesn't come into play as much as compared to
the indexed scans which (probably) are heavily influence by disk
cache.   unfortunately,  you'd have to simulate some very high load
conditions to negate the disk cache effect and convince people that the
numbers are actually fairly correct.  even though i like to think i have
very important & demanding, end-all, be-all applications and that
postgresql should cater to my situation, i, like almost all of the other
users, operate in a very low load environments for the most part. 
therefore, you may want to consider placing the defaults where they may
be more advantageous to the casual user (or at least to the user which
has a small number of backends & is fairly dedicated to operating as a
database server)

anyway, it seems to me that if you're going to fudge a number,
random_page_cost is probably the best place to do it because it factors
so heavily in the total cost.  fudging things in the selectivity can be
frustrating because you can fudge for a certain table distribution, but
who's to say that the same table distribution will be the same for all
tables?  since the selectivity doesn't have a lot of real-world
relevance, what do you gain in a general case by fudging it?  at least
messing with random_page_cost, you know that on paper the value of 4 is
reasonably close to reality and that you're factoring in disk cache,
which should be fairly constant on your system (assuming a similar
volume of disk accesses).

on another topic, is there a list somewhere of the variables that you
can adjust with a SET command?  some of the variables have been
invaluable in learning about cost estimates, but i can't remember seeing
a list of them documented anywhere -- i've just stumbled onto them, more
or less.

jeff


Re: Use of index in 7.0 vs 6.5

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> That certainly would make sense.  We have hesitated to gather more
> statistics because of the time involved.  Fuller statistics on just the
> indexed columns could be a big win and be done fairly quickly because
> the rows are already sorted in the index.

Yeah, a scan over just the index itself would be a perfect way to
gather stats.  The normal objection to it (can't tell whether entries
correspond to currently-valid tuples) doesn't apply, because we don't
really care whether the stats are perfectly accurate.

Should put this in TODO, along with something about splitting the
ANALYZE function out of VACUUM and making it invokable as a separate
statement.
        regards, tom lane


Re: Use of index in 7.0 vs 6.5

From
Tom Lane
Date:
Jeff Hoffmann <jeff@propertykey.com> writes:
> on another topic, is there a list somewhere of the variables that you
> can adjust with a SET command?

SET command reference page is fairly complete, I think (but if you want
The Whole Truth And Nothing But, see src/backend/commands/variable.c).
        regards, tom lane


Re: Use of index in 7.0 vs 6.5

From
Bruce Momjian
Date:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > That certainly would make sense.  We have hesitated to gather more
> > statistics because of the time involved.  Fuller statistics on just the
> > indexed columns could be a big win and be done fairly quickly because
> > the rows are already sorted in the index.
> 
> Yeah, a scan over just the index itself would be a perfect way to
> gather stats.  The normal objection to it (can't tell whether entries
> correspond to currently-valid tuples) doesn't apply, because we don't
> really care whether the stats are perfectly accurate.
> 
> Should put this in TODO, along with something about splitting the
> ANALYZE function out of VACUUM and making it invokable as a separate
> statement.

Added:

* Remove ANALYZE from VACUUM so it can be run separately without locks
* Gather more accurate statistics using indexes


--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


RE: Use of index in 7.0 vs 6.5

From
"Mikheev, Vadim"
Date:
> > Oracle has index keys distribution page... And we have near empty
> > meta-page for each index AM... And backend reads this page for
> > each insert...
> 
> That certainly would make sense.  We have hesitated to gather more
> statistics because of the time involved.  Fuller statistics 
> on just the
> indexed columns could be a big win and be done fairly quickly because
> the rows are already sorted in the index.

So, add new item to TODO -:)
(I'm sure that I've told about index keys distribution page already -
I become tired to say about the same more and more -:))

Vadim


RE: Use of index in 7.0 vs 6.5

From
"Mikheev, Vadim"
Date:
> Yeah, a scan over just the index itself would be a perfect way to
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
I believe that Oracle updates index statistic on-fly...
Meta-page is always in cache for inserts, so there will be no
additional reads.

> gather stats.  The normal objection to it (can't tell whether entries
> correspond to currently-valid tuples) doesn't apply, because we don't
> really care whether the stats are perfectly accurate.

Vadim


Re: Use of index in 7.0 vs 6.5

From
Ryan Bradetich
Date:
Tom Lane wrote:

> "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> >> One way to put a thumb on the scales is to reduce the value of the SET
> >> variable random_page_cost.  The default value is 4.0, which seems to
> >> correspond more or less to reality, but reducing it to 3 or so would
> >> shift the planner pretty nicely in the direction of indexscans.

This worked great!  Is their a place I can change the default to 3?
I do not want to change all the scripts to include this :)

> > Or how about changing current fudge factor ?
> > For example,from 0.5 to 0.2 which is the fudge factor of attdisbursion
> > calculation.
>
> Yes, that's another way --- and probably more defensible than changing
> random_page_cost, now that I think about it.  Unfortunately it's a
> hardwired constant and so not as easily experimented with :-(.
>
>                         regards, tom lane

Can you give me more information about this?  I do not have a problem
re-compiling the database and performing more testing if you would like.


Tom,

To answer your question in a previous post:
Since you find that in reality the indexscan method is very quick,
I'm guessing that there are actually fairly few tuples matching
host_id = 404.  Could you run a quick "select count(*)" to check?

procman=# select count(*) from medusa where host_id = 404;count
-------  680
(1 row)

procman=# select count(catagory) from medusa where host_id = 404 and
catagory like 'A%';count
-------    4
(1 row)


Thanks again everyone for all the help!  Now that I am finished with school
for the semester,
I should have time to make contributions again ... :)

Ryan





RE: Use of index in 7.0 vs 6.5

From
"Hiroshi Inoue"
Date:
> -----Original Message-----
> From: Bruce Momjian [mailto:pgman@candle.pha.pa.us]
> 
> > Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > > That certainly would make sense.  We have hesitated to gather more
> > > statistics because of the time involved.  Fuller statistics 
> on just the
> > > indexed columns could be a big win and be done fairly quickly because
> > > the rows are already sorted in the index.
> > 
> > Yeah, a scan over just the index itself would be a perfect way to
> > gather stats.  The normal objection to it (can't tell whether entries
> > correspond to currently-valid tuples) doesn't apply, because we don't
> > really care whether the stats are perfectly accurate.
> > 
> > Should put this in TODO, along with something about splitting the
> > ANALYZE function out of VACUUM and making it invokable as a separate
> > statement.
> 
> Added:
> 
> * Remove ANALYZE from VACUUM so it can be run separately without locks
> * Gather more accurate statistics using indexes
>

Gathering statistics using indexes on-fly is best.
However VACUUM(without ANALYZE) already scans all indexes using
vc_scanoneind()/vc_vaconeind().  Isn't it availble anyway ?

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp


Re: Use of index in 7.0 vs 6.5

From
Tom Lane
Date:
Ryan Bradetich <ryan_bradetich@hp.com> writes:
> This worked great!  Is their a place I can change the default to 3?
> I do not want to change all the scripts to include this :)

See src/include/optimizer/cost.h.  However, I am currently thinking of
taking Hiroshi's advice instead.  Lowering RANDOM_PAGE_COST seems like
a bad idea --- if anything, we might want to raise it ;-)

>>>> Or how about changing current fudge factor ?
>>>> For example,from 0.5 to 0.2 which is the fudge factor of attdisbursion
>>>> calculation.
>> 
>> Yes, that's another way --- and probably more defensible than changing
>> random_page_cost, now that I think about it.  Unfortunately it's a
>> hardwired constant and so not as easily experimented with :-(.

> Can you give me more information about this?  I do not have a problem
> re-compiling the database and performing more testing if you would like.

The fudge factor in question is currently 0.5, and is used in two places
in src/backend/utils/adt/selfuncs.c (looks like lines 193 and 212 in 7.0
sources).  I was thinking of dropping it to 0.25 or 0.1.
        regards, tom lane


Re: Use of index in 7.0 vs 6.5

From
Ryan Bradetich
Date:
Tom Lane wrote:

> Ryan Bradetich <ryan_bradetich@hp.com> writes:
> > This worked great!  Is their a place I can change the default to 3?
> > I do not want to change all the scripts to include this :)
>
> See src/include/optimizer/cost.h.  However, I am currently thinking of
> taking Hiroshi's advice instead.  Lowering RANDOM_PAGE_COST seems like
> a bad idea --- if anything, we might want to raise it ;-)
>
> >>>> Or how about changing current fudge factor ?
> >>>> For example,from 0.5 to 0.2 which is the fudge factor of attdisbursion
> >>>> calculation.
> >>
> >> Yes, that's another way --- and probably more defensible than changing
> >> random_page_cost, now that I think about it.  Unfortunately it's a
> >> hardwired constant and so not as easily experimented with :-(.
>
> > Can you give me more information about this?  I do not have a problem
> > re-compiling the database and performing more testing if you would like.
>
> The fudge factor in question is currently 0.5, and is used in two places
> in src/backend/utils/adt/selfuncs.c (looks like lines 193 and 212 in 7.0
> sources).  I was thinking of dropping it to 0.25 or 0.1.
>
>                         regards, tom lane

Tom and Hiroshi,

I modified the backend to 0.1 and this has been working great!  Thanks
again for the suggestion, and I'll let you know if we run into a problem.

Thanks again!

Ryan