Re: Yet another "Why won't PostgreSQL use my index?" - Mailing list pgsql-general

From Gregory Wood
Subject Re: Yet another "Why won't PostgreSQL use my index?"
Date
Msg-id 003f01c2188c$b40093e0$7889ffcc@comstock.com
Whole thread Raw
In response to Yet another "Why won't PostgreSQL use my index?"  ("Gregory Wood" <gregw@com-stock.com>)
Responses Re: Yet another "Why won't PostgreSQL use my index?"  (Manfred Koizar <mkoi-pg@aon.at>)
Re: Yet another "Why won't PostgreSQL use my index?"  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Yet another "Why won't PostgreSQL use my index?"  (Stephan Szabo <sszabo@megazone23.bigpanda.com>)
List pgsql-general
----- Original Message -----
From: "Manfred Koizar" <mkoi-pg@aon.at>
To: "Gregory Wood" <gregw@com-stock.com>
Cc: "PostgreSQL-General" <pgsql-general@postgresql.org>
Sent: Thursday, June 20, 2002 12:55 PM
Subject: Re: [GENERAL] Yet another "Why won't PostgreSQL use my index?"


> On Thu, 20 Jun 2002 12:05:53 -0400, "Gregory Wood"
> <gregw@com-stock.com> wrote:
> >cns=# explain analyze select * from re_site_listings_index where
> >idx_siteid=237;
> >NOTICE:  QUERY PLAN:
> >
> >Seq Scan on re_site_listings_index  (cost=0.00..41050.76 rows=12749
> >width=302) (actual time=158.57..2839.78 rows=354 loops=1)
> >Total runtime: 2841.60 msec
> >
> >EXPLAIN
> >cns=# set enable_seqscan=false;
> >SET VARIABLE
> >cns=# explain analyze select * from re_site_listings_index where
> >idx_siteid=237;
> >NOTICE:  QUERY PLAN:
> >
> >Index Scan using bill_idx_siteid on re_site_listings_index
> >(cost=0.00..48402.08 rows=12749 width=302) (actual time=0.30..3.95
rows=354
> >loops=1)
> >Total runtime: 5.76 msec
>
> Greg,
>
> apparently random_page_cost is set to the default value of 4.
> The planner assumes that the rows are scattered all over the table and
> that it has to do 12000 random page reads;  the total cost is
> calculated to be approx. 12000 * random_page_cost = 48000, which is
> more than the estimated 41000 for a seq scan.  So a seq scan looks
> cheaper.
>
> SET random_page_cost=3;
>
> and try again.  Experiment with other values, I guess you will see a
> change somewhere between 3.3 and 3.5.
>
> In fact the tuples seem to be close to each other, so several of them
> fit on the same page, but the planner does not know this.  I'm sorry,
> I don't know how to tell it.
>
> But as long as setting random_page_cost to a lower value helps, this
> should be ok.  The default value of 4 seems to be too high for many
> situations, anyway.
>
> Servus
>  Manfred

Very, very helpful information... thank you Manfred!

I'll have to play around with this setting a bit to find what values are
best; I'm just worried that I might set it too far in the other direction
and thereby screw things up.

I guess the best way to approach that particular tuning problem is to find a
query where the estimated row numbers is close to the actual page numbers
and then try different values until the random page reads start to become
slower than the sequential scan. Fun fun.

Of course if PostgreSQL were estimating the number of rows correctly, that
would be less of a problem. Seems that our data is throwing off the
statistics... we have some values that appear tens of thousands of times and
others that appear only a few times, with a few values (such as the example
I sent) in between. Perhaps it's time to look at TABLE SET STATISTICS...

Greg


pgsql-general by date:

Previous
From: "Elmshauser, Erik"
Date:
Subject: selecting all records where a column is null
Next
From: Andrew Sullivan
Date:
Subject: Re: database size