Re: Tell postgres which index to use? - Mailing list pgsql-performance

From Greg Stark
Subject Re: Tell postgres which index to use?
Date
Msg-id 873bw5e9vl.fsf@stark.xeocode.com
Whole thread Raw
In response to Re: Tell postgres which index to use?  (John Arbash Meinel <john@arbash-meinel.com>)
List pgsql-performance
John Arbash Meinel <john@arbash-meinel.com> writes:

> >    ->  Hash  (cost=1418.68..1418.68 rows=3226 width=4) (actual
> > time=77.062..77.062 rows=0 loops=1)
>
> This seems to be at least one of the problems. The planner thinks there
> are going to be 3000+ rows, but in reality there are 0.

No, that's a red herring. Hash nodes always report 0 rows.

> >  Nested Loop  (cost=0.00..23849.81 rows=7533 width=8) (actual time=0.341..198.162 rows=5798 loops=1)
> >    ->  Seq Scan on pdb_entry  (cost=0.00..1418.68 rows=3226 width=4) (actual time=0.145..78.177 rows=3329 loops=1)
> >          Filter: ((resolution > 0::double precision) AND (resolution < 1.7::double precision))
> >    ->  Index Scan using chain_pdb_id_ind on "chain"  (cost=0.00..6.87 rows=6 width=8) (actual time=0.021..0.027
rows=2loops=3329) 
> >          Index Cond: ("outer".id = "chain".pdb_id)

The actual number of records is pretty close to the estimated number. And the
difference seems to come primarily from selectivity of the join where it
thinks an average of 6 rows will match every row whereas in fact an average of
about 2 rows matches.

So it thinks it's going to read about 18,000 records out of 67,000 or about
25%. In that case the sequential scan is almost certainly better. In fact it's
going to read about 6,000 or just under 10%, in which case the sequential scan
is probably still better but it's not so clear.

I suspect the only reason you're seeing such a big difference when I would
expect it to be about even is because nearly all the data is cached. In that
case the non-sequential access pattern of the nested loop has little effect.

You might get away with lowering random_page_cost but since it thinks it's
going to read 25% of the table I suspect you'll have to get very close to 1
before it switches over, if it does even then. Be careful about tuning
settings like this based on a single query, especially to unrealistically low
values.

You might also want to try raising the statistics target on pdb_entry. See if
that makes the estimate go down from 6 to closer to 2.

--
greg

pgsql-performance by date:

Previous
From: "Paul Johnson"
Date:
Subject: Re: Solaris 9 tuning
Next
From: Greg Stark
Date:
Subject: Re: Performance Tuning