Re: BUG #13824: EXISTS sometimes uses seq scan instead of index - Mailing list pgsql-bugs

From Kevin Grittner
Subject Re: BUG #13824: EXISTS sometimes uses seq scan instead of index
Date
Msg-id CACjxUsNp4uEx3xsunw4wVpBDVomas7o6hnv_49bSbaz-HAVdyA@mail.gmail.com
Whole thread Raw
In response to Re: BUG #13824: EXISTS sometimes uses seq scan instead of index  (Grzegorz Garlewicz <grzegorz@thulium.pl>)
List pgsql-bugs
On Mon, Dec 21, 2015 at 9:06 AM, Grzegorz Garlewicz <grzegorz@thulium.pl> wrote:
> Could you please take a look at this one once again?
>
> On Fri, Dec 18, 2015 at 10:23 AM, Grzegorz Garlewicz <grzegorz@thulium.pl> wrote:
>>
>> I did just what you said - reduced random_page cost from 4 to 2
>> then 1 and then 0.5.

It never really makes sense to set random_page_cost less than
seq_page_cost; although setting them to equal values can make sense
if your data is heavily cached.

>> If I'm not mistaken, the issue seems to originate from the
>> planner's thinking it needs to look up all the rows for EXISTS
>> clause, not just a single one, so it thinks the cost would be
>> much bigger.

No, it does not think that.  It knows that an EXISTS test can stop
after finding a single matching row.

What it does think is that values are fairly evenly distributed
(i.e., if a value is 1% of a table you will only need to read about
100 rows before seeing one, rather than finding them all clumped at
the end of the table) and that there is no correlation between
values (i.e., any given id_status value is not more common for one
id_outbound value than another).  There is work in progress to try
to allow for correlated values, and ideas on how to deal with
uneven distribution.

One thing that might help, in addition to reducing random_page_cost
to be equal to or just above seq_page_cost, is to increase
cpu_tuple_cost to something like 0.03 to 0.05.  Benchmarking with
real-world applications I was running showed better plans chosen
with numbers in that range than with lower numbers.

Of course, if the statistics are not up-to-date it doesn't have
much chance of using accurate numbers and is likely to choose a bad
plan.  Make sure that autovacuum is tuned to be aggressive enough,
and if there are big changes to a table you may want to use an
explicit ANALYZE (or VACUUM ANALYZE) before running queries which
reference data in the table.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-bugs by date:

Previous
From: henrik.pauli@uhusystems.com
Date:
Subject: BUG #13829: Exponentiation operator is left-associative
Next
From: Tom Lane
Date:
Subject: Re: BUG #13829: Exponentiation operator is left-associative