Re: Bad query plan inside EXISTS clause - Mailing list pgsql-performance

From Tom Lane
Subject Re: Bad query plan inside EXISTS clause
Date
Msg-id 15947.1268235860@sss.pgh.pa.us
Whole thread Raw
In response to Bad query plan inside EXISTS clause  (Benoit Delbosc <bdelbosc@nuxeo.com>)
Responses Re: Bad query plan inside EXISTS clause
List pgsql-performance
Benoit Delbosc <bdelbosc@nuxeo.com> writes:
> I am trying to understand why inside an EXISTS clause the query planner
>   does not use the index:

I'm not sure this plan is as bad as all that.  The key point is that the
planner is expecting 52517 rows that match that users_md5 value (and the
true number is evidently 51446, so that estimate isn't far off).  That's
about 1/48th of the table.  It knows that the EXISTS case can stop as
soon as it finds one match, so it's betting that a plain seqscan will
hit a match faster than an index lookup would be able to, ie,
seqscanning about 48 tuples is faster than one index lookup.  This might
be a bad bet if the users_md5 values are correlated with physical order,
ie the matches are not randomly scattered but are all towards the end of
the table.  Barring that, though, it could be a good bet if the table
isn't swapped in.  Which is what the default cost parameters are set
up to assume.

I suspect your real complaint is that you expect the table to be swapped
in, in which case what you ought to be doing is adjusting the planner's
cost parameters.  Some playing around here with a similar case suggests
that even a small reduction in random_page_cost would make it prefer an
indexscan for this type of situation.

            regards, tom lane

pgsql-performance by date:

Previous
From: Benoit Delbosc
Date:
Subject: Re: Bad query plan inside EXISTS clause
Next
From: Harald Fuchs
Date:
Subject: Re: Strange workaround for slow query