Re: Bad query plan inside EXISTS clause - Mailing list pgsql-performance

From Benoit Delbosc
Subject Re: Bad query plan inside EXISTS clause
Date
Msg-id 4B97CFFA.6030000@nuxeo.com
Whole thread Raw
In response to Re: Bad query plan inside EXISTS clause  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
Tom Lane a écrit :
> Benoit Delbosc <bdelbosc@nuxeo.com> writes:
>> I am trying to understand why inside an EXISTS clause the query planner
>>   does not use the index:
>
> I'm not sure this plan is as bad as all that.  The key point is that the
> planner is expecting 52517 rows that match that users_md5 value (and the
> true number is evidently 51446, so that estimate isn't far off).  That's
> about 1/48th of the table.  It knows that the EXISTS case can stop as
> soon as it finds one match, so it's betting that a plain seqscan will
> hit a match faster than an index lookup would be able to, ie,
> seqscanning about 48 tuples is faster than one index lookup.  This might
> be a bad bet if the users_md5 values are correlated with physical order,
> ie the matches are not randomly scattered but are all towards the end of
> the table.
exact, the data is not randomly scattered but ordered this explains why
in my case seq scan is a bad bet

  Barring that, though, it could be a good bet if the table
> isn't swapped in.  Which is what the default cost parameters are set
> up to assume.
there are lots of shared buffers and effective memory on this instance,
the query is executed many times I can assume that the table isn't
swapped in right ?

> I suspect your real complaint is that you expect the table to be swapped
> in, in which case what you ought to be doing is adjusting the planner's
> cost parameters.  Some playing around here with a similar case suggests
> that even a small reduction in random_page_cost would make it prefer an
> indexscan for this type of situation.
excellent !

Changing the random_page_cost from 4 to 2 do the trick

SET random_page_cost = 2;
EXPLAIN ANALYZE SELECT 1 WHERE EXISTS (SELECT 1 FROM read_acls_cache
WHERE users_md5 = '9bc9012eb29c0bb2ae3cc7b5e78c2acf');


    QUERY PLAN


---------------------------------------------------------------------------------------------------------------------------------------------------------------
  Result  (cost=1.06..1.07 rows=1 width=0) (actual time=0.048..0.048
rows=1 loops=1)
    One-Time Filter: $0
    InitPlan 1 (returns $0)
      ->  Index Scan using read_acls_cache_users_md5_idx on
read_acls_cache  (cost=0.00..55664.21 rows=52517 width=0) (actual
time=0.045..0.045 rows=1 loops=1)
            Index Cond: ((users_md5)::text =
'9bc9012eb29c0bb2ae3cc7b5e78c2acf'::text)
  Total runtime: 0.087 ms
(6 rows)

466/0.087 = 5360 thanks !

kind regards

ben

pgsql-performance by date:

Previous
From: Harald Fuchs
Date:
Subject: Re: Strange workaround for slow query
Next
From: Robert Haas
Date:
Subject: Re: Strange workaround for slow query