Re: Bad query plan inside EXISTS clause - Mailing list pgsql-performance

From Benoit Delbosc
Subject Re: Bad query plan inside EXISTS clause
Date
Msg-id 4B97BC66.2030500@nuxeo.com
Whole thread Raw
In response to Re: Bad query plan inside EXISTS clause  (Yeb Havinga <yebhavinga@gmail.com>)
List pgsql-performance
Yeb Havinga a écrit :
> Yeb Havinga wrote:
>> Kenneth Marshall wrote:
>>> EXISTS matches NULLs too and since they are not indexed a
>>> sequential scan is needed to check for them. Try using
>>> IN instead.
>>>
>> This is nonsense in more than one way.
> Hit ctrl-return a bit too slow - exists does not match null but a set of
> records, that is either empty or not empty. Also it is possible to index
> table columns with nulls, and then the indexes can still be used.
> Besides filtering record sets with expressions, indexes are also used
> for ordering. There the effect of indexes with nulls can be seen: where
> to put them: in front or after the non nulls? So indexes can be
> perfectly used in conjunction with nulls. I found the original mail
> rather intriguing and played with an example myself a bit, but could not
> repeat the behavior (9.0 devel version), in my case the exists used an
> index. Maybe it has something to do with the fact that the planner
> estimates to return 50000 rows, even when the actual numbers list only 1
> hit. In the exists case, it can stop at the first hit. In the select all
> rows case, it must return all rows. Maybe a better plan emerges with
> better statistics?

Thanks for your quick investigation.

Changing the target statistics to 1000 for the column and analyzing the
table has not changed the query plan.

I just notice that using a "LIMIT 1" also gives a bad query plan

EXPLAIN ANALYZE SELECT 1 FROM read_acls_cache WHERE users_md5 =
'9bc9012eb29c0bb2ae3cc7b5e78c2acf' LIMIT 1;
                                                         QUERY PLAN

---------------------------------------------------------------------------------------
  Limit  (cost=0.00..1.19 rows=1 width=0) (actual time=366.771..366.772
rows=1 loops=1)
    ->  Seq Scan on read_acls_cache  (cost=0.00..62637.01 rows=52517
width=0) (actual time=366.769..366.769 rows=1 loops=1)
          Filter: ((users_md5)::text =
'9bc9012eb29c0bb2ae3cc7b5e78c2acf'::text)
  Total runtime: 366.806 ms
(4 rows)

Perhaps the EXISTS clause is also trying to limit the sub select query.

There are no NULL value on this column but only 49 distinct values for
2.5m rows, around 51k rows per value.

What I am trying to do is to know if a value is present or not in the
table, so far the fastest way is to use a COUNT:

EXPLAIN ANALYZE SELECT COUNT(1) FROM read_acls_cache WHERE users_md5 =
'9bc9012eb29c0bb2ae3cc7b5e78c2acf';
                                      QUERY PLAN
---------------------------------------------------------------------------
  Aggregate  (cost=35154.28..35154.29 rows=1 width=0) (actual
time=20.242..20.242 rows=1 loops=1)
    ->  Bitmap Heap Scan on read_acls_cache  (cost=2176.10..35022.98
rows=52517 width=0) (actual time=6.937..15.025 rows=51446 loops=1)
          Recheck Cond: ((users_md5)::text =
'9bc9012eb29c0bb2ae3cc7b5e78c2acf'::text)
          ->  Bitmap Index Scan on read_acls_cache_users_md5_idx
(cost=0.00..2162.97 rows=52517 width=0) (actual time=6.835..6.835
rows=51446 loops=1)
                Index Cond: ((users_md5)::text =
'9bc9012eb29c0bb2ae3cc7b5e78c2acf'::text)
  Total runtime: 20.295 ms
(6 rows)

ben

pgsql-performance by date:

Previous
From: Yeb Havinga
Date:
Subject: Re: Bad query plan inside EXISTS clause
Next
From: Tom Lane
Date:
Subject: Re: Bad query plan inside EXISTS clause