BUG #6335: Weird planner decision with exists (a join b) condition - Mailing list pgsql-bugs

From maxim.boguk@gmail.com
Subject BUG #6335: Weird planner decision with exists (a join b) condition
Date
Msg-id E1RamiQ-0001jT-V9@wrigleys.postgresql.org
Whole thread Raw
Responses Re: BUG #6335: Weird planner decision with exists (a join b) condition
Re: BUG #6335: Weird planner decision with exists (a join b) condition
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      6335
Logged by:          Maksym Boguk
Email address:      maxim.boguk@gmail.com
PostgreSQL version: 9.0.4
Operating system:   Linux Ubuntu
Description:=20=20=20=20=20=20=20=20

I was explored reasons of high DB load and I localized the next problem
query:

That is correct version:

EXPLAIN ANALYZE  select *
from applicant_adv_subscription aas
where
aas.user_id in (5112699)
and exists (
SELECT * from resume
join resume_view_history using (resume_id)
where
resume.user_id =3D aas.user_id
);



=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20
     QUERY PLAN=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20
=20=20=20=20=20=20=20=20=20=20=20=20=20=20
---------------------------------------------------------------------------=
---------------------------------------------------------------------------=
------------------------
 Nested Loop Semi Join  (cost=3D0.00..20.51 rows=3D1 width=3D65) (actual
time=3D0.031..0.032 rows=3D1 loops=3D1)
   ->  Index Scan using applicant_adv_subscription_user_id_key on
applicant_adv_subscription aas  (cost=3D0.00..0.02 rows=3D1 width=3D65) (ac=
tual
time=3D0.011..0.012 rows=3D1 loops=3D1)
         Index Cond: (user_id =3D 5112699)
   ->  Nested Loop  (cost=3D0.00..20.49 rows=3D3118 width=3D4) (actual
time=3D0.018..0.018 rows=3D1 loops=3D1)
         ->  Index Scan using resume_user_id_key on resume  (cost=3D0.00..0=
.13
rows=3D18 width=3D8) (actual time=3D0.008..0.008 rows=3D1 loops=3D1)
               Index Cond: (user_id =3D 5112699)
         ->  Index Scan using resume_view_history_fk73b63ccd36b06a5 on
resume_view_history  (cost=3D0.00..0.95 rows=3D173 width=3D4) (actual
time=3D0.009..0.009 rows=3D1 loops=3D1)
               Index Cond: (resume_view_history.resume_id =3D
resume.resume_id)
 Total runtime: 0.080 ms


But once I add second value into IN list plan become completely screwed:

EXPLAIN analyze select *
from applicant_adv_subscription aas
where
aas.user_id in (5112699,7995496)
and exists (
SELECT * from resume
join resume_view_history using (resume_id)
where
resume.user_id =3D aas.user_id
);


                                                                     QUERY
PLAN
---------------------------------------------------------------------------=
--------------------------------------------------------------------------
 Nested Loop Semi Join  (cost=3D38967.39..735076.65 rows=3D2 width=3D65) (a=
ctual
time=3D14656.388..389866.211 rows=3D1 loops=3D1)
   Join Filter: (aas.user_id =3D resume.user_id)
   ->  Bitmap Heap Scan on applicant_adv_subscription aas  (cost=3D0.02..0.=
04
rows=3D2 width=3D65) (actual time=3D0.018..0.021 rows=3D2 loops=3D1)
         Recheck Cond: (user_id =3D ANY ('{5112699,7995496}'::integer[]))
         ->  Bitmap Index Scan on applicant_adv_subscription_user_id_key=20
(cost=3D0.00..0.02 rows=3D2 width=3D0) (actual time=3D0.014..0.014 rows=3D2=
 loops=3D1)
               Index Cond: (user_id =3D ANY ('{5112699,7995496}'::integer[]=
))
   ->  Hash Join  (cost=3D38967.36..726839.23 rows=3D272203680 width=3D4) (=
actual
time=3D13267.456..182842.841 rows=3D136136926 loops=3D2)
         Hash Cond: (resume_view_history.resume_id =3D resume.resume_id)
         ->  Seq Scan on resume_view_history  (cost=3D0.00..282228.92
rows=3D272203680 width=3D4) (actual time=3D0.004..25574.666 rows=3D136161776
loops=3D2)
         ->  Hash  (cost=3D21737.05..21737.05 rows=3D16110150 width=3D8) (a=
ctual
time=3D13260.145..13260.145 rows=3D16114222 loops=3D2)
               Buckets: 2097152  Batches: 2  Memory Usage: 314776kB
               ->  Seq Scan on resume  (cost=3D0.00..21737.05 rows=3D161101=
50
width=3D8) (actual time=3D0.005..8839.480 rows=3D16114222 loops=3D2)
 Total runtime: 389866.374 ms


Ooops.

Changes in *_cost settings have no effect.
Disabling seq_scan/merge_join/hash_join doest not help as well.

Nothing special about tables, all required indexes on the place.

pgsql-bugs by date:

Previous
From: Wilfried.Weiss@nsg.com
Date:
Subject: BUG #6334: initdb not working
Next
From: Alvaro Herrera
Date:
Subject: Re: BUG #6335: Weird planner decision with exists (a join b) condition