Re: *very* inefficient choice made by the planner (regarding - Mailing list pgsql-performance
From | Jean-Luc Lachance |
---|---|
Subject | Re: *very* inefficient choice made by the planner (regarding |
Date | |
Msg-id | 40C881FE.5000301@sympatico.ca Whole thread Raw |
In response to | *very* inefficient choice made by the planner (regarding IN(...)) (Frank van Vugt <ftm.van.vugt@foxi.nl>) |
Responses |
Re: *very* inefficient choice made by the planner (regarding
|
List | pgsql-performance |
The real question is: If the two statments are functionally equivalent, why can't PG rewrite the "NOT IN" version into the more efficient "NOT EXISTS"? Frank van Vugt wrote: > L.S. > > Could anybody explain why the planner is doing what it is doing? > > What could I do to make it easier to choose a better plan? > > > > ********* > Summary > ********* > On a freshly vacuum/analysed pair of tables with 7389 and 64333 records, this: > > select id from location where id not in (select location_id from > location_carrier); > > takes 581546,497 ms > > > While a variant like: > > select id from location where not exists (select 1 from location_carrier where > location_id = location.id); > > takes only 124,625 ms > > > ********* > Details > ********* > =# select version(); > version > --------------------------------------------------------------------- > PostgreSQL 7.4.2 on i686-pc-linux-gnu, compiled by GCC egcs-2.91.66 > (1 row) > > > =# \d location > Table "public.location" > Column | Type | Modifiers > ------------+-----------------------------+----------- > id | integer | not null > Indexes: > "location_pkey" primary key, btree (id) > > > =# select count(*) from location; > count > ------- > 7389 > (1 row) > > > =# \d location_carrier > Table "public.location_carrier" > Column | Type | Modifiers > ---------------------+-----------------------------+----------- > location_id | integer | not null > carrier_id | integer | not null > Indexes: > "location_carrier_pkey" primary key, btree (location_id, carrier_id) > > > =# select count(*) from location_carrier; > count > ------- > 64333 > (1 row) > > > =# explain select id from location where id not in (select location_id from > location_carrier); > QUERY PLAN > ------------------------------------------------------------------------------- > Seq Scan on "location" (cost=0.00..5077093.72 rows=3695 width=4) > Filter: (NOT (subplan)) > SubPlan > -> Seq Scan on location_carrier (cost=0.00..1213.33 rows=64333 width=4) > (4 rows) > > > =# explain analyse select id from location where id not in (select location_id > from location_carrier); > QUERY PLAN > --------------------------------------------------------------------------------------------------------------------------------- > Seq Scan on "location" (cost=0.00..5077093.72 rows=3695 width=4) (actual > time=248.310..581541.483 rows=240 loops=1) > Filter: (NOT (subplan)) > SubPlan > -> Seq Scan on location_carrier (cost=0.00..1213.33 rows=64333 width=4) > (actual time=0.007..48.517 rows=19364 loops=7389) > Total runtime: 581542.560 ms > (5 rows) > > Time: 581546,497 ms > > > =# explain analyse select id from location l left outer join location_carrier > lc on l.id = lc.location_id where lc.location_id is null; > QUERY > PLAN > -------------------------------------------------------------------------------------------------------------------------------------------------------------- > Merge Left Join (cost=0.00..3022.51 rows=7389 width=4) (actual > time=0.083..435.841 rows=240 loops=1) > Merge Cond: ("outer".id = "inner".location_id) > Filter: ("inner".location_id IS NULL) > -> Index Scan using location_pkey on "location" l (cost=0.00..258.85 > rows=7389 width=4) (actual time=0.041..26.211 rows=7389 loops=1) > -> Index Scan using location_carrier_pkey on location_carrier lc > (cost=0.00..1941.22 rows=64333 width=4) (actual time=0.015..238.305 > rows=64333 loops=1) > Total runtime: 436.213 ms > (6 rows) > > Time: 440,787 ms > > > megafox=# explain analyse select id from location where not exists (select 1 > from location_carrier where location_id = location.id); > QUERY > PLAN > ----------------------------------------------------------------------------------------------------------------------------------------------------- > Seq Scan on "location" (cost=0.00..13242.14 rows=3695 width=4) (actual > time=0.078..120.785 rows=240 loops=1) > Filter: (NOT (subplan)) > SubPlan > -> Index Scan using location_carrier_pkey on location_carrier > (cost=0.00..17.61 rows=10 width=0) (actual time=0.011..0.011 rows=1 > loops=7389) > Index Cond: (location_id = $0) > Total runtime: 121.165 ms > (6 rows) > > Time: 124,625 ms > > > > > > >
pgsql-performance by date: