*very* inefficient choice made by the planner (regarding IN(...)) - Mailing list pgsql-performance
From | Frank van Vugt |
---|---|
Subject | *very* inefficient choice made by the planner (regarding IN(...)) |
Date | |
Msg-id | 200406101624.21739.ftm.van.vugt@foxi.nl Whole thread Raw |
Responses |
Re: *very* inefficient choice made by the planner (regarding
Re: *very* inefficient choice made by the planner (regarding IN(...)) Re: *very* inefficient choice made by the planner (regarding |
List | pgsql-performance |
L.S. Could anybody explain why the planner is doing what it is doing? What could I do to make it easier to choose a better plan? ********* Summary ********* On a freshly vacuum/analysed pair of tables with 7389 and 64333 records, this: select id from location where id not in (select location_id from location_carrier); takes 581546,497 ms While a variant like: select id from location where not exists (select 1 from location_carrier where location_id = location.id); takes only 124,625 ms ********* Details ********* =# select version(); version --------------------------------------------------------------------- PostgreSQL 7.4.2 on i686-pc-linux-gnu, compiled by GCC egcs-2.91.66 (1 row) =# \d location Table "public.location" Column | Type | Modifiers ------------+-----------------------------+----------- id | integer | not null Indexes: "location_pkey" primary key, btree (id) =# select count(*) from location; count ------- 7389 (1 row) =# \d location_carrier Table "public.location_carrier" Column | Type | Modifiers ---------------------+-----------------------------+----------- location_id | integer | not null carrier_id | integer | not null Indexes: "location_carrier_pkey" primary key, btree (location_id, carrier_id) =# select count(*) from location_carrier; count ------- 64333 (1 row) =# explain select id from location where id not in (select location_id from location_carrier); QUERY PLAN ------------------------------------------------------------------------------- Seq Scan on "location" (cost=0.00..5077093.72 rows=3695 width=4) Filter: (NOT (subplan)) SubPlan -> Seq Scan on location_carrier (cost=0.00..1213.33 rows=64333 width=4) (4 rows) =# explain analyse select id from location where id not in (select location_id from location_carrier); QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------- Seq Scan on "location" (cost=0.00..5077093.72 rows=3695 width=4) (actual time=248.310..581541.483 rows=240 loops=1) Filter: (NOT (subplan)) SubPlan -> Seq Scan on location_carrier (cost=0.00..1213.33 rows=64333 width=4) (actual time=0.007..48.517 rows=19364 loops=7389) Total runtime: 581542.560 ms (5 rows) Time: 581546,497 ms =# explain analyse select id from location l left outer join location_carrier lc on l.id = lc.location_id where lc.location_id is null; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------- Merge Left Join (cost=0.00..3022.51 rows=7389 width=4) (actual time=0.083..435.841 rows=240 loops=1) Merge Cond: ("outer".id = "inner".location_id) Filter: ("inner".location_id IS NULL) -> Index Scan using location_pkey on "location" l (cost=0.00..258.85 rows=7389 width=4) (actual time=0.041..26.211 rows=7389 loops=1) -> Index Scan using location_carrier_pkey on location_carrier lc (cost=0.00..1941.22 rows=64333 width=4) (actual time=0.015..238.305 rows=64333 loops=1) Total runtime: 436.213 ms (6 rows) Time: 440,787 ms megafox=# explain analyse select id from location where not exists (select 1 from location_carrier where location_id = location.id); QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------- Seq Scan on "location" (cost=0.00..13242.14 rows=3695 width=4) (actual time=0.078..120.785 rows=240 loops=1) Filter: (NOT (subplan)) SubPlan -> Index Scan using location_carrier_pkey on location_carrier (cost=0.00..17.61 rows=10 width=0) (actual time=0.011..0.011 rows=1 loops=7389) Index Cond: (location_id = $0) Total runtime: 121.165 ms (6 rows) Time: 124,625 ms -- Best, Frank.
pgsql-performance by date: