Re: Bad query plan - Mailing list pgsql-performance
From | Gavin Flower |
---|---|
Subject | Re: Bad query plan |
Date | |
Msg-id | 4E2C7D33.1000104@archidevsys.co.nz Whole thread Raw |
In response to | Bad query plan (Дмитрий Васильев <dmitry.vasil@gmail.com>) |
List | pgsql-performance |
On 25/07/11 02:06, Дмитрий Васильев wrote: > I have a problem with poor query plan. > > My PostgreSQL is "PostgreSQL 8.4.8, compiled by Visual C++ build 1400, > 32-bit" installed by EnterpriseDB installer on Windows 7 32 bit. > > Steps to reproduce: > > Start with fresh installation and execute the following: > > drop table if exists small; > drop table if exists large; > > CREATE TABLE small > ( > id bigint, > primary key(id) > ); > > CREATE TABLE large > ( > id bigint, > primary key(id) > ); > > --Insert 100000 rows into large > CREATE or replace FUNCTION populate_large() RETURNS bigint AS $$ > DECLARE > id1 bigint := 0; > BEGIN > LOOP > insert into large(id) values(id1); > id1 := id1 +1; > if id1>100000 then > exit; > end if; > END LOOP; > return id1; > END > $$ LANGUAGE plpgsql; > > --Insert 1000 rows into small > CREATE or replace FUNCTION populate_small() RETURNS bigint AS $$ > DECLARE > id1 bigint := 0; > BEGIN > LOOP > insert into small(id) values(id1); > id1 := id1 +1; > if id1>1000 then > exit; > end if; > END LOOP; > return id1; > END > $$ LANGUAGE plpgsql; > > select populate_large(),populate_small(); > analyze; > > Then execute > > explain analyze insert into large(id) select id from small where id > not in(select id from large); > > It gives > > "Seq Scan on small (cost=1934.01..823278.28 rows=500 width=8) (actual > time=6263.588..6263.588 rows=0 loops=1)" > " Filter: (NOT (SubPlan 1))" > " SubPlan 1" > " -> Materialize (cost=1934.01..3325.02 rows=100001 width=8) > (actual time=0.007..3.012 rows=501 loops=1001)" > " -> Seq Scan on large (cost=0.00..1443.01 rows=100001 > width=8) (actual time=0.010..5.810 rows=1001 loops=1)" > "Total runtime: 6263.703 ms" > > But > > explain analyze insert into large(id) select id from small where not > exists (select id from large l where small.id=l.id); > > exeutes much faster: > > "Merge Anti Join (cost=0.00..85.58 rows=1 width=8) (actual > time=15.793..15.793 rows=0 loops=1)" > " Merge Cond: (small.id = l.id)" > " -> Index Scan using small_pkey on small (cost=0.00..43.27 > rows=1001 width=8) (actual time=0.025..3.515 rows=1001 loops=1)" > " -> Index Scan using large_pkey on large l (cost=0.00..3050.28 > rows=100001 width=8) (actual time=0.017..2.932 rows=1001 loops=1)" > "Total runtime: 15.863 ms" > > Both queries are semantically the same. > Out of interest, I ran your code on my existing 9.1beta3 installation. Notes (1) the second SELECT ran a faster than the first. (2) both plans are different to the ones you got $ psql psql (9.1beta3) [...] gavin=> explain analyze insert into large(id) select id from small where id gavin-> not in(select id from large); QUERY PLAN --------------------------------------------------------------------------------------------------------------------------- Insert on large (cost=1543.01..1559.02 rows=500 width=8) (actual time=51.090..51.090 rows=0 loops=1) -> Seq Scan on small (cost=1543.01..1559.02 rows=500 width=8) (actual time=51.087..51.087 rows=0 loops=1) Filter: (NOT (hashed SubPlan 1)) SubPlan 1 -> Seq Scan on large (cost=0.00..1443.01 rows=100001 width=8) (actual time=0.008..13.867 rows=100001 loops=1) Total runtime: 51.582 ms (6 rows) gavin=> explain analyze insert into large(id) select id from small where not gavin-> exists (select id from large l where small.id=l.id); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------- Insert on large (cost=0.00..80.94 rows=1 width=8) (actual time=0.907..0.907 rows=0 loops=1) -> Merge Anti Join (cost=0.00..80.94 rows=1 width=8) (actual time=0.906..0.906 rows=0 loops=1) Merge Cond: (small.id = l.id) -> Index Scan using small_pkey on small (cost=0.00..40.61 rows=1001 width=8) (actual time=0.010..0.225 rows=1001 loops=1) -> Index Scan using large_pkey on large l (cost=0.00..2800.12 rows=100001 width=8) (actual time=0.006..0.235 rows=1001 loops=1) Total runtime: 1.000 ms (6 rows) postgresql.conf parameters changed: shared_buffers = 2GB temp_buffers = 64MB work_mem = 16MB maintenance_work_mem = 512MB max_stack_depth = 6MB checkpoint_segments = 8 cpu_index_tuple_cost = 0.0025 cpu_operator_cost = 0.001 effective_cache_size = 2GB
pgsql-performance by date: