Home > mailing lists

Re: Bad query plan - Mailing list pgsql-performance

From	Gavin Flower
Subject	Re: Bad query plan
Date	July 29, 2011 14:26:17
Msg-id	4E2C7D33.1000104@archidevsys.co.nz Whole thread Raw
In response to	Bad query plan (Дмитрий Васильев <dmitry.vasil@gmail.com>)
List	pgsql-performance

Tree view

On 25/07/11 02:06, Дмитрий Васильев wrote:
> I have a problem with poor query plan.
>
> My PostgreSQL is "PostgreSQL 8.4.8, compiled by Visual C++ build 1400,
> 32-bit" installed by EnterpriseDB installer on Windows 7 32 bit.
>
> Steps to reproduce:
>
> Start with fresh installation and execute the following:
>
> drop table if exists small;
> drop table if exists large;
>
> CREATE TABLE small
> (
>    id bigint,
>    primary key(id)
> );
>
> CREATE TABLE large
> (
>    id bigint,
>    primary key(id)
> );
>
> --Insert 100000 rows into large
> CREATE or replace FUNCTION populate_large() RETURNS bigint AS $$
> DECLARE
>      id1 bigint := 0;
> BEGIN
>      LOOP
>          insert into large(id) values(id1);
>          id1 := id1 +1;
>          if id1>100000 then
>              exit;
>          end if;
>      END LOOP;
>      return id1;
> END
> $$ LANGUAGE plpgsql;
>
> --Insert 1000 rows into small
> CREATE or replace FUNCTION populate_small() RETURNS bigint AS $$
> DECLARE
>      id1 bigint := 0;
> BEGIN
>      LOOP
>          insert into small(id) values(id1);
>          id1 := id1 +1;
>          if id1>1000 then
>              exit;
>          end if;
>      END LOOP;
>      return id1;
> END
> $$ LANGUAGE plpgsql;
>
> select populate_large(),populate_small();
> analyze;
>
> Then execute
>
> explain analyze insert into large(id) select id from small where id
> not in(select id from large);
>
> It gives
>
> "Seq Scan on small  (cost=1934.01..823278.28 rows=500 width=8) (actual
> time=6263.588..6263.588 rows=0 loops=1)"
> "  Filter: (NOT (SubPlan 1))"
> "  SubPlan 1"
> "    ->   Materialize  (cost=1934.01..3325.02 rows=100001 width=8)
> (actual time=0.007..3.012 rows=501 loops=1001)"
> "          ->   Seq Scan on large  (cost=0.00..1443.01 rows=100001
> width=8) (actual time=0.010..5.810 rows=1001 loops=1)"
> "Total runtime: 6263.703 ms"
>
> But
>
> explain analyze insert into large(id) select id from small where not
> exists (select id from large l where small.id=l.id);
>
> exeutes much faster:
>
> "Merge Anti Join  (cost=0.00..85.58 rows=1 width=8) (actual
> time=15.793..15.793 rows=0 loops=1)"
> "  Merge Cond: (small.id = l.id)"
> "  ->   Index Scan using small_pkey on small  (cost=0.00..43.27
> rows=1001 width=8) (actual time=0.025..3.515 rows=1001 loops=1)"
> "  ->   Index Scan using large_pkey on large l  (cost=0.00..3050.28
> rows=100001 width=8) (actual time=0.017..2.932 rows=1001 loops=1)"
> "Total runtime: 15.863 ms"
>
> Both queries are semantically the same.
>
Out of interest, I ran your code on my existing 9.1beta3 installation.

Notes
(1) the second SELECT ran a faster than the first.
(2) both plans are different to the ones you got

$ psql
psql (9.1beta3)
[...]
gavin=> explain analyze insert into large(id) select id from small where id
gavin-> not in(select id from large);
                                                         QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------
  Insert on large  (cost=1543.01..1559.02 rows=500 width=8) (actual
time=51.090..51.090 rows=0 loops=1)
    ->  Seq Scan on small  (cost=1543.01..1559.02 rows=500 width=8)
(actual time=51.087..51.087 rows=0 loops=1)
          Filter: (NOT (hashed SubPlan 1))
          SubPlan 1
            ->  Seq Scan on large  (cost=0.00..1443.01 rows=100001
width=8) (actual time=0.008..13.867 rows=100001 loops=1)
  Total runtime: 51.582 ms
(6 rows)

gavin=> explain analyze insert into large(id) select id from small where not
gavin-> exists (select id from large l where small.id=l.id);
                                                                 QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------
  Insert on large  (cost=0.00..80.94 rows=1 width=8) (actual
time=0.907..0.907 rows=0 loops=1)
    ->  Merge Anti Join  (cost=0.00..80.94 rows=1 width=8) (actual
time=0.906..0.906 rows=0 loops=1)
          Merge Cond: (small.id = l.id)
          ->  Index Scan using small_pkey on small  (cost=0.00..40.61
rows=1001 width=8) (actual time=0.010..0.225 rows=1001 loops=1)
          ->  Index Scan using large_pkey on large l
(cost=0.00..2800.12 rows=100001 width=8) (actual time=0.006..0.235
rows=1001 loops=1)
  Total runtime: 1.000 ms
(6 rows)

postgresql.conf parameters changed:
shared_buffers = 2GB
temp_buffers = 64MB
work_mem = 16MB
maintenance_work_mem = 512MB
max_stack_depth = 6MB
checkpoint_segments = 8
cpu_index_tuple_cost = 0.0025
cpu_operator_cost = 0.001
effective_cache_size = 2GB

pgsql-performance by date:

From: alan
Date: 29 July 2011, 14:26:14
Subject: Re: Trigger or Function

From: Vibhor Kumar
Date: 29 July 2011, 14:26:20
Subject: Re: [ADMIN] Restore database after drop command

Re: Bad query plan - Mailing list pgsql-performance

Previous

Next