Home > mailing lists

plan variations: join vs. exists vs. row comparison - Mailing list pgsql-performance

From	Jon Nelson
Subject	plan variations: join vs. exists vs. row comparison
Date	March 7, 2011 15:15:22
Msg-id	AANLkTikAH2aWpo78cW13AJDvxwgu=NsEokiTfjWQdjp3@mail.gmail.com Whole thread Raw
Responses	Re: plan variations: join vs. exists vs. row comparison Re: plan variations: join vs. exists vs. row comparison
List	pgsql-performance

Tree view

Originally, I posted to -general but I found some time to write some
samples, and realized it's probably more of a performance question.

The original post is here:
http://archives.postgresql.org/pgsql-general/2011-03/msg00198.php

I was hoping that somebody could help me understand the differences
between three plans.
All of the plans are updating a table using a second table, and should
be logically equivalent.
Two of the plans use joins, and one uses an exists subquery.
One of the plans uses row constructors and IS NOT DISTINCT FROM. It is
this plan which has really awful performance.
Clearly it is due to the nested loop, but why would the planner choose
that approach?

I also don't understand why in the 'exists' plan the planner thinks
the index scan will provide 1019978 rows, when there are only 1000000,
but that is a lesser issue.

Here is a sample SQL file which demonstrates the issues and includes
all three variations.

begin;
create temporary table t7 (
  i BIGINT NOT NULL,
  k BIGINT
);

create temporary table t8 (
  i BIGINT NOT NULL,
  j INT
);

CREATE FUNCTION populate_t8()
RETURNS VOID
LANGUAGE SQL
AS
$$
truncate t8;
insert into t8
SELECT i, 1 from t7
ORDER BY i LIMIT 10000;

insert into t8
SELECT i, 2 from t7
WHERE i > 10000
ORDER BY i LIMIT 10000;

SELECT i, 3 from t7
WHERE i > 20000
ORDER BY i LIMIT 20000;

analyze t8;
$$;

INSERT INTO t7
select x, x + 10 from generate_series(1,1000000) as x ;
analyze t7;

select populate_t8();

explain analyze verbose
update
  t7
SET
  k = 1
FROM
  t8
WHERE
  t7.i = t8.i
  AND
  (
    t8.j = 2
    OR
    t8.j = 1
  );

select populate_t8();

explain analyze verbose
update
  t7
SET
  k = 1
WHERE
  EXISTS (
    SELECT 1 FROM t8
    WHERE t8.i = t7.i
    AND
    (
      t8.j = 2
      OR
      t8.j = 1
    )
  );

select populate_t8();

explain
update
  t7
SET
  k = 1
FROM
  t8
WHERE
  ROW(t7.i) IS NOT DISTINCT FROM ROW(t8.i)
  AND
  (
    t8.j = 2
    OR
    t8.j = 1
  );

explain analyze verbose
update
  t7
SET
  k = 1
FROM
  t8
WHERE
  ROW(t7.i) IS NOT DISTINCT FROM ROW(t8.i)
  AND
  (
    t8.j = 2
    OR
    t8.j = 1
  );

rollback;





--
Jon

pgsql-performance by date:

From: Cédric Villemain
Date: 07 March 2011, 11:48:35
Subject: Re: Anyone tried Flashcache with PostgreSQL?

From: Robert Haas
Date: 07 March 2011, 15:26:00
Subject: Re: Performance trouble finding records through related records

plan variations: join vs. exists vs. row comparison - Mailing list pgsql-performance

Previous

Next