plan variations: join vs. exists vs. row comparison - Mailing list pgsql-performance

From Jon Nelson
Subject plan variations: join vs. exists vs. row comparison
Date
Msg-id AANLkTikAH2aWpo78cW13AJDvxwgu=NsEokiTfjWQdjp3@mail.gmail.com
Whole thread Raw
Responses Re: plan variations: join vs. exists vs. row comparison  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: plan variations: join vs. exists vs. row comparison  (Merlin Moncure <mmoncure@gmail.com>)
List pgsql-performance
Originally, I posted to -general but I found some time to write some
samples, and realized it's probably more of a performance question.

The original post is here:
http://archives.postgresql.org/pgsql-general/2011-03/msg00198.php

I was hoping that somebody could help me understand the differences
between three plans.
All of the plans are updating a table using a second table, and should
be logically equivalent.
Two of the plans use joins, and one uses an exists subquery.
One of the plans uses row constructors and IS NOT DISTINCT FROM. It is
this plan which has really awful performance.
Clearly it is due to the nested loop, but why would the planner choose
that approach?

I also don't understand why in the 'exists' plan the planner thinks
the index scan will provide 1019978 rows, when there are only 1000000,
but that is a lesser issue.

Here is a sample SQL file which demonstrates the issues and includes
all three variations.

begin;
create temporary table t7 (
  i BIGINT NOT NULL,
  k BIGINT
);

create temporary table t8 (
  i BIGINT NOT NULL,
  j INT
);

CREATE FUNCTION populate_t8()
RETURNS VOID
LANGUAGE SQL
AS
$$
truncate t8;
insert into t8
SELECT i, 1 from t7
ORDER BY i LIMIT 10000;

insert into t8
SELECT i, 2 from t7
WHERE i > 10000
ORDER BY i LIMIT 10000;

SELECT i, 3 from t7
WHERE i > 20000
ORDER BY i LIMIT 20000;

analyze t8;
$$;

INSERT INTO t7
select x, x + 10 from generate_series(1,1000000) as x ;
analyze t7;

select populate_t8();

explain analyze verbose
update
  t7
SET
  k = 1
FROM
  t8
WHERE
  t7.i = t8.i
  AND
  (
    t8.j = 2
    OR
    t8.j = 1
  );

select populate_t8();

explain analyze verbose
update
  t7
SET
  k = 1
WHERE
  EXISTS (
    SELECT 1 FROM t8
    WHERE t8.i = t7.i
    AND
    (
      t8.j = 2
      OR
      t8.j = 1
    )
  );

select populate_t8();

explain
update
  t7
SET
  k = 1
FROM
  t8
WHERE
  ROW(t7.i) IS NOT DISTINCT FROM ROW(t8.i)
  AND
  (
    t8.j = 2
    OR
    t8.j = 1
  );

explain analyze verbose
update
  t7
SET
  k = 1
FROM
  t8
WHERE
  ROW(t7.i) IS NOT DISTINCT FROM ROW(t8.i)
  AND
  (
    t8.j = 2
    OR
    t8.j = 1
  );

rollback;





--
Jon

pgsql-performance by date:

Previous
From: Cédric Villemain
Date:
Subject: Re: Anyone tried Flashcache with PostgreSQL?
Next
From: Robert Haas
Date:
Subject: Re: Performance trouble finding records through related records