Performance of UPDATE SET = FROM vs UPDATE SET = (SELECT ...) - Mailing list pgsql-general

From pbj@cmicdo.com
Subject Performance of UPDATE SET = FROM vs UPDATE SET = (SELECT ...)
Date
Msg-id 1415032438.18065.YahooMailBasic@web161703.mail.bf1.yahoo.com
Whole thread Raw
Responses Re: Performance of UPDATE SET = FROM vs UPDATE SET = (SELECT ...)  (Shaun Thomas <sthomas@optionshouse.com>)
List pgsql-general
Why does the UPDATE SET = FROM choose a more poorly performing plan than
the UPDATE SET = (SELECT ...)?  It seems to me that it is the same join.

I'm using 9.3.5.

CREATE TABLE orig
(
        key1    VARCHAR(11) PRIMARY KEY,
        time1   TIME
);

INSERT INTO orig (key1, time1)
SELECT
        a::TEXT,
        (((random()*100)::INT % 24)::TEXT || ':' ||
        ((random()*100)::INT % 60)::TEXT)::TIME
FROM generate_series(80000000000, 80002000000) a;

CREATE INDEX odx ON orig(key1);

CREATE TABLE second (LIKE orig);

INSERT INTO second (key1)
        SELECT (80000000000+(((random()*1000000)::INT) % 1000000))::TEXT
        FROM generate_series(1,400000);

EXPLAIN ANALYZE
UPDATE second SET time1 = orig.time1
FROM orig
WHERE second.key1 = orig.key1;

                                                             QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------
 Update on second  (cost=69461.02..106082.02 rows=400000 width=32) (actual time=16033.023..16033.023 rows=0 loops=1)
   ->  Hash Join  (cost=69461.02..106082.02 rows=400000 width=32) (actual time=7698.445..12992.039 rows=400000 loops=1)
         Hash Cond: ((second.key1)::text = (orig.key1)::text)
         ->  Seq Scan on second  (cost=0.00..12627.00 rows=400000 width=18) (actual time=49.820..791.397 rows=400000
loops=1)
         ->  Hash  (cost=31765.01..31765.01 rows=2000001 width=26) (actual time=7648.540..7648.540 rows=2000001
loops=1)
               Buckets: 4096  Batches: 128  Memory Usage: 717kB
               ->  Seq Scan on orig  (cost=0.00..31765.01 rows=2000001 width=26) (actual time=0.014..3655.844
rows=2000001loops=1) 
 Total runtime: 16033.193 ms
(8 rows)

UPDATE second SET time1 = NULL;

EXPLAIN ANALYZE
UPDATE second SET time1 = (SELECT orig.time1 FROM orig,second
                        WHERE orig.key1 = second.key1 LIMIT 1);


                                                               QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------
 Update on second  (cost=3.60..19078.19 rows=1279959 width=18) (actual time=4642.453..4642.453 rows=0 loops=1)
   InitPlan 1 (returns $1)
     ->  Limit  (cost=0.43..3.60 rows=1 width=8) (actual time=2.611..2.613 rows=1 loops=1)
           ->  Nested Loop  (cost=0.43..4056331.83 rows=1279959 width=8) (actual time=2.606..2.606 rows=1 loops=1)
                 ->  Seq Scan on second second_1  (cost=0.00..19074.59 rows=1279959 width=12) (actual time=2.487..2.487
rows=1loops=1) 
                 ->  Index Scan using odx on orig  (cost=0.43..3.14 rows=1 width=20) (actual time=0.098..0.098 rows=1
loops=1)
                       Index Cond: ((key1)::text = (second_1.key1)::text)
   ->  Seq Scan on second  (cost=0.00..19074.59 rows=1279959 width=18) (actual time=6.420..817.739 rows=400000 loops=1)
 Total runtime: 4642.561 ms
(9 rows)




pgsql-general by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: How to implent the CONVERT ( data_type [ ( length ) ] , expression ) function in PostgreSQL
Next
From: Sven Wegener
Date:
Subject: COPY TO returning empty result with parallel ALTER TABLE