Thread: Re: [SPAM] - Performance of UPDATE SET = FROM vs UPDATE SET = (SELECT ...)
-----Original Message----- From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of pbj@cmicdo.com Sent: Monday, November 03, 2014 11:34 AM To: pgsql-general@postgresql.org Subject: [SPAM] - [GENERAL] Performance of UPDATE SET = FROM vs UPDATE SET = (SELECT ...) Why does the UPDATE SET = FROM choose a more poorly performing plan than the UPDATE SET = (SELECT ...)? It seems to me thatit is the same join. I'm using 9.3.5. CREATE TABLE orig ( key1 VARCHAR(11) PRIMARY KEY, time1 TIME ); INSERT INTO orig (key1, time1) SELECT a::TEXT, (((random()*100)::INT % 24)::TEXT || ':' || ((random()*100)::INT % 60)::TEXT)::TIME FROM generate_series(80000000000, 80002000000) a; CREATE INDEX odx ON orig(key1); CREATE TABLE second (LIKE orig); INSERT INTO second (key1) SELECT (80000000000+(((random()*1000000)::INT) % 1000000))::TEXT FROM generate_series(1,400000); EXPLAIN ANALYZE UPDATE second SET time1 = orig.time1 FROM orig WHERE second.key1 = orig.key1; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------ Update on second (cost=69461.02..106082.02 rows=400000 width=32) (actual time=16033.023..16033.023 rows=0 loops=1) -> Hash Join (cost=69461.02..106082.02 rows=400000 width=32) (actual time=7698.445..12992.039 rows=400000 loops=1) Hash Cond: ((second.key1)::text = (orig.key1)::text) -> Seq Scan on second (cost=0.00..12627.00 rows=400000 width=18) (actual time=49.820..791.397 rows=400000 loops=1) -> Hash (cost=31765.01..31765.01 rows=2000001 width=26) (actual time=7648.540..7648.540 rows=2000001 loops=1) Buckets: 4096 Batches: 128 Memory Usage: 717kB -> Seq Scan on orig (cost=0.00..31765.01 rows=2000001 width=26) (actual time=0.014..3655.844 rows=2000001loops=1) Total runtime: 16033.193 ms (8 rows) UPDATE second SET time1 = NULL; EXPLAIN ANALYZE UPDATE second SET time1 = (SELECT orig.time1 FROM orig,second WHERE orig.key1 = second.key1 LIMIT 1); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------- Update on second (cost=3.60..19078.19 rows=1279959 width=18) (actual time=4642.453..4642.453 rows=0 loops=1) InitPlan 1 (returns $1) -> Limit (cost=0.43..3.60 rows=1 width=8) (actual time=2.611..2.613 rows=1 loops=1) -> Nested Loop (cost=0.43..4056331.83 rows=1279959 width=8) (actual time=2.606..2.606 rows=1 loops=1) -> Seq Scan on second second_1 (cost=0.00..19074.59 rows=1279959 width=12) (actual time=2.487..2.487 rows=1loops=1) -> Index Scan using odx on orig (cost=0.43..3.14 rows=1 width=20) (actual time=0.098..0.098 rows=1 loops=1) Index Cond: ((key1)::text = (second_1.key1)::text) -> Seq Scan on second (cost=0.00..19074.59 rows=1279959 width=18) (actual time=6.420..817.739 rows=400000 loops=1) Total runtime: 4642.561 ms (9 rows) These 2 queries are not the same. The first query updates rows in the "second" table with the orig.time1 values based on key1 column match. The second query finds first possible match (based on key1 column) and assigns orig.time1 value from the matched row to everyrecord in "second" table. Regards, Igor Neyman
Re: [SPAM] - Performance of UPDATE SET = FROM vs UPDATE SET = (SELECT ...)
From
pbj@cmicdo.com
Date:
> > On Mon, 11/3/14, Igor Neyman <ineyman@perceptron.com> wrote: > > -----Original Message----- > From: pgsql-general-owner@postgresql.org > [mailto:pgsql-general-owner@postgresql.org] > On Behalf Of pbj@cmicdo.com > Sent: Monday, November 03, 2014 11:34 AM > To: pgsql-general@postgresql.org > Subject: [SPAM] - [GENERAL] Performance of UPDATE SET = FROM > vs UPDATE SET = (SELECT ...) > > Why does the UPDATE SET = FROM choose a more poorly > performing plan than the UPDATE SET = (SELECT ...)? It > seems to me that it is the same join. > > I'm using 9.3.5. > > CREATE TABLE orig > ( > key1 VARCHAR(11) > PRIMARY KEY, > time1 TIME > ); > > INSERT INTO orig (key1, time1) > SELECT > a::TEXT, > (((random()*100)::INT % > 24)::TEXT || ':' || > ((random()*100)::INT % > 60)::TEXT)::TIME FROM generate_series(80000000000, > 80002000000) a; > > CREATE INDEX odx ON orig(key1); > > CREATE TABLE second (LIKE orig); > > INSERT INTO second (key1) > SELECT > (80000000000+(((random()*1000000)::INT) % 1000000))::TEXT > FROM generate_series(1,400000); > > EXPLAIN ANALYZE > UPDATE second SET time1 = orig.time1 > FROM orig > WHERE second.key1 = orig.key1; > [.....] > > UPDATE second SET time1 = NULL; > > EXPLAIN ANALYZE > UPDATE second SET time1 = (SELECT orig.time1 FROM > orig,second > > WHERE orig.key1 = second.key1 > LIMIT 1); > [.....] > > These 2 queries are not the same. > > The first query updates rows in the "second" table with the > orig.time1 values based on key1 column match. > The second query finds first possible match (based on key1 > column) and assigns orig.time1 value from the matched row to > every record in "second" table. > > Regards, > Igor Neyman I see that now. I was trying to reproduce something from work from memory and got tripped up on a sublety of UPDATE ... SELECT. The query I ran at work was like this: EXPLAIN ANALYZE UPDATE second se SET time1 = (SELECT time1 FROM orig WHERE orig.key1 = se.key1); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------ Update on second se (cost=0.00..3390627.00 rows=400000 width=18) (actual time=18698.795..18698.795 rows=0 loops=1) -> Seq Scan on second se (cost=0.00..3390627.00 rows=400000 width=18) (actual time=7.558..16694.600 rows=400000 loops=1) SubPlan 1 -> Index Scan using odx on orig (cost=0.43..8.45 rows=1 width=8) (actual time=0.033..0.035 rows=1 loops=400000) Index Cond: ((key1)::text = (se.key1)::text) Total runtime: 18698.865 ms (6 rows) This does correctly match and update all of the second table entries. The plan actually runs longer than the UPDATE ... FROM, which squares with a comment the fine manual. Thanks! PJ