Thread: update from select
<div class="Section1"><p class="MsoNormal"><font face="Verdana" size="1"><span lang="EN-GB" style="font-size: 9.0pt;font-family:Verdana">Hello</span></font><p class="MsoNormal"><font face="Verdana" size="1"><span lang="EN-GB" style="font-size: 9.0pt;font-family:Verdana"> </span></font><p class="MsoNormal"><font face="Verdana" size="1"><span lang="EN-GB" style="font-size: 9.0pt;font-family:Verdana">I have a performance problem with an SQL statement.</span></font><p class="MsoNormal"><font face="Verdana"size="1"><span lang="EN-GB" style="font-size: 9.0pt;font-family:Verdana">Is there a better way to do this update:</span></font><p class="MsoNormal"><font face="Verdana"size="1"><span lang="EN-GB" style="font-size: 9.0pt;font-family:Verdana"> </span></font><p class="MsoNormal"><font face="Verdana" size="1"><span lang="EN-GB" style="font-size: 9.0pt;font-family:Verdana">UPDATE table1 SET column2 = temp_table.column2, column3 = temp_table.column3, column4 = CAST(temp_table.column4AS date) FROM</span></font><p class="MsoNormal"><font face="Verdana" size="1"><span lang="EN-GB" style="font-size: 9.0pt;font-family:Verdana">(</span></font><p class="MsoNormal"><font face="Verdana" size="1"><span lang="EN-GB" style="font-size: 9.0pt;font-family:Verdana"> SELECT DISTINCT</span></font><p class="MsoNormal"><font face="Verdana" size="1"><span lang="EN-GB"style="font-size: 9.0pt;font-family:Verdana"> table2.column1,</span></font><p class="MsoNormal"><font face="Verdana" size="1"><span lang="EN-GB"style="font-size: 9.0pt;font-family:Verdana"> table2.column2,</span></font><p class="MsoNormal"><font face="Verdana" size="1"><span lang="EN-GB"style="font-size: 9.0pt;font-family:Verdana"> table2.column3,</span></font><p class="MsoNormal"><font face="Verdana" size="1"><span lang="EN-GB"style="font-size: 9.0pt;font-family:Verdana"> table2.column4</span></font><p class="MsoNormal"><font face="Verdana" size="1"><span lang="EN-GB"style="font-size: 9.0pt;font-family:Verdana"> FROM table2 WHERE column4 IS NOT NULL AND column4 <> '' AND (length(column4) = 10 OR length(column4)= 23) </span></font><p class="MsoNormal"><font face="Verdana" size="1"><span lang="EN-GB" style="font-size: 9.0pt;font-family:Verdana">) AS temp_table</span></font><p class="MsoNormal"><font face="Verdana" size="1"><span lang="EN-GB"style="font-size: 9.0pt;font-family:Verdana">WHERE table1.column1 = temp_table.column1;</span></font><p class="MsoNormal"><font face="Verdana"size="1"><span lang="EN-GB" style="font-size: 9.0pt;font-family:Verdana"> </span></font><p class="MsoNormal"><font face="Verdana" size="1"><span lang="EN-GB" style="font-size: 9.0pt;font-family:Verdana">The select by it’s own takes around 1 second. The Update is around 120’000 rows. I got an indexon column1. The whole query needs around 16 minutes.</span></font><p class="MsoNormal"><font face="Verdana" size="1"><spanlang="EN-GB" style="font-size: 9.0pt;font-family:Verdana">The same procedure on MSSQL needs around 30 seconds. I hope to get it too in Postgres…</span></font><pclass="MsoNormal"><font face="Verdana" size="1"><span lang="EN-GB" style="font-size: 9.0pt;font-family:Verdana"> </span></font><p class="MsoNormal"><font face="Verdana" size="1"><span lang="EN-GB" style="font-size: 9.0pt;font-family:Verdana">Please help me.</span></font><p class="MsoNormal"><font face="Verdana" size="1"><span lang="EN-GB"style="font-size: 9.0pt;font-family:Verdana"> </span></font><p class="MsoNormal"><font face="Verdana" size="1"><span lang="EN-GB" style="font-size: 9.0pt;font-family:Verdana">Regards</span></font><p class="MsoNormal"><font face="Verdana" size="1"><span lang="EN-GB" style="font-size: 9.0pt;font-family:Verdana"> </span></font><p class="MsoNormal"><font face="Verdana" size="1"><span lang="EN-GB" style="font-size: 9.0pt;font-family:Verdana">Reto</span></font><p class="MsoNormal"><font face="Times New Roman" size="3"><span style="font-size: 12.0pt"> </span></font></div>
am Mon, dem 29.10.2007, um 10:18:38 +0100 mailte dev@kbsolutions.ch folgendes: > > WHERE table1.column1 = temp_table.column1; table1.column1 and temp_table.column1 have the same type? > > > > The select by it?s own takes around 1 second. The Update is around 120?000 > rows. I got an index on column1. The whole query needs around 16 minutes. Show us the EXPLAIN ANALYSE - result. Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header) GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net
Hello you use corelated subquery and that is slow for thausands rows. Use PostgreSQL's extension UPDATE table1 SET column2 = t,colum2, .... FROM table2 t WHERE table1.column1 = t.column1 and t.column4 is not null and ... http://www.postgresql.org/docs/8.2/interactive/sql-update.html Regards Pavel Stehule > Hello > > > > I have a performance problem with an SQL statement. > > Is there a better way to do this update: > > > > UPDATE table1 SET column2 = temp_table.column2, column3 = > temp_table.column3, column4 = CAST(temp_table.column4 AS date) FROM > > ( > > SELECT DISTINCT > > table2.column1, > > table2.column2, > > table2.column3, > > table2.column4 > > FROM table2 WHERE column4 IS NOT NULL AND column4 <> '' AND > (length(column4) = 10 OR length(column4) = 23) > > ) AS temp_table > > WHERE table1.column1 = temp_table.column1; > > > > The select by it's own takes around 1 second. The Update is around 120'000 > rows. I got an index on column1. The whole query needs around 16 minutes. > > The same procedure on MSSQL needs around 30 seconds. I hope to get it too in > Postgres… > > > > Please help me. > > > > Regards > > > > Reto > >
<dev@kbsolutions.ch> writes: > Is there a better way to do this update: > UPDATE table1 SET column2 = temp_table.column2, column3 = > temp_table.column3, column4 = CAST(temp_table.column4 AS date) FROM > ( > SELECT DISTINCT > table2.column1, > table2.column2, > table2.column3, > table2.column4 > FROM table2 WHERE column4 IS NOT NULL AND column4 <> '' AND > (length(column4) = 10 OR length(column4) = 23) > ) AS temp_table > WHERE table1.column1 = temp_table.column1; This looks seriously fishy. Is table2.column1 unique? If it is then you don't need the DISTINCT. If it isn't, you are in great danger of trying to update (some) table1 rows multiple times; which is bad, both because it wastes cycles and because you have no idea which of the matching table2 rows will "win" the update. I think you first need to think clearly about what you're doing ... regards, tom lane
Yes, both have varchar(50). Query: UPDATE owner SET picturemedium = dvds.picturemedium, title = dvds.title, titleOrder = dvds.title, releasedate = CAST(dvds.releasedate AS date) FROM (SELECT DISTINCTdetail_dvd.asin,detail_dvd.picturemedium,detail_dvd.title,detail_dvd.releasedateFROM detail_dvd WHERE releasedateIS NOT NULL AND releasedate <> '' AND (length(releasedate) = 10 OR length(releasedate) = 23) ) AS dvds WHERE owner.asin = dvds.asin; *********************************************************** EXPLAIN ANALYZE: Hash Join (cost=10827.45..25950.05 rows=4906 width=1191) (actual time=586.251..2852.691 rows=111306 loops=1) " Hash Cond: ((""owner"".asin)::text = (dvds.asin)::text)" " -> Seq Scan on ""owner"" (cost=0.00..14148.98 rows=230198 width=101) (actual time=0.050..968.028 rows=230198 loops=1)" -> Hash (cost=10825.02..10825.02 rows=194 width=1208) (actual time=584.463..584.463 rows=19489 loops=1) -> Subquery Scan dvds (cost=10820.66..10825.02 rows=194 width=1208) (actual time=435.005..545.213 rows=19489 loops=1) -> Unique (cost=10820.66..10823.08 rows=194 width=110) (actual time=435.002..520.725 rows=19489 loops=1) -> Sort (cost=10820.66..10821.14 rows=194 width=110) (actual time=434.998..491.487 rows=19489 loops=1) Sort Key: asin, picturemedium, title, releasedate -> Seq Scan on detail_dvd (cost=0.00..10813.29 rows=194 width=110) (actual time=0.042..166.493 rows=19489 loops=1) Filter: ((releasedate ISNOT NULL) AND ((releasedate)::text <> ''::text) AND ((length((releasedate)::text) = 10) OR (length((releasedate)::text) = 23))) Total runtime: 633548.404 ms *********************************************************** He is not using the index on asin? When I reduce the SELECT to 100, he is using the index! perhaps to many rows are affected? The DISTINCT in the SELECT is not really necessary. It's just for security reasons. And I did check it. It's unique! But as I said before. The SELECT takes around 1 second! I have 13 indices on the UPDATE table. So I did delete the one I dont need for this query... Now I have 3 left! And it takes around 2 Minutes! But thats also a problem, because I need the Indexes again! Is it possible to set the way, Postgres is building the indices? Or is the only way deleting the indeces before UPDATE and then creating them again? I also tried this query (PostgreSQL's extension): UPDATE owner SET picturemedium = detail_dvd.picturemedium, title = detail_dvd.title, titleOrder = detail_dvd.title, releasedate = CAST(detail_dvd.releasedate AS date) FROM detail_dvd WHERE owner.asin = detail_dvd.asin AND detail_dvd.releasedate IS NOT NULL AND detail_dvd.releasedate <> '' AND (length(detail_dvd.releasedate) = 10 OR length(detail_dvd.releasedate) = 23); But its also to slow: *********************************************************** EXPLAIN ANALYZE: Nested Loop (cost=0.00..28175.75 rows=2006 width=195) (actual time=0.138..127695.132 rows=111306 loops=1) -> Seq Scan on detail_dvd (cost=0.00..10813.29 rows=194 width=110) (actual time=0.035..615.511 rows=19489 loops=1) Filter: ((releasedate IS NOT NULL) AND ((releasedate)::text <> ''::text) AND ((length((releasedate)::text) = 10) OR (length((releasedate)::text) = 23))) " -> Index Scan using ""iidx-owner-asin"" on ""owner"" (cost=0.00..89.04 rows=26 width=99) (actual time=2.848..6.485 rows=6 loops=19489)" " Index Cond: ((""owner"".asin)::text = (detail_dvd.asin)::text)" Total runtime: 1039998.325 ms *********************************************************** Thaks for helping!! Bye the way, we are changing our system from MSSQL2000 to Postgres :-)! Regards Reto -----Ursprüngliche Nachricht----- Von: pgsql-sql-owner@postgresql.org [mailto:pgsql-sql-owner@postgresql.org] Im Auftrag von A. Kretschmer Gesendet: Montag, 29. Oktober 2007 11:45 An: pgsql-sql@postgresql.org Betreff: Re: [SQL] update from select am Mon, dem 29.10.2007, um 10:18:38 +0100 mailte dev@kbsolutions.ch folgendes: > > WHERE table1.column1 = temp_table.column1; table1.column1 and temp_table.column1 have the same type? > > > > The select by it?s own takes around 1 second. The Update is around 120?000 > rows. I got an index on column1. The whole query needs around 16 minutes. Show us the EXPLAIN ANALYSE - result. Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header) GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
2007/10/29, dev@kbsolutions.ch <dev@kbsolutions.ch>: > > Yes, both have varchar(50). > > Query: > > UPDATE owner SET picturemedium = dvds.picturemedium, title = dvds.title, > titleOrder = dvds.title, releasedate = CAST(dvds.releasedate AS date) FROM > ( > SELECT DISTINCT > detail_dvd.asin, > detail_dvd.picturemedium, > detail_dvd.title, > detail_dvd.releasedate > FROM detail_dvd > WHERE releasedate IS NOT NULL AND releasedate <> '' AND > (length(releasedate) = 10 OR length(releasedate) = 23) > ) > AS dvds WHERE owner.asin = dvds.asin; > > *********************************************************** > EXPLAIN ANALYZE: > > Hash Join (cost=10827.45..25950.05 rows=4906 width=1191) (actual > time=586.251..2852.691 rows=111306 loops=1) > " Hash Cond: ((""owner"".asin)::text = (dvds.asin)::text)" > " -> Seq Scan on ""owner"" (cost=0.00..14148.98 rows=230198 width=101) > (actual time=0.050..968.028 rows=230198 loops=1)" > -> Hash (cost=10825.02..10825.02 rows=194 width=1208) (actual > time=584.463..584.463 rows=19489 loops=1) > -> Subquery Scan dvds (cost=10820.66..10825.02 rows=194 > width=1208) (actual time=435.005..545.213 rows=19489 loops=1) > -> Unique (cost=10820.66..10823.08 rows=194 width=110) > (actual time=435.002..520.725 rows=19489 loops=1) > -> Sort (cost=10820.66..10821.14 rows=194 width=110) > (actual time=434.998..491.487 rows=19489 loops=1) > Sort Key: asin, picturemedium, title, releasedate > -> Seq Scan on detail_dvd (cost=0.00..10813.29 > rows=194 width=110) (actual time=0.042..166.493 rows=19489 loops=1) > Filter: ((releasedate IS NOT NULL) AND > ((releasedate)::text <> ''::text) AND ((length((releasedate)::text) = 10) OR > (length((releasedate)::text) = 23))) > Total runtime: 633548.404 ms > *********************************************************** > > He is not using the index on asin? When I reduce the SELECT to 100, he is > using the index! perhaps to many rows are affected? > The DISTINCT in the SELECT is not really necessary. It's just for security > reasons. And I did check it. It's unique! But as I said before. The SELECT > takes around 1 second! I have 13 indices on the UPDATE table. So I did > delete the one I don't need for this query... Now I have 3 left! And it > takes around 2 Minutes! But that's also a problem, because I need the > Indexes again! Is it possible to set the way, Postgres is building the > indices? Or is the only way deleting the indeces before UPDATE and then > creating them again? > > > I also tried this query (PostgreSQL's extension): > > UPDATE owner SET picturemedium = detail_dvd.picturemedium, title = > detail_dvd.title, titleOrder = detail_dvd.title, releasedate = > CAST(detail_dvd.releasedate AS date) > FROM detail_dvd > WHERE owner.asin = detail_dvd.asin > AND detail_dvd.releasedate IS NOT NULL > AND detail_dvd.releasedate <> '' > AND (length(detail_dvd.releasedate) = 10 OR length(detail_dvd.releasedate) = > 23); > > But its also to slow: > > *********************************************************** > EXPLAIN ANALYZE: > > Nested Loop (cost=0.00..28175.75 rows=2006 width=195) (actual > time=0.138..127695.132 rows=111306 loops=1) > -> Seq Scan on detail_dvd (cost=0.00..10813.29 rows=194 width=110) > (actual time=0.035..615.511 rows=19489 loops=1) > Filter: ((releasedate IS NOT NULL) AND ((releasedate)::text <> > ''::text) AND ((length((releasedate)::text) = 10) OR > (length((releasedate)::text) = 23))) > " -> Index Scan using ""iidx-owner-asin"" on ""owner"" (cost=0.00..89.04 > rows=26 width=99) (actual time=2.848..6.485 rows=6 loops=19489)" > " Index Cond: ((""owner"".asin)::text = (detail_dvd.asin)::text)" > Total runtime: 1039998.325 ms > *********************************************************** > try to up statististics on table detail_dvd.release_date. Maybe there is other problem. The casting from (probably) date to text in releasedate column. Is it correct? what is original type for releasedate column? Pavel > > Thaks for helping!! Bye the way, we are changing our system from MSSQL2000 > to Postgres :-)! > > Regards > Reto > > > -----Ursprüngliche Nachricht----- > Von: pgsql-sql-owner@postgresql.org [mailto:pgsql-sql-owner@postgresql.org] > Im Auftrag von A. Kretschmer > Gesendet: Montag, 29. Oktober 2007 11:45 > An: pgsql-sql@postgresql.org > Betreff: Re: [SQL] update from select > > am Mon, dem 29.10.2007, um 10:18:38 +0100 mailte dev@kbsolutions.ch > folgendes: > > > > WHERE table1.column1 = temp_table.column1; > > table1.column1 and temp_table.column1 have the same type? > > > > > > > > > The select by it?s own takes around 1 second. The Update is around 120?000 > > rows. I got an index on column1. The whole query needs around 16 minutes. > > Show us the EXPLAIN ANALYSE - result. > > > Andreas > -- > Andreas Kretschmer > Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header) > GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net > > ---------------------------(end of broadcast)--------------------------- > TIP 7: You can help support the PostgreSQL project by donating at > > http://www.postgresql.org/about/donate > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq >
<dev@kbsolutions.ch> writes: > Hash Join (cost=10827.45..25950.05 rows=4906 width=1191) (actual > time=586.251..2852.691 rows=111306 loops=1) > ... > Total runtime: 633548.404 ms So you're worried about the wrong thing entirely. The query is taking less than 3 seconds, which may be reasonable considering it's producing 111000 join rows. The big problem is the other 630 seconds, which is evidently update overhead. I'm wondering if you have any triggers or foreign keys leading to or from this table. regards, tom lane