Thread: update from select

update from select

From
Date:
<div class="Section1"><p class="MsoNormal"><font face="Verdana" size="1"><span lang="EN-GB" style="font-size:
9.0pt;font-family:Verdana">Hello</span></font><p class="MsoNormal"><font face="Verdana" size="1"><span lang="EN-GB"
style="font-size:
9.0pt;font-family:Verdana"> </span></font><p class="MsoNormal"><font face="Verdana" size="1"><span lang="EN-GB"
style="font-size:
9.0pt;font-family:Verdana">I have a performance problem with an SQL statement.</span></font><p class="MsoNormal"><font
face="Verdana"size="1"><span lang="EN-GB" style="font-size: 
9.0pt;font-family:Verdana">Is there a better way to do this update:</span></font><p class="MsoNormal"><font
face="Verdana"size="1"><span lang="EN-GB" style="font-size: 
9.0pt;font-family:Verdana"> </span></font><p class="MsoNormal"><font face="Verdana" size="1"><span lang="EN-GB"
style="font-size:
9.0pt;font-family:Verdana">UPDATE table1 SET column2 = temp_table.column2, column3 = temp_table.column3, column4 =
CAST(temp_table.column4AS date) FROM</span></font><p class="MsoNormal"><font face="Verdana" size="1"><span lang="EN-GB"
style="font-size:
9.0pt;font-family:Verdana">(</span></font><p class="MsoNormal"><font face="Verdana" size="1"><span lang="EN-GB"
style="font-size:
9.0pt;font-family:Verdana"> SELECT DISTINCT</span></font><p class="MsoNormal"><font face="Verdana" size="1"><span
lang="EN-GB"style="font-size: 
9.0pt;font-family:Verdana"> table2.column1,</span></font><p class="MsoNormal"><font face="Verdana" size="1"><span
lang="EN-GB"style="font-size: 
9.0pt;font-family:Verdana"> table2.column2,</span></font><p class="MsoNormal"><font face="Verdana" size="1"><span
lang="EN-GB"style="font-size: 
9.0pt;font-family:Verdana"> table2.column3,</span></font><p class="MsoNormal"><font face="Verdana" size="1"><span
lang="EN-GB"style="font-size: 
9.0pt;font-family:Verdana"> table2.column4</span></font><p class="MsoNormal"><font face="Verdana" size="1"><span
lang="EN-GB"style="font-size: 
9.0pt;font-family:Verdana"> FROM table2 WHERE column4 IS NOT NULL AND column4 <> '' AND (length(column4) = 10 OR
length(column4)= 23) </span></font><p class="MsoNormal"><font face="Verdana" size="1"><span lang="EN-GB"
style="font-size:
9.0pt;font-family:Verdana">) AS temp_table</span></font><p class="MsoNormal"><font face="Verdana" size="1"><span
lang="EN-GB"style="font-size: 
9.0pt;font-family:Verdana">WHERE table1.column1 = temp_table.column1;</span></font><p class="MsoNormal"><font
face="Verdana"size="1"><span lang="EN-GB" style="font-size: 
9.0pt;font-family:Verdana"> </span></font><p class="MsoNormal"><font face="Verdana" size="1"><span lang="EN-GB"
style="font-size:
9.0pt;font-family:Verdana">The select by it’s own takes around 1 second. The Update is around 120’000 rows. I got an
indexon column1. The whole query needs around 16 minutes.</span></font><p class="MsoNormal"><font face="Verdana"
size="1"><spanlang="EN-GB" style="font-size: 
9.0pt;font-family:Verdana">The same procedure on MSSQL needs around 30 seconds. I hope to get it too in
Postgres…</span></font><pclass="MsoNormal"><font face="Verdana" size="1"><span lang="EN-GB" style="font-size: 
9.0pt;font-family:Verdana"> </span></font><p class="MsoNormal"><font face="Verdana" size="1"><span lang="EN-GB"
style="font-size:
9.0pt;font-family:Verdana">Please help me.</span></font><p class="MsoNormal"><font face="Verdana" size="1"><span
lang="EN-GB"style="font-size: 
9.0pt;font-family:Verdana"> </span></font><p class="MsoNormal"><font face="Verdana" size="1"><span lang="EN-GB"
style="font-size:
9.0pt;font-family:Verdana">Regards</span></font><p class="MsoNormal"><font face="Verdana" size="1"><span lang="EN-GB"
style="font-size:
9.0pt;font-family:Verdana"> </span></font><p class="MsoNormal"><font face="Verdana" size="1"><span lang="EN-GB"
style="font-size:
9.0pt;font-family:Verdana">Reto</span></font><p class="MsoNormal"><font face="Times New Roman" size="3"><span
style="font-size:
12.0pt"> </span></font></div>

Re: update from select

From
"A. Kretschmer"
Date:
am  Mon, dem 29.10.2007, um 10:18:38 +0100 mailte dev@kbsolutions.ch folgendes:
> 
> WHERE table1.column1 = temp_table.column1;

table1.column1 and temp_table.column1 have the same type?

> 
>  
> 
> The select by it?s own takes around 1 second. The Update is around 120?000
> rows. I got an index on column1. The whole query needs around 16 minutes.

Show us the EXPLAIN ANALYSE - result.


Andreas
-- 
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net


Re: update from select

From
"Pavel Stehule"
Date:
Hello

you use corelated subquery and that is slow for thausands rows. Use
PostgreSQL's extension

UPDATE table1 SET column2 = t,colum2, ....
FROM table2 t
WHERE table1.column1 = t.column1 and t.column4 is not null and ...

http://www.postgresql.org/docs/8.2/interactive/sql-update.html

Regards
Pavel Stehule

> Hello
>
>
>
> I have a performance problem with an SQL statement.
>
> Is there a better way to do this update:
>
>
>
> UPDATE table1 SET column2 = temp_table.column2, column3 =
> temp_table.column3, column4 = CAST(temp_table.column4 AS date) FROM
>
> (
>
>  SELECT DISTINCT
>
>  table2.column1,
>
>  table2.column2,
>
>  table2.column3,
>
>  table2.column4
>
>  FROM table2 WHERE column4 IS NOT NULL AND column4 <> '' AND
> (length(column4) = 10 OR length(column4) = 23)
>
> ) AS temp_table
>
> WHERE table1.column1 = temp_table.column1;
>
>
>
> The select by it's own takes around 1 second. The Update is around 120'000
> rows. I got an index on column1. The whole query needs around 16 minutes.
>
> The same procedure on MSSQL needs around 30 seconds. I hope to get it too in
> Postgres…
>
>
>
> Please help me.
>
>
>
> Regards
>
>
>
> Reto
>
>

Re: update from select

From
Tom Lane
Date:
<dev@kbsolutions.ch> writes:
> Is there a better way to do this update:

> UPDATE table1 SET column2 = temp_table.column2, column3 =
> temp_table.column3, column4 = CAST(temp_table.column4 AS date) FROM
> (
>  SELECT DISTINCT
>  table2.column1,
>  table2.column2,
>  table2.column3,
>  table2.column4
>  FROM table2 WHERE column4 IS NOT NULL AND column4 <> '' AND
> (length(column4) = 10 OR length(column4) = 23) 
> ) AS temp_table
> WHERE table1.column1 = temp_table.column1;

This looks seriously fishy.  Is table2.column1 unique?  If it is then
you don't need the DISTINCT.  If it isn't, you are in great danger of
trying to update (some) table1 rows multiple times; which is bad,
both because it wastes cycles and because you have no idea which of
the matching table2 rows will "win" the update.

I think you first need to think clearly about what you're doing ...
        regards, tom lane


Re: update from select

From
Date:
Yes, both have varchar(50).

Query:

UPDATE owner SET picturemedium = dvds.picturemedium, title = dvds.title,
titleOrder = dvds.title, releasedate = CAST(dvds.releasedate AS date) FROM
(SELECT DISTINCTdetail_dvd.asin,detail_dvd.picturemedium,detail_dvd.title,detail_dvd.releasedateFROM detail_dvd WHERE
releasedateIS NOT NULL AND releasedate <> '' AND 
(length(releasedate) = 10 OR length(releasedate) = 23)
)
AS dvds WHERE owner.asin = dvds.asin;

***********************************************************
EXPLAIN ANALYZE:

Hash Join  (cost=10827.45..25950.05 rows=4906 width=1191) (actual
time=586.251..2852.691 rows=111306 loops=1)
"  Hash Cond: ((""owner"".asin)::text = (dvds.asin)::text)"
"  ->  Seq Scan on ""owner""  (cost=0.00..14148.98 rows=230198 width=101)
(actual time=0.050..968.028 rows=230198 loops=1)" ->  Hash  (cost=10825.02..10825.02 rows=194 width=1208) (actual
time=584.463..584.463 rows=19489 loops=1)       ->  Subquery Scan dvds  (cost=10820.66..10825.02 rows=194
width=1208) (actual time=435.005..545.213 rows=19489 loops=1)             ->  Unique  (cost=10820.66..10823.08 rows=194
width=110)
(actual time=435.002..520.725 rows=19489 loops=1)                   ->  Sort  (cost=10820.66..10821.14 rows=194
width=110)
(actual time=434.998..491.487 rows=19489 loops=1)                         Sort Key: asin, picturemedium, title,
releasedate                        ->  Seq Scan on detail_dvd  (cost=0.00..10813.29 
rows=194 width=110) (actual time=0.042..166.493 rows=19489 loops=1)                               Filter: ((releasedate
ISNOT NULL) AND 
((releasedate)::text <> ''::text) AND ((length((releasedate)::text) = 10) OR
(length((releasedate)::text) = 23)))
Total runtime: 633548.404 ms
***********************************************************

He is not using the index on asin? When I reduce the SELECT to 100, he is
using the index! perhaps to many rows are affected?
The DISTINCT in the SELECT is not really necessary. It's just for security
reasons. And I did check it. It's unique! But as I said before. The SELECT
takes around 1 second! I have 13 indices on the UPDATE table. So I did
delete the one I don’t need for this query... Now I have 3 left! And it
takes around 2 Minutes! But that’s also a problem, because I need the
Indexes again! Is it possible to set the way, Postgres is building the
indices? Or is the only way deleting the indeces before UPDATE and then
creating them again?


I also tried this query (PostgreSQL's extension):

UPDATE owner SET picturemedium = detail_dvd.picturemedium, title =
detail_dvd.title, titleOrder = detail_dvd.title, releasedate =
CAST(detail_dvd.releasedate AS date)
FROM detail_dvd
WHERE owner.asin = detail_dvd.asin
AND detail_dvd.releasedate IS NOT NULL
AND detail_dvd.releasedate <> ''
AND (length(detail_dvd.releasedate) = 10 OR length(detail_dvd.releasedate) =
23);

But its also to slow:

***********************************************************
EXPLAIN ANALYZE:

Nested Loop  (cost=0.00..28175.75 rows=2006 width=195) (actual
time=0.138..127695.132 rows=111306 loops=1) ->  Seq Scan on detail_dvd  (cost=0.00..10813.29 rows=194 width=110)
(actual time=0.035..615.511 rows=19489 loops=1)       Filter: ((releasedate IS NOT NULL) AND ((releasedate)::text <>
''::text) AND ((length((releasedate)::text) = 10) OR
(length((releasedate)::text) = 23)))
"  ->  Index Scan using ""iidx-owner-asin"" on ""owner""  (cost=0.00..89.04
rows=26 width=99) (actual time=2.848..6.485 rows=6 loops=19489)"
"        Index Cond: ((""owner"".asin)::text = (detail_dvd.asin)::text)"
Total runtime: 1039998.325 ms
***********************************************************


Thaks for helping!! Bye the way, we are changing our system from MSSQL2000
to Postgres :-)!

Regards
Reto


-----Ursprüngliche Nachricht-----
Von: pgsql-sql-owner@postgresql.org [mailto:pgsql-sql-owner@postgresql.org]
Im Auftrag von A. Kretschmer
Gesendet: Montag, 29. Oktober 2007 11:45
An: pgsql-sql@postgresql.org
Betreff: Re: [SQL] update from select

am  Mon, dem 29.10.2007, um 10:18:38 +0100 mailte dev@kbsolutions.ch
folgendes:
>
> WHERE table1.column1 = temp_table.column1;

table1.column1 and temp_table.column1 have the same type?

>
>
>
> The select by it?s own takes around 1 second. The Update is around 120?000
> rows. I got an index on column1. The whole query needs around 16 minutes.

Show us the EXPLAIN ANALYSE - result.


Andreas
--
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at
               http://www.postgresql.org/about/donate



Re: update from select

From
"Pavel Stehule"
Date:
2007/10/29, dev@kbsolutions.ch <dev@kbsolutions.ch>:
>
> Yes, both have varchar(50).
>
> Query:
>
> UPDATE owner SET picturemedium = dvds.picturemedium, title = dvds.title,
> titleOrder = dvds.title, releasedate = CAST(dvds.releasedate AS date) FROM
> (
>         SELECT DISTINCT
>         detail_dvd.asin,
>         detail_dvd.picturemedium,
>         detail_dvd.title,
>         detail_dvd.releasedate
>         FROM detail_dvd
>         WHERE releasedate IS NOT NULL AND releasedate <> '' AND
> (length(releasedate) = 10 OR length(releasedate) = 23)
> )
> AS dvds WHERE owner.asin = dvds.asin;
>
> ***********************************************************
> EXPLAIN ANALYZE:
>
> Hash Join  (cost=10827.45..25950.05 rows=4906 width=1191) (actual
> time=586.251..2852.691 rows=111306 loops=1)
> "  Hash Cond: ((""owner"".asin)::text = (dvds.asin)::text)"
> "  ->  Seq Scan on ""owner""  (cost=0.00..14148.98 rows=230198 width=101)
> (actual time=0.050..968.028 rows=230198 loops=1)"
>   ->  Hash  (cost=10825.02..10825.02 rows=194 width=1208) (actual
> time=584.463..584.463 rows=19489 loops=1)
>         ->  Subquery Scan dvds  (cost=10820.66..10825.02 rows=194
> width=1208) (actual time=435.005..545.213 rows=19489 loops=1)
>               ->  Unique  (cost=10820.66..10823.08 rows=194 width=110)
> (actual time=435.002..520.725 rows=19489 loops=1)
>                     ->  Sort  (cost=10820.66..10821.14 rows=194 width=110)
> (actual time=434.998..491.487 rows=19489 loops=1)
>                           Sort Key: asin, picturemedium, title, releasedate
>                           ->  Seq Scan on detail_dvd  (cost=0.00..10813.29
> rows=194 width=110) (actual time=0.042..166.493 rows=19489 loops=1)
>                                 Filter: ((releasedate IS NOT NULL) AND
> ((releasedate)::text <> ''::text) AND ((length((releasedate)::text) = 10) OR
> (length((releasedate)::text) = 23)))
> Total runtime: 633548.404 ms
> ***********************************************************
>
> He is not using the index on asin? When I reduce the SELECT to 100, he is
> using the index! perhaps to many rows are affected?
> The DISTINCT in the SELECT is not really necessary. It's just for security
> reasons. And I did check it. It's unique! But as I said before. The SELECT
> takes around 1 second! I have 13 indices on the UPDATE table. So I did
> delete the one I don't need for this query... Now I have 3 left! And it
> takes around 2 Minutes! But that's also a problem, because I need the
> Indexes again! Is it possible to set the way, Postgres is building the
> indices? Or is the only way deleting the indeces before UPDATE and then
> creating them again?
>
>
> I also tried this query (PostgreSQL's extension):
>
> UPDATE owner SET picturemedium = detail_dvd.picturemedium, title =
> detail_dvd.title, titleOrder = detail_dvd.title, releasedate =
> CAST(detail_dvd.releasedate AS date)
> FROM detail_dvd
> WHERE owner.asin = detail_dvd.asin
> AND detail_dvd.releasedate IS NOT NULL
> AND detail_dvd.releasedate <> ''
> AND (length(detail_dvd.releasedate) = 10 OR length(detail_dvd.releasedate) =
> 23);
>
> But its also to slow:
>
> ***********************************************************
> EXPLAIN ANALYZE:
>
> Nested Loop  (cost=0.00..28175.75 rows=2006 width=195) (actual
> time=0.138..127695.132 rows=111306 loops=1)
>   ->  Seq Scan on detail_dvd  (cost=0.00..10813.29 rows=194 width=110)
> (actual time=0.035..615.511 rows=19489 loops=1)
>         Filter: ((releasedate IS NOT NULL) AND ((releasedate)::text <>
> ''::text) AND ((length((releasedate)::text) = 10) OR
> (length((releasedate)::text) = 23)))
> "  ->  Index Scan using ""iidx-owner-asin"" on ""owner""  (cost=0.00..89.04
> rows=26 width=99) (actual time=2.848..6.485 rows=6 loops=19489)"
> "        Index Cond: ((""owner"".asin)::text = (detail_dvd.asin)::text)"
> Total runtime: 1039998.325 ms
> ***********************************************************
>

try to up statististics on table detail_dvd.release_date. Maybe there
is other problem. The casting from (probably) date to text in
releasedate column. Is it correct?

what is original type for releasedate column?

Pavel
>
> Thaks for helping!! Bye the way, we are changing our system from MSSQL2000
> to Postgres :-)!
>
> Regards
> Reto
>
>
> -----Ursprüngliche Nachricht-----
> Von: pgsql-sql-owner@postgresql.org [mailto:pgsql-sql-owner@postgresql.org]
> Im Auftrag von A. Kretschmer
> Gesendet: Montag, 29. Oktober 2007 11:45
> An: pgsql-sql@postgresql.org
> Betreff: Re: [SQL] update from select
>
> am  Mon, dem 29.10.2007, um 10:18:38 +0100 mailte dev@kbsolutions.ch
> folgendes:
> >
> > WHERE table1.column1 = temp_table.column1;
>
> table1.column1 and temp_table.column1 have the same type?
>
> >
> >
> >
> > The select by it?s own takes around 1 second. The Update is around 120?000
> > rows. I got an index on column1. The whole query needs around 16 minutes.
>
> Show us the EXPLAIN ANALYSE - result.
>
>
> Andreas
> --
> Andreas Kretschmer
> Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
> GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: You can help support the PostgreSQL project by donating at
>
>                 http://www.postgresql.org/about/donate
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faq
>

Re: update from select

From
Tom Lane
Date:
<dev@kbsolutions.ch> writes:
> Hash Join  (cost=10827.45..25950.05 rows=4906 width=1191) (actual
> time=586.251..2852.691 rows=111306 loops=1)
> ...
> Total runtime: 633548.404 ms

So you're worried about the wrong thing entirely.  The query is taking
less than 3 seconds, which may be reasonable considering it's producing
111000 join rows.  The big problem is the other 630 seconds, which is
evidently update overhead.  I'm wondering if you have any triggers or
foreign keys leading to or from this table.
        regards, tom lane