Thread: Update join performance issues
Hi All; I have a query that wants to update a table based on a join like this: update test_one set f_key = t.f_key from upd_temp1 t, test_one t2 where t.id_number = t2.id_number upd_temp1 has 248,762 rows test_one has 248,762 rows test_one has an index on f_key and an index on id_number upd_temp1 has an index on id_number The explain plan looks like this: Update (cost=0.00..3212284472.90 rows=256978208226 width=121) -> Nested Loop (cost=0.00..3212284472.90 rows=256978208226 width=121) -> Merge Join (cost=0.00..51952.68 rows=1033028 width=20) Merge Cond: ((t.id_number)::text = (t2.id_number)::text) -> Index Scan using idx_tmp_001a on upd_temp1 t (cost=0.00..12642.71 rows=248762 width= 52) -> Materialize (cost=0.00..23814.54 rows=248762 width=17) -> Index Scan using index_idx1 on test_one t2 (cost=0.00..23192.64 rows =248762 width=17) -> Materialize (cost=0.00..6750.43 rows=248762 width=101) -> Seq Scan on test_one (cost=0.00..5506.62 rows=248762 width=101) (9 rows) The update never finishes, we always stop it after about 30min to an hour. Anyone have any thoughts per boosting performance? Thanks in advance
Kevin Kempter <cs_dba@consistentstate.com> wrote: > update test_one > set f_key = t.f_key > from > upd_temp1 t, > test_one t2 > where > t.id_number = t2.id_number As written above, it is joining the two table references in the FROM clause and updating every row in test_one with every row in the JOIN -- which is probably not what you want. Having a FROM clause on an UPDATE statement is not something which is covered by the standard, and different products have implemented different semantics for that. For example, under MS SQL Server, the first reference in the FROM clause to the target of the UPDATE is considered to be the same reference; so the above statement would be accepted, but do something very different. You probably want this: update test_one t2 set f_key = t.f_key from upd_temp1 t where t.id_number = t2.id_number -Kevin
On 04/03/2012 01:29 PM, Kevin Kempter wrote: > Hi All; > > I have a query that wants to update a table based on a join like this: > > update test_one > set f_key = t.f_key > from > upd_temp1 t, > test_one t2 > where > t.id_number = t2.id_number Why is test_one in the from clause? update joins whatever is in the from clause to the table being updated. You almost never need it repeated in the from clause. cheers andrew
Andrew Dunstan <andrew@dunslane.net> wrote: > Why is test_one in the from clause? update joins whatever is in > the from clause to the table being updated. You almost never need > it repeated in the from clause. This is actually one of the nastier "gotchas" in converting from Sybase ASE or MS SQL Server to PostgreSQL -- there are syntactically identical UPDATE statements with very different semantics when a FROM clause is used in an UPDATE statement. You need to do what the OP was showing to use an alias with the target table under those other products. I suppose it might be possible to generate a warning when it appears that someone is making this mistake, but it wouldn't be easy and would probably not be worth the carrying cost. The test would need to be something like: (1) The relation which is the target of the UPDATE has no alias. (2) There is a FROM clause which included the target relation (with an alias). (3) There aren't any joining references between the UPDATE target and the relation(s) in the FROM clause. -Kevin
Kevin Kempter wrote on 03.04.2012 19:29: > Hi All; > > I have a query that wants to update a table based on a join like this: > > update test_one > set f_key = t.f_key > from > upd_temp1 t, > test_one t2 > where > t.id_number = t2.id_number > > upd_temp1 has 248,762 rows > test_one has 248,762 rows > To extend on what Kevin has already answere: Quote from the manual: "Note that the target table must not appear in the from_list, unless you intend a self-join"
On Tue, Apr 3, 2012 at 12:29 PM, Kevin Kempter <cs_dba@consistentstate.com> wrote: > Hi All; > > I have a query that wants to update a table based on a join like this: > > update test_one > set f_key = t.f_key > from > upd_temp1 t, > test_one t2 > where > t.id_number = t2.id_number > > upd_temp1 has 248,762 rows > test_one has 248,762 rows > > test_one has an index on f_key and an index on id_number > upd_temp1 has an index on id_number > > > The explain plan looks like this: > Update (cost=0.00..3212284472.90 rows=256978208226 width=121) > -> Nested Loop (cost=0.00..3212284472.90 rows=256978208226 width=121) > -> Merge Join (cost=0.00..51952.68 rows=1033028 width=20) > Merge Cond: ((t.id_number)::text = (t2.id_number)::text) > -> Index Scan using idx_tmp_001a on upd_temp1 t > (cost=0.00..12642.71 rows=248762 width= > 52) > -> Materialize (cost=0.00..23814.54 rows=248762 width=17) > -> Index Scan using index_idx1 on test_one t2 > (cost=0.00..23192.64 rows > =248762 width=17) > -> Materialize (cost=0.00..6750.43 rows=248762 width=101) > -> Seq Scan on test_one (cost=0.00..5506.62 rows=248762 > width=101) > (9 rows) > > > The update never finishes, we always stop it after about 30min to an hour. > > Anyone have any thoughts per boosting performance? to add: reading explain output is an art form all onto itself but the following is a giant screaming red flag: rows=256978208226 unless of course you're trying to update that many rows, this is telling you that there is an unconstrained join in there somewhere as others have noted. merlin