Thread: Update join performance issues

Update join performance issues

From
Kevin Kempter
Date:
Hi All;

I have a query that wants to update a table based on a join like this:

update test_one
set f_key = t.f_key
from
     upd_temp1 t,
     test_one t2
where
     t.id_number = t2.id_number

upd_temp1 has 248,762 rows
test_one has 248,762 rows

test_one has an index on f_key and an index on id_number
upd_temp1 has an index on id_number


The explain plan looks like this:
  Update  (cost=0.00..3212284472.90 rows=256978208226 width=121)
    ->  Nested Loop  (cost=0.00..3212284472.90 rows=256978208226 width=121)
          ->  Merge Join  (cost=0.00..51952.68 rows=1033028 width=20)
                Merge Cond: ((t.id_number)::text = (t2.id_number)::text)
                ->  Index Scan using idx_tmp_001a on upd_temp1 t
(cost=0.00..12642.71 rows=248762 width=
52)
                ->  Materialize  (cost=0.00..23814.54 rows=248762 width=17)
                      ->  Index Scan using index_idx1 on test_one t2
(cost=0.00..23192.64 rows
=248762 width=17)
          ->  Materialize  (cost=0.00..6750.43 rows=248762 width=101)
                ->  Seq Scan on test_one  (cost=0.00..5506.62
rows=248762 width=101)
(9 rows)


The update never finishes, we always stop it after about 30min to an hour.

Anyone have any thoughts per boosting performance?

Thanks in advance




Re: Update join performance issues

From
"Kevin Grittner"
Date:
Kevin Kempter <cs_dba@consistentstate.com> wrote:

> update test_one
> set f_key = t.f_key
> from
>      upd_temp1 t,
>      test_one t2
> where
>      t.id_number = t2.id_number

As written above, it is joining the two table references in the FROM
clause and updating every row in test_one with every row in the JOIN
-- which is probably not what you want.  Having a FROM clause on an
UPDATE statement is not something which is covered by the standard,
and different products have implemented different semantics for
that.  For example, under MS SQL Server, the first reference in the
FROM clause to the target of the UPDATE is considered to be the same
reference; so the above statement would be accepted, but do
something very different.

You probably want this:

update test_one t2
set f_key = t.f_key
from
     upd_temp1 t
where
     t.id_number = t2.id_number

-Kevin

Re: Update join performance issues

From
Andrew Dunstan
Date:

On 04/03/2012 01:29 PM, Kevin Kempter wrote:
> Hi All;
>
> I have a query that wants to update a table based on a join like this:
>
> update test_one
> set f_key = t.f_key
> from
>     upd_temp1 t,
>     test_one t2
> where
>     t.id_number = t2.id_number


Why is test_one in the from clause? update joins whatever is in the from
clause to the table being updated. You almost never need it repeated in
the from clause.


cheers

andrew




Re: Update join performance issues

From
"Kevin Grittner"
Date:
Andrew Dunstan <andrew@dunslane.net> wrote:

> Why is test_one in the from clause? update joins whatever is in
> the from clause to the table being updated. You almost never need
> it repeated in the from clause.

This is actually one of the nastier "gotchas" in converting from
Sybase ASE or MS SQL Server to PostgreSQL -- there are syntactically
identical UPDATE statements with very different semantics when a
FROM clause is used in an UPDATE statement.  You need to do what the
OP was showing to use an alias with the target table under those
other products.

I suppose it might be possible to generate a warning when it appears
that someone is making this mistake, but it wouldn't be easy and
would probably not be worth the carrying cost.  The test would need
to be something like:

(1)  The relation which is the target of the UPDATE has no alias.
(2)  There is a FROM clause which included the target relation (with
     an alias).
(3)  There aren't any joining references between the UPDATE target
     and the relation(s) in the FROM clause.

-Kevin

Re: Update join performance issues

From
Thomas Kellerer
Date:
Kevin Kempter wrote on 03.04.2012 19:29:
> Hi All;
>
> I have a query that wants to update a table based on a join like this:
>
> update test_one
> set f_key = t.f_key
> from
> upd_temp1 t,
> test_one t2
> where
> t.id_number = t2.id_number
>
> upd_temp1 has 248,762 rows
> test_one has 248,762 rows
>

To extend on what Kevin has already answere:

Quote from the manual:
   "Note that the target table must not appear in the from_list, unless you intend a self-join"


Re: Update join performance issues

From
Merlin Moncure
Date:
On Tue, Apr 3, 2012 at 12:29 PM, Kevin Kempter
<cs_dba@consistentstate.com> wrote:
> Hi All;
>
> I have a query that wants to update a table based on a join like this:
>
> update test_one
> set f_key = t.f_key
> from
>    upd_temp1 t,
>    test_one t2
> where
>    t.id_number = t2.id_number
>
> upd_temp1 has 248,762 rows
> test_one has 248,762 rows
>
> test_one has an index on f_key and an index on id_number
> upd_temp1 has an index on id_number
>
>
> The explain plan looks like this:
>  Update  (cost=0.00..3212284472.90 rows=256978208226 width=121)
>   ->  Nested Loop  (cost=0.00..3212284472.90 rows=256978208226 width=121)
>         ->  Merge Join  (cost=0.00..51952.68 rows=1033028 width=20)
>               Merge Cond: ((t.id_number)::text = (t2.id_number)::text)
>               ->  Index Scan using idx_tmp_001a on upd_temp1 t
>  (cost=0.00..12642.71 rows=248762 width=
> 52)
>               ->  Materialize  (cost=0.00..23814.54 rows=248762 width=17)
>                     ->  Index Scan using index_idx1 on test_one t2
>  (cost=0.00..23192.64 rows
> =248762 width=17)
>         ->  Materialize  (cost=0.00..6750.43 rows=248762 width=101)
>               ->  Seq Scan on test_one  (cost=0.00..5506.62 rows=248762
> width=101)
> (9 rows)
>
>
> The update never finishes, we always stop it after about 30min to an hour.
>
> Anyone have any thoughts per boosting performance?

to add:

reading explain output is an art form all onto itself but the
following is a giant screaming red flag:
rows=256978208226

unless of course you're trying to update that many rows, this is
telling you that there is an unconstrained join in there somewhere as
others have noted.

merlin