Re: how to efficiently update tuple in many-to-many relationship? - Mailing list pgsql-performance

From Drew Wilson
Subject Re: how to efficiently update tuple in many-to-many relationship?
Date
Msg-id FCEF7DE2-913F-4C36-B117-8363EDE39BF3@gmail.com
Whole thread Raw
In response to Re: how to efficiently update tuple in many-to-many relationship?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: how to efficiently update tuple in many-to-many relationship?
List pgsql-performance
On Apr 10, 2007, at 6:54 AM, Tom Lane wrote:

> Drew Wilson <drewmwilson@gmail.com> writes:
>> The SELECT is not slow, so its a side effect of the update... Looking
>> at the table definition, there is a "BEFORE ON DELETE" trigger
>> defined, two CHECK constraints for this table, and three foreign
>> keys. Nothing looks suspicious to me.
>
> Since this is an update we can ignore the before-delete trigger, and
> the check constraints don't look expensive to test.  Outgoing foreign
> key references are normally not a problem either, since there must
> be an index on the other end.  But *incoming* foreign key references
> might be an issue --- are there any linking to this table?
There is only one incoming foreign key - the one coming in from the
many-to-many join table ('instance').

>
> Also, the seven indexes seem a bit excessive.  I'm not sure if that's
> where the update time is going, but they sure aren't helping, and
> some of them seem redundant anyway.  In particular I think that the
> partial index WHERE obsolete IS NOT TRUE is probably a waste (do you
> have any queries you know use it? what do they look like?) and you
> probably don't need all three combinations of source_id and
> translation_id --- see discussion here:
> http://www.postgresql.org/docs/8.2/static/indexes-bitmap-scans.html
99% of our queries use obsolete IS NOT TRUE, so we have an index on
this.

> BTW, I don't think you ever mentioned what PG version this is exactly?
> If it's 8.1 or later it would be worth slogging through EXPLAIN
> ANALYZE
> on the update, or maybe an update of 10% or so of the rows if you're
> impatient.  That would break out the time spent in the triggers, which
> would let us eliminate them (or not) as the cause of the problem.
Sorry. I'm using 8.2.3 on Mac OS X 10.4.9, w/ 2.Ghz Intel Core Duo,
and 2G RAM.

If I understand the EXPLAIN ANALYZE results below, it looks like the
time spent applying the "set is_public = true" is much much more than
the fetch. I don't see any triggers firing. Is there something else I
can look for in the logs?

Here is the explain analyze output:
MatchBox=# EXPLAIN ANALYZE UPDATE translation_pair_data SET is_public
= true
          WHERE translation_pair_id IN
             (SELECT translation_pair_id FROM translation_pair_data
                  JOIN instance i using(translation_pair_id)
                  JOIN loc_submission ls using(loc_submission_id)
                      WHERE ls.is_public = true);
                        QUERY PLAN
------------------------------------------------------------------------
---------------------------------------------
Hash IN Join  (cost=324546.91..457218.64 rows=1698329 width=90)
(actual time=12891.309..33621.801 rows=637712 loops=1)
    Hash Cond: (public.translation_pair_data.translation_pair_id =
public.translation_pair_data.translation_pair_id)
    ->  Seq Scan on translation_pair_data  (cost=0.00..38494.29
rows=1698329 width=90) (actual time=0.045..19352.184 rows=1690272
loops=1)
    ->  Hash  (cost=290643.93..290643.93 rows=2006718 width=8)
(actual time=10510.411..10510.411 rows=1207161 loops=1)
          ->  Hash Join  (cost=66710.78..290643.93 rows=2006718
width=8) (actual time=1810.299..9821.862 rows=1207161 loops=1)
                Hash Cond: (i.translation_pair_id =
public.translation_pair_data.translation_pair_id)
                ->  Hash Join  (cost=352.38..169363.36 rows=2006718
width=4) (actual time=11.369..6273.439 rows=1207161 loops=1)
                      Hash Cond: (i.loc_submission_id =
ls.loc_submission_id)
                      ->  Seq Scan on instance i
(cost=0.00..99016.16 rows=5706016 width=8) (actual
time=0.029..3774.705 rows=5705932 loops=1)
                      ->  Hash  (cost=283.23..283.23 rows=5532
width=4) (actual time=11.277..11.277 rows=5563 loops=1)
                            ->  Index Scan using
loc_submission_is_public on loc_submission ls  (cost=0.00..283.23
rows=5532 width=4) (actual time=0.110..7.717 rows=5563 loops=1)
                                  Index Cond: (is_public = true)
                                  Filter: is_public
                ->  Hash  (cost=38494.29..38494.29 rows=1698329
width=4) (actual time=1796.574..1796.574 rows=1690272 loops=1)
                      ->  Seq Scan on translation_pair_data
(cost=0.00..38494.29 rows=1698329 width=4) (actual
time=0.012..917.006 rows=1690272 loops=1)
Total runtime: 1008985.005 ms

Thanks for your help,

Drew

pgsql-performance by date:

Previous
From: "Dave Dutcher"
Date:
Subject: Re: Beginner Question
Next
From: Tom Lane
Date:
Subject: Re: how to efficiently update tuple in many-to-many relationship?