how to efficiently update tuple in many-to-many relationship? - Mailing list pgsql-performance

From Drew Wilson
Subject how to efficiently update tuple in many-to-many relationship?
Date
Msg-id 233A9987-ABA5-4491-A304-FDA1F8FCD962@gmail.com
Whole thread Raw
Responses Re: how to efficiently update tuple in many-to-many relationship?
Re: how to efficiently update tuple in many-to-many relationship?
List pgsql-performance
I have 2 tables (A,B) joined in a many-to-many relationship via a
join table ("membership"), where updating table A based on table B
takes a very long time.

Tables A and B have oid primary keys (a_id and b_id respectively).
The join table, "membership", has its own oid primary key
(membership_id), as well as foreign keys "a_id" and "b_id".

A SELECT query across all 3 tables takes 12 seconds.
"SELECT count(*) FROM a JOIN membership USING(a_id) JOIN b USING
(b_id) WHERE b.is_public = true"

But a simple UPDATE using the same SELECT query takes 30 minutes to
an hour.
"UPDATE A set is_public=true WHERE a_id IN (SELECT count(*) FROM a
JOIN membership USING(a_id) JOIN b USING(b_id) WHERE b.is_public =
true)".

What am I doing wrong here? I'm not sure how to diagnose this further.

Here's the output from explain:
db=# EXPLAIN SELECT a_id  FROM a JOIN membership USING(a_id) JOIN b
USING(b_id) WHERE b.is_public = true;
------------------------------------------------------------------------
-----------------------------------------
Hash Join  (cost=167154.78..173749.48 rows=51345 width=4)
    Hash Cond: (a.a_id = membership.a_id)
    ->  Function Scan on a  (cost=0.00..12.50 rows=1000 width=4)
    ->  Hash  (cost=144406.48..144406.48 rows=1819864 width=4)
          ->  Hash Join  (cost=417.91..144406.48 rows=1819864 width=4)
                Hash Cond: (membership.b_id = b.b_id)
                ->  Seq Scan on membership  (cost=0.00..83623.83
rows=4818983 width=8)
                ->  Hash  (cost=348.52..348.52 rows=5551 width=4)
                      ->  Index Scan using b_is_public on b
(cost=0.00..348.52 rows=5551 width=4)
                            Index Cond: (is_public = true)
                            Filter: is_public
(11 rows)


db=# EXPLAIN UPDATE a SET is_public = true WHERE a_id IN
         ( SELECT a_id FROM a JOIN membership USING(a_id) JOIN b USING
(b_id) WHERE b.is_public = true);
------------------------------------------------------------------------
-----------------------------------------
hash in join  (cost=281680.17..370835.63 rows=1819864 width=90)
    hash cond: (public.a.a_id = public.a.a_id)
    ->  seq scan on a  (cost=0.00..47362.09 rows=2097309 width=90)
    ->  hash  (cost=258931.87..258931.87 rows=1819864 width=8)
          ->  hash join  (cost=73996.36..258931.87 rows=1819864 width=8)
                hash cond: (membership.a_id = public.a.a_id)
                ->  hash join  (cost=417.91..144406.48 rows=1819864
width=4)
                      hash cond: (membership.b_id = b.b_id)
                      ->  seq scan on membership
(cost=0.00..83623.83 rows=4818983 width=8)
                      ->  hash  (cost=348.52..348.52 rows=5551 width=4)
                            ->  index scan using
loc_submission_is_public on b  (cost=0.00..348.52 rows=5551 width=4)
                                  index cond: (is_public = true)
                                  filter: is_public
                ->  hash  (cost=47362.09..47362.09 rows=2097309 width=4)
                      ->  seq scan on a  (cost=0.00..47362.09
rows=2097309 width=4)

Thanks,

Drew

pgsql-performance by date:

Previous
From: "Alex Deucher"
Date:
Subject: Re: postgres 8.2 seems to prefer Seq Scan
Next
From: Tom Lane
Date:
Subject: Re: DELETE with filter on ctid