Re: Very slow update + not using clustered index - Mailing list pgsql-performance

From Tom Lane
Subject Re: Very slow update + not using clustered index
Date
Msg-id 2966.1073016371@sss.pgh.pa.us
Whole thread Raw
In response to Very slow update + not using clustered index  (Mike Glover <mpg4@duluoz.net>)
Responses Re: Very slow update + not using clustered index
List pgsql-performance
Mike Glover <mpg4@duluoz.net> writes:
> I want to run the following query, but it takes a *very* long time.
> Like this:
> bookshelf=> explain analyze update summary set price_min=0,
> availability=2, condition=9 where isbn = inventory.isbn and price_min =
> inventory.price;
> ...
> Total runtime: 3162319.477 ms(9 rows)

> Running what I believe to be the comparable select query is more
> reasonable:

> bookshelf=> explain analyze select s.* from summary s, inventory i where
> s.isbn = i.isbn and s.price_min = i.price;
> ...
> Total runtime: 216324.171 ms

AFAICS these plans are identical, and therefore the difference in
runtime must be ascribed to the time spent actually doing the updates.
It seems unlikely that the raw row inserts and updating the single
index could be quite that slow --- perhaps you have a foreign key
or trigger performance problem?

> So, my first question is: why is the planner still sorting on price when
> isbn seems (considerably) quicker, and how can I force it to sort by
> isbn(if I even should)?

Is this PG 7.4?  It looks to me like the planner *should* consider both
possible orderings of the mergejoin sort keys.  I'm not sure that it
knows enough to realize that the key with more distinct values should be
put first, however.

A quick experiment shows that if the planner does not have any reason to
prefer one ordering over another, the current coding will put the last
WHERE clause first:

regression=# create table t1(f1 int, f2 int);
CREATE TABLE
regression=# create table t2(f1 int, f2 int);
CREATE TABLE
regression=# explain select * from t1,t2 where t1.f1=t2.f1 and t1.f2=t2.f2;
                               QUERY PLAN
-------------------------------------------------------------------------
 Merge Join  (cost=139.66..154.91 rows=25 width=16)
   Merge Cond: (("outer".f2 = "inner".f2) AND ("outer".f1 = "inner".f1))
   ->  Sort  (cost=69.83..72.33 rows=1000 width=8)
         Sort Key: t1.f2, t1.f1
         ->  Seq Scan on t1  (cost=0.00..20.00 rows=1000 width=8)
   ->  Sort  (cost=69.83..72.33 rows=1000 width=8)
         Sort Key: t2.f2, t2.f1
         ->  Seq Scan on t2  (cost=0.00..20.00 rows=1000 width=8)
(8 rows)

regression=# explain select * from t1,t2 where t1.f2=t2.f2 and t1.f1=t2.f1;
                               QUERY PLAN
-------------------------------------------------------------------------
 Merge Join  (cost=139.66..154.91 rows=25 width=16)
   Merge Cond: (("outer".f1 = "inner".f1) AND ("outer".f2 = "inner".f2))
   ->  Sort  (cost=69.83..72.33 rows=1000 width=8)
         Sort Key: t1.f1, t1.f2
         ->  Seq Scan on t1  (cost=0.00..20.00 rows=1000 width=8)
   ->  Sort  (cost=69.83..72.33 rows=1000 width=8)
         Sort Key: t2.f1, t2.f2
         ->  Seq Scan on t2  (cost=0.00..20.00 rows=1000 width=8)
(8 rows)

and so you could probably improve matters just by switching the order of
your WHERE clauses.  Of course this answer will break as soon as anyone
touches any part of the related code, so I'd like to try to fix it so
that there is actually a principled choice made.  Could you send along
the pg_stats rows for these columns?

> The second question is:  why, oh why does the update take such and
> obscenely long time to complete?

See above --- the problem is not within the plan, but must be sought
elsewhere.

            regards, tom lane

pgsql-performance by date:

Previous
From: Mike Glover
Date:
Subject: Very slow update + not using clustered index
Next
From: Mike Glover
Date:
Subject: Re: Very slow update + not using clustered index