Re: Very slow update + not using clustered index - Mailing list pgsql-performance
From | Tom Lane |
---|---|
Subject | Re: Very slow update + not using clustered index |
Date | |
Msg-id | 2966.1073016371@sss.pgh.pa.us Whole thread Raw |
In response to | Very slow update + not using clustered index (Mike Glover <mpg4@duluoz.net>) |
Responses |
Re: Very slow update + not using clustered index
|
List | pgsql-performance |
Mike Glover <mpg4@duluoz.net> writes: > I want to run the following query, but it takes a *very* long time. > Like this: > bookshelf=> explain analyze update summary set price_min=0, > availability=2, condition=9 where isbn = inventory.isbn and price_min = > inventory.price; > ... > Total runtime: 3162319.477 ms(9 rows) > Running what I believe to be the comparable select query is more > reasonable: > bookshelf=> explain analyze select s.* from summary s, inventory i where > s.isbn = i.isbn and s.price_min = i.price; > ... > Total runtime: 216324.171 ms AFAICS these plans are identical, and therefore the difference in runtime must be ascribed to the time spent actually doing the updates. It seems unlikely that the raw row inserts and updating the single index could be quite that slow --- perhaps you have a foreign key or trigger performance problem? > So, my first question is: why is the planner still sorting on price when > isbn seems (considerably) quicker, and how can I force it to sort by > isbn(if I even should)? Is this PG 7.4? It looks to me like the planner *should* consider both possible orderings of the mergejoin sort keys. I'm not sure that it knows enough to realize that the key with more distinct values should be put first, however. A quick experiment shows that if the planner does not have any reason to prefer one ordering over another, the current coding will put the last WHERE clause first: regression=# create table t1(f1 int, f2 int); CREATE TABLE regression=# create table t2(f1 int, f2 int); CREATE TABLE regression=# explain select * from t1,t2 where t1.f1=t2.f1 and t1.f2=t2.f2; QUERY PLAN ------------------------------------------------------------------------- Merge Join (cost=139.66..154.91 rows=25 width=16) Merge Cond: (("outer".f2 = "inner".f2) AND ("outer".f1 = "inner".f1)) -> Sort (cost=69.83..72.33 rows=1000 width=8) Sort Key: t1.f2, t1.f1 -> Seq Scan on t1 (cost=0.00..20.00 rows=1000 width=8) -> Sort (cost=69.83..72.33 rows=1000 width=8) Sort Key: t2.f2, t2.f1 -> Seq Scan on t2 (cost=0.00..20.00 rows=1000 width=8) (8 rows) regression=# explain select * from t1,t2 where t1.f2=t2.f2 and t1.f1=t2.f1; QUERY PLAN ------------------------------------------------------------------------- Merge Join (cost=139.66..154.91 rows=25 width=16) Merge Cond: (("outer".f1 = "inner".f1) AND ("outer".f2 = "inner".f2)) -> Sort (cost=69.83..72.33 rows=1000 width=8) Sort Key: t1.f1, t1.f2 -> Seq Scan on t1 (cost=0.00..20.00 rows=1000 width=8) -> Sort (cost=69.83..72.33 rows=1000 width=8) Sort Key: t2.f1, t2.f2 -> Seq Scan on t2 (cost=0.00..20.00 rows=1000 width=8) (8 rows) and so you could probably improve matters just by switching the order of your WHERE clauses. Of course this answer will break as soon as anyone touches any part of the related code, so I'd like to try to fix it so that there is actually a principled choice made. Could you send along the pg_stats rows for these columns? > The second question is: why, oh why does the update take such and > obscenely long time to complete? See above --- the problem is not within the plan, but must be sought elsewhere. regards, tom lane
pgsql-performance by date: