Very slow update + not using clustered index - Mailing list pgsql-performance
From | Mike Glover |
---|---|
Subject | Very slow update + not using clustered index |
Date | |
Msg-id | 20040101193401.6640420b.mpg4@duluoz.net Whole thread Raw |
Responses |
Re: Very slow update + not using clustered index
|
List | pgsql-performance |
I have these two tables: Table "de.summary" Column | Type | Modifiers --------------+-----------------------------+--------------- isbn | character varying(10) | not null source | character varying(20) | not null condition | smallint | availability | smallint | price_list | numeric(11,2) | price_min | numeric(11,2) | last_update | timestamp without time zone | default now() Indexes: "summary_pkey" primary key, btree (isbn, source) Table "de.inventory" Column | Type | Modifiers --------------+-----------------------+----------- isbn | character varying(10) | condition | integer | availability | integer | price | numeric(9,2) | Indexes: "inventory_isbn_idx" btree (isbn) Both tables are clustered on their respective indexes. The entire database has been freshly VACUUM FULL'd and ANALYZE'd (after clustering). I want to run the following query, but it takes a *very* long time. Like this: bookshelf=> explain analyze update summary set price_min=0, availability=2, condition=9 where isbn = inventory.isbn and price_min = inventory.price; QUERY PLAN ----------------------------------------------------------------- ----------------------------------------------------------- Merge Join (cost=496170.66..517271.50 rows=5051 width=51) (actual time=226940.723..292247.643 rows=419277 loops=1) Merge Cond: (("outer".price_min = "inner".price) AND ("outer"."?column7?" = "inner"."?column3?")) -> Sort (cost=366877.05..371990.05 rows=2045201 width=61) (actual time=162681.929..177216.158 rows=2045200 loops=1) Sort Key: summary.price_min, (summary.isbn)::text -> Seq Scan on summary (cost=0.00..44651.01 rows=2045201 width=61) (actual time=8.139..22179.379 rows=2045201 loops=1) -> Sort(cost=129293.61..131499.09 rows=882192 width=25) (actual time=64213.663..67563.175 rows=882192 loops=1) Sort Key: inventory.price, (inventory.isbn)::text -> Seq Scan on inventory(cost=0.00..16173.92 rows=882192 width=25)(actual time=5.773..21548.942 rows=882192 loops=1) Total runtime: 3162319.477 ms(9 rows) Running what I believe to be the comparable select query is more reasonable: bookshelf=> explain analyze select s.* from summary s, inventory i where s.isbn = i.isbn and s.price_min = i.price; QUERY PLAN ----------------------------------------------------------------------- Merge Join (cost=495960.66..517061.50 rows=5051 width=59) (actual time=194048.974..215835.982 rows=419277 loops=1) Merge Cond: (("outer".price_min = "inner".price) AND ("outer"."?column8?" ="inner"."?column3?")) -> Sort (cost=366667.05..371780.05 rows=2045201 width=59) (actual time=147678.109..149945.170 rows=2045200 loops=1) Sort Key: s.price_min, (s.isbn)::text -> Seq Scan on summary s (cost=0.00..49431.01 rows=2045201 width=59) (actual time=0.056..9304.803 rows=2045201 loops=1) -> Sort (cost=129293.61..131499.09 rows=882192 width=25) (actual time=46338.696..47183.739 rows=882192 loops=1) Sort Key: i.price, (i.isbn)::text -> Seq Scan on inventory i (cost=0.00..16173.92 rows=882192 width=25) (actual time=0.089..2419.187 rows=882192 loops=1) Total runtime: 216324.171 ms I had figured that the tables would get sorted on isbn, because of the clustering. I understand why price might get chosen (fewer matches), but the planner seems to be making the wrong choice: bookshelf=> explain analyze select s.* from summary s, inventory i where s.isbn = i.isbn; QUERY PLAN ----------------------------------------------------------------------- Merge Join (cost=489500.66..512953.69 rows=882192 width=59) (actual time=152247.741..174408.812 rows=882192 loops=1) Merge Cond: ("outer"."?column8?" = "inner"."?column2?") -> Sort (cost=366667.05..371780.05 rows=2045201 width=59) (actual time=118562.097..120817.894 rows=2045146 loops=1) Sort Key:(s.isbn)::text -> Seq Scan on summary s (cost=0.00..49431.01 rows=2045201 width=59) (actual time=0.062..8766.683 rows=2045201 loops=1) -> Sort (cost=122833.61..125039.09 rows=882192 width=14)(actual time=33685.455..34480.190 rows=882192 loops=1) Sort Key:(i.isbn)::text -> Seq Scan on inventory i (cost=0.00..16173.92 rows=882192 width=14) (actual time=0.088..1942.173 rows=882192 loops=1) Total runtime: 174926.115 ms So, my first question is: why is the planner still sorting on price when isbn seems (considerably) quicker, and how can I force it to sort by isbn(if I even should)? The second question is: why, oh why does the update take such and obscenely long time to complete? The 175s (and even 216s) for the select seems reasonable given the size of the tables, but not 3000s to update the same rows. The processor (AMD 1.3GHz) is 90%+ utilization for most of the execution time. I can post more information if it would be helpful, but this post is long enough already. TIA, and happy new year. -mike -- Mike Glover Key ID BFD19F2C <mpg4@duluoz.net>
Attachment
pgsql-performance by date: