Very slow update + not using clustered index - Mailing list pgsql-performance

From Mike Glover
Subject Very slow update + not using clustered index
Date
Msg-id 20040101193401.6640420b.mpg4@duluoz.net
Whole thread Raw
Responses Re: Very slow update + not using clustered index
List pgsql-performance
I have these two tables:

                     Table "de.summary"
    Column    |            Type             |   Modifiers
--------------+-----------------------------+---------------
 isbn         | character varying(10)       | not null
 source       | character varying(20)       | not null
 condition    | smallint                    |
 availability | smallint                    |
 price_list   | numeric(11,2)               |
 price_min    | numeric(11,2)               |
 last_update  | timestamp without time zone | default now()
Indexes:
    "summary_pkey" primary key, btree (isbn, source)

               Table "de.inventory"
    Column    |         Type          | Modifiers
--------------+-----------------------+-----------
 isbn         | character varying(10) |
 condition    | integer               |
 availability | integer               |
 price        | numeric(9,2)          |
Indexes:
    "inventory_isbn_idx" btree (isbn)


Both tables are clustered on their respective indexes.  The entire
database has been freshly VACUUM FULL'd and ANALYZE'd (after
clustering).

I want to run the following query, but it takes a *very* long time.
Like this:

bookshelf=> explain analyze update summary set price_min=0,
availability=2, condition=9 where isbn = inventory.isbn and price_min =
inventory.price;
   QUERY PLAN
-----------------------------------------------------------------
-----------------------------------------------------------
Merge Join (cost=496170.66..517271.50 rows=5051 width=51) (actual
time=226940.723..292247.643 rows=419277 loops=1)
  Merge Cond: (("outer".price_min = "inner".price) AND
("outer"."?column7?" = "inner"."?column3?"))
   ->  Sort (cost=366877.05..371990.05 rows=2045201 width=61) (actual
time=162681.929..177216.158 rows=2045200 loops=1)
         Sort Key: summary.price_min, (summary.isbn)::text
   -> Seq Scan on summary (cost=0.00..44651.01 rows=2045201 width=61)
(actual time=8.139..22179.379 rows=2045201 loops=1)
   -> Sort(cost=129293.61..131499.09 rows=882192 width=25) (actual
time=64213.663..67563.175 rows=882192 loops=1)
        Sort Key: inventory.price, (inventory.isbn)::text
   -> Seq Scan on inventory(cost=0.00..16173.92 rows=882192
width=25)(actual time=5.773..21548.942 rows=882192 loops=1)
Total runtime: 3162319.477 ms(9 rows)

Running what I believe to be the comparable select query is more
reasonable:

bookshelf=> explain analyze select s.* from summary s, inventory i where
s.isbn = i.isbn and s.price_min = i.price;
                             QUERY PLAN
-----------------------------------------------------------------------
Merge Join (cost=495960.66..517061.50 rows=5051 width=59) (actual
time=194048.974..215835.982 rows=419277 loops=1)
   Merge Cond: (("outer".price_min = "inner".price) AND
("outer"."?column8?" ="inner"."?column3?"))
   ->  Sort (cost=366667.05..371780.05 rows=2045201 width=59) (actual
time=147678.109..149945.170 rows=2045200 loops=1)
         Sort Key: s.price_min, (s.isbn)::text
         ->  Seq Scan on summary s (cost=0.00..49431.01 rows=2045201
width=59) (actual time=0.056..9304.803 rows=2045201 loops=1)
   ->  Sort (cost=129293.61..131499.09 rows=882192 width=25) (actual
time=46338.696..47183.739 rows=882192 loops=1)
        Sort Key: i.price, (i.isbn)::text
         ->  Seq Scan on inventory i (cost=0.00..16173.92 rows=882192
width=25) (actual time=0.089..2419.187 rows=882192 loops=1)
Total runtime: 216324.171 ms


I had figured that the tables would get sorted on isbn, because of the
clustering.  I understand why price might get chosen (fewer matches),
but the planner seems to be making the wrong choice:

bookshelf=> explain analyze select s.* from summary s, inventory i where
s.isbn = i.isbn;
   QUERY PLAN
-----------------------------------------------------------------------
Merge Join (cost=489500.66..512953.69 rows=882192 width=59) (actual
time=152247.741..174408.812 rows=882192 loops=1)
   Merge Cond: ("outer"."?column8?" = "inner"."?column2?")
   ->  Sort (cost=366667.05..371780.05 rows=2045201 width=59) (actual
time=118562.097..120817.894 rows=2045146 loops=1)
         Sort Key:(s.isbn)::text
         ->  Seq Scan on summary s  (cost=0.00..49431.01 rows=2045201
width=59) (actual time=0.062..8766.683 rows=2045201 loops=1)
   ->  Sort (cost=122833.61..125039.09 rows=882192 width=14)(actual
time=33685.455..34480.190 rows=882192 loops=1)
         Sort Key:(i.isbn)::text
         ->  Seq Scan on inventory i (cost=0.00..16173.92 rows=882192
width=14) (actual time=0.088..1942.173 rows=882192 loops=1)
 Total runtime: 174926.115 ms

So, my first question is: why is the planner still sorting on price when
isbn seems (considerably) quicker, and how can I force it to sort by
isbn(if I even should)?

The second question is:  why, oh why does the update take such and
obscenely long time to complete?  The 175s (and even 216s) for the
select seems reasonable given the size of the tables, but not 3000s to
update the same rows. The processor (AMD 1.3GHz) is 90%+ utilization for
most of the execution time.

I can post more information if it would be helpful, but this post is
long enough already.

TIA, and happy new year.

-mike


--
Mike Glover
Key ID BFD19F2C <mpg4@duluoz.net>

Attachment

pgsql-performance by date:

Previous
From: "cnliou"
Date:
Subject: pg_restore makes disk busy
Next
From: Tom Lane
Date:
Subject: Re: Very slow update + not using clustered index