Re: atrocious update performance - Mailing list pgsql-performance

From Rosser Schwarz
Subject Re: atrocious update performance
Date
Msg-id 004301c411be$74004650$2500fa0a@CardServices.TCI.com
Whole thread Raw
In response to Re: atrocious update performance  (Greg Spiegelberg <gspiegelberg@cranel.com>)
Responses Re: atrocious update performance
List pgsql-performance
Greg Spiegelberg wrote:

> > Will advise.

After creating 100, 1K, 10K, 100K and 1M-row subsets of account.cust and
the corresponding rows/tables with foreign key constraints referring to
the table, I'm unable to reproduce the behavior at issue.

explain analyze looks like the following, showing the query run with the
join column indexed and not, respectively:

# explain analyze update test.cust100 set prodid = tempprod.prodid,
subprodid = tempprod.subprodid where origid = tempprod.debtid;
-- with index
                        QUERY PLAN
-----------------------------------------------------------------------
 Merge Join  (cost=0.00..25.64 rows=500 width=220) (actual
 time=0.241..13.091 rows=100 loops=1)
   Merge Cond: (("outer".origid)::text = ("inner".debtid)::text)
   ->  Index Scan using ix_origid_cust100 on cust100  (cost=0.00..11.50
       rows=500 width=204) (actual time=0.125..6.465 rows=100 loops=1)
   ->  Index Scan using ix_debtid on tempprod  (cost=0.00..66916.71
       rows=4731410 width=26) (actual time=0.057..1.497 rows=101 loops=1)
 Total runtime: 34.067 ms
(5 rows)

-- without index
                        QUERY PLAN
----------------------------------------------------------------------
 Merge Join  (cost=7.32..16.71 rows=100 width=220) (actual
 time=4.415..10.918 rows=100 loops=1)
   Merge Cond: (("outer".debtid)::text = "inner"."?column22?")
   ->  Index Scan using ix_debtid on tempprod  (cost=0.00..66916.71
       rows=4731410 width=26) (actual time=0.051..1.291 rows=101 loops=1)
   ->  Sort  (cost=7.32..7.57 rows=100 width=204) (actual
       time=4.311..4.450 rows=100 loops=1)
         Sort Key: (cust100.origid)::text
         ->  Seq Scan on cust100  (cost=0.00..4.00 rows=100 width=204)
             (actual time=0.235..2.615 rows=100 loops=1)
 Total runtime: 25.031 ms
(7 rows)

With the join column indexed, it takes roughly .32ms/row on the first
four tests (100.. 100K), and about .48ms/row on 1M rows.  Without the
index, it runs 100 rows @ .25/row, 1000 @ .26, 10000 @ .27, 100000 @
.48 and .5 @ 1M rows.

In no case does the query plan reflect foreign key validation.  Failing
any other suggestions for diagnosis in the soon, I'm going to nuke the
PostgreSQL install, scour it from the machine and start from scratch.
Failing that, I'm going to come in some weekend and re-do the machine.

> Problem is when I recreate the indexes and add the constraints back
> on ORIG I end up with the same long running process.  The original
> UPDATE runs for about 30 minutes on a table of 400,000 with the
> WHERE matching about 70% of the rows.  The above runs for about 2
> minutes without adding the constraints or indexes however adding the
> constraints and creating the dropped indexes negates any gain.

Is this a frequently-run update?

In my experience, with my seemingly mutant install, dropping indices
and constraints to shave 14/15 off the update time would be worth the
effort.  Just script dropping, updating and recreating into one large
transaction.  It's a symptom-level fix, but re-creating the fifteen
indices on one of our 5M row tables doesn't take 28 minutes, and your
hardware looks to be rather less IO and CPU bound than ours.  I'd also
second Tom's suggestion of moving to 7.4.

/rls


pgsql-performance by date:

Previous
From: "scott.marlowe"
Date:
Subject: Re: slow vacuum performance
Next
From: Richard Huxton
Date:
Subject: Re: Help with query plan inconsistencies