Home > mailing lists

Re: atrocious update performance - Mailing list pgsql-performance

From	Rosser Schwarz
Subject	Re: atrocious update performance
Date	March 16, 2004 15:58:50
Msg-id	001e01c40b91$181d66c0$2500fa0a@CardServices.TCI.com Whole thread Raw
In response to	Re: atrocious update performance (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: atrocious update performance
List	pgsql-performance

Tree view

while you weren't looking, Tom Lane wrote:

> ...slow FK checks could be your problem if the application is set
> up to issue multiple UPDATEs affecting the same row(s) during a
> single transaction.  I'm not clear on whether that applies to you
> or not.

It shouldn't.  It's just one large batch update that should be hitting
every row serially.

> And anyway the bottom line is: have you got indexes on the columns
> *referencing* account.cust.custid?

No.  I'd've sworn I had one on account.acct.custid, since that table
is popupated (currently ~500K rows), but it's not.

$ time psql tci -c "explain analyze select * from account.acct where
custid = 257458"
                              QUERY PLAN
-----------------------------------------------------------------------
 Seq Scan on acct  (cost=0.00..7166.68 rows=2 width=71) (actual
 time=1047.122..1047.122 rows=0 loops=1)
   Filter: (custid = 257458)
 Total runtime: 1047.362 ms
(3 rows)

real    0m1.083s
user    0m0.010s
sys     0m0.000s

If it is looking up the custid in account.acct for each row, that's,
say, 1 seconds per lookup, for 4.7 million lookups, for, if my math
is right (4,731,410 / 3600 / 24) 54 days.  I suppose that tracks, but
that doesn't make sense, given what you said about the fk checks,
above.

Of course, if I index the column and amend the query to say "where
custid = 194752::bigint" I get back much saner numbers:

                              QUERY PLAN
----------------------------------------------------------------------
 Index Scan using ix_fk_acct_custid on acct  (cost=0.00..3.34 rows=2
 width=71) (actual time=0.126..0.141 rows=2 loops=1)
   Index Cond: (custid = 194752::bigint)
 Total runtime: 0.314 ms
(3 rows)

real    0m0.036s
user    0m0.010s
sys     0m0.000s

Which would still take just under two days.

$ time psql tci -c "explain analyze update account.cust set prodid =
tempprod.prodid, subprodid = tempprod.subprodid where origid =
tempprod.debtid"

But if I'm not touching the column referenced from account.acct, why
would it be looking there at all?  I've got an explain analyze of the
update running now, but until it finishes, I can't say for certain
what it's doing.  explain, alone, says:

$ time psql tci -c "explain update account.cust set prodid =
tempprod.prodid, subprodid = tempprod.subprodid where origid =
tempprod.debtid;"
                              QUERY PLAN
---------------------------------------------------------------------
 Merge Join  (cost=0.00..232764.69 rows=4731410 width=252)
   Merge Cond: (("outer".origid)::text = ("inner".debtid)::text)
   ->  Index Scan using ix_origid on cust  (cost=0.00..94876.83
       rows=4731410 width=236)
   ->  Index Scan using ix_debtid on tempprod  (cost=0.00..66916.71
       rows=4731410 width=26)
(4 rows)

real    0m26.965s
user    0m0.010s
sys     0m0.000s

which shows it not hitting account.acct at all.  (And why did it take
the planner 20-some seconds to come up with that query plan?)

tempprod doesn't have an index either, but then it doesn't reference
account.cust; instead, the update would be done by joining the two on
debtid/origid, which map one-to-one, are both indexed, and with both
tables clustered on those indices--exactly as was the CREATE TABLE AS
Aaron suggested elsethread.

Unfortunately, this isn't the only large update we'll have to do.  We
receive a daily, ~100K rows file that may have new values for any field
of any row in account.cust, .acct or sundry other tables.  The process
of updating from that file is time-critical; it must run in minutes, at
the outside.

/rls

--
Rosser Schwarz
Total Card, Inc.

pgsql-performance by date:

From: Darcy Buskermolen
Date: 16 March 2004, 14:54:48
Subject: Fwd: Configuring disk cache size on postgress

From: Tom Lane
Date: 16 March 2004, 16:14:57
Subject: Re: atrocious update performance

Re: atrocious update performance - Mailing list pgsql-performance

Previous

Next