Re: atrocious update performance - Mailing list pgsql-performance
From | Rosser Schwarz |
---|---|
Subject | Re: atrocious update performance |
Date | |
Msg-id | 001e01c40b91$181d66c0$2500fa0a@CardServices.TCI.com Whole thread Raw |
In response to | Re: atrocious update performance (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: atrocious update performance
|
List | pgsql-performance |
while you weren't looking, Tom Lane wrote: > ...slow FK checks could be your problem if the application is set > up to issue multiple UPDATEs affecting the same row(s) during a > single transaction. I'm not clear on whether that applies to you > or not. It shouldn't. It's just one large batch update that should be hitting every row serially. > And anyway the bottom line is: have you got indexes on the columns > *referencing* account.cust.custid? No. I'd've sworn I had one on account.acct.custid, since that table is popupated (currently ~500K rows), but it's not. $ time psql tci -c "explain analyze select * from account.acct where custid = 257458" QUERY PLAN ----------------------------------------------------------------------- Seq Scan on acct (cost=0.00..7166.68 rows=2 width=71) (actual time=1047.122..1047.122 rows=0 loops=1) Filter: (custid = 257458) Total runtime: 1047.362 ms (3 rows) real 0m1.083s user 0m0.010s sys 0m0.000s If it is looking up the custid in account.acct for each row, that's, say, 1 seconds per lookup, for 4.7 million lookups, for, if my math is right (4,731,410 / 3600 / 24) 54 days. I suppose that tracks, but that doesn't make sense, given what you said about the fk checks, above. Of course, if I index the column and amend the query to say "where custid = 194752::bigint" I get back much saner numbers: QUERY PLAN ---------------------------------------------------------------------- Index Scan using ix_fk_acct_custid on acct (cost=0.00..3.34 rows=2 width=71) (actual time=0.126..0.141 rows=2 loops=1) Index Cond: (custid = 194752::bigint) Total runtime: 0.314 ms (3 rows) real 0m0.036s user 0m0.010s sys 0m0.000s Which would still take just under two days. $ time psql tci -c "explain analyze update account.cust set prodid = tempprod.prodid, subprodid = tempprod.subprodid where origid = tempprod.debtid" But if I'm not touching the column referenced from account.acct, why would it be looking there at all? I've got an explain analyze of the update running now, but until it finishes, I can't say for certain what it's doing. explain, alone, says: $ time psql tci -c "explain update account.cust set prodid = tempprod.prodid, subprodid = tempprod.subprodid where origid = tempprod.debtid;" QUERY PLAN --------------------------------------------------------------------- Merge Join (cost=0.00..232764.69 rows=4731410 width=252) Merge Cond: (("outer".origid)::text = ("inner".debtid)::text) -> Index Scan using ix_origid on cust (cost=0.00..94876.83 rows=4731410 width=236) -> Index Scan using ix_debtid on tempprod (cost=0.00..66916.71 rows=4731410 width=26) (4 rows) real 0m26.965s user 0m0.010s sys 0m0.000s which shows it not hitting account.acct at all. (And why did it take the planner 20-some seconds to come up with that query plan?) tempprod doesn't have an index either, but then it doesn't reference account.cust; instead, the update would be done by joining the two on debtid/origid, which map one-to-one, are both indexed, and with both tables clustered on those indices--exactly as was the CREATE TABLE AS Aaron suggested elsethread. Unfortunately, this isn't the only large update we'll have to do. We receive a daily, ~100K rows file that may have new values for any field of any row in account.cust, .acct or sundry other tables. The process of updating from that file is time-critical; it must run in minutes, at the outside. /rls -- Rosser Schwarz Total Card, Inc.
pgsql-performance by date: