atrocious update performance - Mailing list pgsql-performance
From | Rosser Schwarz |
---|---|
Subject | atrocious update performance |
Date | |
Msg-id | 001401c40acc$1e4e7180$2500fa0a@CardServices.TCI.com Whole thread Raw |
Responses |
Re: atrocious update performance
Re: atrocious update performance |
List | pgsql-performance |
We're in the throes of an MS SQL to PostgreSQL migration; our databases include a number of ~5M row tables. We decided to take this opportunity to clean up and slightly re-normalize our schemas, given what we've learned about the data over its lifetime and such, else we wouldn't be experiencing any of the following (we could instead just dump and `copy from`). We have a temporary table, public.tempprod, containing 4.7M rows, one for each row in account.cust. account.cust has, among others, two columns, prod and subprod, which we're trying to update from tempprod joined against prod. The update tends to take unnecessarily long-- rather, we've had to finally kill it after its taking obscenely too long. The table: # \d account.cust Table "account.cust" Column | Type | Modifiers -----------+-----------------------------+---------------------------------- ---- custid | bigint | not null default | | nextval('account.custid_seq'::text) ownerid | integer | not null origid | text | not null pname | text | fname | text | mname | text | lname | text | suffix | text | addr1 | text | addr2 | text | addr3 | text | city | text | state | text | zip | text | zipplus | text | homeph | text | workph | text | otherph | text | ssn | text | isactive | boolean | default true createddt | timestamp without time zone | default now() prodid | bigint | subprodid | bigint | Indexes: "cust_pkey" primary key, btree (custid) "ix_addr1" btree (addr1) WHERE (addr1 IS NOT NULL) "ix_addr2" btree (addr2) WHERE (addr2 IS NOT NULL) "ix_city" btree (city) WHERE (city IS NOT NULL) "ix_fname" btree (fname) WHERE (fname IS NOT NULL) "ix_homeph" btree (homeph) WHERE (homeph IS NOT NULL) "ix_lname" btree (lname) WHERE (lname IS NOT NULL) "ix_mname" btree (mname) WHERE (mname IS NOT NULL) "ix_origid" btree (origid) "ix_ssn" btree (ssn) WHERE (ssn IS NOT NULL) "ix_state" btree (state) WHERE (state IS NOT NULL) "ix_workph" btree (workph) WHERE (workph IS NOT NULL) "ix_zip" btree (zip) WHERE (zip IS NOT NULL) We're currently running on a dual Xeon 700 (I know, I know; it's what we've got) with 2.5GB RAM and 4x36GB SCSI in hardware RAID 5 (Dell Perc3 something-or-other controller). If we can demonstrate that PostgreSQL will meet our needs, we'll be going production on a dual Opteron, maxed memory, with a 12-disk Fibre Channel array. The query is: update account.cust set prodid = (select p.prodid from account.prod p join public.tempprod t on t.pool = p.origid where custid = t.did) And then, upon its completion, s/prod/subprod/. That shouldn't run overnight, should it, let alone for -days-? In experimenting with ways of making the updates take less time, we tried adding product and subproduct columns to tempprod, and updating those. That seemed to work marginally better: explain analyze update public.tempprod set prodid = (select account.prod.prodid::bigint from account.prod where public.tempprod.pool::text = account.prod.origid::text) Seq Scan on tempprod (cost=0.00..9637101.35 rows 4731410 width=56) (actual time=24273.467..16090470.438 rows=4731410 loops=1) SubPlan -> Limit (cost=0.00..2.02 rows=2 width=8) (actual time=0.134..0.315 rows=1 loops=4731410) -> Seq Scan on prod (cost=0.00..2.02 rows=2 width=8) (actual time=0.126..0.305 rows=1 loops=4731410) Filter: (($0)::text = (origid)::text) Total runtime: 2284551.962 ms But then going from public.tempprod to account.cust again takes days. I just cancelled an update that's been running since last Thursday. Alas, given how long the queries take to run, I can't supply an `explain analyze`. The `explain` seems reasonable enough: # explain update account.cust set prodid = tempprod.prodid where tempprod.did = origid; Merge Join (cost=0.00..232764.69 rows=4731410 width=252) Merge Cond: (("outer".origid)::text = ("inner".did)::text) -> Index Scan using ix_origid on cust (cost=0.00..94876.83 rows=4731410 width=244) -> Index Scan using ix_did on tempprod (cost=0.00..66916.71 rows=4731410 width=18) The relevant bits from my postgreql.conf (note, we built with a BLCKSZ of 16K): shared_buffers = 4096 sort_mem = 32768 vacuum_mem = 32768 wal_buffers = 16384 checkpoint_segments = 64 checkpoint_timeout = 1800 checkpoint_warning = 30 commit_delay = 50000 effective_cache_size = 131072 Any advice, suggestions or comments of the "You bleeding idiot, why do you have frob set to x?!" sort welcome. Unfortunately, if we can't improve this, significantly, the powers what be will probably pass on PostgreSQL, even though everything we've done so far--with this marked exception--performs pretty spectacularly, all told. /rls -- Rosser Schwarz Total Card, Inc.
pgsql-performance by date: