Thread: atrocious update performance
We're in the throes of an MS SQL to PostgreSQL migration; our databases include a number of ~5M row tables. We decided to take this opportunity to clean up and slightly re-normalize our schemas, given what we've learned about the data over its lifetime and such, else we wouldn't be experiencing any of the following (we could instead just dump and `copy from`). We have a temporary table, public.tempprod, containing 4.7M rows, one for each row in account.cust. account.cust has, among others, two columns, prod and subprod, which we're trying to update from tempprod joined against prod. The update tends to take unnecessarily long-- rather, we've had to finally kill it after its taking obscenely too long. The table: # \d account.cust Table "account.cust" Column | Type | Modifiers -----------+-----------------------------+---------------------------------- ---- custid | bigint | not null default | | nextval('account.custid_seq'::text) ownerid | integer | not null origid | text | not null pname | text | fname | text | mname | text | lname | text | suffix | text | addr1 | text | addr2 | text | addr3 | text | city | text | state | text | zip | text | zipplus | text | homeph | text | workph | text | otherph | text | ssn | text | isactive | boolean | default true createddt | timestamp without time zone | default now() prodid | bigint | subprodid | bigint | Indexes: "cust_pkey" primary key, btree (custid) "ix_addr1" btree (addr1) WHERE (addr1 IS NOT NULL) "ix_addr2" btree (addr2) WHERE (addr2 IS NOT NULL) "ix_city" btree (city) WHERE (city IS NOT NULL) "ix_fname" btree (fname) WHERE (fname IS NOT NULL) "ix_homeph" btree (homeph) WHERE (homeph IS NOT NULL) "ix_lname" btree (lname) WHERE (lname IS NOT NULL) "ix_mname" btree (mname) WHERE (mname IS NOT NULL) "ix_origid" btree (origid) "ix_ssn" btree (ssn) WHERE (ssn IS NOT NULL) "ix_state" btree (state) WHERE (state IS NOT NULL) "ix_workph" btree (workph) WHERE (workph IS NOT NULL) "ix_zip" btree (zip) WHERE (zip IS NOT NULL) We're currently running on a dual Xeon 700 (I know, I know; it's what we've got) with 2.5GB RAM and 4x36GB SCSI in hardware RAID 5 (Dell Perc3 something-or-other controller). If we can demonstrate that PostgreSQL will meet our needs, we'll be going production on a dual Opteron, maxed memory, with a 12-disk Fibre Channel array. The query is: update account.cust set prodid = (select p.prodid from account.prod p join public.tempprod t on t.pool = p.origid where custid = t.did) And then, upon its completion, s/prod/subprod/. That shouldn't run overnight, should it, let alone for -days-? In experimenting with ways of making the updates take less time, we tried adding product and subproduct columns to tempprod, and updating those. That seemed to work marginally better: explain analyze update public.tempprod set prodid = (select account.prod.prodid::bigint from account.prod where public.tempprod.pool::text = account.prod.origid::text) Seq Scan on tempprod (cost=0.00..9637101.35 rows 4731410 width=56) (actual time=24273.467..16090470.438 rows=4731410 loops=1) SubPlan -> Limit (cost=0.00..2.02 rows=2 width=8) (actual time=0.134..0.315 rows=1 loops=4731410) -> Seq Scan on prod (cost=0.00..2.02 rows=2 width=8) (actual time=0.126..0.305 rows=1 loops=4731410) Filter: (($0)::text = (origid)::text) Total runtime: 2284551.962 ms But then going from public.tempprod to account.cust again takes days. I just cancelled an update that's been running since last Thursday. Alas, given how long the queries take to run, I can't supply an `explain analyze`. The `explain` seems reasonable enough: # explain update account.cust set prodid = tempprod.prodid where tempprod.did = origid; Merge Join (cost=0.00..232764.69 rows=4731410 width=252) Merge Cond: (("outer".origid)::text = ("inner".did)::text) -> Index Scan using ix_origid on cust (cost=0.00..94876.83 rows=4731410 width=244) -> Index Scan using ix_did on tempprod (cost=0.00..66916.71 rows=4731410 width=18) The relevant bits from my postgreql.conf (note, we built with a BLCKSZ of 16K): shared_buffers = 4096 sort_mem = 32768 vacuum_mem = 32768 wal_buffers = 16384 checkpoint_segments = 64 checkpoint_timeout = 1800 checkpoint_warning = 30 commit_delay = 50000 effective_cache_size = 131072 Any advice, suggestions or comments of the "You bleeding idiot, why do you have frob set to x?!" sort welcome. Unfortunately, if we can't improve this, significantly, the powers what be will probably pass on PostgreSQL, even though everything we've done so far--with this marked exception--performs pretty spectacularly, all told. /rls -- Rosser Schwarz Total Card, Inc.
> # explain update account.cust set prodid = tempprod.prodid > where tempprod.did = origid; > > Merge Join (cost=0.00..232764.69 rows=4731410 width=252) > Merge Cond: (("outer".origid)::text = ("inner".did)::text) > -> Index Scan using ix_origid on cust (cost=0.00..94876.83 > rows=4731410 width=244) > -> Index Scan using ix_did on tempprod (cost=0.00..66916.71 > rows=4731410 width=18) I'm going to hazard a guess and say you have a number of foreign keys that refer to account.cust.prodid? This is probably the time consuming part -- perhaps even a missing index on one of those keys that refers to this field. Going the other way should be just as good for your purposes, and much faster since you're not updating several foreign key'd fields bound to account.cust.prodid. UPDATE tempprod.prodid = prodid FROM account.cust WHERE temprod.did = cust.origid;
> > # explain update account.cust set prodid = tempprod.prodid > > where tempprod.did = origid; > > Merge Join (cost=0.00..232764.69 rows=4731410 width=252) > > Merge Cond: (("outer".origid)::text = ("inner".did)::text) > > -> Index Scan using ix_origid on cust (cost=0.00..94876.83 > > rows=4731410 width=244) > > -> Index Scan using ix_did on tempprod (cost=0.00..66916.71 > > rows=4731410 width=18) > I'm going to hazard a guess and say you have a number of foreign keys > that refer to account.cust.prodid? This is probably the time consuming > part -- perhaps even a missing index on one of those keys > that refers to > this field. Actually, there are no foreign keys to those columns. Once they're populated, I'll apply a foreign key constraint and they'll refer to the appropriate row in the prod and subprod tables, but nothing will reference account.cust.[sub]prodid. There are, of course, several foreign keys referencing account.cust.custid. > Going the other way should be just as good for your purposes, and much > faster since you're not updating several foreign key'd fields bound to > account.cust.prodid. > UPDATE tempprod.prodid = prodid > FROM account.cust > WHERE temprod.did = cust.origid; Not quite. Without this update, acount.cust.[sub]prodid are null. The data was strewn across multiple tables in MS SQL; we're normalizing it into one, hence the need to populate the two columns independently. /rls -- Rosser Schwarz Total Card, Inc.
Bulk updates are generally dogs (not just in pg), so I avoid doing them by doing faster selects and inserts. You can create a new table using 'create table as' to produce your target results. This is real fast - avoiding the row iteration in insert, allowing the select optimizer to run and no index overhead. Then alter/rename, add indexes and whatever else hangs off the table (or if you're lazy do an insert/select into the original target table). I often see 2 orders of magnitude improvement doing this, and no need to vacuum. /Aaron ----- Original Message ----- From: "Rosser Schwarz" <rschwarz@totalcardinc.com> To: <pgsql-performance@postgresql.org> Sent: Monday, March 15, 2004 3:28 PM Subject: [PERFORM] atrocious update performance We're in the throes of an MS SQL to PostgreSQL migration; our databases include a number of ~5M row tables. We decided to take this opportunity to clean up and slightly re-normalize our schemas, given what we've learned about the data over its lifetime and such, else we wouldn't be experiencing any of the following (we could instead just dump and `copy from`). We have a temporary table, public.tempprod, containing 4.7M rows, one for each row in account.cust. account.cust has, among others, two columns, prod and subprod, which we're trying to update from tempprod joined against prod. The update tends to take unnecessarily long-- rather, we've had to finally kill it after its taking obscenely too long. The table: # \d account.cust Table "account.cust" Column | Type | Modifiers -----------+-----------------------------+---------------------------------- ---- custid | bigint | not null default | | nextval('account.custid_seq'::text) ownerid | integer | not null origid | text | not null pname | text | fname | text | mname | text | lname | text | suffix | text | addr1 | text | addr2 | text | addr3 | text | city | text | state | text | zip | text | zipplus | text | homeph | text | workph | text | otherph | text | ssn | text | isactive | boolean | default true createddt | timestamp without time zone | default now() prodid | bigint | subprodid | bigint | Indexes: "cust_pkey" primary key, btree (custid) "ix_addr1" btree (addr1) WHERE (addr1 IS NOT NULL) "ix_addr2" btree (addr2) WHERE (addr2 IS NOT NULL) "ix_city" btree (city) WHERE (city IS NOT NULL) "ix_fname" btree (fname) WHERE (fname IS NOT NULL) "ix_homeph" btree (homeph) WHERE (homeph IS NOT NULL) "ix_lname" btree (lname) WHERE (lname IS NOT NULL) "ix_mname" btree (mname) WHERE (mname IS NOT NULL) "ix_origid" btree (origid) "ix_ssn" btree (ssn) WHERE (ssn IS NOT NULL) "ix_state" btree (state) WHERE (state IS NOT NULL) "ix_workph" btree (workph) WHERE (workph IS NOT NULL) "ix_zip" btree (zip) WHERE (zip IS NOT NULL) We're currently running on a dual Xeon 700 (I know, I know; it's what we've got) with 2.5GB RAM and 4x36GB SCSI in hardware RAID 5 (Dell Perc3 something-or-other controller). If we can demonstrate that PostgreSQL will meet our needs, we'll be going production on a dual Opteron, maxed memory, with a 12-disk Fibre Channel array. The query is: update account.cust set prodid = (select p.prodid from account.prod p join public.tempprod t on t.pool = p.origid where custid = t.did) And then, upon its completion, s/prod/subprod/. That shouldn't run overnight, should it, let alone for -days-? In experimenting with ways of making the updates take less time, we tried adding product and subproduct columns to tempprod, and updating those. That seemed to work marginally better: explain analyze update public.tempprod set prodid = (select account.prod.prodid::bigint from account.prod where public.tempprod.pool::text = account.prod.origid::text) Seq Scan on tempprod (cost=0.00..9637101.35 rows 4731410 width=56) (actual time=24273.467..16090470.438 rows=4731410 loops=1) SubPlan -> Limit (cost=0.00..2.02 rows=2 width=8) (actual time=0.134..0.315 rows=1 loops=4731410) -> Seq Scan on prod (cost=0.00..2.02 rows=2 width=8) (actual time=0.126..0.305 rows=1 loops=4731410) Filter: (($0)::text = (origid)::text) Total runtime: 2284551.962 ms But then going from public.tempprod to account.cust again takes days. I just cancelled an update that's been running since last Thursday. Alas, given how long the queries take to run, I can't supply an `explain analyze`. The `explain` seems reasonable enough: # explain update account.cust set prodid = tempprod.prodid where tempprod.did = origid; Merge Join (cost=0.00..232764.69 rows=4731410 width=252) Merge Cond: (("outer".origid)::text = ("inner".did)::text) -> Index Scan using ix_origid on cust (cost=0.00..94876.83 rows=4731410 width=244) -> Index Scan using ix_did on tempprod (cost=0.00..66916.71 rows=4731410 width=18) The relevant bits from my postgreql.conf (note, we built with a BLCKSZ of 16K): shared_buffers = 4096 sort_mem = 32768 vacuum_mem = 32768 wal_buffers = 16384 checkpoint_segments = 64 checkpoint_timeout = 1800 checkpoint_warning = 30 commit_delay = 50000 effective_cache_size = 131072 Any advice, suggestions or comments of the "You bleeding idiot, why do you have frob set to x?!" sort welcome. Unfortunately, if we can't improve this, significantly, the powers what be will probably pass on PostgreSQL, even though everything we've done so far--with this marked exception--performs pretty spectacularly, all told. /rls -- Rosser Schwarz Total Card, Inc. ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
On Mon, 2004-03-15 at 16:15, Rosser Schwarz wrote: > > > # explain update account.cust set prodid = tempprod.prodid > > > where tempprod.did = origid; > > > > Merge Join (cost=0.00..232764.69 rows=4731410 width=252) > > > Merge Cond: (("outer".origid)::text = ("inner".did)::text) > > > -> Index Scan using ix_origid on cust (cost=0.00..94876.83 > > > rows=4731410 width=244) > > > -> Index Scan using ix_did on tempprod (cost=0.00..66916.71 > > > rows=4731410 width=18) > > > I'm going to hazard a guess and say you have a number of foreign keys > > that refer to account.cust.prodid? This is probably the time consuming > > part -- perhaps even a missing index on one of those keys > > that refers to > > this field. > > Actually, there are no foreign keys to those columns. Once they're > populated, I'll apply a foreign key constraint and they'll refer to the > appropriate row in the prod and subprod tables, but nothing will > reference account.cust.[sub]prodid. There are, of course, several foreign > keys referencing account.cust.custid. If there are no feign keys to it, I wouldn't expect it to take more than 10 minutes on slow hardware. Fresh out of ideas here.
> You can create a new table using 'create table as' to produce your > target results. This is real fast ... > I often see 2 orders of magnitude improvement doing this, and no > need to vacuum. Indeed: "Query returned successfully with no result in 582761 ms." Though I must say, ten minutes is nominally more than two orders of mangitude performance improvement, versus several days. Many thanks, Aaron. /rls -- Rosser Schwarz Total Card, Inc.
"Rosser Schwarz" <rschwarz@totalcardinc.com> writes: >> You can create a new table using 'create table as' to produce your >> target results. This is real fast ... >> I often see 2 orders of magnitude improvement doing this, and no >> need to vacuum. > Indeed: > "Query returned successfully with no result in 582761 ms." > Though I must say, ten minutes is nominally more than two orders of > mangitude performance improvement, versus several days. Hm. There is no way that inserting a row is two orders of magnitude faster than updating a row --- they both require storing a new row and making whatever index entries are needed. The only additional cost of the update is finding the old row (not a very big deal AFAICS in the examples you gave) and marking it deleted (definitely cheap). So there's something awfully fishy going on here. I'm inclined to suspect an issue with foreign-key checking. You didn't give us any details about foreign key relationships your "cust" table is involved in --- could we see those? And the schemas of the other tables involved? Also, exactly which PG version is this? regards, tom lane
The original point was about a very slow update of an entire table with a plan that looped, and over a dozen conditional indices - vs. a 'create as' in a CPU starved environment. I stand by my statement about observing the orders of magnitude difference. In theory I agree that the update should be in the same order of magnitude as the create as, but in practice I disagree. I also think something is wrong on the logical side (besides FKs, are there any triggers?) but was responding to the Gordian knot issue of bailing out of pg. Can you post a sample extract, Rosser? Otherwise, I'll try to put together a sample of a slow mass join update. /Aaron ----- Original Message ----- From: "Tom Lane" <tgl@sss.pgh.pa.us> To: "Rosser Schwarz" <rschwarz@totalcardinc.com> Cc: <pgsql-performance@postgresql.org> Sent: Monday, March 15, 2004 7:08 PM Subject: Re: [PERFORM] atrocious update performance > "Rosser Schwarz" <rschwarz@totalcardinc.com> writes: > >> You can create a new table using 'create table as' to produce your > >> target results. This is real fast ... > >> I often see 2 orders of magnitude improvement doing this, and no > >> need to vacuum. > > > Indeed: > > "Query returned successfully with no result in 582761 ms." > > Though I must say, ten minutes is nominally more than two orders of > > mangitude performance improvement, versus several days. > > Hm. There is no way that inserting a row is two orders of magnitude > faster than updating a row --- they both require storing a new row and > making whatever index entries are needed. The only additional cost of > the update is finding the old row (not a very big deal AFAICS in the > examples you gave) and marking it deleted (definitely cheap). So > there's something awfully fishy going on here. > > I'm inclined to suspect an issue with foreign-key checking. You didn't > give us any details about foreign key relationships your "cust" table is > involved in --- could we see those? And the schemas of the other tables > involved? > > Also, exactly which PG version is this? > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly >
"Rosser Schwarz" <rschwarz@totalcardinc.com> writes: > Actually, there are no foreign keys to those columns. Once they're > populated, I'll apply a foreign key constraint and they'll refer to the > appropriate row in the prod and subprod tables, but nothing will > reference account.cust.[sub]prodid. There are, of course, several foreign > keys referencing account.cust.custid. Just to be clear, the foreign key constraints they're worrying about are not constraints on the table you're updating. They're constraints on other tables referring to the table you're updating. Since you're updating the column here postgres has to be sure nothing is referring to the old value you're obliterating, and to do that it has to select for possible records in the referencing tables referring to the value. If there are any references in other tables referring to this column then you need an index on the column in the referencing table to be able to update the column in referenced table efficiently. -- greg
Rosser Schwarz wrote: > shared_buffers = 4096 > sort_mem = 32768 > vacuum_mem = 32768 > wal_buffers = 16384 > checkpoint_segments = 64 > checkpoint_timeout = 1800 > checkpoint_warning = 30 > commit_delay = 50000 > effective_cache_size = 131072 You didn't mention the OS so I would take it as either linux/freeBSD. First of all, your shared buffers are low. 4096 is 64MB with 16K block size. I would say at least push them to 150-200MB. Secondly your sort mem is too high. Note that it is per sort per query. You could build a massive swap storm with such a setting. Similarly pull down vacuum and WAL buffers to around 512-1024 each. I know that your problem is solved by using insert rather than updates. But I just want to point out that you still need to analyze the table to update the statistics or the further queres will not be exactly good. And lastly, you can bundle entire thing including creating duplicate table, populating it, renaming original table etc in a single transaction and nobody will notice it. I am almost sure MS-SQL can not do that. Not many databases have trasact-safe DDLs out there.. HTH Shridhar
Shridhar Daithankar <shridhar@frodo.hserus.net> writes: > Rosser Schwarz wrote: >> shared_buffers = 4096 >> sort_mem = 32768 >> vacuum_mem = 32768 >> wal_buffers = 16384 >> checkpoint_segments = 64 >> checkpoint_timeout = 1800 >> checkpoint_warning = 30 >> commit_delay = 50000 >> effective_cache_size = 131072 > First of all, your shared buffers are low. 4096 is 64MB with 16K block > size. I would say at least push them to 150-200MB. Check. Much more than that isn't necessarily better though. shared_buffers = 10000 is frequently mentioned as a "sweet spot". > Secondly your sort mem is too high. Note that it is per sort per query. You > could build a massive swap storm with such a setting. Agreed, but I doubt that has anything to do with the immediate problem, since he's not testing parallel queries. > Similarly pull down vacuum and WAL buffers to around 512-1024 each. The vacuum_mem setting here is 32Mb, which seems okay to me, if not on the low side. Again though it's not his immediate problem. I agree that the wal_buffers setting is outlandishly large; I can't see any plausible reason for it to be more than a few dozen. I don't know whether oversized wal_buffers can directly cause any performance issues, but it's certainly not a well-tested scenario. The other setting I was going to comment on is checkpoint_warning; it seems mighty low in comparison to checkpoint_timeout. If you are targeting a checkpoint every half hour, I'd think you'd want the system to complain about checkpoints spaced more closely than several minutes. But with the possible exception of wal_buffers, I can't see anything in these settings that explains the originally complained-of performance problem. I'm still wondering about foreign key checks. regards, tom lane
while you weren't looking, Tom Lane wrote: > But with the possible exception of wal_buffers, I can't see > anything in > these settings that explains the originally complained-of performance > problem. I'm still wondering about foreign key checks. Many of the configs I posted were fairly wild values, set to gather data points for further tweaking. Unfortunately, with this query there hasn't been time for many passes, and I've too much else on my plate to try concocting demonstration cases. The postmaster's been hupped with more sane values, but I experienced this same issue with the defaults. As for foreign keys, three tables refer to account.cust; all of them refer to account.cust.custid, the pk. One of those tables has several hundred thousand rows, many more to come; the others are empty. Unless I've woefully misunderstood, the presence or absence of a foreign key referring to one column should be moot for updates writing another column, shouldn't it? To answer your (and others') question, Tom, 7.4.1 on 2.4.20-18.9smp. Red Hat, I believe. I was handed the machine, which is also in use for lightweight production stuff: intranet webserver, rinky-dink MySQL doo-dads, &c. I'm sure that has an impact, usurping the disk heads and such--maybe even more than I'd expect--but I can't imagine that'd cause an update to one 4.7M row table, from another 4.7M row table, both clustered on a join column that maps one-to-one between them, to take days. I'm baffled; everything else is perfectly snappy, given the hardware. Anything requiring a sequential scan over one of the big tables is a slog, but that's to be expected and hence all the indices. Watching iostat, I've observed a moderately cyclic read-big, write- big pattern, wavelengths generally out of phase, interspersed with smaller, almost epicycles--from the machine's other tasks, I'm sure. top has postmaster's cpu usage rarely breaking 25% over the course of the query's execution, and spending most of its time much lower; memory usage hovers somewhere north of 500MB. In what little time I had to stare at a disturbingly matrix-esque array of terminals scrolling sundry metrics, I didn't notice a correlation between cpu usage spikes and peaks in the IO cycle's waveforms. For whatever that's worth. The other tables involved are: # \d account.acct Table "account.acct" Column | Type | Modifiers ------------+-----------------------------+--------------------------------- ---- acctid | bigint | not null default | nextval('account.acctid_seq'::text) custid | bigint | acctstatid | integer | not null acctno | character varying(50) | bal | money | begdt | timestamp without time zone | not null enddt | timestamp without time zone | debtid | character varying(50) | Indexes: "acct_pkey" primary key, btree (acctid) "ix_acctno" btree (acctno) WHERE (acctno IS NOT NULL) Foreign-key constraints: "$1" FOREIGN KEY (custid) REFERENCES account.cust(custid) ON UPDATE CASCADE ON DELETE RESTRICT "$2" FOREIGN KEY (acctstatid) REFERENCES account.acctstat(acctstatid) ON UPDATE CASCADE ON DELETE RESTRICT # \d account.note Table "account.note" Column | Type | Modifiers -----------+-----------------------------+---------------------------------- --- noteid | bigint | not null default | nextval('account.noteid_seq'::text) custid | bigint | not null note | text | not null createddt | timestamp without time zone | not null default now() Indexes: "note_pkey" primary key, btree (noteid) Foreign-key constraints: "$1" FOREIGN KEY (custid) REFERENCES account.cust(custid) ON UPDATE CASCADE ON DELETE RESTRICT # \d account.origacct Table "account.origacct" Column | Type | Modifiers -------------+-----------------------------+----------- custid | bigint | lender | character varying(50) | chgoffdt | timestamp without time zone | opendt | timestamp without time zone | offbureaudt | timestamp without time zone | princbal | money | intbal | money | totbal | money | lastpayamt | money | lastpaydt | timestamp without time zone | debttype | integer | debtid | character varying(10) | acctno | character varying(50) | Foreign-key constraints: "$1" FOREIGN KEY (custid) REFERENCES account.cust(custid) ON UPDATE CASCADE ON DELETE RESTRICT And the table we were joining to get the new values for prodid and subprodid: # \d tempprod Table "public.tempprod" Column | Type | Modifiers -----------+-----------------------+----------- debtid | character varying(10) | not null pool | character varying(10) | not null port | character varying(10) | not null subprodid | bigint | prodid | bigint | Indexes: "ix_debtid" btree (debtid) /rls -- Rosser Schwarz Total Card, Inc.
"Rosser Schwarz" <rschwarz@totalcardinc.com> writes: > As for foreign keys, three tables refer to account.cust; all of them > refer to account.cust.custid, the pk. One of those tables has several > hundred thousand rows, many more to come; the others are empty. Unless > I've woefully misunderstood, the presence or absence of a foreign key > referring to one column should be moot for updates writing another > column, shouldn't it? Well, that is the crux of the issue, and also why I was asking about versions. It's only been since 7.3.4 or so that we skip checking FKs on update. Looking at the code, though, the update check is only skipped if the previous version of the row predates the current transaction. (Otherwise we can't be sure that the value was ever checked.) This means that slow FK checks could be your problem if the application is set up to issue multiple UPDATEs affecting the same row(s) during a single transaction. I'm not clear on whether that applies to you or not. And anyway the bottom line is: have you got indexes on the columns *referencing* account.cust.custid? If not you probably ought to add them, since without them there will definitely be some slow cases. regards, tom lane
while you weren't looking, Tom Lane wrote: > ...slow FK checks could be your problem if the application is set > up to issue multiple UPDATEs affecting the same row(s) during a > single transaction. I'm not clear on whether that applies to you > or not. It shouldn't. It's just one large batch update that should be hitting every row serially. > And anyway the bottom line is: have you got indexes on the columns > *referencing* account.cust.custid? No. I'd've sworn I had one on account.acct.custid, since that table is popupated (currently ~500K rows), but it's not. $ time psql tci -c "explain analyze select * from account.acct where custid = 257458" QUERY PLAN ----------------------------------------------------------------------- Seq Scan on acct (cost=0.00..7166.68 rows=2 width=71) (actual time=1047.122..1047.122 rows=0 loops=1) Filter: (custid = 257458) Total runtime: 1047.362 ms (3 rows) real 0m1.083s user 0m0.010s sys 0m0.000s If it is looking up the custid in account.acct for each row, that's, say, 1 seconds per lookup, for 4.7 million lookups, for, if my math is right (4,731,410 / 3600 / 24) 54 days. I suppose that tracks, but that doesn't make sense, given what you said about the fk checks, above. Of course, if I index the column and amend the query to say "where custid = 194752::bigint" I get back much saner numbers: QUERY PLAN ---------------------------------------------------------------------- Index Scan using ix_fk_acct_custid on acct (cost=0.00..3.34 rows=2 width=71) (actual time=0.126..0.141 rows=2 loops=1) Index Cond: (custid = 194752::bigint) Total runtime: 0.314 ms (3 rows) real 0m0.036s user 0m0.010s sys 0m0.000s Which would still take just under two days. $ time psql tci -c "explain analyze update account.cust set prodid = tempprod.prodid, subprodid = tempprod.subprodid where origid = tempprod.debtid" But if I'm not touching the column referenced from account.acct, why would it be looking there at all? I've got an explain analyze of the update running now, but until it finishes, I can't say for certain what it's doing. explain, alone, says: $ time psql tci -c "explain update account.cust set prodid = tempprod.prodid, subprodid = tempprod.subprodid where origid = tempprod.debtid;" QUERY PLAN --------------------------------------------------------------------- Merge Join (cost=0.00..232764.69 rows=4731410 width=252) Merge Cond: (("outer".origid)::text = ("inner".debtid)::text) -> Index Scan using ix_origid on cust (cost=0.00..94876.83 rows=4731410 width=236) -> Index Scan using ix_debtid on tempprod (cost=0.00..66916.71 rows=4731410 width=26) (4 rows) real 0m26.965s user 0m0.010s sys 0m0.000s which shows it not hitting account.acct at all. (And why did it take the planner 20-some seconds to come up with that query plan?) tempprod doesn't have an index either, but then it doesn't reference account.cust; instead, the update would be done by joining the two on debtid/origid, which map one-to-one, are both indexed, and with both tables clustered on those indices--exactly as was the CREATE TABLE AS Aaron suggested elsethread. Unfortunately, this isn't the only large update we'll have to do. We receive a daily, ~100K rows file that may have new values for any field of any row in account.cust, .acct or sundry other tables. The process of updating from that file is time-critical; it must run in minutes, at the outside. /rls -- Rosser Schwarz Total Card, Inc.
"Rosser Schwarz" <rschwarz@totalcardinc.com> writes: > But if I'm not touching the column referenced from account.acct, why > would it be looking there at all? I've got an explain analyze of the > update running now, but until it finishes, I can't say for certain > what it's doing. explain, alone, says: EXPLAIN won't tell you anything about triggers that might get fired during the UPDATE, so it's not much help for investigating possible FK performance problems. EXPLAIN ANALYZE will give you some indirect evidence: the difference between the total query time and the total time reported for the topmost plan node represents the time spent running triggers and physically updating the tuples. I suspect we are going to see a big difference. > which shows it not hitting account.acct at all. (And why did it take > the planner 20-some seconds to come up with that query plan?) It took 20 seconds to EXPLAIN? That's pretty darn odd in itself. I'm starting to think there must be something quite whacked-out about your installation, but I haven't got any real good ideas about what. (I'm assuming of course that there weren't a ton of other jobs eating CPU while you tried to do the EXPLAIN.) [ thinks for awhile... ] The only theory that comes to mind for making the planner so slow is oodles of dead tuples in pg_statistic. Could I trouble you to run vacuum full verbose pg_statistic; and send along the output? regards, tom lane
while you weren't looking, Tom Lane wrote: > EXPLAIN won't tell you anything about triggers that might get fired > during the UPDATE, so it's not much help for investigating possible > FK performance problems. EXPLAIN ANALYZE will give you some indirect > evidence: the difference between the total query time and the total time > reported for the topmost plan node represents the time spent running > triggers and physically updating the tuples. I suspect we are going > to see a big difference. It's still running. > It took 20 seconds to EXPLAIN? That's pretty darn odd in itself. It struck me, too. > I'm starting to think there must be something quite whacked-out about > your installation, but I haven't got any real good ideas about what. Built from source. configure arguments: ./configure --prefix=/var/postgresql --bindir=/usr/bin --enable-thread-safety --with-perl --with-python --with-openssl --with-krb5=/usr/kerberos I can answer more specific questions; otherwise, I'm not sure what to look for, either. If we could take the machine out of production (oh, hell; I think I just volunteered myself for weekend work) long enough to reinstall everything to get a fair comparison... So far as I know, though, it's a more or less stock Red Hat. 2.4.20- something. > (I'm assuming of course that there weren't a ton of other jobs eating > CPU while you tried to do the EXPLAIN.) CPU's spiked sopradically, which throttled everything else, but it never stays high. top shows the current explain analyze running between 50- ish% and negligible. iostat -k 3 shows an average of 3K/sec written, for a hundred-odd tps. I can't get any finer-grained than that, unfortunately; the machine was handed to me with a single, contiguous filesystem, in production use. > [ thinks for awhile... ] The only theory that comes to mind > for making > the planner so slow is oodles of dead tuples in pg_statistic. Could I > trouble you to run > vacuum full verbose pg_statistic; > and send along the output? INFO: vacuuming "pg_catalog.pg_statistic" INFO: "pg_statistic": found 215 removable, 349 nonremovable row versions in 7 pages DETAIL: 0 dead row versions cannot be removed yet. Nonremovable row versions range from 72 to 8132 bytes long. There were 3 unused item pointers. Total free space (including removable row versions) is 91572 bytes. 0 pages are or will become empty, including 0 at the end of the table. 7 pages containing 91572 free bytes are potential move destinations. CPU 0.00s/0.00u sec elapsed 0.71 sec. INFO: index "pg_statistic_relid_att_index" now contains 349 row versions in 2 pages DETAIL: 215 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "pg_statistic": moved 120 row versions, truncated 7 to 5 pages DETAIL: CPU 0.03s/0.01u sec elapsed 0.17 sec. INFO: index "pg_statistic_relid_att_index" now contains 349 row versions in 2 pages DETAIL: 120 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: vacuuming "pg_toast.pg_toast_16408" INFO: "pg_toast_16408": found 12 removable, 12 nonremovable row versions in 5 pages DETAIL: 0 dead row versions cannot be removed yet. Nonremovable row versions range from 660 to 8178 bytes long. There were 0 unused item pointers. Total free space (including removable row versions) is 91576 bytes. 2 pages are or will become empty, including 0 at the end of the table. 5 pages containing 91576 free bytes are potential move destinations. CPU 0.00s/0.00u sec elapsed 0.27 sec. INFO: index "pg_toast_16408_index" now contains 12 row versions in 2 pages DETAIL: 12 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.05 sec. INFO: "pg_toast_16408": moved 10 row versions, truncated 5 to 3 pages DETAIL: CPU 0.00s/0.00u sec elapsed 0.01 sec. INFO: index "pg_toast_16408_index" now contains 12 row versions in 2 pages DETAIL: 10 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. Having never more than glanced at the output of "vacuum verbose", I can't say whether that makes the cut for oodles. My suspicion is no. /rls -- Rosser Schwarz Total Card, Inc.
"Rosser Schwarz" <rschwarz@totalcardinc.com> writes: > Having never more than glanced at the output of "vacuum verbose", I > can't say whether that makes the cut for oodles. My suspicion is no. Nope, it sure doesn't. We occasionally see people who don't know they need to vacuum regularly and have accumulated hundreds or thousands of dead tuples for every live one :-(. That's clearly not the issue here. I'm fresh out of ideas, and the fact that this is a live server kinda limits what we can do experimentally ... but clearly, *something* is very wrong. Well, when you don't know what to look for, you still have to look. One possibly useful idea is to trace the kernel calls of the backend process while it does that ridiculously long EXPLAIN --- think you could try that? regards, tom lane
while you weren't looking, Tom Lane wrote: [trace] `strace -p 21882` run behind the below query and plan ... below that. # explain update account.cust set prodid = tempprod.prodid, subprodid = tempprod.subprodid where origid = tempprod.debtid; QUERY PLAN ------------------------------------------------------------------------- Merge Join (cost=0.00..232764.69 rows=4731410 width=252) Merge Cond: (("outer".origid)::text = ("inner".debtid)::text) -> Index Scan using ix_origid on cust (cost=0.00..94876.83 rows=4731410 width=236) -> Index Scan using ix_debtid on tempprod (cost=0.00..66916.71 rows=4731410 width=26) (4 rows) ---------- recv(9, "Q\0\0\0}explain update account.cust"..., 8192, 0) = 126 gettimeofday({1079482151, 106228}, NULL) = 0 brk(0) = 0x82d9000 brk(0x82db000) = 0x82db000 open("/var/lib/pgsql/data/base/495616/6834170", O_RDWR|O_LARGEFILE) = 8 _llseek(8, 212402176, [212402176], SEEK_SET) = 0 write(8, "\342\1\0\0\0\314\374\6\24\0\0\0\214\7pG\360\177\1\200\320"..., 32768) = 32768 close(8) = 0 open("/var/lib/pgsql/data/base/495616/16635", O_RDWR|O_LARGEFILE) = 8 read(8, "\0\0\0\0\20\0\0\0\1\0\0\0\24\0\360\177\360\177\1\200b1"..., 32768) = 32768 open("/var/lib/pgsql/data/base/495616/6834168", O_RDWR|O_LARGEFILE) = 10 _llseek(10, 60817408, [60817408], SEEK_SET) = 0 write(10, "\342\1\0\0`\334\5\7\24\0\0\0t\0010x\360\177\1\200\330\377"..., 32768) = 32768 close(10) = 0 read(8, "\334\1\0\0h\217\270n\24\0\0\0H\0H|\360\177\1\200@\376\220"..., 32768) = 32768 open("/var/lib/pgsql/data/base/495616/6834165", O_RDWR|O_LARGEFILE) = 10 _llseek(10, 130777088, [130777088], SEEK_SET) = 0 write(10, "\342\1\0\0<\341\7\7\24\0\0\0004\t0I\360\177\1\200\330\377"..., 32768) = 32768 close(10) = 0 open("/var/lib/pgsql/data/base/495616/16595", O_RDWR|O_LARGEFILE) = 10 read(10, "\334\1\0\0\360\216\270n\24\0\0\0X\0@y\0\200\1\200\320\371"..., 32768) = 32768 open("/var/lib/pgsql/data/base/495616/6834168", O_RDWR|O_LARGEFILE) = 11 _llseek(11, 145915904, [145915904], SEEK_SET) = 0 write(11, "\342\1\0\0\300\350\n\7\24\0\0\0\224\6\310Z\360\177\1\200"..., 32768) = 32768 close(11) = 0 open("/var/lib/pgsql/data/base/495616/16614", O_RDWR|O_LARGEFILE) = 11 read(11, "\0\0\0\0\24\231P\306\16\0\0\0\24\0\360\177\360\177\1\200"..., 32768) = 32768 open("/var/lib/pgsql/data/base/495616/6834166", O_RDWR|O_LARGEFILE) = 12 _llseek(12, 148570112, [148570112], SEEK_SET) = 0 write(12, "\342\1\0\0\274\365\22\7\24\0\0\0X\3\234o\360\177\1\200"..., 32768) = 32768 close(12) = 0 _llseek(11, 98304, [98304], SEEK_SET) = 0 read(11, "\0\0\0\0\24\231P\306\16\0\0\0\34\0\234\177\360\177\1\200"..., 32768) = 32768 open("/var/lib/pgsql/data/base/495616/6834163", O_RDWR|O_LARGEFILE) = 12 _llseek(12, 251789312, [251789312], SEEK_SET) = 0 write(12, "\342\1\0\0l\366\23\7\24\0\0\0\364\10\260J\360\177\1\200"..., 32768) = 32768 close(12) = 0 _llseek(11, 32768, [32768], SEEK_SET) = 0 read(11, "\340\1\0\0\324\231\273\241\24\0\0\0\234\5\330\26\360\177"..., 32768) = 32768 open("/var/lib/pgsql/data/base/495616/6834165", O_RDWR|O_LARGEFILE) = 12 _llseek(12, 117309440, [117309440], SEEK_SET) = 0 write(12, "\342\1\0\0d\36)\7\24\0\0\0000\tHI\360\177\1\200\330\377"..., 32768) = 32768 close(12) = 0 open("/var/lib/pgsql/data/base/495616/1259", O_RDWR|O_LARGEFILE) = 12 _llseek(12, 32768, [32768], SEEK_SET) = 0 read(12, "\334\1\0\0\324v-p\24\0\0\0000\3\304\3\0\200\1\200<\377"..., 32768) = 32768 open("/var/lib/pgsql/data/base/495616/6834173", O_RDWR|O_LARGEFILE) = 13 _llseek(13, 247824384, [247824384], SEEK_SET) = 0 write(13, "\342\1\0\0h *\7\24\0\0\0\204\4dm\360\177\1\200\340\377"..., 32768) = 32768 close(13) = 0 open("/var/lib/pgsql/data/base/495616/16613", O_RDWR|O_LARGEFILE) = 13 read(13, "\0\0\0\0\20\0\0\0\1\0\0\0\24\0\360\177\360\177\1\200b1"..., 32768) = 32768 open("/var/lib/pgsql/data/base/495616/6834168", O_RDWR|O_LARGEFILE) = 14 _llseek(14, 204472320, [204472320], SEEK_SET) = 0 write(14, "\342\1\0\0\314\272:\7\24\0\0\0\324\t\354K\360\177\1\200"..., 32768) = 32768 close(14) = 0 read(13, "\340\1\0\0X\231\273\241\24\0\0\0\370\6Dk\360\177\1\200"..., 32768) = 32768 open("/var/lib/pgsql/data/base/495616/6834166", O_RDWR|O_LARGEFILE) = 14 _llseek(14, 152010752, [152010752], SEEK_SET) = 0 write(14, "\342\1\0\0p\277<\7\24\0\0\0\364\n\220I\360\177\1\200\334"..., 32768) = 32768 close(14) = 0 open("/var/lib/pgsql/data/base/495616/16610", O_RDWR|O_LARGEFILE) = 14 read(14, "\0\0\0\0\10\317\27\t\16\0\0\0\24\0\360\177\360\177\1\200"..., 32768) = 32768 open("/var/lib/pgsql/data/base/495616/6834170", O_RDWR|O_LARGEFILE) = 15 _llseek(15, 86441984, [86441984], SEEK_SET) = 0 write(15, "\342\1\0\0\330B?\7\24\0\0\0\370\6 N\360\177\1\200\310\377"..., 32768) = 32768 close(15) = 0 _llseek(14, 98304, [98304], SEEK_SET) = 0 read(14, "\340\1\0\0,l\257\241\24\0\0\0(\0\250\177\360\177\1\200"..., 32768) = 32768 open("/var/lib/pgsql/data/base/495616/6834166", O_RDWR|O_LARGEFILE) = 15 _llseek(15, 121896960, [121896960], SEEK_SET) = 0 write(15, "\342\1\0\0\264\303?\7\24\0\0\0\234\tHP\360\177\1\200\334"..., 32768) = 32768 close(15) = 0 _llseek(14, 65536, [65536], SEEK_SET) = 0 read(14, "\334\1\0\0\310u\252n\23\0\0\0\234\20\320=\360\177\1\200"..., 32768) = 32768 open("/var/lib/pgsql/data/base/495616/6834173", O_RDWR|O_LARGEFILE) = 15 _llseek(15, 41549824, [41549824], SEEK_SET) = 0 write(15, "\342\1\0\0\0\312B\7\24\0\0\0\234\7\350T\360\177\1\200\330"..., 32768) = 32768 close(15) = 0 open("/var/lib/pgsql/data/base/495616/1249", O_RDWR|O_LARGEFILE) = 15 _llseek(15, 229376, [229376], SEEK_SET) = 0 read(15, "O\1\0\0\214\241\200\0\23\0\0\0\364\3\0\4\0\200\1\200\200"..., 32768) = 32768 open("/var/lib/pgsql/data/base/495616/6834163", O_RDWR|O_LARGEFILE) = 16 _llseek(16, 57147392, [57147392], SEEK_SET) = 0 write(16, "\342\1\0\0004\320G\7\24\0\0\0\374\7\200P\360\177\1\200"..., 32768) = 32768 close(16) = 0 _llseek(15, 163840, [163840], SEEK_SET) = 0 read(15, "\21\1\0\0\214\3\224R\23\0\0\0\364\3\0\4\0\200\1\200\200"..., 32768) = 32768 open("/var/lib/pgsql/data/base/495616/6834163", O_RDWR|O_LARGEFILE) = 16 _llseek(16, 241893376, [241893376], SEEK_SET) = 0 write(16, "\342\1\0\0\220TK\7\24\0\0\0,\t`I\360\177\1\200\330\377"..., 32768) = 32768 close(16) = 0 _llseek(12, 0, [0], SEEK_SET) = 0 read(12, "O\1\0\0\350\340\316,\23\0\0\0X\3\230\3\0\200\1\200d\304"..., 32768) = 32768 open("/var/lib/pgsql/data/base/495616/6834171", O_RDWR|O_LARGEFILE) = 16 _llseek(16, 88702976, [88702976], SEEK_SET) = 0 write(16, "\342\1\0\0\324\326K\7\24\0\0\0`\v\370E\360\177\1\200\334"..., 32768) = 32768 close(16) = 0 _llseek(14, 32768, [32768], SEEK_SET) = 0 read(14, "\0\0\0\0\10\317\27\t\16\0\0\0\234\20\320=\360\177\1\200"..., 32768) = 32768 open("/var/lib/pgsql/data/base/495616/6834173", O_RDWR|O_LARGEFILE) = 16 _llseek(16, 152043520, [152043520], SEEK_SET) = 0 write(16, "\342\1\0\0\220fU\7\24\0\0\0l\n\320K\360\177\1\200\334\377"..., 32768) = 32768 close(16) = 0 _llseek(15, 0, [0], SEEK_SET) = 0 read(15, "\0\0\0\0\20\0\0\0\1\0\0\0\364\3\0\4\0\200\1\200\200\377"..., 32768) = 32768 open("/var/lib/pgsql/data/base/495616/6834163", O_RDWR|O_LARGEFILE) = 16 _llseek(16, 70025216, [70025216], SEEK_SET) = 0 write(16, "\342\1\0\0\370\rk\7\24\0\0\0 \10\250O\360\177\1\200\330"..., 32768) = 32768 close(16) = 0 read(15, "\0\0\0\0\20\0\0\0\1\0\0\0\364\3\0\4\0\200\1\200\200\377"..., 32768) = 32768 open("/var/lib/pgsql/data/base/495616/6834163", O_RDWR|O_LARGEFILE) = 16 _llseek(16, 152764416, [152764416], SEEK_SET) = 0 write(16, "\342\1\0\0008\222m\7\24\0\0\0\370\10\230J\360\177\1\200"..., 32768) = 32768 close(16) = 0 open("/var/lib/pgsql/data/base/495616/16630", O_RDWR|O_LARGEFILE) = 16 read(16, "\0\0\0\0\20\0\0\0\1\0\0\0\24\0\360\177\360\177\1\200b1"..., 32768) = 32768 open("/var/lib/pgsql/data/base/495616/6834163", O_RDWR|O_LARGEFILE) = 17 _llseek(17, 143753216, [143753216], SEEK_SET) = 0 write(17, "\342\1\0\0\314!w\7\24\0\0\0\20\t\10J\360\177\1\200\330"..., 32768) = 32768 close(17) = 0 read(16, "\340\1\0\0\340\204\264\241\24\0\0\0H\2Ty\360\177\1\200"..., 32768) = 32768 open("/var/lib/pgsql/data/base/495616/6834170", O_RDWR|O_LARGEFILE) = 17 _llseek(17, 192512000, [192512000], SEEK_SET) = 0 write(17, "\342\1\0\0`\253y\7\24\0\0\0\250\7\330G\360\177\1\200\324"..., 32768) = 32768 close(17) = 0 open("/var/lib/pgsql/data/base/495616/16390", O_RDWR|O_LARGEFILE) = 17 read(17, "\334\1\0\0t\242\23p\24\0\0\0\0\2\210\2\0\200\1\200\24\377"..., 32768) = 32768 open("/var/lib/pgsql/data/base/495616/16396", O_RDWR|O_LARGEFILE) = 18 _llseek(18, 0, [32768], SEEK_END) = 0 open("/var/lib/pgsql/data/base/495616/6834168", O_RDWR|O_LARGEFILE) = 19 _llseek(19, 63471616, [63471616], SEEK_SET) = 0 write(19, "\342\1\0\0\2444\200\7\24\0\0\0$\10\240O\360\177\1\200\330"..., 32768) = 32768 close(19) = 0 _llseek(18, 0, [0], SEEK_SET) = 0 read(18, "\0\0\0\0\20\0\0\0\1\0\0\0$\0\240}\0\200\1\200h\3770\1\320"..., 32768) = 32768 brk(0) = 0x82db000 brk(0x82dd000) = 0x82dd000 open("/var/lib/pgsql/data/base/495616/6834166", O_RDWR|O_LARGEFILE) = 19 _llseek(19, 64290816, [64290816], SEEK_SET) = 0 write(19, "\342\1\0\0<\265\200\7\24\0\0\0d\t`Q\360\177\1\200\334\377"..., 32768) = 32768 close(19) = 0 open("/var/lib/pgsql/data/base/495616/16605", O_RDWR|O_LARGEFILE) = 19 read(19, "\0\0\0\0\20\0\0\0\1\0\0\0\24\0\360\177\360\177\1\200b1"..., 32768) = 32768 open("/var/lib/pgsql/data/base/495616/6834163", O_RDWR|O_LARGEFILE) = 20 _llseek(20, 152731648, [152731648], SEEK_SET) = 0 write(20, "\342\1\0\0\264=\206\7\24\0\0\0\370\10\230J\360\177\1\200"..., 32768) = 32768 close(20) = 0 read(19, "\0\0\0\0\20\0\0\0\1\0\0\0\264\3pq\360\177\1\200\300\363"..., 32768) = 32768 open("/var/lib/pgsql/data/base/495616/6834170", O_RDWR|O_LARGEFILE) = 20 _llseek(20, 150274048, [150274048], SEEK_SET) = 0 write(20, "\342\1\0\0\230\310\212\7\24\0\0\0\210\7lO\360\177\1\200"..., 32768) = 32768 close(20) = 0 open("/var/lib/pgsql/data/base/495616/16398", O_RDWR|O_LARGEFILE) = 20 read(20, "\0\0\0\0\20\0\0\0\1\0\0\0\264\3`_\0\200\1\200\334\377H"..., 32768) = 32768 open("/var/lib/pgsql/data/base/495616/6834166", O_RDWR|O_LARGEFILE) = 21 _llseek(21, 260046848, [260046848], SEEK_SET) = 0 write(21, "\342\1\0\0\4\322\220\7\24\0\0\0\264\2\320r\360\177\1\200"..., 32768) = 32768 close(21) = 0 open("/var/lib/pgsql/data/base/495616/16639", O_RDWR|O_LARGEFILE) = 21 read(21, "\0\0\0\0\20\0\0\0\1\0\0\0\24\0\360\177\360\177\1\200b1"..., 32768) = 32768 open("/var/lib/pgsql/data/base/495616/6834170", O_RDWR|O_LARGEFILE) = 22 _llseek(22, 174424064, [174424064], SEEK_SET) = 0 write(22, "\342\1\0\0\200\\\225\7\24\0\0\0D\t$H\360\177\1\200\330"..., 32768) = 32768 close(22) = 0 read(21, "\0\0\0\0\20\0\0\0\1\0\0\0\200\t\254c\360\177\1\200\344"..., 32768) = 32768 open("/var/lib/pgsql/data/base/495616/6834163", O_RDWR|O_LARGEFILE) = 22 _llseek(22, 109084672, [109084672], SEEK_SET) = 0 write(22, "\342\1\0\0\310\335\226\7\24\0\0\0 \10\250O\360\177\1\200"..., 32768) = 32768 close(22) = 0 open("/var/lib/pgsql/data/base/495616/16392", O_RDWR|O_LARGEFILE) = 22 read(22, "\0\0\0\0\20\0\0\0\1\0\0\0X\3\350\3\0\200\1\200h\3770\1"..., 32768) = 32768 open("/var/lib/pgsql/data/base/495616/6834170", O_RDWR|O_LARGEFILE) = 23 _llseek(23, 200900608, [200900608], SEEK_SET) = 0 write(23, "\342\1\0\0\314\344\232\7\24\0\0\0\344\7\304G\360\177\1"..., 32768) = 32768 close(23) = 0 open("/var/lib/pgsql/data/base/495616/16606", O_RDWR|O_LARGEFILE) = 23 read(23, "\0\0\0\0\20\0\0\0\1\0\0\0\24\0\360\177\360\177\1\200b1"..., 32768) = 32768 open("/var/lib/pgsql/data/base/495616/6834168", O_RDWR|O_LARGEFILE) = 24 _llseek(24, 85426176, [85426176], SEEK_SET) = 0 write(24, "\342\1\0\0\30\345\232\7\24\0\0\0\264\7\360V\360\177\1\200"..., 32768) = 32768 close(24) = 0 read(23, "\0\0\0\0\20\0\0\0\1\0\0\0H\1 {\360\177\1\200P\377 \0@\377"..., 32768) = 32768 open("/var/lib/pgsql/data/base/495616/6834166", O_RDWR|O_LARGEFILE) = 24 _llseek(24, 156729344, [156729344], SEEK_SET) = 0 write(24, "\342\1\0\0\260e\233\7\24\0\0\0\30\n\334M\360\177\1\200"..., 32768) = 32768 close(24) = 0 open("/var/lib/pgsql/data/base/495616/16400", O_RDWR|O_LARGEFILE) = 24 read(24, "\0\0\0\0\20\0\0\0\1\0\0\0H\1,u\0\200\1\200\334\377H\0\270"..., 32768) = 32768 open("/var/lib/pgsql/data/base/495616/6834168", O_RDWR|O_LARGEFILE) = 25 _llseek(25, 92995584, [92995584], SEEK_SET) = 0 write(25, "\342\1\0\0\244i\235\7\24\0\0\0\360\ttO\360\177\1\200\324"..., 32768) = 32768 close(25) = 0 open("/var/lib/pgsql/data/base/495616/16607", O_RDWR|O_LARGEFILE) = 25 read(25, "\0\0\0\0\264\341\v\t\16\0\0\0\24\0\360\177\360\177\1\200"..., 32768) = 32768 open("/var/lib/pgsql/data/base/495616/6834170", O_RDWR|O_LARGEFILE) = 26 _llseek(26, 209387520, [209387520], SEEK_SET) = 0 write(26, "\342\1\0\0<m\237\7\24\0\0\0\\\7\214H\360\177\1\200\320"..., 32768) = 32768 close(26) = 0 read(25, "N\1\0\0X\227`\236\23\0\0\0\334\0\320|\360\177\1\200\340"..., 32768) = 32768 open("/var/lib/pgsql/data/base/495616/6834168", O_RDWR|O_LARGEFILE) = 26 _llseek(26, 108363776, [108363776], SEEK_SET) = 0 write(26, "\342\1\0\0\334\375\251\7\24\0\0\0\24\10`K\360\177\1\200"..., 32768) = 32768 close(26) = 0 brk(0) = 0x82dd000 brk(0x82de000) = 0x82de000 open("/var/lib/pgsql/data/base/495616/16384", O_RDWR|O_LARGEFILE) = 26 read(26, "N\1\0\0\244\1H\332\23\0\0\0\360\0\244N\0\200\1\200\270"..., 32768) = 32768 open("/var/lib/pgsql/data/base/495616/6834173", O_RDWR|O_LARGEFILE) = 27 _llseek(27, 85000192, [85000192], SEEK_SET) = 0 write(27, "\342\1\0\0\364\0\254\7\24\0\0\0008\txQ\360\177\1\200\334"..., 32768) = 32768 close(27) = 0 read(11, "\334\1\0\0\344\3422.\23\0\0\0t\1\320e\360\177\1\200\244"..., 32768) = 32768 mmap2(NULL, 266240, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x404b8000 open("/var/lib/pgsql/data/base/495616/6834163", O_RDWR|O_LARGEFILE) = 27 _llseek(27, 255524864, [255524864], SEEK_SET) = 0 write(27, "\342\1\0\0\34\3\256\7\24\0\0\0\374\10\200J\360\177\1\200"..., 32768) = 32768 close(27) = 0 open("/var/lib/pgsql/data/base/495616/16640", O_RDWR|O_LARGEFILE) = 27 read(27, "\0\0\0\0\20\0\0\0\1\0\0\0\24\0\360\177\360\177\1\200b1"..., 32768) = 32768 open("/var/lib/pgsql/data/base/495616/6834170", O_RDWR|O_LARGEFILE) = 28 _llseek(28, 202047488, [202047488], SEEK_SET) = 0 write(28, "\342\1\0\0\240\20\263\7\24\0\0\0\220\7\30H\360\177\1\200"..., 32768) = 32768 close(28) = 0 _llseek(27, 98304, [98304], SEEK_SET) = 0 read(27, "\0\0\0\0\20\0\0\0\1\0\0\0\34\0\224\177\360\177\1\200\350"..., 32768) = 32768 open("/var/lib/pgsql/data/base/495616/6834168", O_RDWR|O_LARGEFILE) = 28 _llseek(28, 141524992, [141524992], SEEK_SET) = 0 write(28, "\342\1\0\0$\36\274\7\24\0\0\0p\10(L\360\177\1\200\324\377"..., 32768) = 32768 close(28) = 0 _llseek(27, 32768, [32768], SEEK_SET) = 0 read(27, "\0\0\0\0\20\0\0\0\1\0\0\0\314\5\330\7\360\177\1\200\234"..., 32768) = 32768 open("/var/lib/pgsql/data/base/495616/6834166", O_RDWR|O_LARGEFILE) = 28 _llseek(28, 149422080, [149422080], SEEK_SET) = 0 write(28, "\342\1\0\0\250\36\274\7\24\0\0\0\214\3\230n\360\177\1\200"..., 32768) = 32768 close(28) = 0 _llseek(22, 65536, [65536], SEEK_SET) = 0 read(22, "\0\0\0\0\20\0\0\0\1\0\0\0\370\2(\22\0\200\1\200h\3770\1"..., 32768) = 32768 brk(0) = 0x82de000 brk(0x82e2000) = 0x82e2000 open("/var/lib/pgsql/data/base/495616/6834165", O_RDWR|O_LARGEFILE) = 28 _llseek(28, 125075456, [125075456], SEEK_SET) = 0 write(28, "\342\1\0\0\0\237\274\7\24\0\0\0<\t\0I\360\177\1\200\330"..., 32768) = 32768 close(28) = 0 read(27, "\0\0\0\0\20\0\0\0\1\0\0\0\314\3\3301\360\177\1\200\234"..., 32768) = 32768 open("/var/lib/pgsql/data/base/495616/6834166", O_RDWR|O_LARGEFILE) = 28 _llseek(28, 199000064, [199000064], SEEK_SET) = 0 write(28, "\342\1\0\0\304&\301\7\24\0\0\0\310\nlJ\360\177\1\200\334"..., 32768) = 32768 close(28) = 0 _llseek(22, 32768, [32768], SEEK_SET) = 0 read(22, "\0\0\0\0\20\0\0\0\1\0\0\0X\3\350\3\0\200\1\200h\3770\1"..., 32768) = 32768 open("/var/lib/pgsql/data/base/495616/6834168", O_RDWR|O_LARGEFILE) = 28 _llseek(28, 146145280, [146145280], SEEK_SET) = 0 write(28, "\342\1\0\0\224\252\303\7\24\0\0\0\200\2 r\360\177\1\200"..., 32768) = 32768 close(28) = 0 open("/var/lib/pgsql/data/base/495616/16652", O_RDWR|O_LARGEFILE) = 28 read(28, "\0\0\0\0\20\0\0\0\1\0\0\0\24\0\360\177\360\177\1\200b1"..., 32768) = 32768 open("/var/lib/pgsql/data/base/495616/6834170", O_RDWR|O_LARGEFILE) = 29 _llseek(29, 244252672, [244252672], SEEK_SET) = 0 write(29, "\342\1\0\0\0003\310\7\24\0\0\0\260\0074H\360\177\1\200"..., 32768) = 32768 close(29) = 0 read(28, "\340\1\0\0\30*\262\241\24\0\0\0\210\4\224r\360\177\1\200"..., 32768) = 32768 open("/var/lib/pgsql/data/base/495616/6834163", O_RDWR|O_LARGEFILE) = 29 _llseek(29, 2228224, [2228224], SEEK_SET) = 0 write(29, "\342\1\0\0h\264\310\7\24\0\0\0\34\10\300O\360\177\1\200"..., 32768) = 32768 close(29) = 0 open("/var/lib/pgsql/data/base/495616/1247", O_RDWR|O_LARGEFILE) = 29 read(29, "\0\0\0\0\244\5\201\0\v\0\0\0H\3\224\3\0\200\1\200h\377"..., 32768) = 32768 open("/var/lib/pgsql/data/base/495616/6834163", O_RDWR|O_LARGEFILE) = 30 _llseek(30, 20316160, [20316160], SEEK_SET) = 0 write(30, "\342\1\0\0@\270\312\7\24\0\0\0P\10\210N\360\177\1\200\330"..., 32768) = 32768 close(30) = 0 open("/var/lib/pgsql/data/base/495616/16612", O_RDWR|O_LARGEFILE) = 30 read(30, "\0\0\0\0\20\0\0\0\1\0\0\0\24\0\360\177\360\177\1\200b1"..., 32768) = 32768 open("/var/lib/pgsql/data/base/495616/6834173", O_RDWR|O_LARGEFILE) = 31 _llseek(31, 12058624, [12058624], SEEK_SET) = 0 write(31, "\342\1\0\0\340\301\320\7\24\0\0\0l\7 N\360\177\1\200\334"..., 32768) = 32768 close(31) = 0 read(30, "\0\0\0\0\20\0\0\0\1\0\0\0\320\2\0u\360\177\1\200p\372 "..., 32768) = 32768 open("/var/lib/pgsql/data/base/495616/6834168", O_RDWR|O_LARGEFILE) = 31 _llseek(31, 281968640, [281968640], SEEK_SET) = 0 write(31, "\342\1\0\0$\317\331\7\24\0\0\0\270\1,w\360\177\1\200\334"..., 32768) = 32768 close(31) = 0 open("/var/lib/pgsql/data/base/495616/16418", O_RDWR|O_LARGEFILE) = 31 read(31, "\0\0\0\0\20\0\0\0\1\0\0\0\320\2\354a\0\200\1\200\324\377"..., 32768) = 32768 open("/var/lib/pgsql/data/base/495616/6834170", O_RDWR|O_LARGEFILE) = 32 _llseek(32, 336003072, [336003072], SEEK_SET) = 0 write(32, "\342\1\0\0\264Z\340\7\24\0\0\0D\0104H\360\177\1\200\330"..., 32768) = 32768 close(32) = 0 open("/var/lib/pgsql/data/base/495616/16641", O_RDWR|O_LARGEFILE) = 32 read(32, "\0\0\0\0\\\242I\0\10\0\0\0\24\0\360\177\360\177\1\200b"..., 32768) = 32768 open("/var/lib/pgsql/data/base/495616/6834166", O_RDWR|O_LARGEFILE) = 33 _llseek(33, 138903552, [138903552], SEEK_SET) = 0 write(33, "\342\1\0\0\300u\355\7\24\0\0\0\334\4\10h\360\177\1\200"..., 32768) = 32768 close(33) = 0 _llseek(32, 98304, [98304], SEEK_SET) = 0 read(32, "\0\0\0\0\\\242I\0\10\0\0\0\34\0\334\177\360\177\1\200\350"..., 32768) = 32768 open("/var/lib/pgsql/data/base/495616/6834165", O_RDWR|O_LARGEFILE) = 33 _llseek(33, 6062080, [6062080], SEEK_SET) = 0 write(33, "\342\1\0\0t~\360\7\24\0\0\0\4\10PP\360\177\1\200\330\377"..., 32768) = 32768 close(33) = 0 _llseek(32, 32768, [32768], SEEK_SET) = 0 read(32, "\0\0\0\0\200\20\276\0\v\0\0\0\0\17,S\360\177\1\200\344"..., 32768) = 32768 open("/var/lib/pgsql/data/base/495616/6834163", O_RDWR|O_LARGEFILE) = 33 _llseek(33, 17596416, [17596416], SEEK_SET) = 0 write(33, "\342\1\0\0\314\376\360\7\24\0\0\0\24\10\360O\360\177\1"..., 32768) = 32768 close(33) = 0 open("/var/lib/pgsql/data/base/495616/1255", O_RDWR|O_LARGEFILE) = 33 _llseek(33, 458752, [458752], SEEK_SET) = 0 read(33, "\0\0\0\0\270\10\276\0\v\0\0\0\300\1\370\1\0\200\1\2000"..., 32768) = 32768 brk(0) = 0x82e2000 brk(0x82e4000) = 0x82e4000 brk(0) = 0x82e4000 brk(0x82e6000) = 0x82e6000 brk(0) = 0x82e6000 brk(0x82e7000) = 0x82e7000 open("/var/lib/pgsql/data/base/495616/6834168", O_RDWR|O_LARGEFILE) = 34 _llseek(34, 242122752, [242122752], SEEK_SET) = 0 write(34, "\342\1\0\0\340\201\362\7\24\0\0\0\224\2\334r\360\177\1"..., 32768) = 32768 close(34) = 0 read(15, "\0\0\0\0\320r\37\0\5\0\0\0\364\3\0\4\0\200\1\200\200\377"..., 32768) = 32768 brk(0) = 0x82e7000 brk(0x82e8000) = 0x82e8000 open("/var/lib/pgsql/data/base/495616/6834166", O_RDWR|O_LARGEFILE) = 34 _llseek(34, 242810880, [242810880], SEEK_SET) = 0 write(34, "\342\1\0\0d\202\362\7\24\0\0\0P\3\304o\360\177\1\200\334"..., 32768) = 32768 close(34) = 0 open("/var/lib/pgsql/data/base/495616/16629", O_RDWR|O_LARGEFILE) = 34 read(34, "\0\0\0\0\20\0\0\0\1\0\0\0\24\0\360\177\360\177\1\200b1"..., 32768) = 32768 open("/var/lib/pgsql/data/base/495616/6834165", O_RDWR|O_LARGEFILE) = 35 _llseek(35, 33292288, [33292288], SEEK_SET) = 0 write(35, "\342\1\0\0\234\231\375\7\24\0\0\0(\10xO\360\177\1\200\330"..., 32768) = 32768 close(35) = 0 read(34, "\340\1\0\0\244\204\264\241\24\0\0\0H\2Ty\360\177\1\200"..., 32768) = 32768 brk(0) = 0x82e8000 brk(0x82e9000) = 0x82e9000 brk(0) = 0x82e9000 brk(0x82eb000) = 0x82eb000 brk(0) = 0x82eb000 brk(0x82ec000) = 0x82ec000 brk(0) = 0x82ec000 brk(0x82ed000) = 0x82ed000 open("/var/lib/pgsql/data/base/495616/6834163", O_RDWR|O_LARGEFILE) = 35 _llseek(35, 4456448, [4456448], SEEK_SET) = 0 write(35, "\342\1\0\0\364\31\376\7\24\0\0\0\34\10\300O\360\177\1\200"..., 32768) = 32768 open("/var/lib/pgsql/data/base/495616/16647", O_RDWR|O_LARGEFILE) = 36 read(36, "\0\0\0\0\4\307}\0\v\0\0\0\24\0\360\177\360\177\1\200b1"..., 32768) = 32768 brk(0) = 0x82ed000 brk(0x82ee000) = 0x82ee000 open("/var/lib/pgsql/data/base/495616/6834170", O_RDWR|O_LARGEFILE) = 37 _llseek(37, 265158656, [265158656], SEEK_SET) = 0 write(37, "\342\1\0\0@\34\377\7\24\0\0\0\224\7lG\360\177\1\200\324"..., 32768) = 32768 read(36, "\336\1\0\0000\327V\272\24\0\0\0\210\5 j\360\177\1\200@"..., 32768) = 32768 _llseek(35, 161415168, [161415168], SEEK_SET) = 0 write(35, "\342\1\0\0p\35\0\10\24\0\0\0D\t\320H\360\177\1\200\330"..., 32768) = 32768 open("/var/lib/pgsql/data/base/495616/16408", O_RDWR|O_LARGEFILE) = 38 read(38, "\336\1\0\0\224\273V\272\24\0\0\0H\2h\2\0\200\1\2004\377"..., 32768) = 32768 open("/var/lib/pgsql/data/base/495616/6834173", O_RDWR|O_LARGEFILE) = 39 _llseek(39, 133332992, [133332992], SEEK_SET) = 0 write(39, "\342\1\0\0\34\340(\10\24\0\0\0p\3\334l\360\177\1\200\330"..., 32768) = 32768 open("/var/lib/pgsql/data/base/495616/16604", O_RDWR|O_LARGEFILE) = 40 read(40, "\0\0\0\0\20\0\0\0\1\0\0\0\24\0\360\177\360\177\1\200b1"..., 32768) = 32768 _llseek(39, 244875264, [244875264], SEEK_SET) = 0 write(39, "\342\1\0\0\264\343*\10\24\0\0\0L\2Ht\360\177\1\200\334"..., 32768) = 32768 read(40, "\0\0\0\0\20\0\0\0\1\0\0\0\264\3pq\360\177\1\200\340\366"..., 32768) = 32768 _llseek(37, 119635968, [119635968], SEEK_SET) = 0 write(37, "\342\1\0\0\350k.\10\24\0\0\0\334\6(N\360\177\1\200\324"..., 32768) = 32768 _llseek(38, 65536, [65536], SEEK_SET) = 0 read(38, "\336\1\0\0\214\272V\272\24\0\0\0\334\0\370\0\0\200\1\200"..., 32768) = 32768 _llseek(37, 103841792, [103841792], SEEK_SET) = 0 write(37, "\342\1\0\0\24t3\10\24\0\0\0\300\6@M\360\177\1\200\324\377"..., 32768) = 32768 _llseek(38, 32768, [32768], SEEK_SET) = 0 read(38, "\336\1\0\0\260\325V\272\24\0\0\0000\2\200\2\0\200\1\200"..., 32768) = 32768 brk(0) = 0x82ee000 brk(0x82f0000) = 0x82f0000 brk(0) = 0x82f0000 brk(0x82f2000) = 0x82f2000 gettimeofday({1079482178, 920849}, NULL) = 0 brk(0) = 0x82f2000 brk(0x82f4000) = 0x82f4000 brk(0) = 0x82f4000 brk(0x82f6000) = 0x82f6000 brk(0) = 0x82f6000 brk(0x82fa000) = 0x82fa000 brk(0) = 0x82fa000 brk(0x8302000) = 0x8302000 _llseek(37, 79331328, [79331328], SEEK_SET) = 0 write(37, "\342\1\0\0\200\3747\10\24\0\0\0\300\0068N\360\177\1\200"..., 32768) = 32768 open("/var/lib/pgsql/data/base/495616/16653", O_RDWR|O_LARGEFILE) = 41 read(41, "\0\0\0\0\20\0\0\0\1\0\0\0\24\0\360\177\360\177\1\200b1"..., 32768) = 32768 open("/var/lib/pgsql/data/base/495616/6834168", O_RDWR|O_LARGEFILE) = 42 _llseek(42, 262144, [262144], SEEK_SET) = 0 write(42, "\342\1\0\0000\3758\10\24\0\0\0\0\6Xb\360\177\1\200\320"..., 32768) = 32768 read(41, "\340\1\0\0\224*\262\241\24\0\0\0\210\4T+\360\177\1\200"..., 32768) = 32768 brk(0) = 0x8302000 brk(0x8304000) = 0x8304000 gettimeofday({1079482178, 957454}, NULL) = 0 gettimeofday({1079482178, 957580}, NULL) = 0 send(6, "\4\0\0\0\334\3\0\0\7\0\0\0zU\0\0\0\220\7\0\1\0\0\0\16\0"..., 988, 0) = 988 send(6, "\4\0\0\0\334\3\0\0\7\0\0\0zU\0\0\0\220\7\0\1\0\0\0\16\0"..., 988, 0) = 988 send(6, "\4\0\0\0\334\3\0\0\7\0\0\0zU\0\0\0\220\7\0\1\0\0\0\16\0"..., 988, 0) = 988 send(6, "\4\0\0\0\274\1\0\0\7\0\0\0zU\0\0\0\220\7\0\1\0\0\0\6\0"..., 444, 0) = 444 send(9, "T\0\0\0#\0\1QUERY PLAN\0\0\0\0\0\0\0\0\0\0\31\377\377\377"..., 394, 0) = 394 recv(9, "X\0\0\0\4", 8192, 0) = 5 exit_group(0) = ? /rls -- Rosser Schwarz Total Card, Inc.
"Rosser Schwarz" <rschwarz@totalcardinc.com> writes: > `strace -p 21882` run behind the below query and plan ... below that. Hmm ... that took 20 seconds eh? It is a fairly interesting trace. It shows that the backend needed to read 63 system catalog pages (that weren't already in shared memory), which is not too unreasonable I think ... though I wonder if more of them shouldn't have been in memory already. The odd thing is that for *every single read* it was necessary to first dump out a dirty page in order to make a buffer free. That says you are running with the entire contents of shared buffer space dirty at all times. That's probably not the regime you want to be operating in. I think we already suggested increasing shared_buffers. You might also want to think about not using such a large checkpoint interval. (The background-writing logic already committed for 7.5 should help this problem, but it's not there in 7.4.) Another interesting fact is that the bulk of the writes were "blind writes", involving an open()/write()/close() sequence instead of keeping the open file descriptor around for re-use. This is not too surprising in a freshly started backend, I guess; it's unlikely to have had reason to create a relation descriptor for the relations it may have to dump pages for. In some Unixen, particularly Solaris, open() is fairly expensive and so blind writes are bad news. I didn't think it was a big problem in Linux though. (This is another area we've improved for 7.5: there are no more blind writes. But that won't help you today.) What's not immediately evident is whether the excess I/O accounted for all of the slowdown. Could you retry the strace with -r and -T options so we can see how much time is being spent inside and outside the syscalls? regards, tom lane
Quick observations: 1. We have an explanation for what's going on, based on the message being exactly 666 lines long :-) 2. I'm clueless on the output, but perhaps Tom can see something. A quick glance shows that the strace seemed to run 27 seconds, during which it did: count| call -------|--------- 84 | _llseek 40 | brk 54 | close 88 | open 63 | read in other words, nothing much (though it did *a lot* of opens and closes of db files to do nothing ). Can you do another strace for a few minutes against the actual update query adding the -c/-t options and control-c out? ----- Original Message ----- From: "Rosser Schwarz" <rschwarz@totalcardinc.com> To: "'Tom Lane'" <tgl@sss.pgh.pa.us> Cc: <pgsql-performance@postgresql.org> Sent: Tuesday, March 16, 2004 7:20 PM Subject: Re: [PERFORM] atrocious update performance while you weren't looking, Tom Lane wrote: [trace] `strace -p 21882` run behind the below query and plan ... below that. # explain update account.cust set prodid = tempprod.prodid, subprodid = tempprod.subprodid where origid = tempprod.debtid;
On Tuesday 16 March 2004 00:08, Tom Lane wrote: > > I'm inclined to suspect an issue with foreign-key checking. You didn't > give us any details about foreign key relationships your "cust" table is > involved in --- could we see those? And the schemas of the other tables > involved? Two questions Tom: 1. Do the stats tables record FK checks, or just explicit table accesses? 2. If not, should they? If the only real activity is this update then simple before/after views of the stats might be revealing. -- Richard Huxton Archonet Ltd
Richard Huxton <dev@archonet.com> writes: > Two questions Tom: > 1. Do the stats tables record FK checks, or just explicit table accesses? The stats record everything, IIRC. > If the only real activity is this update then simple before/after > views of the stats might be revealing. That's quite a good thought, though since Rosser's system is live it might be hard to get a view of just one query's activity. regards, tom lane
while you weren't looking, Tom Lane wrote: > What's not immediately evident is whether the excess I/O accounted for > all of the slowdown. Could you retry the strace with -r and -T options > so we can see how much time is being spent inside and outside the > syscalls? Unlike the previous run (this is a trace of the explain), this one went immediately. No delay. I also have, per Aaron's request, a trace -ct against the backend running the explain analyze. I killed it well before "a few minutes"; it's just shy of 900K. I don't think I'll be forwarding that on to the list, though I can put it up on a web server somewhere easily enough. Try <http://www.totalcardinc.com/pg/postmaster.trace>. # `strace -rT -p 25075` 0.000000 read(0, "\r", 1) = 1 <5.514983> 5.516215 write(1, "\n", 1) = 1 <0.000034> 0.000545 rt_sigprocmask(SIG_BLOCK, [INT], [33], 8) = 0 <0.000013> 0.000200 ioctl(0, SNDCTL_TMR_STOP, {B38400 opost isig icanon echo ...}) = 0 <0.000032> 0.000162 rt_sigprocmask(SIG_SETMASK, [33], NULL, 8) = 0 <0.000013> 0.000120 rt_sigaction(SIGINT, {0x804d404, [], SA_RESTORER|SA_RESTART, 0x420 276f8}, {0x401ec910, [], SA_RESTORER, 0x420276f8}, 8) = 0 <0.000015> 0.000154 rt_sigaction(SIGTERM, {SIG_DFL}, {0x401ec910, [], SA_RESTORER, 0x4 20276f8}, 8) = 0 <0.000014> 0.000136 rt_sigaction(SIGQUIT, {SIG_DFL}, {0x401ec910, [], SA_RESTORER, 0x4 20276f8}, 8) = 0 <0.000013> 0.000134 rt_sigaction(SIGALRM, {SIG_DFL}, {0x401ec910, [], SA_RESTORER, 0x4 20276f8}, 8) = 0 <0.000012> 0.000164 rt_sigaction(SIGTSTP, {SIG_DFL}, {0x401ec910, [], SA_RESTORER, 0x4 20276f8}, 8) = 0 <0.000013> 0.000140 rt_sigaction(SIGTTOU, {SIG_DFL}, {0x401ec910, [], SA_RESTORER, 0x4 20276f8}, 8) = 0 <0.000013> 0.000135 rt_sigaction(SIGTTIN, {SIG_DFL}, {0x401ec910, [], SA_RESTORER, 0x4 20276f8}, 8) = 0 <0.000013> 0.000135 rt_sigaction(SIGWINCH, {SIG_DFL}, {0x401ec9d0, [], SA_RESTORER, 0x 420276f8}, 8) = 0 <0.000014> 0.000250 rt_sigaction(SIGPIPE, {SIG_IGN}, {SIG_DFL}, 8) = 0 <0.000013> 0.000138 send(3, "Q\0\0\0}explain update account.cust"..., 126, 0) = 126 <0 .000032> 0.000164 rt_sigaction(SIGPIPE, {SIG_DFL}, {SIG_IGN}, 8) = 0 <0.000013> 0.000132 poll([{fd=3, events=POLLIN|POLLERR, revents=POLLIN}], 1, -1) = 1 < 0.222093> 0.222388 recv(3, "T\0\0\0#\0\1QUERY PLAN\0\0\0\0\0\0\0\0\0\0\31\377\377\377 "..., 16384, 0) = 394 <0.000031> 0.000360 ioctl(0, SNDCTL_TMR_TIMEBASE, {B38400 opost isig icanon echo ...}) = 0 <0.000019> 0.000137 ioctl(1, SNDCTL_TMR_TIMEBASE, {B38400 opost isig icanon echo ...}) = 0 <0.000013> 0.000135 ioctl(1, TIOCGWINSZ, {ws_row=64, ws_col=80, ws_xpixel=0, ws_ypixel =0}) = 0 <0.000015> 0.000175 write(1, " "..., 92) = 92 <0.000038 > 0.000184 write(1, "--------------------------------"..., 92) = 92 <0.000025 > 0.000154 write(1, " Merge Join (cost=0.00..232764."..., 59) = 59 <0.000023 > 0.000136 write(1, " Merge Cond: ((\"outer\".origid)"..., 65) = 65 <0.0000 23> 0.000134 write(1, " -> Index Scan using ix_origi"..., 88) = 88 <0.000025 > 0.000129 write(1, " -> Index Scan using ix_debti"..., 91) = 91 <0.000025 > 0.000136 write(1, "(4 rows)\n", 9) = 9 <0.000022> 0.000116 write(1, "\n", 1) = 1 <0.000021> 0.000144 rt_sigprocmask(SIG_BLOCK, NULL, [33], 8) = 0 <0.000013> 0.000121 rt_sigaction(SIGINT, {0x804d404, [], SA_RESTORER|SA_RESTART, 0x420 276f8}, {0x804d404, [], SA_RESTORER|SA_RESTART, 0x420276f8}, 8) = 0 <0.000015> 0.000208 rt_sigprocmask(SIG_BLOCK, [INT], [33], 8) = 0 <0.000013> 0.000129 ioctl(0, TIOCGWINSZ, {ws_row=64, ws_col=80, ws_xpixel=0, ws_ypixel =0}) = 0 <0.000013> 0.000102 ioctl(0, TIOCSWINSZ, {ws_row=64, ws_col=80, ws_xpixel=0, ws_ypixel =0}) = 0 <0.000014> 0.000105 ioctl(0, SNDCTL_TMR_TIMEBASE, {B38400 opost isig icanon echo ...}) = 0 <0.000013> 0.000127 ioctl(0, SNDCTL_TMR_STOP, {B38400 opost isig -icanon -echo ...}) = 0 <0.000028> 0.000147 rt_sigprocmask(SIG_SETMASK, [33], NULL, 8) = 0 <0.000012> 0.000114 rt_sigaction(SIGINT, {0x401ec910, [], SA_RESTORER, 0x420276f8}, {0 x804d404, [], SA_RESTORER|SA_RESTART, 0x420276f8}, 8) = 0 <0.000012> 0.000149 rt_sigaction(SIGTERM, {0x401ec910, [], SA_RESTORER, 0x420276f8}, { SIG_DFL}, 8) = 0 <0.000013> 0.000136 rt_sigaction(SIGQUIT, {0x401ec910, [], SA_RESTORER, 0x420276f8}, { SIG_DFL}, 8) = 0 <0.000012> 0.000136 rt_sigaction(SIGALRM, {0x401ec910, [], SA_RESTORER, 0x420276f8}, { SIG_DFL}, 8) = 0 <0.000012> 0.000136 rt_sigaction(SIGTSTP, {0x401ec910, [], SA_RESTORER, 0x420276f8}, { SIG_DFL}, 8) = 0 <0.000013> 0.000136 rt_sigaction(SIGTTOU, {0x401ec910, [], SA_RESTORER, 0x420276f8}, { SIG_DFL}, 8) = 0 <0.000012> 0.000136 rt_sigaction(SIGTTIN, {0x401ec910, [], SA_RESTORER, 0x420276f8}, { SIG_DFL}, 8) = 0 <0.000013> 0.000212 rt_sigaction(SIGWINCH, {0x401ec9d0, [], SA_RESTORER, 0x420276f8}, {SIG_DFL}, 8) = 0 <0.000012> 0.000188 write(1, "\r\rtci=# \rtci=# ", 15) = 15 <0.000019> 0.000112 rt_sigprocmask(SIG_BLOCK, NULL, [33], 8) = 0 <0.000012> 0.000110 read(0, "\\", 1) = 1 <18.366895> 18.368284 write(1, "\rtci=# \\\rtci=# \\", 16) = 16 <0.000029> 0.000134 rt_sigprocmask(SIG_BLOCK, NULL, [33], 8) = 0 <0.000013> 0.000125 read(0, "q", 1) = 1 <0.117572> 0.117719 write(1, "\rtci=# \\q\rtci=# \\q", 18) = 18 <0.000020> 0.000118 rt_sigprocmask(SIG_BLOCK, NULL, [33], 8) = 0 <0.000012> 0.000107 read(0, "\r", 1) = 1 <1.767409> 1.767604 write(1, "\n", 1) = 1 <0.000032> 0.000140 rt_sigprocmask(SIG_BLOCK, [INT], [33], 8) = 0 <0.000013> 0.000138 ioctl(0, SNDCTL_TMR_STOP, {B38400 opost isig icanon echo ...}) = 0 <0.000030> 0.000143 rt_sigprocmask(SIG_SETMASK, [33], NULL, 8) = 0 <0.000013> 0.000111 rt_sigaction(SIGINT, {0x804d404, [], SA_RESTORER|SA_RESTART, 0x420 276f8}, {0x401ec910, [], SA_RESTORER, 0x420276f8}, 8) = 0 <0.000014> 0.000153 rt_sigaction(SIGTERM, {SIG_DFL}, {0x401ec910, [], SA_RESTORER, 0x4 20276f8}, 8) = 0 <0.000013> 0.000134 rt_sigaction(SIGQUIT, {SIG_DFL}, {0x401ec910, [], SA_RESTORER, 0x4 20276f8}, 8) = 0 <0.000013> 0.000134 rt_sigaction(SIGALRM, {SIG_DFL}, {0x401ec910, [], SA_RESTORER, 0x4 20276f8}, 8) = 0 <0.000013> 0.000133 rt_sigaction(SIGTSTP, {SIG_DFL}, {0x401ec910, [], SA_RESTORER, 0x4 20276f8}, 8) = 0 <0.000013> 0.000134 rt_sigaction(SIGTTOU, {SIG_DFL}, {0x401ec910, [], SA_RESTORER, 0x4 20276f8}, 8) = 0 <0.000013> 0.000134 rt_sigaction(SIGTTIN, {SIG_DFL}, {0x401ec910, [], SA_RESTORER, 0x4 20276f8}, 8) = 0 <0.000012> 0.000134 rt_sigaction(SIGWINCH, {SIG_DFL}, {0x401ec9d0, [], SA_RESTORER, 0x 420276f8}, 8) = 0 <0.000014> 0.001271 rt_sigaction(SIGINT, {SIG_DFL}, {0x804d404, [], SA_RESTORER|SA_RES TART, 0x420276f8}, 8) = 0 <0.000013> 0.000532 rt_sigaction(SIGPIPE, {SIG_IGN}, {SIG_DFL}, 8) = 0 <0.000014> 0.000145 send(3, "X\0\0\0\4", 5, 0) = 5 <0.000028> 0.000126 rt_sigaction(SIGPIPE, {SIG_DFL}, {SIG_IGN}, 8) = 0 <0.000013> 0.000140 close(3) = 0 <0.000033> 0.000147 rt_sigaction(SIGPIPE, {SIG_DFL}, {SIG_DFL}, 8) = 0 <0.000013> 0.000197 open("/var/lib/pgsql/.psql_history", O_WRONLY|O_CREAT|O_TRUNC, 060 0) = 3 <0.000168> 0.000694 write(3, "\\d payment.batch\nalter sequence "..., 16712) = 16712 < 0.000209> 0.000311 close(3) = 0 <0.000057> 0.055587 munmap(0x40030000, 4096) = 0 <0.000032> 0.000130 exit_group(0) = ? /rls -- Rosser Schwarz Total Card, Inc.
"Rosser Schwarz" <rschwarz@totalcardinc.com> writes: >> Could you retry the strace with -r and -T options > Unlike the previous run (this is a trace of the explain), this one went > immediately. No delay. Hm. It looks like you mistakenly traced psql rather than the backend, but since the delay went away we wouldn't have learned anything anyhow. Have you got any idea what conditions may have changed between seeing delay and not seeing delay? > I also have, per Aaron's request, a trace -ct against the backend running > the explain analyze. I killed it well before "a few minutes"; it's just > shy of 900K. I don't think I'll be forwarding that on to the list, though > I can put it up on a web server somewhere easily enough. > Try <http://www.totalcardinc.com/pg/postmaster.trace>. This is pretty odd too. It looks like it's doing checkpoints every so often (look for the writes to pg_control), which a backend engaged in a long-running query surely ought not be doing. Need to think about why that might be... regards, tom lane
while you weren't looking, Tom Lane wrote: > Hm. It looks like you mistakenly traced psql rather than the backend, > but since the delay went away we wouldn't have learned > anything anyhow. > Have you got any idea what conditions may have changed between seeing > delay and not seeing delay? None, offhand. I have noticed that when a large query is running, the machine can sporadically just freeze--or at least take inordinately long for some other process, be it top or ls, another query, or whatever to start. Nothing looks saturated when it happens, and, while you can count on it to happen, it's not consistent enough to reproduce. > This is pretty odd too. It looks like it's doing checkpoints every so > often (look for the writes to pg_control), which a backend engaged in > a long-running query surely ought not be doing. Need to think about > why that might be... Does the fact that all the reads and writes are 32K mean anything out of the ordinary? $PGSRC/src/include/pg_config_manual.h has BLCKSZ #defined to 16384. I was running previously with a 32K BLCKSZ, but that turned out to be rather sub-optimal for as heavily indexed as our tables are. I've dumped and rebuilt several times since then. /rls -- Rosser Schwarz Total Card, Inc.
"Rosser Schwarz" <rschwarz@totalcardinc.com> writes: > while you weren't looking, Tom Lane wrote: >> Have you got any idea what conditions may have changed between seeing >> delay and not seeing delay? > None, offhand. I have noticed that when a large query is running, > the machine can sporadically just freeze--or at least take inordinately > long for some other process, be it top or ls, another query, or whatever > to start. Nothing looks saturated when it happens, and, while you can > count on it to happen, it's not consistent enough to reproduce. Interesting. You should leave "vmstat 1" running in the background and see if you can correlate these freezes with bursts of disk I/O or swap. I saw a couple of delays in your big strace that seemed odd --- a couple of one-second-plus intervals, and a four-second-plus interval, with no obvious reason for them. Perhaps the same issue? > Does the fact that all the reads and writes are 32K mean anything out > of the ordinary? $PGSRC/src/include/pg_config_manual.h has BLCKSZ > #defined to 16384. I was running previously with a 32K BLCKSZ, but > that turned out to be rather sub-optimal for as heavily indexed as our > tables are. I've dumped and rebuilt several times since then. I hate to break it to you, but that most definitely means you are running with BLCKSZ = 32K. Whatever you thought you were rebuilding didn't take effect. I agree that the larger blocksize is of dubious value. People used to do that back when the blocksize limited your row width, but these days I think you're probably best off with the standard 8K. Another thing that's fairly striking is the huge bursts of WAL activity --- your trace shows that the thing is writing entire WAL segments (16 MB) at one go, rather than dribbling it out a page or two at a time as the code is intended to do. I think what is happening is that you have wal_buffers = 1024 (correct?) yielding 32MB of WAL buffers, and since there are no other transactions happening, nothing gets written until you hit the "write when the buffers are half full" heuristic. I would suggest knocking wal_buffers down to something closer to the default (maybe 4 or 8 buffers) to reduce these I/O storms. (Memo to hackers: we need to see to it that the new background writer process takes some responsibility for pushing out filled WAL buffers, not only data buffers.) If the big EXPLAIN ANALYZE is still running, would you get a dump of its open files (see "lsof -p") and correlate those with the tables being used in the query? I'm trying to figure out what the different writes and reads represent. regards, tom lane
while you weren't looking, Tom Lane wrote: > I hate to break it to you, but that most definitely means you are > running with BLCKSZ = 32K. Whatever you thought you were rebuilding > didn't take effect. I saw that and thought so. The other day, I was rooting around in $PGDATA, and saw a lot of 32K files and wondered for a moment, too. If that's the case, though, that's ... weird. > I agree that the larger blocksize is of dubious value. People used to > do that back when the blocksize limited your row width, but these days > I think you're probably best off with the standard 8K. I'd been experimenting with larger blocksizes after we started seeing a lot of seqscans in query plans. 32K proved quickly that it hurts index scan performance, so I was--I thought--trying 16. > If the big EXPLAIN ANALYZE is still running, would you get a dump of its > open files (see "lsof -p") and correlate those with the tables being > used in the query? I'm trying to figure out what the different writes > and reads represent. It looks rather like it's hitting the foreign keys; one of the files that shows is the account.note table, which has an fk to the pk of the table being updated. The file's zero size, but it's open. The only reason it should be open is if foreign keys are being checked, yes? You'd said that the foreign keys were only checked if last-change is after current-query, as of 7.3.4, yes? `rpm -qa postgresql` comes up with 7.3.2-3, which makes no sense, 'cos I know I removed it before installing current; I remember making sure no-one was using pg on this machine, and remember saying rpm -e. Regardless, something thinks it's still there. Is there any way that it is, and that I've somehow been running 7.3.2 all along? `which psql`, &c show the bindir from my configure, but I'm not sure that's sufficient. How would I tell? I don't remember any of the binaries having a --version argument. /rls -- Rosser Schwarz Total Card, Inc.
I wrote: > Regardless, something thinks it's still there. Is there any way that > it is, and that I've somehow been running 7.3.2 all along? `which > psql`, &c show the bindir from my configure, but I'm not sure that's > sufficient. The weird thing is that I know I never built 7.3.anything with 32K BLCKSZ, never built 7.3.anything at all. If 7.3 were installed, would it have any problem reading a 7.4 cluster? /rls -- Rosser Schwarz Total Card, Inc.
"Rosser Schwarz" <rschwarz@totalcardinc.com> writes: >> Regardless, something thinks it's still there. Is there any way that >> it is, and that I've somehow been running 7.3.2 all along? `which >> psql`, &c show the bindir from my configure, but I'm not sure that's >> sufficient. "select version()" is the definitive test for backend version. > The weird thing is that I know I never built 7.3.anything with 32K > BLCKSZ, never built 7.3.anything at all. If 7.3 were installed, would > it have any problem reading a 7.4 cluster? 7.3 would refuse to start on a 7.4 cluster, and vice versa. regards, tom lane
I've been following this thread closely as I have the same problem with an UPDATE. Everything is identical here right down to the strace output. Has anyone found a workaround or resolved the problem? If not, I have test systems here which I can use to help up test and explore. Greg -- Greg Spiegelberg Sr. Product Development Engineer Cranel, Incorporated. Phone: 614.318.4314 Fax: 614.431.8388 Email: gspiegelberg@Cranel.com Cranel. Technology. Integrity. Focus.
Greg Spiegelberg wrote: > I've been following this thread closely as I have the same problem > with an UPDATE. Everything is identical here right down to the > strace output. > Has anyone found a workaround or resolved the problem? If not, > I have test systems here which I can use to help up test and explore. I'm still gathering data. The explain analyze I'd expected to finish Thursday afternoon hasn't yet. I'm going to kill it and try a few smaller runs, increasing in size, until the behavior manifests. Will advise. /rls
Rosser Schwarz wrote: > Greg Spiegelberg wrote: > > >>I've been following this thread closely as I have the same problem >>with an UPDATE. Everything is identical here right down to the >>strace output. > > >>Has anyone found a workaround or resolved the problem? If not, >>I have test systems here which I can use to help up test and explore. > > > I'm still gathering data. The explain analyze I'd expected to finish > Thursday afternoon hasn't yet. I'm going to kill it and try a few > smaller runs, increasing in size, until the behavior manifests. > > Will advise. I've replaced my atrocious UPDATE with the following. begin; -- Drop all contraints alter table ORIG drop constraint ...; -- Drop all indexes drop index ...; -- Update update ORIG set column=... where...; commit; Problem is when I recreate the indexes and add the constraints back on ORIG I end up with the same long running process. The original UPDATE runs for about 30 minutes on a table of 400,000 with the WHERE matching about 70% of the rows. The above runs for about 2 minutes without adding the constraints or indexes however adding the constraints and creating the dropped indexes negates any gain. RedHat 7.3 + Kernel 2.4.24 + ext3 + PostgreSQL 7.3.5 Dual PIII 1.3'ishGHz, 2GB Memory U160 OS drives and a 1Gbps test SAN on a Hitachi 9910 Greg -- Greg Spiegelberg Sr. Product Development Engineer Cranel, Incorporated. Phone: 614.318.4314 Fax: 614.431.8388 Email: gspiegelberg@Cranel.com Cranel. Technology. Integrity. Focus.
Greg Spiegelberg <gspiegelberg@cranel.com> writes: > RedHat 7.3 + Kernel 2.4.24 + ext3 + PostgreSQL 7.3.5 ^^^^^^^^^^^^^^^^ Please try 7.4. regards, tom lane
Greg Spiegelberg wrote: > > Will advise. After creating 100, 1K, 10K, 100K and 1M-row subsets of account.cust and the corresponding rows/tables with foreign key constraints referring to the table, I'm unable to reproduce the behavior at issue. explain analyze looks like the following, showing the query run with the join column indexed and not, respectively: # explain analyze update test.cust100 set prodid = tempprod.prodid, subprodid = tempprod.subprodid where origid = tempprod.debtid; -- with index QUERY PLAN ----------------------------------------------------------------------- Merge Join (cost=0.00..25.64 rows=500 width=220) (actual time=0.241..13.091 rows=100 loops=1) Merge Cond: (("outer".origid)::text = ("inner".debtid)::text) -> Index Scan using ix_origid_cust100 on cust100 (cost=0.00..11.50 rows=500 width=204) (actual time=0.125..6.465 rows=100 loops=1) -> Index Scan using ix_debtid on tempprod (cost=0.00..66916.71 rows=4731410 width=26) (actual time=0.057..1.497 rows=101 loops=1) Total runtime: 34.067 ms (5 rows) -- without index QUERY PLAN ---------------------------------------------------------------------- Merge Join (cost=7.32..16.71 rows=100 width=220) (actual time=4.415..10.918 rows=100 loops=1) Merge Cond: (("outer".debtid)::text = "inner"."?column22?") -> Index Scan using ix_debtid on tempprod (cost=0.00..66916.71 rows=4731410 width=26) (actual time=0.051..1.291 rows=101 loops=1) -> Sort (cost=7.32..7.57 rows=100 width=204) (actual time=4.311..4.450 rows=100 loops=1) Sort Key: (cust100.origid)::text -> Seq Scan on cust100 (cost=0.00..4.00 rows=100 width=204) (actual time=0.235..2.615 rows=100 loops=1) Total runtime: 25.031 ms (7 rows) With the join column indexed, it takes roughly .32ms/row on the first four tests (100.. 100K), and about .48ms/row on 1M rows. Without the index, it runs 100 rows @ .25/row, 1000 @ .26, 10000 @ .27, 100000 @ .48 and .5 @ 1M rows. In no case does the query plan reflect foreign key validation. Failing any other suggestions for diagnosis in the soon, I'm going to nuke the PostgreSQL install, scour it from the machine and start from scratch. Failing that, I'm going to come in some weekend and re-do the machine. > Problem is when I recreate the indexes and add the constraints back > on ORIG I end up with the same long running process. The original > UPDATE runs for about 30 minutes on a table of 400,000 with the > WHERE matching about 70% of the rows. The above runs for about 2 > minutes without adding the constraints or indexes however adding the > constraints and creating the dropped indexes negates any gain. Is this a frequently-run update? In my experience, with my seemingly mutant install, dropping indices and constraints to shave 14/15 off the update time would be worth the effort. Just script dropping, updating and recreating into one large transaction. It's a symptom-level fix, but re-creating the fifteen indices on one of our 5M row tables doesn't take 28 minutes, and your hardware looks to be rather less IO and CPU bound than ours. I'd also second Tom's suggestion of moving to 7.4. /rls
After deinstalling and scrubbing PostgreSQL from my server and doing a clean build using a vanilla 7.4.2 tree, I'm rather more confident that foreign key validation is at cause in my performance problems. I recreated my schemas and ran the original update, with foreign keys referring to the identity column of the target table. The update took roughly two days, as I'd predicted based on my analysis of the previous installation. (I can't say how long with certainty, beyond that it finished some time between when I left work one night and came in the next morning, the second day after starting the query.) I'm not sure what was wrong with the previous install, such that the update took several days; two-ish days is long enough. Just this morning, however, I created a copy of the target table (all 4.7M rows), with absolutely no foreign keys referring to it, and ran the update against the copy. That update took 2300 seconds. The join columns were indexed in both cases. I'm in the process of migrating the machine to run kernel 2.6.4, following the thread started by Gary, though I suspect that the kernel revision is moot with respect to whether or not foreign keys are being incorrectly validated. I can keep the 2.4 kernel and modules around to run using the current versions for testing purposes, though any such work would necessarily be off-hours. Please advise of anything I can do to help narrow down the specific cause of the issue; I know just enough C to be mildly dangerous. /rls -- Rosser Schwarz Total Card, Inc.
On 5 Apr 2004 at 12:05, Rosser Schwarz wrote: > Just this morning, however, I created a copy of the target table (all > 4.7M rows), with absolutely no foreign keys referring to it, and ran > the update against the copy. That update took 2300 seconds. The > join columns were indexed in both cases. Have you added indexes for the custid column for tables account.acct accunt.orgacct and note? I haven't followed the entire thread but it you have cascading FK on those tables without an index on the column that could cause your delay. Kevin Barnard SpeedFC
while you weren't looking, Kevin Barnard wrote: > Have you added indexes for the custid column for tables > account.acct accunt.orgacct and note? They were indexed in the original case, yes. There was no need to index them in today's test case, as that was done purely in attempt to rule in or out foreign key validation as the cause of the performance hit. No foreign keys that might be validated, no need to index the foreign key columns. > I haven't followed the entire thread but it you have > cascading FK on those tables without an index on the > column that could cause your delay. The issue is that the foreign keys are being validated at all, when the column being referenced by those foreign keys (account.cust.custid) is never touched. Regardless of whether or not the referencing columns are indexed, validating them at all--in this specific case--is broken. The column they refer to is never touched; they should remain utterly ignorant of whatever happens to other columns in the same row. /rls -- Rosser Schwarz Total Card, Inc.
On Mon, 5 Apr 2004, Kevin Barnard wrote: > On 5 Apr 2004 at 12:05, Rosser Schwarz wrote: > > > Just this morning, however, I created a copy of the target table (all > > 4.7M rows), with absolutely no foreign keys referring to it, and ran > > the update against the copy. That update took 2300 seconds. The > > join columns were indexed in both cases. > > Have you added indexes for the custid column for tables account.acct accunt.orgacct > and note? > > I haven't followed the entire thread but it you have cascading FK on those tables > without an index on the column that could cause your delay. also make sure the fk/pk types match, or the index likely won't get used anyway.
Hi, We have got a G5 64-bit processor to replace an old G4 32-bit processor. Given everything else equal, should we see a big improvement on PG's performance? The other question I have is that, when I tried different size for shared_buffer ( i used 10,000, 1,000, 528, 256) and Max connections=32, it gives me error when I tried to start PG using pg_ctl start as postgres. It kept saying this is bigger than the system Shared Memory. So finally I started PG using SystemStarter start PostgreSQL and it seems starting OK. Any idea? Thanks a lot! Qing Zhao
On Mon, 5 Apr 2004, Rosser Schwarz wrote: > while you weren't looking, Kevin Barnard wrote: > > > Have you added indexes for the custid column for tables > > account.acct accunt.orgacct and note? > > They were indexed in the original case, yes. There was no > need to index them in today's test case, as that was done > purely in attempt to rule in or out foreign key validation > as the cause of the performance hit. No foreign keys that > might be validated, no need to index the foreign key columns. > > > I haven't followed the entire thread but it you have > > cascading FK on those tables without an index on the > > column that could cause your delay. > > The issue is that the foreign keys are being validated at > all, when the column being referenced by those foreign keys > (account.cust.custid) is never touched. > > Regardless of whether or not the referencing columns are > indexed, validating them at all--in this specific case--is > broken. The column they refer to is never touched; they > should remain utterly ignorant of whatever happens to other > columns in the same row. It shouldn't be checking the other table if the values of the key column hadn't changed. The ri_KeysEqual check should be causing it to return just before actually doing the check on the other table (it still does a few things before then but nothing that should be particularly expensive). In some simple tests on my 7.4.2 machine, this appears to work for me on pk cascade updates. It would be interesting to know if it's actually doing any checks for you, you might be able to poke around the triggers (backend/utils/adt/ri_triggers.c).
Qing Zhao <qzhao@quotefx.net> writes: > We have got a G5 64-bit processor to replace an old G4 32-bit > processor. Given everything else equal, should we see a big > improvement on PG's performance? Nope. Database performance typically depends on disk performance first, and RAM size second. A 64-bit processor might help by allowing you to install more RAM, but you didn't say that you had. > The other question I have is that, when I tried different size for > shared_buffer ( i used 10,000, 1,000, 528, 256) and Max > connections=32, it gives me error when I tried to start PG using > pg_ctl start as postgres. It kept saying this is bigger than the > system Shared Memory. Out-of-the-box, Mac OS X has a very low SHMMAX limit. See the PG admin docs or the mail list archives about how to increase it. You should do this --- most people find that you want to set shared_buffers to 1000 or 10000 or so for best performance. regards, tom lane
On Tue, Apr 06, 2004 at 01:47:22AM -0400, Tom Lane wrote: > Qing Zhao <qzhao@quotefx.net> writes: > > We have got a G5 64-bit processor to replace an old G4 32-bit > > processor. Given everything else equal, should we see a big > > improvement on PG's performance? > > Nope. Database performance typically depends on disk performance first, > and RAM size second. A 64-bit processor might help by allowing you to > install more RAM, but you didn't say that you had. Memory bandwidth is a consideration too, so you might see some performance improvements on a G5. We recently debated between Xeons and Opterons in a new PGSQL server and a little poking around on the lists indicated that the Opterons did perform better, presumably due to the increased memory bandwidth. Incidentally, this is why you need about 2x the CPUs on Sun hardware vs RS6000 hardware for database stuff (and that gets expensive if you're paying per CPU!). -- Jim C. Nasby, Database Consultant jim@nasby.net Member: Triangle Fraternity, Sports Car Club of America Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
----- Original Message ----- From: "Tom Lane" <tgl@sss.pgh.pa.us> To: "Qing Zhao" <qzhao@quotefx.net> Cc: <pgsql-performance@postgresql.org> Sent: Tuesday, April 06, 2004 1:47 AM Subject: Re: [PERFORM] possible improvement between G4 and G5 > Qing Zhao <qzhao@quotefx.net> writes: > > We have got a G5 64-bit processor to replace an old G4 32-bit > > processor. Given everything else equal, should we see a big > > improvement on PG's performance? > > Nope. Database performance typically depends on disk performance first, > and RAM size second. I'm surprised by this thought. I tend to hit CPU bottlenecks more often than I/O ones. In most applications, db I/O is a combination of buffer misses and logging, which are both reasonably constrained. RAM size seems to me to be the best way to improve performance, and then CPU which is needed to perform the in-memory searching, locking, versioning, and processing, and finally I/O (this is not the case in small I/O subsystems - if you have less than a dozen drives, you're easily I/O bound). I/O is often the thing I tune first, because I can do it in place without buying hardware. Conceptually, an RDBMS converts slow random I/O into in memory processing and sequential logging writes. If successful, it should reduce the I/O overhead. /Aaron
Aaron, > I'm surprised by this thought. I tend to hit CPU bottlenecks more often than > I/O ones. In most applications, db I/O is a combination of buffer misses and > logging, which are both reasonably constrained. Not my experience at all. In fact, the only times I've seen modern platforms max out the CPU was when: a) I had bad queries with bad plans, or b) I had reporting queires that did a lot of calculation for display (think OLAP). Otherwise, on the numerous servers I administrate, RAM spikes, and I/O bottlenecks, but the CPU stays almost flat. Of course, most of my apps are large databases (i.e. too big for RAM) with a heavy transaction-processing component. What kind of applications are you running? -- -Josh Berkus Aglio Database Solutions San Francisco
----- Original Message ----- From: "Josh Berkus" <josh@agliodbs.com> To: "Aaron Werman" <awerman2@hotmail.com>; "Qing Zhao" <qzhao@quotefx.net>; "Tom Lane" <tgl@sss.pgh.pa.us> Cc: <pgsql-performance@postgresql.org> Sent: Tuesday, April 06, 2004 2:52 PM Subject: Re: [PERFORM] possible improvement between G4 and G5 > Aaron, > > > I'm surprised by this thought. I tend to hit CPU bottlenecks more often than > > I/O ones. In most applications, db I/O is a combination of buffer misses and > > logging, which are both reasonably constrained. > > Not my experience at all. In fact, the only times I've seen modern platforms > max out the CPU was when: > a) I had bad queries with bad plans, or > b) I had reporting queires that did a lot of calculation for display (think > OLAP). > > Otherwise, on the numerous servers I administrate, RAM spikes, and I/O > bottlenecks, but the CPU stays almost flat. > > Of course, most of my apps are large databases (i.e. too big for RAM) with a > heavy transaction-processing component. > > What kind of applications are you running? > > -- > -Josh Berkus > Aglio Database Solutions > San Francisco > > <hot air> I do consulting, so they're all over the place and tend to be complex. Very few fit in RAM, but still are very buffered. These are almost all backed with very high end I/O subsystems, with dozens of spindles with battery backed up writethrough cache and gigs of buffers, which may be why I worry so much about CPU. I have had this issue with multiple servers. Consider an analysis db with 10G data. Of that, 98% of the access is read and only 2% write (that is normal for almost anything that is not order entry, even transaction processing with thorough cross validation). Almost all the queries access 10%, or 1G of the data. Of the reads, they average ~3 level b-trees, with the first 2 levels certainly cached, and the last ones often cached. Virtually all the I/O activity is logical reads against buffer. A system with a 100 transactions which on average access 200 rows does 98% of 200 rows x 100 transactions x 3 logical I/Os per read = 58,800 logical reads, of which actually maybe a hundred are physical reads. It also does 2% of 200 rows x 100 transactions x (1 table logical I/O and say 2 index logical writes) per write = 1,200 logical writes to log, of which there are 100 transaction commit synch writes, and in reality less than that because of queuing against logs (there are also 1,200 logical writes deferred to checkpoint, of which it is likely to only be 40 physical writes because of page overlaps). Transaction processing is a spectrum between activity logging, and database centric design. The former, where actions are stored in the database is totally I/O bound with the engine acting as a thin layer of logical to physical mapping. Database centric processing makes the engine a functional server of discrete actions - and is a big CPU hog. What my CPU tends to be doing is a combination of general processing, complex SQL processing: nested loops and sorting and hashing and triggers and SPs. I'm curious about you having flat CPU, which is not my experience. Are your apps mature and stable? </hot air> /Aaron
Aaron, > I do consulting, so they're all over the place and tend to be complex. Very > few fit in RAM, but still are very buffered. These are almost all backed > with very high end I/O subsystems, with dozens of spindles with battery > backed up writethrough cache and gigs of buffers, which may be why I worry > so much about CPU. I have had this issue with multiple servers. Aha, I think this is the difference. I never seem to be able to get my clients to fork out for adequate disk support. They are always running off single or double SCSI RAID in the host server; not the sort of setup you have. > What my CPU tends to be doing is a combination of general processing, > complex SQL processing: nested loops and sorting and hashing and triggers > and SPs. I haven't noticed SPs to be particularly CPU-hoggish, more RAM. > I'm curious about you having flat CPU, which is not my experience. Are your > apps mature and stable? Well, "flat" was a bit of an exaggeration ... there are spikes ... but average CPU load is < 30%. I think the difference is that your clients listen to you about disk access. Mine are all too liable to purchase a quad-Xeon machine but with an Adaptec RAID-5 card with 4 drives, and *then* call me and ask for advice. As a result, most intensive operations don't tend to swamp the CPU because they are waiting for disk. I have noticed the limitiations on RAM for 64 vs. 32, as I find it easier to convince a client to get 8GB RAM than four-channel RAID with 12 drives, mostly because the former is cheaper. Linux 2.4 + Bigmem just doesn't cut it for making effective use of > 3GB of RAM. -- Josh Berkus Aglio Database Solutions San Francisco
>>>>> "JB" == Josh Berkus <josh@agliodbs.com> writes: JB> Aaron, >> I do consulting, so they're all over the place and tend to be complex. Very >> few fit in RAM, but still are very buffered. These are almost all backed >> with very high end I/O subsystems, with dozens of spindles with battery >> backed up writethrough cache and gigs of buffers, which may be why I worry >> so much about CPU. I have had this issue with multiple servers. JB> Aha, I think this is the difference. I never seem to be able to JB> get my clients to fork out for adequate disk support. They are JB> always running off single or double SCSI RAID in the host server; JB> not the sort of setup you have. Even when I upgraded my system to a 14-spindle RAID5 with 128M cache and 4GB RAM on a dual Xeon system, I still wind up being I/O bound quite often. I think it depends on what your "working set" turns out to be. My workload really spans a lot more of the DB than I can end up caching. -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Vivek Khera, Ph.D. Khera Communications, Inc. Internet: khera@kciLink.com Rockville, MD +1-301-869-4449 x806 AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/