Re: Postgres performance slowly gets worse over a month - Mailing list pgsql-admin
From | Marc Mitchell |
---|---|
Subject | Re: Postgres performance slowly gets worse over a month |
Date | |
Msg-id | 015d01c23281$2ce57380$6f01050a@eisolution.com Whole thread Raw |
In response to | Re: Postgres performance slowly gets worse over a month (Naomi Walker <nwalker@eldocomp.com>) |
Responses |
Re: Postgres performance slowly gets worse over a month
|
List | pgsql-admin |
We recently ran into a similar problem with 7.1. Let me start by saying: knowing the SQLs you are running, knowing "EXPLAIN" and knowing pgmonitor go a LONG way in helping you identify the problem. In our case, the problem came down to this: we had a table ("TABLE_A") with ~200,000 rows that was heavily used (1000 rows added per day). The most common access to this table was via a foreign key of type int4 where any select for a specific value would return normally 1 and never more than 10 rows. Obviously, there was an index on this foreign key and the index was used (resulting in an "INDEXED SCAN") in most cases. However, the problem was that half the rows in "TABLE_A" had no foreign key relationship and so had this field set to zero. No one would ever do a select asking for all rows where foreign key = 0 as that didn't make "Business sense". But, since the database isn't aware of the business meaning of our data, the stats gathered by the ANALYSE would think that the overall distribution of the column for a unique value was between 2000 and 3000 rows. Mathematically correct, but in reality not useful. The result was that as the table grew and/or based on differences in the sample of rows from the most recent nightly VUCUUM, the cost of an INDEX SCAN (# of estimated rows * random_page_cost) would exceed the cost of a straight SEQ SCAN (# of data pages in table). Thus, many applications would start scanning the entire table. While any one scan would take 4 seconds (vs. fractions of a second using the index) and that wasn't too bad, the result of many users doing this many times quickly made our machine exhibit the same behavior that you describe. Our short term fix was to turn down the value of random_page_cost. However, as Tom Lane very rightly noted in response to a similar posting, this is a total hack. Our goal is to switch to 7.2 in the hopes that the "WHERE" extension to the "CREATE INDEX" command coupled with greater control of the sample space used in statistics will be the true answer. We also intend to look into whether setting this optional foreign key to be nullable may effect things in 7.1 . Overall, my point is that we were in a position where Postgres was operating exactly as it was intended. There really wasn't any tuning to be corrected or log file with a silver bullet message starting what to change. It was all a matter of seeing what the load on the system was and why it was taking as long as it was. My advice: EXPLAIN is your friend. pgmonitor is your friend. Of course, that's just my opinion - I could be wrong... Marc Mitchell - Senior Technical Architect Enterprise Information Solutions, Inc. 4910 Main Street Downers Grove, IL 60515 marcm@eisolution.com ----- Original Message ----- From: "Robert M. Meyer" <rmeyer@installs.com> To: "Naomi Walker" <nwalker@eldocomp.com> Cc: <pgsql-admin@postgresql.org> Sent: Tuesday, July 23, 2002 1:29 PM Subject: Re: [ADMIN] Postgres performance slowly gets worse over a month > Well, we're running a hardware, three disk RAID5, on an > sym53c896-0-<2,*>: FAST-20 WIDE SCSI 40.0 MB/s interface. This is a > Compaq 3500 system with a CR3500 raid controller. An md5sum of a 1.2Gig > file takes less than two minutes. > > We tried rebuilding the indices but that didn't seem to help. We had an > outside consultant do the rebuild and he's not available now so I don't > know what command he ran to do it. > > I've never used 'sar'. If you can believe it, I've been poking around > with Unix for the last 20 years and I've never even seen the 'man' page > for 'sar'. I probably should look into it. What flags would give me > the most information to help figger out what's going on here? > > Of course the troubleshooting increment is going to be a month or more > so this will probably take some time to resolve :-) > > Cheers! > > Bob > > > On Tue, 2002-07-23 at 14:08, Naomi Walker wrote: > > > > >Nightly, we're doing a 'vacuumdb -a -z' after stopping and restarting > > >the database. The performance will get so bad after a month that we > > >start to see load spikes in excess of 30. Normally, we don't see load > > >over 2.5 during the heaviest activity and generally less than 1.0 most > > >of the time. > > Typically, performance is linked to your I/O, but my first guess in this > > case has to do with your indices. As a test, next time performance gets > > really rotten, drop your indicies and rebuild them. It cannot hurt, and > > might just help. > > > > The trick here is to see what is happening while it is tanking. What does > > your disk configuration look like? Is it a raid or stripe where reads are > > spread out among more than one controller? Do sar reports point to > > anything in particular? > > > > ------------------------------------------------------------------------- --- > > ---------------------------------- > > Naomi Walker > > Eldorado Computing, Inc > > Chief Information Officer > > nwalker@eldocomp.com > > 602-604-3100 x242 > > > > > -- > Robert M. Meyer > Sr. Network Administrator > DigiVision Satellite Services > 14 Lafayette Sq, Ste 410 > Buffalo, NY 14203-1904 > (716)332-1451 > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly
pgsql-admin by date: