Thread: postgresql meltdown on PlanetMath.org
Hi all. As the topic suggests, I am having fairly critical troubles with postgresql on PlanetMath.org (a site which I run). You can go there and try to pull up some entries and you will see the problem: everything is incredibly slow. It is hard to pinpoint when this began happening, but I've tried a variety of optimizations to fix it, all of which have failed. First: the machine. The machine is not too spectactular, but it is not so bad that the performance currently witnessed should be happening. It is a dual PIII-650 with 512MB of RAM and a 20gb IDE drive (yes, DMA is on). There is plenty of free space on the drive. Now, the optimisations I have tried: - Using hash indices everywhere. A few months ago, I did this, and there was a dramatic and instant speed up. However, this began degenerating. I also noticed in the logs that there was deadlock happening all over the place. The server response time was intolerable so I figured the deadlock might have something to do with this, and eliminated all hash indices (replaced with normal BTree indices). - Going back to BTrees yielded a temporary respite, but soon enough the server was back to half a minute to pull up an already-cached entry, which is of course crazy. - I then tried increasing the machines shared memory max to 75% of the physical memory, and scaled postgresql's buffers accordingly. This also sped things up for a while, but again resulted in eventual degeneration. Even worse, there were occasional crashes due to running out of memory that (according to my calculations) shouldn't have been happening. - Lastly, I tried reducing the shared memory max and limiting postgresql to more conservative values, although still not to the out-of-box values. Right now shared memory max on the system is 128mb, postgres's shared buffers are at 64mb, sort_mem is at 16mb, and effective cache size is at 10mb. For perspective, the size of the PlanetMath database dump is 24mb. It should be able to fit in memory easily, so I'm not sure what I'm doing wrong regarding the caching. For the most trivial request, Postgresql takes up basically all the CPU for the duration of the request. The load average of the machine is over-unity at all times, sometimes as bad as being the 30's. None of this happens without postgres running, so it is definitely the culprit. The site averages about one hit every twenty seconds. This should not be an overwhelming load, especially for what is just pulling up cached information 99% of the time. Given this scenario, can anyone advise? I am particularly puzzled as to why everything I tried initially helped, but always degenerated rather rapidly to a near standstill. It seems to me that everything should be able to be cached in memory with no problem, perhaps I need to force this more explicitly. My next step, if I cannot fix this, is to try mysql =( Anyway, whoever helps would be doing a great service to many who use PlanetMath =) It'd be much appreciated. Aaron Krowne
> As the topic suggests, I am having fairly critical troubles with > postgresql on PlanetMath.org (a site which I run). You can go there and > try to pull up some entries and you will see the problem: everything is > incredibly slow. Have you read the following? http://developer.postgresql.org/docs/postgres/performance-tips.html > First: the machine. The machine is not too spectactular, but it is not > so bad that the performance currently witnessed should be happening. It > is a dual PIII-650 with 512MB of RAM and a 20gb IDE drive (yes, DMA is > on). There is plenty of free space on the drive. This shouldn't be an issue for the load you describe. A p-100 should be okay, but it depends on your queries that you're performing. > Now, the optimisations I have tried: *) Stick with btree's. > - I then tried increasing the machines shared memory max to 75% of the > physical memory, and scaled postgresql's buffers accordingly. This > also sped things up for a while, but again resulted in eventual > degeneration. Even worse, there were occasional crashes due to > running out of memory that (according to my calculations) shouldn't > have been happening. *) Don't do this, go back to near default levels. I bet this is hurting your setup. > - Lastly, I tried reducing the shared memory max and limiting postgresql > to more conservative values, although still not to the out-of-box > values. Right now shared memory max on the system is 128mb, > postgres's shared buffers are at 64mb, sort_mem is at 16mb, and > effective cache size is at 10mb. *) You shouldn't have to do this either. > For perspective, the size of the PlanetMath database dump is 24mb. > It should be able to fit in memory easily, so I'm not sure what I'm > doing wrong regarding the caching. I hate to say this, but this sounds like a config error. :-/ > For the most trivial request, Postgresql takes up basically all the > CPU for the duration of the request. The load average of the > machine is over-unity at all times, sometimes as bad as being the > 30's. None of this happens without postgres running, so it is > definitely the culprit. *) Send an EXPLAIN statement as specified here: http://developer.postgresql.org/docs/postgres/performance-tips.html#USING-EXPLAIN > The site averages about one hit every twenty seconds. This should not > be an overwhelming load, especially for what is just pulling up cached > information 99% of the time. *) Have you done a vacuum analyze? http://developer.postgresql.org/docs/postgres/populate.html#POPULATE-ANALYZE > Given this scenario, can anyone advise? I am particularly puzzled > as to why everything I tried initially helped, but always > degenerated rather rapidly to a near standstill. It seems to me > that everything should be able to be cached in memory with no > problem, perhaps I need to force this more explicitly. *) Send the EXPLAIN output and we can work from there. > My next step, if I cannot fix this, is to try mysql =( Bah, don't throw down the gauntlet, it's pretty clear this is a local issue and not a problem with the DB. :) -sc -- Sean Chittenden
Aaron Krowne <akrowne@vt.edu> writes: > As the topic suggests, I am having fairly critical troubles with > postgresql on PlanetMath.org (a site which I run). Um ... not meaning to insult your intelligence, but how often do you vacuum? Also, exactly what Postgres version are you running? Can you show us EXPLAIN ANALYZE results for some of the slow queries? regards, tom lane
Aaron Krowne wrote: > Given this scenario, can anyone advise? I am particularly puzzled as to > why everything I tried initially helped, but always degenerated rather > rapidly to a near standstill. It seems to me that everything should be > able to be cached in memory with no problem, perhaps I need to force > this more explicitly. Basic guidance: - Keep shared memory use reasonable; your final settings of 64M shared buffers and 16M sort_mem sound OK. In any case, be sure you're not disk-swapping. - If you don't already, run VACUUM ANALYZE on some regular schedule (how often depends on your data turnover rate) - Possibly consider running REINDEX periodically - Post the SQL and EXPLAIN ANALYZE output for the queries causing the worst of your woes to the list Explanations of these can be found by searching the list archives and reading the related sections of the manual. A few questions: - What version of Postgres? - Have you run VACUUM FULL ANALYZE lately (or at least VACUUM ANALYZE)? - Does the database see mostly SELECTs and INSERTs, or are there many UPDATEs and/or DELETEs too? - Are all queries slow, or particular ones? HTH, Joe
> - Keep shared memory use reasonable; your final settings of 64M shared > buffers and 16M sort_mem sound OK. In any case, be sure you're not > disk-swapping. Yeah, those seem like reasonable values to me. But I am not sure I'm not disk-swapping, in fact it is almost certainly going on here bigtime. > - If you don't already, run VACUUM ANALYZE on some regular schedule > (how often depends on your data turnover rate) I've done it here and there, especially when things seem slow. Never seems to help much; the data turnover isn't high. > - Possibly consider running REINDEX periodically Ok thats a new one, I'll try that out. > - Post the SQL and EXPLAIN ANALYZE output for the queries causing the > worst of your woes to the list > - Are all queries slow, or particular ones? I'm grouping two separate things together to reply to, because the second point answers the first: there's really no single culprit. Every SELECT has a lag on the scale of a second; resolving all of the foreign keys in various tables to construct a typical data-rich page piles up many of these. I'm assuming the badness of this depends on how much swapping is going on. > Explanations of these can be found by searching the list archives and > reading the related sections of the manual. Will check that out, thanks. > A few questions: > - What version of Postgres? 7.2.1 > - Have you run VACUUM FULL ANALYZE lately (or at least VACUUM ANALYZE)? Yes, after a particularly bad slowdown... it didn't seem to fix things. > - Does the database see mostly SELECTs and INSERTs, or are there many > UPDATEs and/or DELETEs too? Almost exclusively SELECTs. OK, I have just run a VACUUM FULL ANALYZE and things seem much better... which would be the first time its really made a difference =) I tried comparing an EXPLAIN ANALYZE of a single row select on the main objects table before and after the vacuum, and the plan didn't change (sequential scan still), but the response time went from ~1 second to ~5msec! I'm not really sure what could have happened here behind-the-scenes since it didn't start using the index, and there probably weren't more than 10% updated/added rows since the last VACUUM. I actually thought I had a task scheduled which was running a VACUUM periodically, but maybe it broke for some reason or another. Still, I have not been getting consistent results from running VACUUMs, so I'm not entirely confident that the book is closed on the problem. Thanks for your help. apk
> Have you read the following? > http://developer.postgresql.org/docs/postgres/performance-tips.html Yup. I would never go and bother real people without first checking the manual, but I bet you get a lot of that =) > This shouldn't be an issue for the load you describe. A p-100 should > be okay, but it depends on your queries that you're performing. Mostly just gather-retrieval based on unique identifier keys in a bunch of tables. Really mundane stuff. > *) Stick with btree's. Yeah, that saddens me, though =) When I initially switched to hashes, things were blazing. This application makes heavy use of keys and equal comparisons on indices, so hashes are really the optimal index structure. I'd like to be able to go back to using them some day... if not for the concurrency issue, which seems like it should be fixable (even having mutually exclusive locking on the entire index would probably be fine for this application and would prevent deadlock). > > - I then tried increasing the machines shared memory max to 75% of the > > physical memory, and scaled postgresql's buffers accordingly. This > *) Don't do this, go back to near default levels. I bet this is > hurting your setup. > > - Lastly, I tried reducing the shared memory max and limiting postgresql > > to more conservative values, although still not to the out-of-box > > values. Right now shared memory max on the system is 128mb, > > postgres's shared buffers are at 64mb, sort_mem is at 16mb, and > > effective cache size is at 10mb. > *) You shouldn't have to do this either. Well, I've now been advised that the best way is all 3 that I have tried (among aggressive buffering, moderate buffering, and default conservative buffering). Perhaps you could explain to me why the system shouldn't be ok with the moderate set of buffer sizes on a 512mb machine? I don't really know enough about the internals of postgres to be doing anything but voodoo when I change the values. > I hate to say this, but this sounds like a config error. :-/ Thats better than a hardware error! This is what I wanted to hear =) > *) Have you done a vacuum analyze? See previous message to list (summary: it worked this time, but usually it does not help.) Thanks, Aaron Krowne
> > > - Lastly, I tried reducing the shared memory max and limiting postgresql > > > to more conservative values, although still not to the out-of-box > > > values. Right now shared memory max on the system is 128mb, > > > postgres's shared buffers are at 64mb, sort_mem is at 16mb, and > > > effective cache size is at 10mb. > > *) You shouldn't have to do this either. > > Well, I've now been advised that the best way is all 3 that I have > tried (among aggressive buffering, moderate buffering, and default > conservative buffering). > > Perhaps you could explain to me why the system shouldn't be ok with > the moderate set of buffer sizes on a 512mb machine? I don't really > know enough about the internals of postgres to be doing anything but > voodoo when I change the values. Honestly? The defaults are small, but they're not small enough to give you the lousy performance you were describing. If your buffers are too high or there are enough things that are using up KVM/system memory... contention can cause thashing/swapping which it wasn't clear that you weren't having happen. Defaults shouldn't, under any non-embedded circumstance cause problems with machines >233Mhz, they're just too conservative to do any harm. :) > > *) Have you done a vacuum analyze? > > See previous message to list (summary: it worked this time, but > usually it does not help.) Hrmm... ENOTFREEBSD, eh? http://www.freebsd.org/cgi/cvsweb.cgi/ports/databases/postgresql7/files/502.pgsql?rev=1.5&content-type=text/x-cvsweb-markup You may want to setup a nightly vacuum/backup procedure. Palle Girgensohn <girgen@pingpong.net> has written a really nice and simple script that's been in use for ages on FreeBSD PostgreSQL installations for making sure that you don't have this problem. Actually, it'd be really cool to lobby to get this script added to the base PostgreSQL installation that way you wouldn't have this problem... it'd also dramatically increase the number of nightly backups performed for folks if a default script does this along with vacuuming. -sc -- Sean Chittenden
> You may want to setup a nightly vacuum/backup procedure. Palle > Girgensohn <girgen@pingpong.net> has written a really nice and simple > script that's been in use for ages on FreeBSD PostgreSQL installations > for making sure that you don't have this problem. > > Actually, it'd be really cool to lobby to get this script added to the > base PostgreSQL installation that way you wouldn't have this > problem... it'd also dramatically increase the number of nightly > backups performed for folks if a default script does this along with > vacuuming. -sc *Actually*, I just double checked, and I was not hallucinating: I *do* have a nightly vacuum script... because Debian postgres comes with it =) So, either it is broken, or doing a VACUUM FULL ANALYZE rather than just VACUUM ANALYZE made all the difference. Is this possible (the latter, we know the former is possible...)? apk
> > You may want to setup a nightly vacuum/backup procedure. Palle > > Girgensohn <girgen@pingpong.net> has written a really nice and simple > > script that's been in use for ages on FreeBSD PostgreSQL installations > > for making sure that you don't have this problem. > > > > Actually, it'd be really cool to lobby to get this script added to the > > base PostgreSQL installation that way you wouldn't have this > > problem... it'd also dramatically increase the number of nightly > > backups performed for folks if a default script does this along with > > vacuuming. -sc > > *Actually*, I just double checked, and I was not hallucinating: I *do* > have a nightly vacuum script... because Debian postgres comes with it =) Cool, glad to hear other installations are picking up doing this. > So, either it is broken, or doing a VACUUM FULL ANALYZE rather than just > VACUUM ANALYZE made all the difference. Is this possible (the latter, > we know the former is possible...)? You shouldn't have to do a VACUUM FULL. Upgrade your PostgreSQL installation if you can (most recent if possible), there have been many performance updates and VACUUM fixes worth noting. Check the release notes starting with your version and read through them up to the current release... you'll be amazed at all the work that's been done, some of which it looks like may affect your installation. http://developer.postgresql.org/docs/postgres/release.html -sc -- Sean Chittenden
Aaron Krowne <akrowne@vt.edu> writes: > So, either it is broken, or doing a VACUUM FULL ANALYZE rather than just > VACUUM ANALYZE made all the difference. Is this possible (the latter, > we know the former is possible...)? If your FSM parameters in postgresql.conf are too small, then plain vacuums might have failed to keep up with the available free space, leading to a situation where vacuum full is essential. Did you happen to notice whether the vacuum full shrunk the database's disk footprint noticeably? regards, tom lane
Aaron Krowne wrote: >>- What version of Postgres? > 7.2.1 You should definitely look at upgrading, at least to 7.2.4 (which you can do without requiring a dump/reload cycle), but better yet to 7.3.2 (which will require a dump/reload cycle). I don't know that will fix you specific issue, but there were some critical bug fixes between 7.2.1 and 7.2.4. >>- Does the database see mostly SELECTs and INSERTs, or are there many >> UPDATEs and/or DELETEs too? > > Almost exclusively SELECTs. > > OK, I have just run a VACUUM FULL ANALYZE and things seem much better... Hmmm, do you periodically do large updates or otherwise turn over rows in batches? > which would be the first time its really made a difference =) I tried > comparing an EXPLAIN ANALYZE of a single row select on the main objects > table before and after the vacuum, and the plan didn't change > (sequential scan still), but the response time went from ~1 second to > ~5msec! I'm not really sure what could have happened here > behind-the-scenes since it didn't start using the index, and there > probably weren't more than 10% updated/added rows since the last VACUUM. If your app is mostly doing equi-lookups by primary key, and indexes aren't being used (I think I saw you mention that on another post), then something else is still wrong. Please pick one or two typical queries that are doing seq scans and post the related table definitions, indexes, SQL, and EXPLAIN ANALYZE. I'd bet you are getting bitten by a datatype mismatch or something. Joe
> - Lastly, I tried reducing the shared memory max and limiting postgresql > to more conservative values, although still not to the out-of-box > values. Right now shared memory max on the system is 128mb, > postgres's shared buffers are at 64mb, sort_mem is at 16mb, and > effective cache size is at 10mb. I found that 5000 shared buffers was best performance on my system. However, your problems are probably due to maybe not running vacuum, analyze, reindex, etc. Your queries may not be effectively indexed - EXPLAIN ANALYZE them all. Chris
I don't know what your definition of "high" is, but I do find that turnover can degrade performance over time. Perhaps one of the devs can enlighten me, but I have a database that turns over ~100,000 rows/day that does appear to slowly get worse. The updates are done in batches and I "VACUUM" and "VACUUM ANALYZE" after each batch (three/day) but I found that over time simple queries would start to hit the disk more and more. A "select count(*) FROM tblwordidx" initially took about 1 second to return a count of 2 million but after a few months it took several minutes of really hard HDD grinding. Also, the database only had a couple hundred megs of data in it, but the db was taking up 8-9 GB of disk space. I'm thinking data fragmentation is ruining cache performance? When I did a dump restore and updated from 7.2.1 to 7.3.1 queries were zippy again. But, now it is starting to slow... I have yet to measure the effects of a VACUUM FULL, however. I'll try it an report back... Logan Bowers On Sun, 16 Mar 2003, Aaron Krowne wrote: <snip> > I've done it here and there, especially when things seem slow. Never > seems to help much; the data turnover isn't high. > <snip>
> I don't know what your definition of "high" is, but I do find that > turnover can degrade performance over time. Perhaps one of the devs > can enlighten me, but I have a database that turns over ~100,000 > rows/day that does appear to slowly get worse. The updates are done > in batches and I "VACUUM" and "VACUUM ANALYZE" after each batch > (three/day) but I found that over time simple queries would start to > hit the disk more and more. Creeping index syndrome. Tom recently fixed this in HEAD. Try the latest copy from the repo and see if this solves your problems. > A "select count(*) FROM tblwordidx" initially took about 1 second to > return a count of 2 million but after a few months it took several > minutes of really hard HDD grinding. That's because there are dead entries in the index that weren't being reused or cleaned up. As I said, this has been fixed. -sc PS It's good to see you around again. :) -- Sean Chittenden
Sean Chittenden said: >> A "select count(*) FROM tblwordidx" initially took about 1 second to >> return a count of 2 million but after a few months it took several >> minutes of really hard HDD grinding. > > That's because there are dead entries in the index that weren't being > reused or cleaned up. As I said, this has been fixed. That's doubtful: "select count(*) FROM foo" won't use an index. There are a bunch of other factors (e.g. dead heap tuples, changes in the pages cached in the buffer, disk fragmentation, etc.) that could effect performance in that situation, however. Cheers, Neil
> >> A "select count(*) FROM tblwordidx" initially took about 1 second to > >> return a count of 2 million but after a few months it took several > >> minutes of really hard HDD grinding. > > > > That's because there are dead entries in the index that weren't being > > reused or cleaned up. As I said, this has been fixed. > > That's doubtful: "select count(*) FROM foo" won't use an > index. There are a bunch of other factors (e.g. dead heap tuples, > changes in the pages cached in the buffer, disk fragmentation, etc.) > that could effect performance in that situation, however. *blush* Yeah, jumped the gun on that when I read that queries were getting slower (churn of an index == slow creaping death for performance). A SELECT COUNT(*), however, wouldn't be affected by the index growth problem. Is the COUNT() on a view that uses an index? I haven't had any real problems with this kind of degredation outside of indexes. :-/ -sc -- Sean Chittenden
"Neil Conway" <neilc@samurai.com> writes: > Sean Chittenden said: > A "select count(*) FROM tblwordidx" initially took about 1 second to > return a count of 2 million but after a few months it took several > minutes of really hard HDD grinding. >> >> That's because there are dead entries in the index that weren't being >> reused or cleaned up. As I said, this has been fixed. > That's doubtful: "select count(*) FROM foo" won't use an index. To know what's going on, as opposed to guessing about it, we'd need to know something about the physical sizes of the table and its indexes. "vacuum verbose" output would be instructive... But my best theorizing-in-advance-of-the-data guess is that Logan's FSM settings are too small, causing free space to be leaked over time. If a vacuum full restores the original performance then that's probably the right answer. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > Aaron Krowne <akrowne@vt.edu> writes: > > So, either it is broken, or doing a VACUUM FULL ANALYZE rather than just > > VACUUM ANALYZE made all the difference. Is this possible (the latter, > > we know the former is possible...)? > > If your FSM parameters in postgresql.conf are too small, then plain > vacuums might have failed to keep up with the available free space, > leading to a situation where vacuum full is essential. Did you happen > to notice whether the vacuum full shrunk the database's disk footprint > noticeably? This seems to be a frequent problem. Is there any easy way to check an existing table for lost free space? Is there any way vauum could do this check and print a warning suggesting using vaccuum full and/or increasing fsm parameters if it finds such? -- greg
Greg Stark <gsstark@mit.edu> writes: > Is there any easy way to check an existing table for lost free space? contrib/pgstattuple gives a pretty good set of statistics. (I thought VACUUM VERBOSE printed something about total free space in a table, but apparently only VACUUM FULL VERBOSE does. Maybe should change that.) > Is there any way vauum could do this check and print a warning suggesting > using vaccuum full and/or increasing fsm parameters if it finds such? In CVS tip, a whole-database VACUUM VERBOSE gives info about the free space map occupancy, eg INFO: Free space map: 224 relations, 450 pages stored; 3776 total pages needed. Allocated FSM size: 1000 relations + 20000 pages = 178 KB shared mem. If the "pages needed" number is drastically larger than the allocated FSM size, you've got a problem. (I don't think you need to panic if it's just a little larger, though. 10X bigger would be time to do something, 2X bigger maybe not.) regards, tom lane
On Mon, 17 Mar 2003, Tom Lane wrote: > In CVS tip, a whole-database VACUUM VERBOSE gives info about the free > space map occupancy, eg > > INFO: Free space map: 224 relations, 450 pages stored; 3776 total pages needed. > Allocated FSM size: 1000 relations + 20000 pages = 178 KB shared mem. > How do you get this information? I just ran VACUUM VERBOSE and it spit out a bunch of information per relation, but nothing about total relations and FSM space. We are running 7.3.2. Chris
On Sun, Mar 16, 2003 at 03:37:32AM -0500, Tom Lane wrote: > Aaron Krowne <akrowne@vt.edu> writes: > > So, either it is broken, or doing a VACUUM FULL ANALYZE rather than just > > VACUUM ANALYZE made all the difference. Is this possible (the latter, > > we know the former is possible...)? > > If your FSM parameters in postgresql.conf are too small, then plain > vacuums might have failed to keep up with the available free space, > leading to a situation where vacuum full is essential. Did you happen > to notice whether the vacuum full shrunk the database's disk footprint > noticeably? I was having a similar problem a couple threads ago, and a VACUUM FULL reduced my database from 3.9 gigs to 2.1 gigs ! So my question is how to (smartly) choose an FSM size? thanks, max`
Chris Sutton said: > On Mon, 17 Mar 2003, Tom Lane wrote: >> In CVS tip, a whole-database VACUUM VERBOSE gives info about the free >> space map occupancy, eg > How do you get this information? > > I just ran VACUUM VERBOSE and it spit out a bunch of information per > relation, but nothing about total relations and FSM space. We are > running 7.3.2. As Tom mentioned, that information is printed by a database-wide VACUUM VERBOSE "in CVS tip" -- i.e. in the development code that will eventually become PostgreSQL 7.4 Cheers, Neil
Chris Sutton <chris@smoothcorp.com> writes: > On Mon, 17 Mar 2003, Tom Lane wrote: >> In CVS tip, a whole-database VACUUM VERBOSE gives info about the free >> space map occupancy, eg >> INFO: Free space map: 224 relations, 450 pages stored; 3776 total pages needed. >> Allocated FSM size: 1000 relations + 20000 pages = 178 KB shared mem. > How do you get this information? Before CVS tip, you don't. [ thinks...] Perhaps we could back-port the FSM changes into 7.3 ... it would be a larger change than I'd usually consider reasonable for a stable branch, though. Particularly considering that it would be hard to call it a bug fix. By any sane definition this is a new feature, and we have a policy against putting new features in stable branches. regards, tom lane
On Mon, Mar 17, 2003 at 02:26:00PM -0500, Tom Lane wrote: > [ thinks...] Perhaps we could back-port the FSM changes into 7.3 ... For what it's worth, I think that'd be a terrible precedent. Perhaps making a patch file akin to what the Postgres-R folks do, for people who really want it. But there is just no way it's a bug fix, and one of the things I _really really_ like about Postgres is the way "stable" means stable. Introducing such a new feature to 7.3.x now smacks to me of the direction the Linux kernal has gone, where major new funcitonality gets "merged"[1] in dot-releases of the so-called stable version. [1] This is the meaning of "merge" also used in Toronto on the 401 at rush hour. 8 lanes of traffic jam and growing. -- ---- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110
Andrew Sullivan <andrew@libertyrms.info> writes: > On Mon, Mar 17, 2003 at 02:26:00PM -0500, Tom Lane wrote: >> [ thinks...] Perhaps we could back-port the FSM changes into 7.3 ... > For what it's worth, I think that'd be a terrible precedent. Oh, I quite agree. I was just throwing up the option to see if anyone thought the issue was important enough to take risks for. I do not... regards, tom lane
What is the structure of you table? Is the data types in the table the same as in the SQL.... Did you create the index after the loading the table? cluster the table around the most used index.... Is you web site on the same box you database is on? telnet www.planetmath.org 5432 oh, $hit... never mind........ If you have another box, please put the database on it. The web server maybe killing the database but this depends on the amount of traffic. and block the port......... How fast is you hard drive? 5400rpm :S, k=n^r/ck, SCJP _________________________________________________________________ MSN 8 with e-mail virus protection service: 2 months FREE* http://join.msn.com/?page=features/virus
Or at least restrict TCP/IP connections from localhost only, and use SSH tunnels if you must have direct external access (for pgAdmin, etc.) to the DB. Lucas. -----Original Message----- From: Kendrick C. Wilson [mailto:kendrick_wilson@hotmail.com] Sent: Monday, March 17, 2003 2:47 PM To: pgsql-performance@postgresql.org Subject: Re: [PERFORM] postgresql meltdown on PlanetMath.org What is the structure of you table? Is the data types in the table the same as in the SQL.... Did you create the index after the loading the table? cluster the table around the most used index.... Is you web site on the same box you database is on? telnet www.planetmath.org 5432 oh, $hit... never mind........ If you have another box, please put the database on it. The web server maybe killing the database but this depends on the amount of traffic. and block the port......... How fast is you hard drive? 5400rpm :S, k=n^r/ck, SCJP _________________________________________________________________ MSN 8 with e-mail virus protection service: 2 months FREE* http://join.msn.com/?page=features/virus ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
> What is the structure of you table? > Is the data types in the table the same as in the SQL.... > > Did you create the index after the loading the table? > cluster the table around the most used index.... There is no point clustering a table around the most used index, unless access to the index is non-random. eg. you are picking up more than one consecutive entry from the index at a time. eg. Indexes on foreign keys are excellent for clustering. Chris
I should have paid more attention to the disk space before... but it looks like somewhere between half a gig and a gig was freed! The disk footprint is about a gig now. Aaron Krowne On Sun, Mar 16, 2003 at 03:37:32AM -0500, Tom Lane wrote: > Aaron Krowne <akrowne@vt.edu> writes: > > So, either it is broken, or doing a VACUUM FULL ANALYZE rather than just > > VACUUM ANALYZE made all the difference. Is this possible (the latter, > > we know the former is possible...)? > > If your FSM parameters in postgresql.conf are too small, then plain > vacuums might have failed to keep up with the available free space, > leading to a situation where vacuum full is essential. Did you happen > to notice whether the vacuum full shrunk the database's disk footprint > noticeably? > > regards, tom lane
All right, I performed a VACUUM FULL last night and after about 3 hours I tried running a select count(*) FROM tblwordidx and that did help things considerably (it runs in ~20 seconds instead of 1-2 minutes). Not as good as originally, but close. But, here's the breakdown of the db: I'm using the database as a keyword based file search engine (not the most efficient method, I know, but it works well for my purposes). The biggest and most relevant tables are a table of files and of words. The basic operation that each file has a set of keywords associated with it, I do a whole word search on tblwordidx and join with tblfiles (I know, the naming scheme sucks, sorry!). Three times a day I scan the network and update the database. I insert about 180,000 rows into a temporary table and then use it to update temporary table (tbltmp). With the aid of a few other tables, I clean up tblFiles so that existing rows have an updated timestamp in tblseen and files with a timestamp older than 1 day are removed. Then, I take the new rows in tblfiles and use a perl script to add more words to tblwordidx. After each update a do a VACUUM and VACUUM ANALYZE which usually grinds for 10 to 15 minutes. I'm running this db on a celeron 450Mhz with 256MB RAM and a 60GB HDD (7200 rpm). For the most part I have the db running "well enough." Over time however, I find that performance degrades, the count(*) above is an example of a command that does worse over time. It gets run once an hour for stats collection. When I first migrated the db to v7.3.1 it would take about 5-10 seconds (which it is close to now after a VACUUM FULL) but after a few weeks it would take over a minute of really intense HDD activity. Also of note is that when I first loaded the data it would cache very well with the query taking maybe taking 15 seconds if I had just started the db after reboot, but when it was in its "slow" state repeating the query didn't noticably use the disk less (nor did it take less time). I've attached a VACUUM VERBOSE and my conf file (which is pretty vanilla, I haven't tweaked it since updating). If you have any suggestions on how I can correct this situation through config changes that would be ideal and thanks for your help, if is just a case of doing lots of VACUUM FULLs, I can definitely see it as a performance bottleneck for postgres. Fortunately I can afford the huge peroformance penalty of a VACUUM FULL, but I can certainly think of apps that can't. Logan Bowers \d tblfiles: (219,248 rows) Column | Type | Modifiers ----------+-----------------------------+------------------------------------------- fid | integer | not null default nextval('fileids'::text) hid | integer | not null pid | integer | not null name | character varying(256) | not null size | bigint | not null Indexes: temp_fid_key unique btree (fid), filediridx btree (hid, pid, name, size, fid), fileidx btree (name, hid, pid, fid), fileidxfid btree (fid, name, pid) \d tblwordidx: (1,739,481 rows) Table "public.tblwordidx" Column | Type | Modifiers --------+------------------------+----------- fid | integer | not null word | character varying(128) | not null todel | boolean | Indexes: wordidxfid btree (fid, word), wordidxfidonly btree (fid), wordidxw btree (word, fid) On Mon, 17 Mar 2003, Tom Lane wrote: > "Neil Conway" <neilc@samurai.com> writes: > > Sean Chittenden said: > > A "select count(*) FROM tblwordidx" initially took about 1 second to > > return a count of 2 million but after a few months it took several > > minutes of really hard HDD grinding. > >> > >> That's because there are dead entries in the index that weren't being > >> reused or cleaned up. As I said, this has been fixed. > > > That's doubtful: "select count(*) FROM foo" won't use an index. > > To know what's going on, as opposed to guessing about it, we'd need to > know something about the physical sizes of the table and its indexes. > "vacuum verbose" output would be instructive... > > But my best theorizing-in-advance-of-the-data guess is that Logan's > FSM settings are too small, causing free space to be leaked over time. > If a vacuum full restores the original performance then that's probably > the right answer. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org >
Attachment
> I'm running this db on a celeron 450Mhz with 256MB RAM and a 60GB HDD > (7200 rpm). For the most part I have the db running "well enough." Over > time however, I find that performance degrades, the count(*) above is an > example of a command that does worse over time. It gets run once an hour > for stats collection. When I first migrated the db to v7.3.1 it would > take about 5-10 seconds (which it is close to now after a VACUUM FULL) but > after a few weeks it would take over a minute of really intense HDD > activity. Also of note is that when I first loaded the data it would > cache very well with the query taking maybe taking 15 seconds if I had > just started the db after reboot, but when it was in its "slow" state > repeating the query didn't noticably use the disk less (nor did it take > less time). To speed up your COUNT(*), how about doing this: Create a separate table to hold a single integer. Add a trigger after insert on your table to increment the counter in the other table Add a trigger after delete on your table to decrement the counter in the other table. That way you always have an O(1) count... Chris
Logan Bowers <logan@datacurrent.com> writes: > I've attached a VACUUM VERBOSE and my conf file (which is pretty vanilla, > I haven't tweaked it since updating). You definitely need to increase the fsm shared memory parameters. The default max_fsm_relations is just plain too small (try 1000) and the default_max_fsm_pages is really only enough for perhaps a 100Mb database. I'd try bumping it to 100,000. Note you need a postmaster restart to make these changes take effect. regards, tom lane
On Mon, 17 Mar 2003, Logan Bowers wrote: > Logan Bowers > > \d tblfiles: (219,248 rows) > Column | Type | Modifiers > ----------+-----------------------------+------------------------------------------- > fid | integer | not null default > nextval('fileids'::text) > hid | integer | not null > pid | integer | not null > name | character varying(256) | not null > size | bigint | not null > Indexes: temp_fid_key unique btree (fid), > filediridx btree (hid, pid, name, size, fid), > fileidx btree (name, hid, pid, fid), > fileidxfid btree (fid, name, pid) I'm no expert on indexes, but I seem to remember reading that creating multicolumn indexes on more than 2 or 3 columns gets sort of pointless: http://www.us.postgresql.org/users-lounge/docs/7.3/postgres/indexes-multicolumn.html There is probably a ton of disk space and CPU used to keep all these multi column indexes. Might be part of the problem. > \d tblwordidx: (1,739,481 rows) > Table "public.tblwordidx" > Column | Type | Modifiers > --------+------------------------+----------- > fid | integer | not null > word | character varying(128) | not null > todel | boolean | > Indexes: wordidxfid btree (fid, word), > wordidxfidonly btree (fid), > wordidxw btree (word, fid) > Another index question for the pros. When creating a multi-column index do you need to do it both ways: wordidxfid btree (fid, word) wordidxw btree (word, fid We have a very similar "dictonary" table here for searching. It's about 1.7 million rows, takes about 80mb of disk space. There is one multi column index on the table which uses about 50mb of disk space. To find out how much disk space you are using, the hard way is: select relfilenode from pg_class where relname='tblwordidx'; select relfilenode from pg_class where relname='wordidxw'; relfilenode is the name of the file in your data directory. I'm pretty sure there is an easier way to do this with a function I saw in contrib. Just some thoughts. Chris
Clustering is good for queries that return multiple values. select this, that from tableA where this = 'whatever'; If there are multiple values, the location of the first record is found in the indexFile. Then dataFile is scanned until this != 'whatever'; This will decrease disk activity, which is the bottle neck in database performance. k=n^r/ck, SCJP >From: "Christopher Kings-Lynne" <chriskl@familyhealth.com.au> >To: "Kendrick C. Wilson" ><kendrick_wilson@hotmail.com>,<pgsql-performance@postgresql.org> >Subject: Re: [PERFORM] postgresql meltdown on PlanetMath.org Date: Tue, 18 >Mar 2003 09:34:36 +0800 >MIME-Version: 1.0 >Received: from relay2.pgsql.com ([64.49.215.143]) by >mc6-f41.law1.hotmail.com with Microsoft SMTPSVC(5.0.2195.5600); Mon, 17 Mar >2003 17:34:42 -0800 >Received: from postgresql.org (postgresql.org [64.49.215.8])by >relay2.pgsql.com (Postfix) with ESMTPid 022ADE5BD; Mon, 17 Mar 2003 >20:34:36 -0500 (EST) >Received: from houston.familyhealth.com.au (unknown [203.59.48.253])by >postgresql.org (Postfix) with ESMTP id A55E5475F09for ><pgsql-performance@postgresql.org>; Mon, 17 Mar 2003 20:34:33 -0500 (EST) >Received: (from root@localhost)by houston.familyhealth.com.au >(8.11.6/8.11.6) id h2I1Yac95711for pgsql-performance@postgresql.org; Tue, >18 Mar 2003 09:34:36 +0800 (WST)(envelope-from chriskl@familyhealth.com.au) >Received: from mariner (mariner.internal [192.168.0.101])by >houston.familyhealth.com.au (8.11.6/8.9.3) with SMTP id h2I1YW795594;Tue, >18 Mar 2003 09:34:32 +0800 (WST) >X-Message-Info: yilqo4+6kc64AXpUCzRAW30W84h6gtv8 >X-Original-To: pgsql-performance@postgresql.org >Message-ID: <07b501c2ecee$8917b8c0$6500a8c0@fhp.internal> >References: <BAY1-F104L3frQfGvB200019147@hotmail.com> >X-Priority: 3 >X-MSMail-Priority: Normal >X-Mailer: Microsoft Outlook Express 6.00.2800.1106 >X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1106 >X-scanner: scanned by Inflex 0.1.5c - (http://www.inflex.co.za/) >Precedence: bulk >Sender: pgsql-performance-owner@postgresql.org >Return-Path: pgsql-performance-owner+M1426@postgresql.org >X-OriginalArrivalTime: 18 Mar 2003 01:34:42.0860 (UTC) >FILETIME=[8CCFDEC0:01C2ECEE] > > > > > What is the structure of you table? > > Is the data types in the table the same as in the SQL.... > > > > Did you create the index after the loading the table? > > cluster the table around the most used index.... > >There is no point clustering a table around the most used index, unless >access to the index is non-random. eg. you are picking up more than one >consecutive entry from the index at a time. eg. Indexes on foreign keys >are >excellent for clustering. > >Chris > > >---------------------------(end of broadcast)--------------------------- >TIP 5: Have you checked our extensive FAQ? > >http://www.postgresql.org/docs/faqs/FAQ.html _________________________________________________________________ MSN 8 helps eliminate e-mail viruses. Get 2 months FREE*. http://join.msn.com/?page=features/virus
Logan, > I'm running this db on a celeron 450Mhz with 256MB RAM and a 60GB HDD > (7200 rpm). For the most part I have the db running "well enough." Over Hmmm ... actually, considering your hardware, I'd say the database performance you're getting is excellent. You're facing 3 bottlenecks: 1) The Celeron II's lack of on-chip cache will slow down even moderately complex queries as much as 50% over a comparably-clocked pentium or AMD chip, in my experience. 2) 256mb RAM is small enough that if you are running Apache on the same machine, Apache & Postgres could be contesting for RAM during busy periods. 3) (most noticable) You have pretty much the bare minimum of disk. For a one-gb database, a Linux RAID array or mirror would be a lot better ... Of course, that's all relative. What I'm saying is, if you want your database to "scream" you're going to have to put some money into hardware. If you're just looking for adequate performance, then that can be had with a little tweaking and maintainence. -- Josh Berkus Aglio Database Solutions San Francisco
On Tue, 18 Mar 2003 09:19:44 -0600, "Kendrick C. Wilson" <kendrick_wilson@hotmail.com> wrote: >If there are multiple values, the location of the first record is found in >the indexFile. > >Then dataFile is scanned until this != 'whatever'; Nice, but unfortunately not true for Postgres. When you do the first UPDATE after CLUSTER the new version of the changed row(s) are written to the end of the dataFile (heap relation in Postgres speech). So the *index* has to be scanned until this != 'whatever'. >Clustering is good for queries that return multiple [rows with the same search] values. Yes. With clustering you can expect that most of the tuples you want are near to each other and you find several of them in the same page. Servus Manfred
Tom Lane <tgl@sss.pgh.pa.us> writes: > You definitely need to increase the fsm shared memory parameters. The > default max_fsm_relations is just plain too small (try 1000) and the > default_max_fsm_pages is really only enough for perhaps a 100Mb > database. I'd try bumping it to 100,000. Note you need a postmaster > restart to make these changes take effect. Hmm, are there any guidelines for choosing these values? We have a database with a table into which we insert about 4,000,000 rows each day, and delete another 4,000,000 rows. The total row count is around 40 million, I guess, and the rows are about 150 bytes long. (VACUUM FULL is running at the moment, so I can't check.) The database is used as a research tool, and we run moderately complex ad-hoc queries on it. As a consequence, I don't see much room for optimization. One of the columns is time-based and indexed, so we suffer from the creeping index syndrome. A nightly index rebuild followed by a VACUUM ANALYZE isn't a problem (it takes less than six ours), but this doesn't seem to be enough (we seem to lose disk space nevertheless). I can't afford a regular VACUUM FULL because it takes down the database for over ten hours, and this starts to cut into the working hours no matter when it starts. Can you suggest some tweaks to the FSM values so that we can avoid the full VACUUM? The database runs 7.3.2 and resides on a 4-way Xeon box with 4 GB of RAM and a severely underpowered disk subsystem (Linux software RAID1 on two 10k 36 GB SCSI drives -- don't ask, this database application is nothing but an accident which happened after purchase of the box). -- Florian Weimer Weimer@CERT.Uni-Stuttgart.DE University of Stuttgart http://CERT.Uni-Stuttgart.DE/people/fw/ RUS-CERT fax +49-711-685-5898
On Friday 21 Mar 2003 4:31 am, Florian Weimer wrote: > Tom Lane <tgl@sss.pgh.pa.us> writes: > > You definitely need to increase the fsm shared memory parameters. The > > default max_fsm_relations is just plain too small (try 1000) and the > > default_max_fsm_pages is really only enough for perhaps a 100Mb > > database. I'd try bumping it to 100,000. Note you need a postmaster > > restart to make these changes take effect. > > Hmm, are there any guidelines for choosing these values? > > We have a database with a table into which we insert about 4,000,000 > rows each day, and delete another 4,000,000 rows. The total row count > is around 40 million, I guess, and the rows are about 150 bytes long. > (VACUUM FULL is running at the moment, so I can't check.) I suggest you split your tables into exactly similar tables using inheritance. Your queries won't be affected as you can make them on parent table and get same result. But as far as vacuuming goes, you can probably dump a child table entirely and recreate it as a fast alternative to vacuum. Only catch is, I don't know if inherited tables would use their respective indxes other wise your queries might be slow as anything. > One of the columns is time-based and indexed, so we suffer from the > creeping index syndrome. A nightly index rebuild followed by a VACUUM > ANALYZE isn't a problem (it takes less than six ours), but this > doesn't seem to be enough (we seem to lose disk space nevertheless). I am sure a select * from table into another table; drop table; renamre temp table kind of hack would be faster than vacuuming in this case.. This is just a suggestion. Good if this works for you.. Shridhar
Florian Weimer <Weimer@CERT.Uni-Stuttgart.DE> writes: > Hmm, are there any guidelines for choosing these values? > We have a database with a table into which we insert about 4,000,000 > rows each day, and delete another 4,000,000 rows. The total row count > is around 40 million, I guess, and the rows are about 150 bytes long. If you are replacing 10% of the rows in the table every day, then it's a pretty good bet that every single page of the table contains free space. Accordingly, you'd better set max_fsm_pages large enough to have a FSM slot for every page of the table. (1 page = 8Kb normally) You could possibly get away with a smaller FSM if you do (non-FULL) vacuums more often than once a day. Some people find they can run background vacuums without hurting performance too much, some don't --- I suspect it depends on how much spare disk bandwidth you have. regards, tom lane