Thread: "critical mass" reached?
Using 7.0.3, I've got a database that has about 30 tables. One in particular seems to be giving us problems. It's a pretty simple table with one index that logs certain web site activity. It gets about 100K insertions per day, and now has about 6 million records. All of a sudden (yesterday), we seem to have reached a "critical mass". No other tables or code have changed significantly (or more than normal). However, the database performance has abruptly become abyssmal -- the server which previously hadn't broken a load average of 4 now spikes continuously between 20 and 40, rarely dropping below 10. Web server logs show normal activity. Also, in the pgsql/data/base/dbname directory, I'm getting hundreds of pg_sorttemp and pg_noname files lying around. I thought there might be some data or index corruption, so I've even gone so far as to initdb and recreate the database from scratch, using a previous pg_dump output, but it has not helped. Six million tables doesn't seem like it should be too much of a problem, but we suspect this table might be the cause of the problem because it's the only one that changes significantly from day to day. Memory is ok, there is no swapping, disk space is plentiful, I don't know where else to look. Any ideas? -- Alex Howansky Wankwood Associates http://www.wankwood.com/
From: "Alex Howansky" <alex@wankwood.com> > Using 7.0.3, I've got a database that has about 30 tables. One in particular > seems to be giving us problems. It's a pretty simple table with one index that > logs certain web site activity. It gets about 100K insertions per day, and now > has about 6 million records. > > All of a sudden (yesterday), we seem to have reached a "critical mass". No > other tables or code have changed significantly (or more than normal). However, > the database performance has abruptly become abyssmal -- the server which > previously hadn't broken a load average of 4 now spikes continuously between 20 > and 40, rarely dropping below 10. Web server logs show normal activity. Also, > in the pgsql/data/base/dbname directory, I'm getting hundreds of pg_sorttemp > and pg_noname files lying around. Presumably you're running vacuum analyze regularly (at least once a day I'd guess) so I can only suspect that something has tipped the balance in the cost estimations. Is there a particular query that's slow and can you post an EXPLAIN? > I thought there might be some data or index corruption, so I've even gone so > far as to initdb and recreate the database from scratch, using a previous > pg_dump output, but it has not helped. Looks like you've ruled out damage to the DB. What happens if you delete 3 million of the records in your log-table? > Six million tables doesn't seem like it should be too much of a problem, but we > suspect this table might be the cause of the problem because it's the only one > that changes significantly from day to day. Memory is ok, there is no swapping, > disk space is plentiful, I don't know where else to look. Any ideas? Six million _tables_ is a lot, but you're right 6M records is pretty small compared to what some people are using. See if you can't post an EXPLAIN of a problem query and the relevant table defs. - Richard Huxton
> Presumably you're running vacuum analyze regularly (at least once a day I'd > guess) so I can only suspect that something has tipped the balance in the > cost estimations. Is there a particular query that's slow and can you post > an EXPLAIN? Oops, yes, sorry forgot to mention that. Vacuum analyze run nightly. There is not just one particluar query that runs slow -- it's the database as a whole (while apparently under the same average everyday load). > Looks like you've ruled out damage to the DB. What happens if you delete 3 > million of the records in your log-table? We haven't got that far yet. I was hoping to get some other ideas prior to doing something so drastic, but we'll try it ... > Six million _tables_ is a lot, but you're right 6M records is pretty small > compared to what some people are using. Oops again. I gotta stop trying to debug at 3am... :) -- Alex Howansky Wankwood Associates http://www.wankwood.com/
> Hm. As Richard remarks, 6M records is not an especially big table; > there are people running larger ones. The leftover sorttemp files sound > like you are suffering backend crashes --- but you didn't mention > anything about unexpected disconnects. I haven't noticed any myself, but the majority of our connections come from a public web based app -- so I can't really tell if the consumer is experiencing problems or not. > The postmaster log would be a good place to look for more info (if > you're not keeping one, turn it on). I have a debug level 2 log of almost the entire day's activity. I scanned it briefly but found nothing (it's 180 meg). Is there anything in particular I should be looking for? > Also, make sure the postmaster is not being run with an environment of > "ulimit -c 0" ... if the backends are crashing, we want to get some core > files so we can see what's happening. Ok, will verify. Thanks. -- Alex Howansky Wankwood Associates http://www.wankwood.com/
Alex Howansky <alex@wankwood.com> writes: > [ lots of bad stuff ] Hm. As Richard remarks, 6M records is not an especially big table; there are people running larger ones. The leftover sorttemp files sound like you are suffering backend crashes --- but you didn't mention anything about unexpected disconnects. The postmaster log would be a good place to look for more info (if you're not keeping one, turn it on). Also, make sure the postmaster is not being run with an environment of "ulimit -c 0" ... if the backends are crashing, we want to get some core files so we can see what's happening. regards, tom lane
Alex Howansky <alex@wankwood.com> writes: >> The postmaster log would be a good place to look for more info (if >> you're not keeping one, turn it on). > I have a debug level 2 log of almost the entire day's activity. Oh good. Hang onto that. > I scanned it > briefly but found nothing (it's 180 meg). Is there anything in particular I > should be looking for? Unusual errors, reports of subprocess crashes ... regards, tom lane