Re: panic on 7.3 - Mailing list pgsql-hackers
From | Rick Gigger |
---|---|
Subject | Re: panic on 7.3 |
Date | |
Msg-id | C9474450-4F32-411E-A053-39E789613098@alpinenetworking.com Whole thread Raw |
In response to | Re: panic on 7.3 (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: panic on 7.3
|
List | pgsql-hackers |
> =Rick Gigger <rick@alpinenetworking.com> writes: >> Postgres version 7.3.4 > > You realize of course that that's pretty old ... > Yes. I will be upgrading immediately. >> That is right now. Right after it started up it went up to 0292. > > So it was the latest file eh? I thought maybe you had some problem > with a corrupted XID leading to trying to touch a clog file > out-of-order, but that seems ruled out. Well that sounds like a good thing. > >>> 2006-01-20 11:50:51 PANIC: creation of file /var/lib/pgsql/data/ >>> pg_clog/0292 failed: File exists > > Digging in the 7.3 sources, it seems that error message could only > have > come from here: > > fd = BasicOpenFile(path, O_RDWR | PG_BINARY, S_IRUSR | S_IWUSR); > if (fd < 0) > { > if (errno != ENOENT) > elog(PANIC, "open of %s failed: %m", path); > fd = BasicOpenFile(path, O_RDWR | O_CREAT | O_EXCL | > PG_BINARY, > S_IRUSR | S_IWUSR); > if (fd < 0) > elog(PANIC, "creation of file %s failed: %m", path); > } Yes I found that too (on accident with google) but didn't really have the slightest clue what exactly would have caused it. > AFAICS, it is simply not possible for the second open() to fail with > that errno, *unless* someone else created the same file in the > microseconds between the two open calls. > > The code doing this has a lock on the particular clog buffer it's > trying > to write out, so no-one else could be trying to write the same buffer; > but now that I look at it, it's entirely legal for someone else to be > trying to write a different clog buffer. This leads to the following > theory: > > 1. The clog page that would be first in the 0292 segment got > created in > clog buffers, but there was no reason to write it out for awhile. (In > normal operation, only a checkpoint would be cause to write out the > frontmost page of clog.) > > 2. More than 2K transactions elapsed, so the page that would be second > in the 0292 segment also got set up in clog buffers. (Rick, is the > load > on your machine such that several thousand transactions might have > elapsed between checkpoints?) Perhaps there were even enough > transactions so that more than two pages were dirty and pending write > in the clog buffers, yet the file hadn't been created yet. I don't know if 2K could have passed since the last checkpoint. Part of the reason I haven't upgraded in so long is that it has been running like champ for about 3 years. Recently though the load on the site just shot through the roof. Not only am I going to upgrade the version of postgres but I need to do some major tuning. I am still using a lot of defaults. I am using the default checkpoint settings but I am not sure how often they are happening. Actually now that I think about it I was getting about 400 pages requests / minute and each of those would have have been doing at least 2 transactions so yes, I guess that is very likely. > 3. Two different backends decided to try to write different clog pages > concurrently. Probably one was writing the frontmost page because it > was doing a checkpoint, and another needed to read in an older clog > page > so it had to swap out one of the other dirty buffers. > > 4. Kaboom. Yeah Kaboom. It was really bad timing too. :) > If this theory is correct, the bug has been there since the clog code > was first written. But the conditions for having it happen are narrow > enough that it's not too surprising we haven't seen it before. Wow it's great to be the first. > I think that a sufficient fix might just be to remove the O_EXCL flag > from the second open() call --- ie, if someone else creates the file > in this narrow window, it should be considered OK. Comments? Well just a little fyi, I don't know if any of this will help but I was suffering from severe table bloat. The data in my session table is unfortunately quite large and being updated constantly so the session table and it's two indexes and especially it's toast table we impossible to vacuum. Also the vacuum and fsm settings were the defaults making the problem worse. > > regards, tom lane > Thanks so much for the help. Rick
pgsql-hackers by date: