Re: panic on 7.3 - Mailing list pgsql-hackers
From | Rick Gigger |
---|---|
Subject | Re: panic on 7.3 |
Date | |
Msg-id | EEDAB265-2A6F-4D0A-A10D-F7C6D2E6E544@alpinenetworking.com Whole thread Raw |
In response to | Re: panic on 7.3 (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
On Jan 20, 2006, at 6:02 PM, Tom Lane wrote: > Rick Gigger <rick@alpinenetworking.com> writes: >> Postgres version 7.3.4 > > You realize of course that that's pretty old ... > >> That is right now. Right after it started up it went up to 0292. > > So it was the latest file eh? I thought maybe you had some problem > with a corrupted XID leading to trying to touch a clog file > out-of-order, but that seems ruled out. > >>> 2006-01-20 11:50:51 PANIC: creation of file /var/lib/pgsql/data/ >>> pg_clog/0292 failed: File exists > > Digging in the 7.3 sources, it seems that error message could only > have > come from here: > > fd = BasicOpenFile(path, O_RDWR | PG_BINARY, S_IRUSR | S_IWUSR); > if (fd < 0) > { > if (errno != ENOENT) > elog(PANIC, "open of %s failed: %m", path); > fd = BasicOpenFile(path, O_RDWR | O_CREAT | O_EXCL | > PG_BINARY, > S_IRUSR | S_IWUSR); > if (fd < 0) > elog(PANIC, "creation of file %s failed: %m", path); > } > > AFAICS, it is simply not possible for the second open() to fail with > that errno, *unless* someone else created the same file in the > microseconds between the two open calls. > > The code doing this has a lock on the particular clog buffer it's > trying > to write out, so no-one else could be trying to write the same buffer; > but now that I look at it, it's entirely legal for someone else to be > trying to write a different clog buffer. This leads to the following > theory: > > 1. The clog page that would be first in the 0292 segment got > created in > clog buffers, but there was no reason to write it out for awhile. (In > normal operation, only a checkpoint would be cause to write out the > frontmost page of clog.) > > 2. More than 2K transactions elapsed, so the page that would be second > in the 0292 segment also got set up in clog buffers. (Rick, is the > load > on your machine such that several thousand transactions might have > elapsed between checkpoints?) Perhaps there were even enough > transactions so that more than two pages were dirty and pending write > in the clog buffers, yet the file hadn't been created yet. So what I think I'm getting killed on right now are the disk writes. So I was thinking of changing away from the default checkpoint settings. My load is going to continue to go up so 2000+ transactions are going to start happening pretty fast. I have lots of disk space so I was going to increase the time between checkpoints. Will that increase the chances of this happening again or was this such a strange freak of nature coincidence that it can't really even happen again. Also I've decided to upgrade all the way to the latest 8.1 code. > 3. Two different backends decided to try to write different clog pages > concurrently. Probably one was writing the frontmost page because it > was doing a checkpoint, and another needed to read in an older clog > page > so it had to swap out one of the other dirty buffers. > > 4. Kaboom. > > If this theory is correct, the bug has been there since the clog code > was first written. But the conditions for having it happen are narrow > enough that it's not too surprising we haven't seen it before. > > I think that a sufficient fix might just be to remove the O_EXCL flag > from the second open() call --- ie, if someone else creates the file > in this narrow window, it should be considered OK. Comments? > > regards, tom lane > > ---------------------------(end of > broadcast)--------------------------- > TIP 6: explain analyze is your friend >
pgsql-hackers by date: