Re: panic on 7.3 - Mailing list pgsql-hackers

From Rick Gigger
Subject Re: panic on 7.3
Date
Msg-id EEDAB265-2A6F-4D0A-A10D-F7C6D2E6E544@alpinenetworking.com
Whole thread Raw
In response to Re: panic on 7.3  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Jan 20, 2006, at 6:02 PM, Tom Lane wrote:

> Rick Gigger <rick@alpinenetworking.com> writes:
>> Postgres version 7.3.4
>
> You realize of course that that's pretty old ...
>
>> That is right now.  Right after it started up it went up to 0292.
>
> So it was the latest file eh?  I thought maybe you had some problem
> with a corrupted XID leading to trying to touch a clog file
> out-of-order, but that seems ruled out.
>
>>> 2006-01-20 11:50:51 PANIC:  creation of file /var/lib/pgsql/data/
>>> pg_clog/0292 failed: File exists
>
> Digging in the 7.3 sources, it seems that error message could only  
> have
> come from here:
>
>     fd = BasicOpenFile(path, O_RDWR | PG_BINARY, S_IRUSR | S_IWUSR);
>     if (fd < 0)
>     {
>         if (errno != ENOENT)
>             elog(PANIC, "open of %s failed: %m", path);
>         fd = BasicOpenFile(path, O_RDWR | O_CREAT | O_EXCL |  
> PG_BINARY,
>                            S_IRUSR | S_IWUSR);
>         if (fd < 0)
>             elog(PANIC, "creation of file %s failed: %m", path);
>     }
>
> AFAICS, it is simply not possible for the second open() to fail with
> that errno, *unless* someone else created the same file in the
> microseconds between the two open calls.
>
> The code doing this has a lock on the particular clog buffer it's  
> trying
> to write out, so no-one else could be trying to write the same buffer;
> but now that I look at it, it's entirely legal for someone else to be
> trying to write a different clog buffer.  This leads to the following
> theory:
>
> 1. The clog page that would be first in the 0292 segment got  
> created in
> clog buffers, but there was no reason to write it out for awhile.  (In
> normal operation, only a checkpoint would be cause to write out the
> frontmost page of clog.)
>
> 2. More than 2K transactions elapsed, so the page that would be second
> in the 0292 segment also got set up in clog buffers.  (Rick, is the  
> load
> on your machine such that several thousand transactions might have
> elapsed between checkpoints?)  Perhaps there were even enough
> transactions so that more than two pages were dirty and pending write
> in the clog buffers, yet the file hadn't been created yet.

So what I think I'm getting killed on right now are the disk writes.   
So I was thinking of changing away from the default checkpoint  
settings.  My load is going to continue to go up so 2000+  
transactions are going to start happening pretty fast.  I have lots  
of disk space so I was going to increase the time between  
checkpoints.  Will that increase the chances of this happening again  
or was this such a strange freak of nature coincidence that it can't  
really even happen again.  Also I've decided to upgrade all the way  
to the latest 8.1 code.

> 3. Two different backends decided to try to write different clog pages
> concurrently.  Probably one was writing the frontmost page because it
> was doing a checkpoint, and another needed to read in an older clog  
> page
> so it had to swap out one of the other dirty buffers.
>
> 4. Kaboom.
>
> If this theory is correct, the bug has been there since the clog code
> was first written.  But the conditions for having it happen are narrow
> enough that it's not too surprising we haven't seen it before.
>
> I think that a sufficient fix might just be to remove the O_EXCL flag
> from the second open() call --- ie, if someone else creates the file
> in this narrow window, it should be considered OK.  Comments?
>
>             regards, tom lane
>
> ---------------------------(end of  
> broadcast)---------------------------
> TIP 6: explain analyze is your friend
>



pgsql-hackers by date:

Previous
From: Michael Adler
Date:
Subject: Re: strange behavior on locks
Next
From: Tom Lane
Date:
Subject: Re: panic on 7.3