Re: Hard limit on WAL space used (because PANIC sucks) - Mailing list pgsql-hackers

From Jeff Janes
Subject Re: Hard limit on WAL space used (because PANIC sucks)
Date
Msg-id CAMkU=1wjB+rE4Yn6fBou0WrpfA3r2gZpTfL4Tcc_Mex1o8e41g@mail.gmail.com
Whole thread Raw
In response to Re: Hard limit on WAL space used (because PANIC sucks)  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: Hard limit on WAL space used (because PANIC sucks)
List pgsql-hackers
On Sat, Jun 8, 2013 at 11:27 AM, Andres Freund <andres@2ndquadrant.com> wrote:

You know, the PANIC isn't there just because we like to piss of
users. There's actual technical reasons that don't just go away by
judging the PANIC as stupid.
At the points where the XLogInsert()s happens we're in critical sections
out of which we *cannot* ERROR out because we already may have made
modifications that cannot be allowed to be performed
partially/unlogged. That's why we're throwing a PANIC which will force a
cluster wide restart including *NOT* writing any further buffers from
s_b out.

If archiving is on and failure is due to no space, could we just keep trying XLogFileInit again for a couple minutes to give archiving a chance to do its things?  Doing that while holding onto locks and a critical section would be unfortunate, but if the alternative is a PANIC, it might be acceptable.

The problem is that even if the file is only being kept so it can be archived, once archiving succeeds I think the file is not removed immediately but rather not until the next checkpoint, which will never happen when the locks are still held.

Cheers,

Jeff

pgsql-hackers by date:

Previous
From: Greg Smith
Date:
Subject: Re: Cost limited statements RFC
Next
From: "MauMau"
Date:
Subject: Re: Hard limit on WAL space used (because PANIC sucks)