Re: Hard limit on WAL space used (because PANIC sucks) - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Hard limit on WAL space used (because PANIC sucks)
Date
Msg-id CA+U5nMKaZGHofGY6O=ZUvz_V+n=ooh3CmO4cQhy=2dKrcuNiRg@mail.gmail.com
Whole thread Raw
In response to Re: Hard limit on WAL space used (because PANIC sucks)  (Jim Nasby <jim@nasby.net>)
Responses Re: Hard limit on WAL space used (because PANIC sucks)  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-hackers
On 23 January 2014 01:19, Jim Nasby <jim@nasby.net> wrote:
> On 1/21/14, 6:46 PM, Andres Freund wrote:
>>
>> On 2014-01-21 16:34:45 -0800, Peter Geoghegan wrote:
>>>
>>> >On Tue, Jan 21, 2014 at 3:43 PM, Andres Freund<andres@2ndquadrant.com>
>>> > wrote:
>>>>
>>>> > >I personally think this isn't worth complicating the code for.
>>>
>>> >
>>> >You're probably right. However, I don't see why the bar has to be very
>>> >high when we're considering the trade-off between taking some
>>> >emergency precaution against having a PANIC shutdown, and an assured
>>> >PANIC shutdown
>>
>> Well, the problem is that the tradeoff would very likely include making
>> already complex code even more complex. None of the proposals, even the
>> one just decreasing the likelihood of a PANIC, like like they'd end up
>> being simple implementation-wise.
>> And that additional complexity would hurt robustness and prevent things
>> I find much more important than this.
>
>
> If we're not looking for perfection, what's wrong with Peter's idea of a
> ballast file? Presumably the check to see if that file still exists would be
> cheap so we can do that before entering the appropriate critical section.
>
> There's still a small chance that we'd end up panicing, but it's better than
> today. I'd argue that even if it doesn't work for CoW filesystems it'd still
> be a win.

I grant that it does sound simple enough for a partial stop gap.

My concern is that it provides only a short delay before the eventual
disk-full situation, which it doesn't actually prevent.

IMHO the main issue now is how we clear down old WAL files. We need to
perform a checkpoint to do that - and as has been pointed out in
relation to my proposal, we cannot complete that because of locks that
will be held for some time when we do eventually lock up.

That issue is not solved by having a ballast file(s).

IMHO we need to resolve the deadlock inherent in the
disk-full/WALlock-up/checkpoint situation. My view is that can be
solved in a similar way to the way the buffer pin deadlock was
resolved for Hot Standby.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



pgsql-hackers by date:

Previous
From: KONDO Mitsumasa
Date:
Subject: Re: Add min and max execute statement time in pg_stat_statement
Next
From: Andres Freund
Date:
Subject: Re: Add CREATE support to event triggers