Re: Hard limit on WAL space used (because PANIC sucks) - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Hard limit on WAL space used (because PANIC sucks)
Date
Msg-id 20140122003449.GE32729@awork2.anarazel.de
Whole thread Raw
In response to Re: Hard limit on WAL space used (because PANIC sucks)  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Hard limit on WAL space used (because PANIC sucks)
List pgsql-hackers
On 2014-01-21 19:23:57 -0500, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On 2014-01-21 18:59:13 -0500, Tom Lane wrote:
> >> Another thing to think about is whether we couldn't put a hard limit on
> >> WAL record size somehow.  Multi-megabyte WAL records are an abuse of the
> >> design anyway, when you get right down to it.  So for example maybe we
> >> could split up commit records, with most of the bulky information dumped
> >> into separate records that appear before the "real commit".  This would
> >> complicate replay --- in particular, if we abort the transaction after
> >> writing a few such records, how does the replayer realize that it can
> >> forget about those records?  But that sounds probably surmountable.
> 
> > I think removing the list of subtransactions from commit records would
> > essentially require not truncating pg_subtrans after a restart
> > anymore.
> 
> I'm not suggesting that we stop providing that information!  I'm just
> saying that we perhaps don't need to store it all in one WAL record,
> if instead we put the onus on WAL replay to be able to reconstruct what
> it needs from a series of WAL records.

That'd likely require something similar to the incomplete actions used
in btrees (and until recently in more places). I think that is/was a
disaster I really don't want to extend.

> > We could relatively easily split of logging the dropped files from
> > commit records and log them in groups afterwards, we already have
> > several races allowing to leak files.
> 
> I was thinking the other way around: emit the subsidiary records before the
> atomic commit or abort record, indeed before we've actually committed.
> Part of the point is to reduce the risk that lack of WAL space would
> prevent us from fully committing.
> Replay would then involve either accumulating the subsidiary records in
> memory, or being willing to go back and re-read them when the real commit
> or abort record is seen.

Well, the reason I suggested doing it the other way round is that we
wouldn't need to reassemble anything (outside of cache invalidations
which I don't know how to handle that way) which I think is a
significant increase in robustness and decrease in complexity.

> Also, writing those records afterwards
> increases the risk of a post-commit failure, which is a bad thing.

Well, most of those could be done outside of a critical section,
possibly just FATALing out. Beats PANICing.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Hard limit on WAL space used (because PANIC sucks)
Next
From: Stephen Frost
Date:
Subject: Re: proposal: hide application_name from other users