Re: clog_redo causing very long recovery time - Mailing list pgsql-hackers

From Joe Conway
Subject Re: clog_redo causing very long recovery time
Date
Msg-id 4DC36DD6.8030405@joeconway.com
Whole thread Raw
In response to Re: clog_redo causing very long recovery time  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: clog_redo causing very long recovery time  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On 05/05/2011 08:22 PM, Tom Lane wrote:
> Joseph Conway <mail@joeconway.com> writes:
>> The attached fix-clogredo diff is my proposal for a fix for this.
>
> That seems pretty grotty :-(
>
> I think a more elegant fix might be to just swap the order of the
> ExtendCLOG and ExtendSUBTRANS calls in GetNewTransactionId.  The
> reason that would help is that pg_subtrans isn't WAL-logged, so if
> we succeed doing ExtendSUBTRANS and then fail in ExtendCLOG, we
> won't have written any XLOG entry, and thus repeated failures will not
> result in repeated XLOG entries.  I seem to recall having considered
> exactly that point when the clog WAL support was first done, but the
> scenario evidently wasn't considered when subtransactions were stuck
> in :-(.

Yes, that does seem much nicer :-)

> It would probably also help to put in a comment admonishing people
> to not add stuff right there.  I see the SSI guys have fallen into
> the same trap.

Right -- I think another similar problem exists in GetNewMultiXactId
where ExtendMultiXactOffset could succeed and write an XLOG entry and
then  ExtendMultiXactMember could fail before advancing nextMXact. The
problem in this case is that they both write XLOG entries, so a simple
reversal doesn't help.

Joe

--
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support


pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: clog_redo causing very long recovery time
Next
From: Tom Lane
Date:
Subject: Re: clog_redo causing very long recovery time