Thread: HS and clog

HS and clog

From
Andres Freund
Date:
Hi,

During the investigation of
http://archives.postgresql.org/message-id/CAL_0b1t%3DWuM6roO8dki%3Dw8DhH8P8whhohbPjReymmQUrOcNT2A%40mail.gmail.com
I noticed that during HS we do the following in RecordKnownAssignedTransactionIds:if (TransactionIdFollows(xid,
latestObservedXid)){   TransactionId next_expected_xid;
 
    /*     * Extend clog and subtrans like we do in GetNewTransactionId() during     * normal operation using
individualextend steps. Typical case     * requires almost no activity.     */    next_expected_xid =
latestObservedXid;   TransactionIdAdvance(next_expected_xid);    while
(TransactionIdPrecedesOrEquals(next_expected_xid,xid))    {        ExtendCLOG(next_expected_xid);
ExtendSUBTRANS(next_expected_xid);
        TransactionIdAdvance(next_expected_xid);    }

Extending subtrans is fine, that's required since its truncated after
restart and because its not really WAL logged, but extending CLOG? Thats
strange, isn't it, since clog is actually WAL logged, so all required
pages will be zeroed from their wal records anyway.
The commit introducing HS changed ExtendCLOG to do/* Zero the page and make an XLOG entry about it
*/ZeroCLOGPage(pageno,!InRecovery);
 
to make that even work during recovery.

Imo this shouldn't be needed.

Simon, do you remember why you added that?  It makes me uneasy doing
something like that only during HS but not during normal crash
recovery/disabled HS.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: HS and clog

From
Simon Riggs
Date:
On 30 March 2013 18:20, Andres Freund <andres@2ndquadrant.com> wrote:

> Simon, do you remember why you added that?

That doesn't look like code I added, but I'll look some more.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: HS and clog

From
Simon Riggs
Date:
On 30 March 2013 18:20, Andres Freund <andres@2ndquadrant.com> wrote:

> Imo this shouldn't be needed.

In principle, I think your premise looks correct. ExtendCLOG() is not needed.

If the xid truly is "known assigned" at a point in time, then the clog
should already have been extended to allow that.

I can't recall any special case there, but there may be one.

Given that it seems to work, changing it requires more care...

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: HS and clog

From
Tom Lane
Date:
Simon Riggs <simon@2ndQuadrant.com> writes:
> On 30 March 2013 18:20, Andres Freund <andres@2ndquadrant.com> wrote:
>> Imo this shouldn't be needed.

> In principle, I think your premise looks correct. ExtendCLOG() is not needed.

> If the xid truly is "known assigned" at a point in time, then the clog
> should already have been extended to allow that.

IIRC, the slru/clog code assumes, or at least did at one time assume,
that requests to initialize new pages are consecutive after startup.
Which is OK in normal operation since we don't skip over assignment
of any XIDs.

What I thought Andres was worried about here was that there might be
a gap in the extension requests when doing HS replay, and that maybe
that led to failure to create files or portions of files that later
the code would try to reference.

But maybe I misunderstood.
        regards, tom lane



Re: HS and clog

From
Andres Freund
Date:
On 2013-03-30 18:24:44 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
> > On 30 March 2013 18:20, Andres Freund <andres@2ndquadrant.com> wrote:
> >> Imo this shouldn't be needed.
> 
> > In principle, I think your premise looks correct. ExtendCLOG() is not needed.
> 
> > If the xid truly is "known assigned" at a point in time, then the clog
> > should already have been extended to allow that.

I think thats pretty much guaranteed since xids should only be generated by
GetNewTransactionId() which calls ExtendCLOG() which emits CLOG_ZEROPAGE. So
any failure here would lead to broken crash recovery.

What I am worried about is mostly that it could lead to

a) errors with overwriting existing clog entries if the known assigned xids  machinery lost track somehow. Check for
examplethe bug that  triggered me looking at the code referenced upthread.
 
b) future coding errors which only work with either HS enabled/disabled since  the behaviour is now different between
both.

> IIRC, the slru/clog code assumes, or at least did at one time assume,
> that requests to initialize new pages are consecutive after startup.
> Which is OK in normal operation since we don't skip over assignment
> of any XIDs.

It at least expects that ExtendClog() gets called for every xid since it
only zeroes it for the first xid on a page.

> What I thought Andres was worried about here was that there might be
> a gap in the extension requests when doing HS replay, and that maybe
> that led to failure to create files or portions of files that later
> the code would try to reference.

No, I am not particularly worried about that since I don't think the HS
specific call to ExtendCLOG() can happen before a CLOG_ZEROPAGE record
has been replayed, so I can't see how it ever can do something
necessary. Its more that I am worried that it could overwrite stuff in
edge-cases.

I am pretty sure that the scenario you describe happens for SUBTRANS
tho. Leading to the reported bug.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: HS and clog

From
Andres Freund
Date:
On 2013-03-30 23:58:26 +0100, Andres Freund wrote:
> I am pretty sure that the scenario you describe happens for SUBTRANS
> tho. Leading to the reported bug.

For a slightly too long explanation see:
http://archives.postgresql.org/message-id/20130330172144.GI28736%40alap2.anarazel.de

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services