Re: Change pg_last_xlog_receive_location not to move backwards - Mailing list pgsql-hackers

From Jeff Janes
Subject Re: Change pg_last_xlog_receive_location not to move backwards
Date
Msg-id AANLkTin2UJax+L8z9jYSJ+KJM2xs3LivF77Q8zdpYmFK@mail.gmail.com
Whole thread Raw
In response to Re: Change pg_last_xlog_receive_location not to move backwards  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Change pg_last_xlog_receive_location not to move backwards  (Fujii Masao <masao.fujii@gmail.com>)
List pgsql-hackers
On Mon, Jan 24, 2011 at 11:25 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Jan 13, 2011 at 2:40 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On Thu, Jan 13, 2011 at 4:24 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>> So I'm thinking to change pg_last_xlog_receive_location not to
>>> move backwards.
>>
>> The attached patch does that.
>
> It looks to me like this is changing more than just the return value
> of pg_last_xlog_receive_location.  receivedUpto isn't only used for
> that one function, and there's no explanation in your email or in the
> patch of why the new behavior is correct and/or better for the other
> places where it's used.

I believe the new walrcv->receiveStart was introduced to divide up those
behaviors that should go backwards from those that should not.

The non-retreating value is used in 3 places (via GetWalRcvWriteRecPtr)
in xlog.c.  (And some other places I haven't examined yet.)

One place is to decide whether to remove/recycle XLog files, and I think the
non-retreating value is at least as suitable for this usage.

One is to feed the pg_last_xlog_receive_location() function.

The third seems more problematic.  In the XLogPageRead,
it checks to see if more records have been received beyond what
has been applied.  By using the non-retreating value here, it seems
like the xlog replay could start replaying records that the wal
receiver is in the process of overwriting.  Now, I've argued to myself
that this is not a problem, because the receiver is overwriting them
with identical data to that which is already there.

But by that logic, why does any part of it (walrcv->receiveStart in
the patch, walrcv->receivedUpto unpatched) need to retreat?  From
the previous discussion, I understand that the concern is that we don't
want to retrieve and write out partial xlog files.  What I don't understand
is how we could get our selves into the position in which we are doing
that, other than by someone going in and doing impermissible things to
the PGDATA directory behind our backs.

I've been spinning my wheels on that part for a while.  Since I don't understand
what we are defending against in the original code, I can't evaluate
if the patch
maintains that defense.

Cheers,

Jeff


pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Re: SSI patch version 14
Next
From: Jeff Davis
Date:
Subject: Re: SSI patch version 14