Thread: xlog.c: removing ReadRecPtr and EndRecPtr

xlog.c: removing ReadRecPtr and EndRecPtr

From
Robert Haas
Date:
I spent a lot of time trying to figure out why xlog.c has global
variables ReadRecPtr and EndRecPtr instead of just relying on the
eponymous structure members inside the XLogReaderState. I concluded
that the values are the same at most points in the code, and thus that
we could just use xlogreaderstate->{Read,End}RecPtr instead. There are
two places where this wouldn't produce the same results we're getting
today. Both of those appear to be bugs.

The reason why it's generally the case that ReadRecPtr ==
xlogreaderstate->ReadRecPtr and likewise for EndRecPtr is that
ReadRecord() calls XLogReadRecord(), which sets the values inside the
XLogReaderState, and then immediately assigns the values stored there
to the global variables. There's no other code that changes the other
global variables, and the only other code that changes the structure
members is XLogBeginRead(). So the values can be unequal from the time
XLogBeginRead() is called up until the time that the XLogReadRecord()
call inside ReadRecord() returns. In practice, StartupXLOG() always
calls ReadRecord() right after it calls XLogBeginRead(), and
ReadRecord() does not reference either global variable before calling
XLogReadRecord(), so the problem surface is limited to code that runs
underneath XLogReadRecord(). XLogReadRecord() is part of xlogreader.c,
but it uses a callback interface: the callback is XLogPageRead(),
which itself references EndRecPtr, and also calls
WaitForWALToBecomeAvailable(), which in turn calls
rescanLatestTimeLine(), which also references EndRecPtr. So these are
the two problem cases: XLogPageRead(), and rescanLatestTimeLine().

In rescanLatestTimeLine(), the problem is IMHO probably serious enough
to justify a separate commit with back-patching. The problem is that
EndRecPtr is being used here to reject impermissible attempts to
switch to a bad timeline, but if pg_wal starts out empty, EndRecPtr
will be 0 here, which causes the code to fail to detect a case that
should be prohibited. Consider the following test case:

- create a primary
- create standby #1 from the primary
- start standby #1 and promote it
- take a backup from the primary using -Xnone to create standby #2
- clear primary_conninfo on standby #2 and then start it
- copy 00000002.history from standby #1 to standby #2

You get:

2021-11-17 15:34:26.213 EST [7474] LOG:  selected new timeline ID: 2

But with the attached patch, you get:

2021-11-17 16:12:01.566 EST [20900] LOG:  new timeline 2 forked off
current database system timeline 1 before current recovery point
0/A000060

Had the problem occurred at some later point in the WAL stream rather
than before fetching the very first record, I think everything is
fine; at that point, I think that the global variable EndRecPtr will
be initialized. I'm not entirely sure that it contains exactly the
right value, but it's someplace in the right ballpark, at least.

In XLogPageRead(), the problem is just cosmetic. We're only using
EndRecPtr as an argument to emode_for_corrupt_record(), which is all
about suppressing duplicate complaints about the same LSN. But if the
xlogreader has been repositioned using XLogBeginRead() since the last
call to ReadRecord(), or if there are no preceding calls to
ReadRecord(), then the value of EndRecPtr here is left over from the
previous read position and is not particularly related to the record
we're reading now. xlogreader->EndRecPtr, OTOH, is. This doesn't seem
worth a separate commit to me, or a back-patch, but it seems worth
fixing while I'm cleaning up these global variables.

Review appreciated.

Thanks,

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Attachment

Re: xlog.c: removing ReadRecPtr and EndRecPtr

From
Amul Sul
Date:
On Thu, Nov 18, 2021 at 3:31 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> I spent a lot of time trying to figure out why xlog.c has global
> variables ReadRecPtr and EndRecPtr instead of just relying on the
> eponymous structure members inside the XLogReaderState. I concluded
> that the values are the same at most points in the code, and thus that
> we could just use xlogreaderstate->{Read,End}RecPtr instead. There are
> two places where this wouldn't produce the same results we're getting
> today. Both of those appear to be bugs.
>
> The reason why it's generally the case that ReadRecPtr ==
> xlogreaderstate->ReadRecPtr and likewise for EndRecPtr is that
> ReadRecord() calls XLogReadRecord(), which sets the values inside the
> XLogReaderState, and then immediately assigns the values stored there
> to the global variables. There's no other code that changes the other
> global variables, and the only other code that changes the structure
> members is XLogBeginRead(). So the values can be unequal from the time
> XLogBeginRead() is called up until the time that the XLogReadRecord()
> call inside ReadRecord() returns. In practice, StartupXLOG() always
> calls ReadRecord() right after it calls XLogBeginRead(), and
> ReadRecord() does not reference either global variable before calling
> XLogReadRecord(), so the problem surface is limited to code that runs
> underneath XLogReadRecord(). XLogReadRecord() is part of xlogreader.c,
> but it uses a callback interface: the callback is XLogPageRead(),
> which itself references EndRecPtr, and also calls
> WaitForWALToBecomeAvailable(), which in turn calls
> rescanLatestTimeLine(), which also references EndRecPtr. So these are
> the two problem cases: XLogPageRead(), and rescanLatestTimeLine().
>
> In rescanLatestTimeLine(), the problem is IMHO probably serious enough
> to justify a separate commit with back-patching. The problem is that
> EndRecPtr is being used here to reject impermissible attempts to
> switch to a bad timeline, but if pg_wal starts out empty, EndRecPtr
> will be 0 here, which causes the code to fail to detect a case that
> should be prohibited. Consider the following test case:
>
> - create a primary
> - create standby #1 from the primary
> - start standby #1 and promote it
> - take a backup from the primary using -Xnone to create standby #2
> - clear primary_conninfo on standby #2 and then start it
> - copy 00000002.history from standby #1 to standby #2
>
> You get:
>
> 2021-11-17 15:34:26.213 EST [7474] LOG:  selected new timeline ID: 2
>
> But with the attached patch, you get:
>
> 2021-11-17 16:12:01.566 EST [20900] LOG:  new timeline 2 forked off
> current database system timeline 1 before current recovery point
> 0/A000060
>

Somehow with and without patch I am getting the same log.

> Had the problem occurred at some later point in the WAL stream rather
> than before fetching the very first record, I think everything is
> fine; at that point, I think that the global variable EndRecPtr will
> be initialized. I'm not entirely sure that it contains exactly the
> right value, but it's someplace in the right ballpark, at least.
>

Agree, change seems pretty much reasonable.

> In XLogPageRead(), the problem is just cosmetic. We're only using
> EndRecPtr as an argument to emode_for_corrupt_record(), which is all
> about suppressing duplicate complaints about the same LSN. But if the
> xlogreader has been repositioned using XLogBeginRead() since the last
> call to ReadRecord(), or if there are no preceding calls to
> ReadRecord(), then the value of EndRecPtr here is left over from the
> previous read position and is not particularly related to the record
> we're reading now. xlogreader->EndRecPtr, OTOH, is. This doesn't seem
> worth a separate commit to me, or a back-patch, but it seems worth
> fixing while I'm cleaning up these global variables.
>
LGTM.

Regards,
Amul



Re: xlog.c: removing ReadRecPtr and EndRecPtr

From
Robert Haas
Date:
On Thu, Nov 18, 2021 at 7:30 AM Amul Sul <sulamul@gmail.com> wrote:
> Somehow with and without patch I am getting the same log.

Try applying the attached 0001-dubious-test-cast.patch for you and see
if that fails. It does for me. If so, then try applying
0002-fix-the-bug.patch and see if that makes it pass.

Unfortunately, this test case isn't remotely committable as-is, and I
don't know how to make it so. The main problem is that, although you
can start up a server with nothing in pg_wal, no restore_command, and
no archive command, pg_ctl will not believe that it has started. I
worked around that problem by telling pg_ctl to ignore failures, but
it still waits for a long time before timing out, which sucks both
because (1) hackers are impatient and (2) some hackers run extremely
slow buildfarm machines where almost any timeout won't be long enough.
There's a second place where the patch needs to wait for something
also, and that one I've crudely kludged with sleep(10). If anybody
around here who is good at figuring out how to write clever TAP tests
can tell me how to fix this test to be non-stupid, I will happily do
so.

Otherwise, I think I will just need to commit and back-patch the
actual bug fix without a test, and then rebase the rest of the patch I
posted previously over top of those changes for master only.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Attachment

Re: xlog.c: removing ReadRecPtr and EndRecPtr

From
Alvaro Herrera
Date:
On 2021-Nov-18, Robert Haas wrote:

> Unfortunately, this test case isn't remotely committable as-is, and I
> don't know how to make it so. The main problem is that, although you
> can start up a server with nothing in pg_wal, no restore_command, and
> no archive command, pg_ctl will not believe that it has started.

Would it work to start postmaster directly instad of using pg_ctl, and
then rely on (say) pg_isready?

-- 
Álvaro Herrera           39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
"El que vive para el futuro es un iluso, y el que vive para el pasado,
un imbécil" (Luis Adler, "Los tripulantes de la noche")



Re: xlog.c: removing ReadRecPtr and EndRecPtr

From
Robert Haas
Date:
On Thu, Nov 18, 2021 at 2:21 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> Would it work to start postmaster directly instad of using pg_ctl, and
> then rely on (say) pg_isready?

I *think* that pg_isready would also fail, because the documentation
says "pg_isready returns 0 to the shell if the server is accepting
connections normally, 1 if the server is rejecting connections (for
example during startup) ..." and I believe the "during startup" case
would apply here.

Starting postmaster directly is a thought. Is there any existing
precedent for that approach?

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: xlog.c: removing ReadRecPtr and EndRecPtr

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> There's a second place where the patch needs to wait for something
> also, and that one I've crudely kludged with sleep(10). If anybody
> around here who is good at figuring out how to write clever TAP tests
> can tell me how to fix this test to be non-stupid, I will happily do
> so.

As far as that goes, if you conceptualize it as "wait for this text
to appear in the log file", there's prior art in existing TAP tests.
Basically, sleep for some reasonable short period and check the
log file; if not there, repeat until timeout.

I'm a little dubious that this test case is valuable enough to
mess around with a nonstandard postmaster startup protocol, though.
The main reason I dislike that idea is that any fixes we apply to
the TAP tests' normal postmaster-startup code would almost inevitably
miss fixing this test.  IIRC there have been security-related fixes in
that area (e.g. where do we put the postmaster's socket), so I find
that prospect pretty scary.

            regards, tom lane



Re: xlog.c: removing ReadRecPtr and EndRecPtr

From
Robert Haas
Date:
On Thu, Nov 18, 2021 at 3:14 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > There's a second place where the patch needs to wait for something
> > also, and that one I've crudely kludged with sleep(10). If anybody
> > around here who is good at figuring out how to write clever TAP tests
> > can tell me how to fix this test to be non-stupid, I will happily do
> > so.
>
> As far as that goes, if you conceptualize it as "wait for this text
> to appear in the log file", there's prior art in existing TAP tests.
> Basically, sleep for some reasonable short period and check the
> log file; if not there, repeat until timeout.

Yeah, there's something like that in the form of find_in_log in
019_replslot_init.pl. I thought about copying that, but that didn't
seem great, and I also thought about trying to move into a common
module, which seems maybe better but also more work, and thus not
worth doing unless we have agreement that it's what we should do.

> I'm a little dubious that this test case is valuable enough to
> mess around with a nonstandard postmaster startup protocol, though.
> The main reason I dislike that idea is that any fixes we apply to
> the TAP tests' normal postmaster-startup code would almost inevitably
> miss fixing this test.  IIRC there have been security-related fixes in
> that area (e.g. where do we put the postmaster's socket), so I find
> that prospect pretty scary.

The problem that I have with the present situation is that the test
coverage of xlog.c is pretty abysmal. It actually doesn't look bad if
you just run a coverage report, but there are a shazillion flag
variables in that file and bugs like this make it quite clear that we
don't come close to testing all the possible combinations. It's really
borderline unmaintainable. I don't know whether there's a specific
individual who wrote most of this code and didn't get the memo that
global variables are best avoided, or whether this is sort of case
where we started over 1 or 2 and then it just gradually ballooned into
the giant mess that it now is, but the present situation is pretty
outrageous. It's taking me weeks of time to figure out how to make
changes that would normally take days, or maybe hours. We clearly need
to try both to get the number of cases under control by eliminating
stupid variables that are almost but not quite the same as something
else, and also get proper test coverage for the things that remain so
that it's possible to modify the code without excesive risk of
shooting ourselves in the foot.

That said, I'm not wedded to this particular test case, either. It's
an extremely specific bug that is unlikely to reappear once squashed,
and making the test case robust enough to avoid having the buildfarm
complain seems fraught.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: xlog.c: removing ReadRecPtr and EndRecPtr

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Nov 18, 2021 at 3:14 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I'm a little dubious that this test case is valuable enough to
>> mess around with a nonstandard postmaster startup protocol, though.

> The problem that I have with the present situation is that the test
> coverage of xlog.c is pretty abysmal.

Agreed, but this one test case isn't going to move the needle much.
To get to reasonable coverage we're going to need more tests, and
I definitely don't want multiple versions of ad-hoc postmaster startup
code.  If we need that, it'd be smarter to extend Cluster.pm so that
the mainline code could do what's needful.

Having said that, it wasn't entirely clear to me why you needed a
weird startup method.  Why couldn't you drop the bogus history file
into place *before* starting the charlie postmaster?  If you have
to do it after, aren't there race/timing problems anyway?

            regards, tom lane



Re: xlog.c: removing ReadRecPtr and EndRecPtr

From
Robert Haas
Date:
On Thu, Nov 18, 2021 at 4:49 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Thu, Nov 18, 2021 at 3:14 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> I'm a little dubious that this test case is valuable enough to
> >> mess around with a nonstandard postmaster startup protocol, though.
>
> > The problem that I have with the present situation is that the test
> > coverage of xlog.c is pretty abysmal.
>
> Agreed, but this one test case isn't going to move the needle much.
> To get to reasonable coverage we're going to need more tests, and
> I definitely don't want multiple versions of ad-hoc postmaster startup
> code.  If we need that, it'd be smarter to extend Cluster.pm so that
> the mainline code could do what's needful.

Perhaps so. I don't have a clear view on what a full set of good tests
would look like, so it's hard for me to guess which needs are general
and which are not.

> Having said that, it wasn't entirely clear to me why you needed a
> weird startup method.  Why couldn't you drop the bogus history file
> into place *before* starting the charlie postmaster?  If you have
> to do it after, aren't there race/timing problems anyway?

Well, I need rescanLatestTimeLine() to be called. I'm not sure that
will happen if the file is there originally -- that sounds like more
of a scan than a rescan, but I haven't poked at that angle.  I also
think it doesn't matter whether the file is dropped in or whether it
is restored via restore_command, so having the server restore the file
rather than discover that it is appeared might be another and more
satisfying option, but I also have not tested whether that reproduces
the issue. This has been extremely time-consuming to hunt down.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: xlog.c: removing ReadRecPtr and EndRecPtr

From
Alvaro Herrera
Date:
On 2021-Nov-18, Robert Haas wrote:

> On Thu, Nov 18, 2021 at 2:21 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > Would it work to start postmaster directly instad of using pg_ctl, and
> > then rely on (say) pg_isready?
> 
> I *think* that pg_isready would also fail, because the documentation
> says "pg_isready returns 0 to the shell if the server is accepting
> connections normally, 1 if the server is rejecting connections (for
> example during startup) ..." and I believe the "during startup" case
> would apply here.

Hmm, right ... I suppose there are other ways to check, but I'm not sure
that the value of adding this particular test is large enough to justify
such hacks.

I think one possibly useful technique might be Alexander Korotkov's stop
events[1], except that it is designed around having working SQL access
to the server in order to control it.  You'd need some frosting on top
in order to control the startup sequence without SQL access.

> Starting postmaster directly is a thought. Is there any existing
> precedent for that approach?

Not as far as I can see.

[1] https://postgr.es/m/CAPpHfdtSEOHX8dSk9Qp+Z++i4BGQoffKip6JDWngEA+g7Z-XmQ@mail.gmail.com

-- 
Álvaro Herrera           39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
"Entristecido, Wutra                     (canción de Las Barreras)
echa a Freyr a rodar
y a nosotros al mar"



Re: xlog.c: removing ReadRecPtr and EndRecPtr

From
Amul Sul
Date:
On Fri, Nov 19, 2021 at 12:43 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Nov 18, 2021 at 7:30 AM Amul Sul <sulamul@gmail.com> wrote:
> > Somehow with and without patch I am getting the same log.
>
> Try applying the attached 0001-dubious-test-cast.patch for you and see
> if that fails. It does for me. If so, then try applying
> 0002-fix-the-bug.patch and see if that makes it pass.
>

Thanks,  I can see the reported behavior -- 0001 alone fails & 0002
corrects that.

> Unfortunately, this test case isn't remotely committable as-is, and I
> don't know how to make it so. The main problem is that, although you
> can start up a server with nothing in pg_wal, no restore_command, and
> no archive command, pg_ctl will not believe that it has started. I
> worked around that problem by telling pg_ctl to ignore failures, but
> it still waits for a long time before timing out, which sucks both
> because (1) hackers are impatient and (2) some hackers run extremely
> slow buildfarm machines where almost any timeout won't be long enough.

Yeah :(

> There's a second place where the patch needs to wait for something
> also, and that one I've crudely kludged with sleep(10). If anybody
> around here who is good at figuring out how to write clever TAP tests
> can tell me how to fix this test to be non-stupid, I will happily do
> so.
>
> Otherwise, I think I will just need to commit and back-patch the
> actual bug fix without a test, and then rebase the rest of the patch I
> posted previously over top of those changes for master only.
>

+1.

Regards,
Amul



Re: xlog.c: removing ReadRecPtr and EndRecPtr

From
Robert Haas
Date:
On Wed, Nov 17, 2021 at 5:01 PM Robert Haas <robertmhaas@gmail.com> wrote:
> In rescanLatestTimeLine(), the problem is IMHO probably serious enough
> to justify a separate commit with back-patching.

On second thought, I think it's better not to back-patch this fix. It
turns out that, while it's easy enough to back-patch the proposed
patch, it doesn't make the test case pass in pre-v14 versions. The
test case itself requires some changes to work at all because of the
perl module renaming, but that's not a big deal. The issue is that, in
the back-branches, when starting up the server without any local WAL,
rescanLatestTimeLine() is checking with not only the wrong LSN but
also with the wrong TLI. That got fixed in master by commit
4a92a1c3d1c361ffb031ed05bf65b801241d7cdd even though, rather
unfortunately, the commit message does not say so. So to back-patch
this commit we would need to also back-patch much of that commit. But
that commit depends on the other commits that reduced use of
ThisTimeLineID. Untangling that seems like an undue amount of work and
risk for a corner-case bug fix that was discovered in the lab rather
than in the field and which won't matter anyway if you do things
correctly. So now I'm intending to commit to just to master only.

Attached please find the test case not for commit as
v2-0001-dubious-test-case.patch; the fix, for commit and now with a
proper commit message as
v2-0002-Fix-corner-case-failure-to-detect-improper-timeli.patch; and
the back-ported test case for v14 as v14-0001-dubious-test-case.patch
in case anyone wants to play around with that. (with apologies for
using the idea of a version number in two different and conflicting
senses)

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Attachment

Re: xlog.c: removing ReadRecPtr and EndRecPtr

From
Robert Haas
Date:
On Tue, Nov 23, 2021 at 1:36 PM Robert Haas <robertmhaas@gmail.com> wrote:
> Attached please find the test case not for commit as
> v2-0001-dubious-test-case.patch; the fix, for commit and now with a
> proper commit message as
> v2-0002-Fix-corner-case-failure-to-detect-improper-timeli.patch; and
> the back-ported test case for v14 as v14-0001-dubious-test-case.patch
> in case anyone wants to play around with that. (with apologies for
> using the idea of a version number in two different and conflicting
> senses)

OK, I have committed this patch, rebased the original combined patch
over top of that, and committed that too. By my count that means I've
now removed a total of 3 global variables from this file and Amul got
rid of one as well so that makes 4 in total. If we continue at this
brisk pace the code may become understandable and maintainable
sometime prior to the heat death of the universe, which I think would
be rather nice.

-- 
Robert Haas
EDB: http://www.enterprisedb.com