Thread: pg13: xlogreader API adjust

pg13: xlogreader API adjust

From
Alvaro Herrera
Date:
Hello

Per discussion in thread [1], I propose the following patch to give
another adjustment to the xlogreader API.  This results in a small but
not insignificat net reduction of lines of code.  What this patch does
is adjust the signature of these new xlogreader callbacks, making the
API simpler.  The changes are:

* the segment_open callback installs the FD in xlogreader state itself,
  instead of passing the FD back.  This was suggested by Kyotaro
  Horiguchi in that thread[2].

* We no longer pass segcxt to segment_open; it's in XLogReaderState, 
  which is already an argument.

* We no longer pass seg/segcxt to WALRead; instead, that function takes
  them from XLogReaderState, which is already an argument.
  (This means XLogSendPhysical has to drink more of the fake_xlogreader
  kool-aid.)

I claim the reason to do it now instead of pg14 is to make it simpler
for third-party xlogreader callers to adjust.

(Some might be thinking that I do this to avoid an API change later, but
my guts tell me that we'll adjust xlogreader again in pg14 for the
encryption stuff and other reasons, so.)


[1] https://postgr.es/m/20200406025651.fpzdb5yyb7qyhqko@alap3.anarazel.de
[2] https://postgr.es/m/20200508.114228.963995144765118400.horikyota.ntt@gmail.com

-- 
Álvaro Herrera                         Developer, https://www.PostgreSQL.org/

Attachment

Re: pg13: xlogreader API adjust

From
Kyotaro Horiguchi
Date:
At Mon, 11 May 2020 16:33:36 -0400, Alvaro Herrera <alvherre@2ndquadrant.com> wrote in 
> Hello
> 
> Per discussion in thread [1], I propose the following patch to give
> another adjustment to the xlogreader API.  This results in a small but
> not insignificat net reduction of lines of code.  What this patch does
> is adjust the signature of these new xlogreader callbacks, making the
> API simpler.  The changes are:
> 
> * the segment_open callback installs the FD in xlogreader state itself,
>   instead of passing the FD back.  This was suggested by Kyotaro
>   Horiguchi in that thread[2].
> 
> * We no longer pass segcxt to segment_open; it's in XLogReaderState, 
>   which is already an argument.
> 
> * We no longer pass seg/segcxt to WALRead; instead, that function takes
>   them from XLogReaderState, which is already an argument.
>   (This means XLogSendPhysical has to drink more of the fake_xlogreader
>   kool-aid.)
> 
> I claim the reason to do it now instead of pg14 is to make it simpler
> for third-party xlogreader callers to adjust.
> 
> (Some might be thinking that I do this to avoid an API change later, but
> my guts tell me that we'll adjust xlogreader again in pg14 for the
> encryption stuff and other reasons, so.)
> 
> 
> [1] https://postgr.es/m/20200406025651.fpzdb5yyb7qyhqko@alap3.anarazel.de
> [2] https://postgr.es/m/20200508.114228.963995144765118400.horikyota.ntt@gmail.com

The simplified interface of WALRead looks far better to me since it no
longer has unreasonable duplicates of parameters.  I agree to the
discussion about third-party xlogreader callers but not sure about
back-patching burden.

I'm not sure the reason for wal_segment_open and WalSndSegmentOpen
being modified different way about error handling of BasicOpenFile, I
prefer the WalSndSegmentOpen way.  However, that difference doesn't
harm anything so I'm fine with the current patch.


+    fake_xlogreader.seg = *sendSeg;
+    fake_xlogreader.segcxt = *sendCxt;

fake_xlogreader.seg is a different instance from *sendSeg.  WALRead
modifies fake_xlogreader.seg but does not modify *sendSeg. Thus the
change doesn't persist.  On the other hand WalSndSegmentOpen reads
*sendSeg, which is not under control of WALRead.

Maybe we had better to make fake_xlogreader be a global variable of
walsender.c that covers the current sendSeg and sendCxt.

regards.


-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: pg13: xlogreader API adjust

From
Alvaro Herrera
Date:
On 2020-May-12, Kyotaro Horiguchi wrote:

> I'm not sure the reason for wal_segment_open and WalSndSegmentOpen
> being modified different way about error handling of BasicOpenFile, I
> prefer the WalSndSegmentOpen way.  However, that difference doesn't
> harm anything so I'm fine with the current patch.

Yeah, I couldn't decide which style I liked the most.  I used the one
you suggested.

> +    fake_xlogreader.seg = *sendSeg;
> +    fake_xlogreader.segcxt = *sendCxt;
> 
> fake_xlogreader.seg is a different instance from *sendSeg.  WALRead
> modifies fake_xlogreader.seg but does not modify *sendSeg. Thus the
> change doesn't persist.  On the other hand WalSndSegmentOpen reads
> *sendSeg, which is not under control of WALRead.
> 
> Maybe we had better to make fake_xlogreader be a global variable of
> walsender.c that covers the current sendSeg and sendCxt.

I tried that.  I was about to leave it at just modifying physical
walsender (simple enough, and it passed tests), but I noticed that
WalSndErrorCleanup() would be a problem because we don't know if it's
physical or logical walsender.  So in the end I added a global
'xlogreader' pointer in walsender.c -- logical walsender sets it to the
true xlogreader it has inside the logical decoding context, and physical
walsender sets it to its fake xlogreader.  That seems to work nicely.
sendSeg/sendCxt are gone entirely.  Logical walsender was doing
WALOpenSegmentInit() uselessly during InitWalSender(), since it was
using the separate sendSeg/sendCxt structs instead of the ones in its
xlogreader.  (Some mysteries become clearer!)  

It's slightly disquieting that the segment_close call in
WalSndErrorCleanup is not covered, but in any case this should work well
AFAICS.  I think this is simpler to understand than formerly.

Now the only silliness remaining is the fact that different users of the
xlogreader interface are doing different things about the TLI.
Hopefully we can unify everything to something sensible one day .. but
that's not going to happen in pg13.

I'll get this pushed tomorrow, unless there are further objections.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: pg13: xlogreader API adjust

From
Alvaro Herrera
Date:
Pushed.  Thanks for the help!

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pg13: xlogreader API adjust

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Pushed.  Thanks for the help!

This seems to have fixed bowerbird.  Were you expecting that?

            regards, tom lane



Re: pg13: xlogreader API adjust

From
Alvaro Herrera
Date:
On 2020-May-13, Tom Lane wrote:

> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> > Pushed.  Thanks for the help!
> 
> This seems to have fixed bowerbird.  Were you expecting that?

Hm, not really.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pg13: xlogreader API adjust

From
"Bossart, Nathan"
Date:
I think I've discovered a problem with 850196b6.  The following steps
can be used to trigger a segfault:

        # wal_level = logical
        psql postgres -c "create database testdb;"
        psql testdb -c "select pg_create_logical_replication_slot('slot', 'test_decoding');"
        psql "dbname=postgres replication=database" -c "START_REPLICATION SLOT slot LOGICAL 0/0;"

From a quick glance, I think the problem starts in
StartLogicalReplication() in walsender.c.  The call to
CreateDecodingContext() may ERROR before xlogreader is initialized in
the next line, so the subsequent call to WalSndErrorCleanup()
segfaults when it attempts to access xlogreader.

Nathan


Re: pg13: xlogreader API adjust

From
Kyotaro Horiguchi
Date:
At Thu, 14 May 2020 01:03:48 +0000, "Bossart, Nathan" <bossartn@amazon.com> wrote in 
> I think I've discovered a problem with 850196b6.  The following steps
> can be used to trigger a segfault:
> 
>         # wal_level = logical
>         psql postgres -c "create database testdb;"
>         psql testdb -c "select pg_create_logical_replication_slot('slot', 'test_decoding');"
>         psql "dbname=postgres replication=database" -c "START_REPLICATION SLOT slot LOGICAL 0/0;"
> 
> From a quick glance, I think the problem starts in
> StartLogicalReplication() in walsender.c.  The call to
> CreateDecodingContext() may ERROR before xlogreader is initialized in
> the next line, so the subsequent call to WalSndErrorCleanup()
> segfaults when it attempts to access xlogreader.

Good catch!  That's not only for CreateDecodingContet. That happens
everywhere in the query loop in PostgresMain() until logreader is
initialized.  So that also happens, for example, by starting logical
replication using invalidated slot. Checking xlogreader != NULL in
WalSndErrorCleanup is sufficient.  It doesn't make actual difference,
but the attached explicitly initialize the pointer with NULL.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From 5421f5b0e63abfe18b6f09d75c44697cdac699f3 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyoga.ntt@gmail.com>
Date: Thu, 14 May 2020 13:37:50 +0900
Subject: [PATCH] Don't check uninitialized xlog reader state

WanSndErrorCleanup may be called before initialization of the variable
xlogreader and crashes in that case.  Let the function not to examine
the xlogreader state if not yet initialized.
---
 src/backend/replication/walsender.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 3367aa98f8..661fa12a94 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -136,7 +136,7 @@ bool        wake_wal_senders = false;
  * routines.
  */
 static XLogReaderState fake_xlogreader;
-static XLogReaderState *xlogreader;
+static XLogReaderState *xlogreader = NULL;
 
 /*
  * These variables keep track of the state of the timeline we're currently
@@ -315,7 +315,7 @@ WalSndErrorCleanup(void)
     ConditionVariableCancelSleep();
     pgstat_report_wait_end();
 
-    if (xlogreader->seg.ws_file >= 0)
+    if (xlogreader != NULL && xlogreader->seg.ws_file >= 0)
         wal_segment_close(xlogreader);
 
     if (MyReplicationSlot != NULL)
-- 
2.18.2


Re: pg13: xlogreader API adjust

From
Michael Paquier
Date:
On Thu, May 14, 2020 at 02:12:25PM +0900, Kyotaro Horiguchi wrote:
> Good catch!  That's not only for CreateDecodingContet. That happens
> everywhere in the query loop in PostgresMain() until logreader is
> initialized.  So that also happens, for example, by starting logical
> replication using invalidated slot. Checking xlogreader != NULL in
> WalSndErrorCleanup is sufficient.  It doesn't make actual difference,
> but the attached explicitly initialize the pointer with NULL.

Alvaro, are you planning to look at that?  Should we have an open item
for this matter?
--
Michael

Attachment

Re: pg13: xlogreader API adjust

From
Alvaro Herrera
Date:
On 2020-May-15, Michael Paquier wrote:

> On Thu, May 14, 2020 at 02:12:25PM +0900, Kyotaro Horiguchi wrote:
> > Good catch!  That's not only for CreateDecodingContet. That happens
> > everywhere in the query loop in PostgresMain() until logreader is
> > initialized.  So that also happens, for example, by starting logical
> > replication using invalidated slot. Checking xlogreader != NULL in
> > WalSndErrorCleanup is sufficient.  It doesn't make actual difference,
> > but the attached explicitly initialize the pointer with NULL.
> 
> Alvaro, are you planning to look at that?  Should we have an open item
> for this matter?

On it now.  I'm trying to add a test for this (needs a small change to
PostgresNode->psql), but I'm probably doing something stupid in the Perl
side, because it doesn't detect things as well as I'd like.  Still
trying, but I may be asked to evict the office soon ...

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: pg13: xlogreader API adjust

From
Kyotaro Horiguchi
Date:
At Fri, 15 May 2020 19:24:28 -0400, Alvaro Herrera <alvherre@2ndquadrant.com> wrote in 
> On 2020-May-15, Michael Paquier wrote:
> 
> > On Thu, May 14, 2020 at 02:12:25PM +0900, Kyotaro Horiguchi wrote:
> > > Good catch!  That's not only for CreateDecodingContet. That happens
> > > everywhere in the query loop in PostgresMain() until logreader is
> > > initialized.  So that also happens, for example, by starting logical
> > > replication using invalidated slot. Checking xlogreader != NULL in
> > > WalSndErrorCleanup is sufficient.  It doesn't make actual difference,
> > > but the attached explicitly initialize the pointer with NULL.
> > 
> > Alvaro, are you planning to look at that?  Should we have an open item
> > for this matter?
> 
> On it now.  I'm trying to add a test for this (needs a small change to
> PostgresNode->psql), but I'm probably doing something stupid in the Perl
> side, because it doesn't detect things as well as I'd like.  Still
> trying, but I may be asked to evict the office soon ...

FWIW, and I'm not sure which of the mail and the commit 1d3743023e was
earlier, but I confirmed that the committed test in
006_logical_decoding.pl causes a crash, and the crash is fixed by the
change of code.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center