Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical() at walsender.c:2762 - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical() at walsender.c:2762
Date
Msg-id 20200602.150527.1892561724587572107.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical ()at walsender.c:2762  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers
At Tue, 2 Jun 2020 13:24:56 +0900, Michael Paquier <michael@paquier.xyz> wrote in 
> On Fri, May 29, 2020 at 06:09:06PM +0900, Masahiko Sawada wrote:
> > Yes. Conversely, if we start logical replication in a physical
> > replication connection (i.g. replication=true) we got an error before
> > staring replication:
> > 
> > ERROR:  logical decoding requires a database connection
> > 
> > I think we can prevent that SEGV in a similar way.
> 
> Still unconvinced as this restriction stands for logical decoding
> requiring a database connection but it is not necessarily true now as
> physical replication has less restrictions than a logical one.

If we deliberately allow physical replication on a
database-replication connection, we should revise the documentation
that way. On the other hand physical replication has wider access to a
database cluster than logical replication.  Thus allowing to start
physical replication on a logical replication connection could
introduce a problem related to privileges.  So I think it might be
better that physical and logical replication have separate pg_hba
lines.

Once we explicitly allow physical replication on a logical replication
connection in documentation, it would be far harder to change the
behavior than now.

If we are sure that that cannot be a problem, I don't object the
change in documented behavior.

> Looking at the code, I think that there is some confusion with the
> fake WAL reader used as base reference in InitWalSender() where we
> assume that it could only be used in the context of a non-database WAL
> sender.  However, this initialization happens when the WAL sender
> connection is initialized, and what I think this misses is that we 
> should try to initialize a WAL reader when actually going through a
> START_REPLICATION command.

At first fake_xlogreader was really a fake one that only provides
callback routines, but it should have been changed to a real
xlogreader at the time it began to store segment information. In that
sense moving to real xlogreader makes sense to me separately from
whether we allow physicalrep on logicalrep connections.

> I can note as well that StartLogicalReplication() moves in this sense
> by setting xlogreader to be the one from logical_decoding_ctx once the
> decoding context has been created.
> 
> This results in the attached.  The extra test from upthread to check
> that logical decoding is not allowed in a non-database WAL sender is a
> good idea, so I have kept it.

+        ereport(ERROR,
+                (errcode(ERRCODE_OUT_OF_MEMORY),
+                 errmsg("out of memory")));

The same error message is accompanied by a DETAILS in some other
places. Don't we need one for this?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: Small doc improvement about spilled txn tracking
Next
From: Amit Kapila
Date:
Subject: Re: Small doc improvement about spilled txn tracking