Home > mailing lists

Re: Fix for segfault in logical replication on master - Mailing list pgsql-hackers

From	Amit Kapila
Subject	Re: Fix for segfault in logical replication on master
Date	June 18, 2021 03:48:58
Msg-id	CAA4eK1+qZO7n+JOJVLN4mBVPu6CUbuXo4NVZry4c+p2nzCNhWw@mail.gmail.com Whole thread
In response to	Re: Fix for segfault in logical replication on master (Mark Dilger <mark.dilger@enterprisedb.com>)
Responses	Re: Fix for segfault in logical replication on master
List	pgsql-hackers

Tree view

On Thu, Jun 17, 2021 at 9:26 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
>
> > On Jun 17, 2021, at 6:40 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > I think such a problem won't happen because we are using historic
> > snapshots in this context. We rely on that in a similar way in
> > reorderbuffer.c, see ReorderBufferProcessTXN.
>
> I think you are right, but that's the part I have trouble fully convincing myself is safe.  We certainly have an
historicsnapshot when we call RelationGetIndexList, but that has an early exit if the relation already has fields set,
andwe don't know if those fields were set before or after the historic snapshot was taken.  Within the context of the
pluggableinfrastructure, I think we're safe.  The only caller of RelationGetIdentityKeyBitmap() in core is
logicalrep_write_attrs(),which is only called by logicalrep_write_rel(), which is only called by
send_relation_and_attrs(),which is only called by maybe_send_schema(), which is called by pgoutput_change() and
pgoutput_truncate(),both being callbacks in core's logical replication plugin. 
>
> ReorderBufferProcessTXN calls SetupHistoricSnapshot before opening the relation and then calling
ReorderBufferApplyChangeto invoke the plugin on that opened relation, so the relation's fields could not have been
setupbefore the snapshot was taken.  Any other plugin would similarly get invoked after that same logic, so they'd be
fine,too.  The problem would only be if somebody called RelationGetIdentityKeyBitmap() or one of its calling functions
fromoutside that infrastructure.  Is that worth worrying about?  The function comments for those mention having an
historicsnapshot, and the Assert will catch if code doesn't have one, but I wonder how much of a trap for the unwary
thatis, considering that somebody might open the relation and lookup indexes for the relation before taking an historic
snapshotand calling these functions. 
>

I think in such a case the caller must call InvalidateSystemCaches
before setting up a historic snapshot, otherwise, there could be other
problems as well.

> I thought it was cheap enough to check that the relation we open is an index, because if it is not, we'll segfault
whenaccessing fields of the relation->rd_index struct.  I wouldn't necessarily advocate doing any really expensive
checkshere, but a quick sanity check seemed worth the effort. 
>

I am not telling you anything about the cost of these sanity checks. I
suggest you raise elog rather than return NULL because if this happens
there is definitely some problem and continuing won't be a good idea.

--
With Regards,
Amit Kapila.

pgsql-hackers by date:

From: Ajin Cherian
Date: 18 June 2021, 03:11:52
Subject: Re: Decoding of two-phase xacts missing from CREATE_REPLICATION_SLOT command

From: Noah Misch
Date: 18 June 2021, 03:49:31
Subject: Re: snapshot too old issues, first around wraparound and then more.

Re: Fix for segfault in logical replication on master - Mailing list pgsql-hackers

Previous

Next