Re: pgsql: Avoid duplicate XIDs at recovery when building initialsnapshot - Mailing list pgsql-hackers

From Andres Freund
Subject Re: pgsql: Avoid duplicate XIDs at recovery when building initialsnapshot
Date
Msg-id 20181022174155.wxxu2qt3bcrbrqy5@alap3.anarazel.de
Whole thread Raw
In response to Re: pgsql: Avoid duplicate XIDs at recovery when building initialsnapshot  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Responses Re: pgsql: Avoid duplicate XIDs at recovery when building initialsnapshot  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-hackers
Hi,

On 2018-10-22 12:36:25 -0300, Alvaro Herrera wrote:
> On 2018-Oct-14, Andres Freund wrote:
> 
> > On 2018-10-14 13:26:24 +0000, Michael Paquier wrote:
> > > Avoid duplicate XIDs at recovery when building initial snapshot
> 
> > I'm unhappy this approach was taken over objections. Without a real
> > warning.   Even leaving the crummyness aside, did you check other users
> > of XLOG_RUNNING_XACTS, e.g. logical decoding?
> 
> Mumble.  Is there a real problem here -- I mean, did this break logical
> decoding?  Maybe we need some more tests to ensure this stuff works
> sanely for logical decoding.

Hm? My point is that this fix just puts a band-aid onto *one* of the
places that read a XLOG_RUNNING_XACTS. Which still leaves the contents
of WAL record corrupted. There's not even a note at the WAL-record's
definition or its logging denoting that the contents are not what you'd
expect.  I don't mean that the fix would break logical decoding, but
that it's possible that an equivalent of the problem affecting hot
standby also affects logical decoding. And even leaving those two users
aside, it's possible that there will be further vulernable internal
users or extensions parsing the WAL.


> FWIW and not directly related, I recently became aware (because of a
> customer question) that txid_current_snapshot() is oblivious of
> XLOG_RUNNING_XACTS in a standby.  AFAICS it's not a really serious
> concern, but it was surprising.

That's more fundamental than just XLOG_RUNNING_XACTS though, no? The
whole way running transactions (i.e. including those that are just
detected by looking at their xid) are handled in the known xids struct
and in snapshots seems incompatible with that, no?

Greetings,

Andres Freund


pgsql-hackers by date:

Previous
From: legrand legrand
Date:
Subject: Multiple Wait Events for extensions
Next
From: Andrey Lepikhov
Date:
Subject: Re: [PATCH] XLogReadRecord returns pointer to currently read page