Handling of concurrent aborts in logical decoding of in-progress xacts - Mailing list pgsql-hackers

From Amit Kapila
Subject Handling of concurrent aborts in logical decoding of in-progress xacts
Date
Msg-id CAA4eK1KWSJKm_1kGBMmTttHyx_M_XJ_pj9_HRV8=91j_4dhA=Q@mail.gmail.com
Whole thread Raw
List pgsql-hackers
One of the problems to allow logical decoding of in-progress (or
prepared) transactions is that the transaction which we are decoding
can abort concurrently and we might not be able to detect it which
leads to getting the wrong version of a row from system catalogs.
This can further lead to unpredictable behavior when decoding WAL with
wrong catalog information.

The problem occurs when we have a sequence of operations like
Alter-Abort-Alter as discussed previously in the context of two-phase
transactions [1].  I am reiterating the same problem so as to ease the
discussion for in-progress transactions.

Suppose we created a table (mytable), then in some transaction (say
with txid-200) we are altering it and after that aborting that txn. So
pg_class will have something like this:

xmin | xmax | relname
100  | 200    | mytable
200  | 0        | mytable

After the previous abort, tuple (100,200,mytable) becomes visible to
other transactions and if we will alter the table again then xmax of
first tuple will be set current xid, resulting in following table:

xmin | xmax | relname
100  | 300    | mytable
200  | 0        | mytable
300  | 0        | mytable

At that moment we’ve lost information that first tuple was deleted by
our txn (txid-200).  And from POV of the historic snapshot, the first
tuple (100, 300, mytable) will become visible and used to decode the
WAL, but actually the second tuple should have been used. Moreover,
such a snapshot could see both tuples (first and second) violating oid
uniqueness.

Now, we are planning to handle such a situation by returning
ERRCODE_TRANSACTION_ROLLBACK sqlerrcode from system table scan APIs to
the backend decoding a specific uncommitted transaction. The decoding
logic on the receipt of such an sqlerrcode aborts the ongoing
decoding, discard the changes of current (sub)xact and continue
processing the changes of other txns.  The basic idea is that while
setting up historic snapshots (SetupHistoricSnapshot), we remember the
xid corresponding to ReorderBufferTXN (provided it's not committed
yet) whose changes we are going to decode.  Now, after getting the
tuple from a system table scan, we check if the remembered XID is
aborted and if so we return a specific error
ERRCODE_TRANSACTION_ROLLBACK which helps the caller to proceed
gracefully by discarding the changes of such a transaction.  This is
based on the idea/patch [2][3] discussed among Andres Freund and
Nikhil Sontakke in the context of logical decoding of two-phase
transactions.

We need to deal with this problem to allow logical decoding of
in-progress transactions which is being discussed in a separate thread
[4].  I have previously tried to discuss this in the main thread [5]
but didn't get much response so trying again.

Thoughts?

[1] - https://www.postgresql.org/message-id/EEBD82AA-61EE-46F4-845E-05B94168E8F2%40postgrespro.ru
[2] - https://www.postgresql.org/message-id/20180720145836.gxwhbftuoyx5h4gc%40alap3.anarazel.de
[3] - https://www.postgresql.org/message-id/CAMGcDxcBmN6jNeQkgWddfhX8HbSjQpW%3DUo70iBY3P_EPdp%2BLTQ%40mail.gmail.com
[4] - https://www.postgresql.org/message-id/688b0b7f-2f6c-d827-c27b-216a8e3ea700%402ndquadrant.com
[5] - https://www.postgresql.org/message-id/CAA4eK1KtxLpSp2rP6Rt8izQnPmhiA%3D2QUpLk%2BvoagTjKowc0HA%40mail.gmail.com

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Richard Guo
Date:
Subject: Pulling up sublink may break join-removal logic
Next
From: Masahiko Sawada
Date:
Subject: Re: Fixes for two separate bugs in nbtree VACUUM's page deletion