pgsql: Fix catalog lookup with the wrong snapshot during logical decodi - Mailing list pgsql-committers

From Amit Kapila
Subject pgsql: Fix catalog lookup with the wrong snapshot during logical decodi
Date
Msg-id E1oM0BC-000EF8-Eg@gemulon.postgresql.org
Whole thread Raw
List pgsql-committers
Fix catalog lookup with the wrong snapshot during logical decoding.

Previously, we relied on HEAP2_NEW_CID records and XACT_INVALIDATION
records to know if the transaction has modified the catalog, and that
information is not serialized to snapshot. Therefore, after the restart,
if the logical decoding decodes only the commit record of the transaction
that has actually modified a catalog, we will miss adding its XID to the
snapshot. Thus, we will end up looking at catalogs with the wrong
snapshot.

To fix this problem, this changes the snapshot builder so that it
remembers the last-running-xacts list of the decoded RUNNING_XACTS record
after restoring the previously serialized snapshot. Then, we mark the
transaction as containing catalog changes if it's in the list of initial
running transactions and its commit record has XACT_XINFO_HAS_INVALS. To
avoid ABI breakage, we store the array of the initial running transactions
in the static variables InitialRunningXacts and NInitialRunningXacts,
instead of storing those in SnapBuild or ReorderBuffer.

This approach has a false positive; we could end up adding the transaction
that didn't change catalog to the snapshot since we cannot distinguish
whether the transaction has catalog changes only by checking the COMMIT
record. It doesn't have the information on which (sub) transaction has
catalog changes, and XACT_XINFO_HAS_INVALS doesn't necessarily indicate
that the transaction has catalog change. But that won't be a problem since
we use snapshot built during decoding only to read system catalogs.

On the master branch, we took a more future-proof approach by writing
catalog modifying transactions to the serialized snapshot which avoids the
above false positive. But we cannot backpatch it because of a change in
the SnapBuild.

Reported-by: Mike Oh
Author: Masahiko Sawada
Reviewed-by: Amit Kapila, Shi yu, Takamichi Osumi, Kyotaro Horiguchi, Bertrand Drouvot, Ahsan Hadi
Backpatch-through: 10
Discussion: https://postgr.es/m/81D0D8B0-E7C4-4999-B616-1E5004DBDCD2%40amazon.com

Branch
------
REL_14_STABLE

Details
-------
https://git.postgresql.org/pg/commitdiff/68dcce247f1a13318613a0e27782b2ca21a4ceb7

Modified Files
--------------
contrib/test_decoding/Makefile                     |   2 +-
.../expected/catalog_change_snapshot.out           |  44 +++++++
.../specs/catalog_change_snapshot.spec             |  39 ++++++
src/backend/replication/logical/decode.c           |  15 +++
src/backend/replication/logical/snapbuild.c        | 137 +++++++++++++++++++--
src/include/replication/snapbuild.h                |   3 +
6 files changed, 232 insertions(+), 8 deletions(-)


pgsql-committers by date:

Previous
From: John Naylor
Date:
Subject: pgsql: Optimize xid/subxid searches in XidInMVCCSnapshot().
Next
From: Peter Eisentraut
Date:
Subject: pgsql: Add missing space in _outA_Const() output