Re: Error "initial slot snapshot too large" in create replication slot - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: Error "initial slot snapshot too large" in create replication slot
Date
Msg-id 20211011.195937.2061830189397708214.horikyota.ntt@gmail.com
Whole thread Raw
In response to Error "initial slot snapshot too large" in create replication slot  (Dilip Kumar <dilipbalaut@gmail.com>)
Responses Re: Error "initial slot snapshot too large" in create replication slot  (Dilip Kumar <dilipbalaut@gmail.com>)
List pgsql-hackers
At Mon, 11 Oct 2021 11:49:41 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in 
> While creating an "export snapshot" I don't see any protection why the
> number of xids in the snapshot can not cross the
> "GetMaxSnapshotXidCount()"?.
> 
> Basically, while converting the HISTORIC snapshot to the MVCC snapshot
> in "SnapBuildInitialSnapshot()", we add all the xids between
> snap->xmin to snap->xmax to the MVCC snap->xip array (xids for which
> commit were not recorded).  The problem is that we add both topxids as
> well as the subxids into the same array and expect that the "xid"
> count does not cross the "GetMaxSnapshotXidCount()".  So it seems like
> an issue but I am not sure what is the fix for this, some options are

It seems to me that it is a compromise between the restriction of the
legitimate snapshot and snapshots created by snapbuild.  If the xids
overflow, the resulting snapshot may lose a siginificant xid, i.e, a
top-level xid.

> a) Don't limit the xid count in the exported snapshot and dynamically
> resize the array b) Increase the limit to GetMaxSnapshotXidCount() +
> GetMaxSnapshotSubxidCount().  But in option b) there would still be a
> problem that how do we handle the overflowed subtransaction?

I'm afraid that we shouldn't expand the size limits.  If I understand
it correctly, we only need the top-level xids in the exported snapshot
and reorder buffer knows whether a xid is a top-level or not after
establishing a consistent snapshot.

The attached diff tries to make SnapBuildInitialSnapshot exclude
subtransaction from generated snapshots.  It seems working fine for
you repro. (Of course, I'm not confident that it is the correct thing,
though..)

What do you think about this?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 46e66608cf..4e452cce7c 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -3326,6 +3326,26 @@ ReorderBufferXidHasBaseSnapshot(ReorderBuffer *rb, TransactionId xid)
 }
 
 
+/*
+ * ReorderBufferXidIsKnownSubXact
+ *        Returns true if the xid is a known subtransaction.
+ */
+bool
+ReorderBufferXidIsKnownSubXact(ReorderBuffer *rb, TransactionId xid)
+{
+    ReorderBufferTXN *txn;
+
+    txn = ReorderBufferTXNByXid(rb, xid, false,
+                                NULL, InvalidXLogRecPtr, false);
+
+    /* a known subtxn? */
+    if (txn && rbtxn_is_known_subxact(txn))
+        return true;
+
+    return false;
+}
+
+
 /*
  * ---------------------------------------
  * Disk serialization support
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index a5333349a8..12d283f4de 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -591,12 +591,18 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 
         if (test == NULL)
         {
-            if (newxcnt >= GetMaxSnapshotXidCount())
-                ereport(ERROR,
-                        (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-                         errmsg("initial slot snapshot too large")));
+            bool issubxact =
+                ReorderBufferXidIsKnownSubXact(builder->reorder, xid);
 
-            newxip[newxcnt++] = xid;
+            if (!issubxact)
+            {                
+                if (newxcnt >= GetMaxSnapshotXidCount())
+                    ereport(ERROR,
+                            (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+                             errmsg("initial slot snapshot too large")));
+                
+                newxip[newxcnt++] = xid;
+            }
         }
 
         TransactionIdAdvance(xid);
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 5b40ff75f7..e5fa1051d7 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -669,6 +669,7 @@ void        ReorderBufferXidSetCatalogChanges(ReorderBuffer *, TransactionId xid, XLog
 bool        ReorderBufferXidHasCatalogChanges(ReorderBuffer *, TransactionId xid);
 bool        ReorderBufferXidHasBaseSnapshot(ReorderBuffer *, TransactionId xid);
 
+bool        ReorderBufferXidIsKnownSubXact(ReorderBuffer *rb, TransactionId xid);
 bool        ReorderBufferRememberPrepareInfo(ReorderBuffer *rb, TransactionId xid,
                                              XLogRecPtr prepare_lsn, XLogRecPtr end_lsn,
                                              TimestampTz prepare_time,

pgsql-hackers by date:

Previous
From: Bharath Rupireddy
Date:
Subject: Re: Inconsistency in startup process's MyBackendId and procsignal array registration with ProcSignalInit()
Next
From: Dilip Kumar
Date:
Subject: Re: Error "initial slot snapshot too large" in create replication slot