Thread: Error "initial slot snapshot too large" in create replication slot
While creating an "export snapshot" I don't see any protection why the number of xids in the snapshot can not cross the "GetMaxSnapshotXidCount()"?. Basically, while converting the HISTORIC snapshot to the MVCC snapshot in "SnapBuildInitialSnapshot()", we add all the xids between snap->xmin to snap->xmax to the MVCC snap->xip array (xids for which commit were not recorded). The problem is that we add both topxids as well as the subxids into the same array and expect that the "xid" count does not cross the "GetMaxSnapshotXidCount()". So it seems like an issue but I am not sure what is the fix for this, some options are a) Don't limit the xid count in the exported snapshot and dynamically resize the array b) Increase the limit to GetMaxSnapshotXidCount() + GetMaxSnapshotSubxidCount(). But in option b) there would still be a problem that how do we handle the overflowed subtransaction? I have locally, reproduced the issue, 1. Configuration max_connections= 5 autovacuum = off max_worker_processes = 0 2.Then from pgbench I have run the attached script (test.sql) from 5 clients. ./pgbench -i postgres ./pgbench -c4 -j4 -T 3000 -f test1.sql -P1 postgres 3. Concurrently, create replication slot, [dilipkumar@localhost bin]$ ./psql "dbname=postgres replication=database" postgres[7367]=# postgres[6463]=# CREATE_REPLICATION_SLOT "slot" LOGICAL "test_decoding"; ERROR: 40001: initial slot snapshot too large LOCATION: SnapBuildInitialSnapshot, snapbuild.c:597 postgres[6463]=# CREATE_REPLICATION_SLOT "slot" LOGICAL "test_decoding"; ERROR: XX000: clearing exported snapshot in wrong transaction state LOCATION: SnapBuildClearExportedSnapshot, snapbuild.c:690 I could reproduce this issue, at least once in 8-10 attempts of creating the replication slot. Note: After that issue, I have noticed one more issue "clearing exported snapshot in wrong transaction state", that is because the "ExportInProgress" is not cleared on the transaction abort, for this, a simple fix is we can clear this state on the transaction abort, maybe I will raise this as a separate issue? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Attachment
At Mon, 11 Oct 2021 11:49:41 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in > While creating an "export snapshot" I don't see any protection why the > number of xids in the snapshot can not cross the > "GetMaxSnapshotXidCount()"?. > > Basically, while converting the HISTORIC snapshot to the MVCC snapshot > in "SnapBuildInitialSnapshot()", we add all the xids between > snap->xmin to snap->xmax to the MVCC snap->xip array (xids for which > commit were not recorded). The problem is that we add both topxids as > well as the subxids into the same array and expect that the "xid" > count does not cross the "GetMaxSnapshotXidCount()". So it seems like > an issue but I am not sure what is the fix for this, some options are It seems to me that it is a compromise between the restriction of the legitimate snapshot and snapshots created by snapbuild. If the xids overflow, the resulting snapshot may lose a siginificant xid, i.e, a top-level xid. > a) Don't limit the xid count in the exported snapshot and dynamically > resize the array b) Increase the limit to GetMaxSnapshotXidCount() + > GetMaxSnapshotSubxidCount(). But in option b) there would still be a > problem that how do we handle the overflowed subtransaction? I'm afraid that we shouldn't expand the size limits. If I understand it correctly, we only need the top-level xids in the exported snapshot and reorder buffer knows whether a xid is a top-level or not after establishing a consistent snapshot. The attached diff tries to make SnapBuildInitialSnapshot exclude subtransaction from generated snapshots. It seems working fine for you repro. (Of course, I'm not confident that it is the correct thing, though..) What do you think about this? regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c index 46e66608cf..4e452cce7c 100644 --- a/src/backend/replication/logical/reorderbuffer.c +++ b/src/backend/replication/logical/reorderbuffer.c @@ -3326,6 +3326,26 @@ ReorderBufferXidHasBaseSnapshot(ReorderBuffer *rb, TransactionId xid) } +/* + * ReorderBufferXidIsKnownSubXact + * Returns true if the xid is a known subtransaction. + */ +bool +ReorderBufferXidIsKnownSubXact(ReorderBuffer *rb, TransactionId xid) +{ + ReorderBufferTXN *txn; + + txn = ReorderBufferTXNByXid(rb, xid, false, + NULL, InvalidXLogRecPtr, false); + + /* a known subtxn? */ + if (txn && rbtxn_is_known_subxact(txn)) + return true; + + return false; +} + + /* * --------------------------------------- * Disk serialization support diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c index a5333349a8..12d283f4de 100644 --- a/src/backend/replication/logical/snapbuild.c +++ b/src/backend/replication/logical/snapbuild.c @@ -591,12 +591,18 @@ SnapBuildInitialSnapshot(SnapBuild *builder) if (test == NULL) { - if (newxcnt >= GetMaxSnapshotXidCount()) - ereport(ERROR, - (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE), - errmsg("initial slot snapshot too large"))); + bool issubxact = + ReorderBufferXidIsKnownSubXact(builder->reorder, xid); - newxip[newxcnt++] = xid; + if (!issubxact) + { + if (newxcnt >= GetMaxSnapshotXidCount()) + ereport(ERROR, + (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE), + errmsg("initial slot snapshot too large"))); + + newxip[newxcnt++] = xid; + } } TransactionIdAdvance(xid); diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h index 5b40ff75f7..e5fa1051d7 100644 --- a/src/include/replication/reorderbuffer.h +++ b/src/include/replication/reorderbuffer.h @@ -669,6 +669,7 @@ void ReorderBufferXidSetCatalogChanges(ReorderBuffer *, TransactionId xid, XLog bool ReorderBufferXidHasCatalogChanges(ReorderBuffer *, TransactionId xid); bool ReorderBufferXidHasBaseSnapshot(ReorderBuffer *, TransactionId xid); +bool ReorderBufferXidIsKnownSubXact(ReorderBuffer *rb, TransactionId xid); bool ReorderBufferRememberPrepareInfo(ReorderBuffer *rb, TransactionId xid, XLogRecPtr prepare_lsn, XLogRecPtr end_lsn, TimestampTz prepare_time,
On Mon, Oct 11, 2021 at 4:29 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > At Mon, 11 Oct 2021 11:49:41 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in > > While creating an "export snapshot" I don't see any protection why the > > number of xids in the snapshot can not cross the > > "GetMaxSnapshotXidCount()"?. > > > > Basically, while converting the HISTORIC snapshot to the MVCC snapshot > > in "SnapBuildInitialSnapshot()", we add all the xids between > > snap->xmin to snap->xmax to the MVCC snap->xip array (xids for which > > commit were not recorded). The problem is that we add both topxids as > > well as the subxids into the same array and expect that the "xid" > > count does not cross the "GetMaxSnapshotXidCount()". So it seems like > > an issue but I am not sure what is the fix for this, some options are > > It seems to me that it is a compromise between the restriction of the > legitimate snapshot and snapshots created by snapbuild. If the xids > overflow, the resulting snapshot may lose a siginificant xid, i.e, a > top-level xid. > > > a) Don't limit the xid count in the exported snapshot and dynamically > > resize the array b) Increase the limit to GetMaxSnapshotXidCount() + > > GetMaxSnapshotSubxidCount(). But in option b) there would still be a > > problem that how do we handle the overflowed subtransaction? > > I'm afraid that we shouldn't expand the size limits. If I understand > it correctly, we only need the top-level xids in the exported snapshot But since we are using this as an MVCC snapshot, if we don't have the subxid and if we also don't mark the "suboverflowed" flag then IMHO the sub-transaction visibility might be wrong, Am I missing something? > and reorder buffer knows whether a xid is a top-level or not after > establishing a consistent snapshot. > > The attached diff tries to make SnapBuildInitialSnapshot exclude > subtransaction from generated snapshots. It seems working fine for > you repro. (Of course, I'm not confident that it is the correct thing, > though..) > > What do you think about this? If your statement that we only need top-xids in the exported snapshot, is true then this fix looks fine to me. If not then we might need to add the sub-xids in the snapshot->subxip array and if it crosses the limit of GetMaxSnapshotSubxidCount(), then we can mark "suboverflowed" flag. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
At Mon, 11 Oct 2021 16:48:10 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in > On Mon, Oct 11, 2021 at 4:29 PM Kyotaro Horiguchi > <horikyota.ntt@gmail.com> wrote: > > > > At Mon, 11 Oct 2021 11:49:41 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in > > > While creating an "export snapshot" I don't see any protection why the > > > number of xids in the snapshot can not cross the > > > "GetMaxSnapshotXidCount()"?. > > > > > > Basically, while converting the HISTORIC snapshot to the MVCC snapshot > > > in "SnapBuildInitialSnapshot()", we add all the xids between > > > snap->xmin to snap->xmax to the MVCC snap->xip array (xids for which > > > commit were not recorded). The problem is that we add both topxids as > > > well as the subxids into the same array and expect that the "xid" > > > count does not cross the "GetMaxSnapshotXidCount()". So it seems like > > > an issue but I am not sure what is the fix for this, some options are > > > > It seems to me that it is a compromise between the restriction of the > > legitimate snapshot and snapshots created by snapbuild. If the xids > > overflow, the resulting snapshot may lose a siginificant xid, i.e, a > > top-level xid. > > > > > a) Don't limit the xid count in the exported snapshot and dynamically > > > resize the array b) Increase the limit to GetMaxSnapshotXidCount() + > > > GetMaxSnapshotSubxidCount(). But in option b) there would still be a > > > problem that how do we handle the overflowed subtransaction? > > > > I'm afraid that we shouldn't expand the size limits. If I understand > > it correctly, we only need the top-level xids in the exported snapshot > > But since we are using this as an MVCC snapshot, if we don't have the > subxid and if we also don't mark the "suboverflowed" flag then IMHO > the sub-transaction visibility might be wrong, Am I missing something? Sorry I should have set suboverflowed in the generated snapshot. However, we can store subxid list as well when the snapshot (or running_xact) is not overflown. These (should) works the same way. On physical standby, snapshot is created just filling up only subxip with all top and sub xids (procrray.c:2400). It would be better we do the same thing here. > > and reorder buffer knows whether a xid is a top-level or not after > > establishing a consistent snapshot. > > > > The attached diff tries to make SnapBuildInitialSnapshot exclude > > subtransaction from generated snapshots. It seems working fine for > > you repro. (Of course, I'm not confident that it is the correct thing, > > though..) > > > > What do you think about this? > > If your statement that we only need top-xids in the exported snapshot, > is true then this fix looks fine to me. If not then we might need to > add the sub-xids in the snapshot->subxip array and if it crosses the > limit of GetMaxSnapshotSubxidCount(), then we can mark "suboverflowed" > flag. So I came up with the attached version. regards. -- Kyotaro Horiguchi NTT Open Source Software Center From 374a10aa6819224ca6af548100aa34e6c772a2c3 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Date: Tue, 12 Oct 2021 13:53:27 +0900 Subject: [PATCH] Allow overflowed snapshot when creating logical replication slot Snapshot can hold top XIDs up to the number of server processes but SnapBuildInitialSnapshot tries to store all top-level and sub XIDs to there and fails for certain circumstances. Instead, create a "takenDuringRecovery" snapshot instead, which stores all XIDs in subxip array. Addition to that, allow to create an overflowed snapshot by adding to reorder buffer a function to tell whether an XID is a top-level or not. --- .../replication/logical/reorderbuffer.c | 20 ++++++++ src/backend/replication/logical/snapbuild.c | 50 ++++++++++++++----- src/include/replication/reorderbuffer.h | 1 + 3 files changed, 59 insertions(+), 12 deletions(-) diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c index 46e66608cf..4e452cce7c 100644 --- a/src/backend/replication/logical/reorderbuffer.c +++ b/src/backend/replication/logical/reorderbuffer.c @@ -3326,6 +3326,26 @@ ReorderBufferXidHasBaseSnapshot(ReorderBuffer *rb, TransactionId xid) } +/* + * ReorderBufferXidIsKnownSubXact + * Returns true if the xid is a known subtransaction. + */ +bool +ReorderBufferXidIsKnownSubXact(ReorderBuffer *rb, TransactionId xid) +{ + ReorderBufferTXN *txn; + + txn = ReorderBufferTXNByXid(rb, xid, false, + NULL, InvalidXLogRecPtr, false); + + /* a known subtxn? */ + if (txn && rbtxn_is_known_subxact(txn)) + return true; + + return false; +} + + /* * --------------------------------------- * Disk serialization support diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c index a5333349a8..d422315717 100644 --- a/src/backend/replication/logical/snapbuild.c +++ b/src/backend/replication/logical/snapbuild.c @@ -531,8 +531,9 @@ SnapBuildInitialSnapshot(SnapBuild *builder) { Snapshot snap; TransactionId xid; - TransactionId *newxip; - int newxcnt = 0; + TransactionId *newsubxip; + int newsubxcnt; + bool overflowed = false; Assert(!FirstSnapshotSet); Assert(XactIsoLevel == XACT_REPEATABLE_READ); @@ -568,9 +569,12 @@ SnapBuildInitialSnapshot(SnapBuild *builder) MyProc->xmin = snap->xmin; - /* allocate in transaction context */ - newxip = (TransactionId *) - palloc(sizeof(TransactionId) * GetMaxSnapshotXidCount()); + /* + * Allocate in transaction context. We use only subxip to store xids. See + * GetSnapshotData for details. + */ + newsubxip = (TransactionId *) + palloc(sizeof(TransactionId) * GetMaxSnapshotSubxidCount()); /* * snapbuild.c builds transactions in an "inverted" manner, which means it @@ -578,6 +582,9 @@ SnapBuildInitialSnapshot(SnapBuild *builder) * classical snapshot by marking all non-committed transactions as * in-progress. This can be expensive. */ +retry: + newsubxcnt = 0; + for (xid = snap->xmin; NormalTransactionIdPrecedes(xid, snap->xmax);) { void *test; @@ -591,12 +598,29 @@ SnapBuildInitialSnapshot(SnapBuild *builder) if (test == NULL) { - if (newxcnt >= GetMaxSnapshotXidCount()) - ereport(ERROR, - (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE), - errmsg("initial slot snapshot too large"))); + bool add = true; - newxip[newxcnt++] = xid; + /* exlude subxids when subxip is overflowed */ + if (overflowed && + ReorderBufferXidIsKnownSubXact(builder->reorder, xid)) + add = false; + + if (add) + { + if (newsubxcnt >= GetMaxSnapshotSubxidCount()) + { + if (overflowed) + ereport(ERROR, + (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE), + errmsg("initial slot snapshot too large"))); + + /* retry excluding subxids */ + overflowed = true; + goto retry; + } + + newsubxip[newsubxcnt++] = xid; + } } TransactionIdAdvance(xid); @@ -604,8 +628,10 @@ SnapBuildInitialSnapshot(SnapBuild *builder) /* adjust remaining snapshot fields as needed */ snap->snapshot_type = SNAPSHOT_MVCC; - snap->xcnt = newxcnt; - snap->xip = newxip; + snap->xcnt = 0; + snap->subxcnt = newsubxcnt; + snap->subxip = newsubxip; + snap->suboverflowed = overflowed; return snap; } diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h index 5b40ff75f7..e5fa1051d7 100644 --- a/src/include/replication/reorderbuffer.h +++ b/src/include/replication/reorderbuffer.h @@ -669,6 +669,7 @@ void ReorderBufferXidSetCatalogChanges(ReorderBuffer *, TransactionId xid, XLog bool ReorderBufferXidHasCatalogChanges(ReorderBuffer *, TransactionId xid); bool ReorderBufferXidHasBaseSnapshot(ReorderBuffer *, TransactionId xid); +bool ReorderBufferXidIsKnownSubXact(ReorderBuffer *rb, TransactionId xid); bool ReorderBufferRememberPrepareInfo(ReorderBuffer *rb, TransactionId xid, XLogRecPtr prepare_lsn, XLogRecPtr end_lsn, TimestampTz prepare_time, -- 2.27.0
At Tue, 12 Oct 2021 13:59:59 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in > So I came up with the attached version. Sorry, it was losing a piece of change. -- Kyotaro Horiguchi NTT Open Source Software Center From 424f405b4c9d41471eae1edd48cdbb6a6d47217e Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Date: Tue, 12 Oct 2021 13:53:27 +0900 Subject: [PATCH v2] Allow overflowed snapshot when creating logical replication slot Snapshot can hold top XIDs up to the number of server processes but SnapBuildInitialSnapshot tries to store all top-level and sub XIDs to there and fails for certain circumstances. Instead, create a "takenDuringRecovery" snapshot instead, which stores all XIDs in subxip array. Addition to that, allow to create an overflowed snapshot by adding to reorder buffer a function to tell whether an XID is a top-level or not. --- .../replication/logical/reorderbuffer.c | 20 +++++++ src/backend/replication/logical/snapbuild.c | 56 +++++++++++++++---- src/include/replication/reorderbuffer.h | 1 + 3 files changed, 65 insertions(+), 12 deletions(-) diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c index 46e66608cf..4e452cce7c 100644 --- a/src/backend/replication/logical/reorderbuffer.c +++ b/src/backend/replication/logical/reorderbuffer.c @@ -3326,6 +3326,26 @@ ReorderBufferXidHasBaseSnapshot(ReorderBuffer *rb, TransactionId xid) } +/* + * ReorderBufferXidIsKnownSubXact + * Returns true if the xid is a known subtransaction. + */ +bool +ReorderBufferXidIsKnownSubXact(ReorderBuffer *rb, TransactionId xid) +{ + ReorderBufferTXN *txn; + + txn = ReorderBufferTXNByXid(rb, xid, false, + NULL, InvalidXLogRecPtr, false); + + /* a known subtxn? */ + if (txn && rbtxn_is_known_subxact(txn)) + return true; + + return false; +} + + /* * --------------------------------------- * Disk serialization support diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c index a5333349a8..46c691df20 100644 --- a/src/backend/replication/logical/snapbuild.c +++ b/src/backend/replication/logical/snapbuild.c @@ -531,8 +531,9 @@ SnapBuildInitialSnapshot(SnapBuild *builder) { Snapshot snap; TransactionId xid; - TransactionId *newxip; - int newxcnt = 0; + TransactionId *newsubxip; + int newsubxcnt; + bool overflowed = false; Assert(!FirstSnapshotSet); Assert(XactIsoLevel == XACT_REPEATABLE_READ); @@ -568,9 +569,12 @@ SnapBuildInitialSnapshot(SnapBuild *builder) MyProc->xmin = snap->xmin; - /* allocate in transaction context */ - newxip = (TransactionId *) - palloc(sizeof(TransactionId) * GetMaxSnapshotXidCount()); + /* + * Allocate in transaction context. We use only subxip to store xids. See + * GetSnapshotData for details. + */ + newsubxip = (TransactionId *) + palloc(sizeof(TransactionId) * GetMaxSnapshotSubxidCount()); /* * snapbuild.c builds transactions in an "inverted" manner, which means it @@ -578,6 +582,9 @@ SnapBuildInitialSnapshot(SnapBuild *builder) * classical snapshot by marking all non-committed transactions as * in-progress. This can be expensive. */ +retry: + newsubxcnt = 0; + for (xid = snap->xmin; NormalTransactionIdPrecedes(xid, snap->xmax);) { void *test; @@ -591,12 +598,29 @@ SnapBuildInitialSnapshot(SnapBuild *builder) if (test == NULL) { - if (newxcnt >= GetMaxSnapshotXidCount()) - ereport(ERROR, - (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE), - errmsg("initial slot snapshot too large"))); + bool add = true; - newxip[newxcnt++] = xid; + /* exlude subxids when subxip is overflowed */ + if (overflowed && + ReorderBufferXidIsKnownSubXact(builder->reorder, xid)) + add = false; + + if (add) + { + if (newsubxcnt >= GetMaxSnapshotSubxidCount()) + { + if (overflowed) + ereport(ERROR, + (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE), + errmsg("initial slot snapshot too large"))); + + /* retry excluding subxids */ + overflowed = true; + goto retry; + } + + newsubxip[newsubxcnt++] = xid; + } } TransactionIdAdvance(xid); @@ -604,8 +628,16 @@ SnapBuildInitialSnapshot(SnapBuild *builder) /* adjust remaining snapshot fields as needed */ snap->snapshot_type = SNAPSHOT_MVCC; - snap->xcnt = newxcnt; - snap->xip = newxip; + snap->xcnt = 0; + snap->subxcnt = newsubxcnt; + snap->subxip = newsubxip; + snap->suboverflowed = overflowed; + + /* + * Although this snapshot is taken actually not during recovery, all XIDs + * are stored in subxip even if it is not overflowed. + */ + snap->takenDuringRecovery = true; return snap; } diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h index 5b40ff75f7..e5fa1051d7 100644 --- a/src/include/replication/reorderbuffer.h +++ b/src/include/replication/reorderbuffer.h @@ -669,6 +669,7 @@ void ReorderBufferXidSetCatalogChanges(ReorderBuffer *, TransactionId xid, XLog bool ReorderBufferXidHasCatalogChanges(ReorderBuffer *, TransactionId xid); bool ReorderBufferXidHasBaseSnapshot(ReorderBuffer *, TransactionId xid); +bool ReorderBufferXidIsKnownSubXact(ReorderBuffer *rb, TransactionId xid); bool ReorderBufferRememberPrepareInfo(ReorderBuffer *rb, TransactionId xid, XLogRecPtr prepare_lsn, XLogRecPtr end_lsn, TimestampTz prepare_time, -- 2.27.0
On Tue, Oct 12, 2021 at 10:35 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > At Tue, 12 Oct 2021 13:59:59 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in > > So I came up with the attached version. > > Sorry, it was losing a piece of change. Yeah, at a high level this is on the idea I have in mind, I will do a detailed review in a day or two. Thanks for working on this. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Tue, Oct 12, 2021 at 11:30 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Tue, Oct 12, 2021 at 10:35 AM Kyotaro Horiguchi > <horikyota.ntt@gmail.com> wrote: > > > > At Tue, 12 Oct 2021 13:59:59 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in > > > So I came up with the attached version. > > > > Sorry, it was losing a piece of change. > > Yeah, at a high level this is on the idea I have in mind, I will do a > detailed review in a day or two. Thanks for working on this. While doing the detailed review, I think there are a couple of problems with the patch, the main problem of storing all the xid in the snap->subxip is that once we mark the snapshot overflown then the XidInMVCCSnapshot, will not search the subxip array, instead it will fetch the topXid and search in the snap->xip array. Another issue is that the total xids could be GetMaxSnapshotSubxidCount() +GetMaxSnapshotXidCount(). I think what we should be doing is that if the xid is know subxid then add in the snap->subxip array otherwise in snap->xip array. So snap->xip array size will be GetMaxSnapshotXidCount() whereas the snap->subxip array size will be GetMaxSnapshotSubxidCount(). And if the array size is full then we can stop adding the subxids to the array. What is your thought on this? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Tue, Oct 19, 2021 at 2:25 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Tue, Oct 12, 2021 at 11:30 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Tue, Oct 12, 2021 at 10:35 AM Kyotaro Horiguchi > > <horikyota.ntt@gmail.com> wrote: > > > > > > At Tue, 12 Oct 2021 13:59:59 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in > > > > So I came up with the attached version. > > > > > > Sorry, it was losing a piece of change. > > > > Yeah, at a high level this is on the idea I have in mind, I will do a > > detailed review in a day or two. Thanks for working on this. > > While doing the detailed review, I think there are a couple of > problems with the patch, the main problem of storing all the xid in > the snap->subxip is that once we mark the snapshot overflown then the > XidInMVCCSnapshot, will not search the subxip array, instead it will > fetch the topXid and search in the snap->xip array. I missed that you are marking the snapshot as takenDuringRecovery so your fix looks fine. Another issue is > that the total xids could be GetMaxSnapshotSubxidCount() > +GetMaxSnapshotXidCount(). > I have fixed this, the updated patch is attached. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Attachment
Hi, On Tue, Nov 02, 2021 at 04:40:39PM +0530, Dilip Kumar wrote: > > I have fixed this, the updated patch is attached. The cfbot reports that this patch doesn't compile: https://cirrus-ci.com/task/5642000073490432?logs=build [03:01:24.477] snapbuild.c: In function ‘SnapBuildInitialSnapshot’: [03:01:24.477] snapbuild.c:587:2: error: ‘newsubxcnt’ undeclared (first use in this function); did you mean ‘newsubxip’? [03:01:24.477] 587 | newsubxcnt = 0; [03:01:24.477] | ^~~~~~~~~~ [03:01:24.477] | newsubxip [03:01:24.477] snapbuild.c:587:2: note: each undeclared identifier is reported only once for each function it appears in [03:01:24.477] snapbuild.c:535:8: warning: unused variable ‘maxxidcnt’ [-Wunused-variable] [03:01:24.477] 535 | int maxxidcnt; [03:01:24.477] | ^~~~~~~~~ Could you send a new version? In the meantime I will switch the patch to Waiting on Author.
On Wed, Jan 12, 2022 at 4:09 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
Hi,
On Tue, Nov 02, 2021 at 04:40:39PM +0530, Dilip Kumar wrote:
>
> I have fixed this, the updated patch is attached.
The cfbot reports that this patch doesn't compile:
https://cirrus-ci.com/task/5642000073490432?logs=build
[03:01:24.477] snapbuild.c: In function ‘SnapBuildInitialSnapshot’:
[03:01:24.477] snapbuild.c:587:2: error: ‘newsubxcnt’ undeclared (first use in this function); did you mean ‘newsubxip’?
[03:01:24.477] 587 | newsubxcnt = 0;
[03:01:24.477] | ^~~~~~~~~~
[03:01:24.477] | newsubxip
[03:01:24.477] snapbuild.c:587:2: note: each undeclared identifier is reported only once for each function it appears in
[03:01:24.477] snapbuild.c:535:8: warning: unused variable ‘maxxidcnt’ [-Wunused-variable]
[03:01:24.477] 535 | int maxxidcnt;
[03:01:24.477] | ^~~~~~~~~
Could you send a new version? In the meantime I will switch the patch to
Waiting on Author.
Thanks for notifying, I will work on this and send the update patch soon.
At Tue, 2 Nov 2021 16:40:39 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in > On Tue, Oct 19, 2021 at 2:25 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Tue, Oct 12, 2021 at 11:30 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > On Tue, Oct 12, 2021 at 10:35 AM Kyotaro Horiguchi > > > <horikyota.ntt@gmail.com> wrote: > > > > > > > > At Tue, 12 Oct 2021 13:59:59 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in > > > > > So I came up with the attached version. > > > > > > > > Sorry, it was losing a piece of change. > > > > > > Yeah, at a high level this is on the idea I have in mind, I will do a > > > detailed review in a day or two. Thanks for working on this. > > > > While doing the detailed review, I think there are a couple of > > problems with the patch, the main problem of storing all the xid in > > the snap->subxip is that once we mark the snapshot overflown then the > > XidInMVCCSnapshot, will not search the subxip array, instead it will > > fetch the topXid and search in the snap->xip array. > > I missed that you are marking the snapshot as takenDuringRecovery so > your fix looks fine. > > Another issue is > > that the total xids could be GetMaxSnapshotSubxidCount() > > +GetMaxSnapshotXidCount(). > > > > I have fixed this, the updated patch is attached. Mmm. The size of the array cannot be larger than the numbers the *Connt() functions return. Thus we cannot attach the oversized array to ->subxip. (I don't recall clearly but that would lead to assertion failure somewhere..) regards. -- Kyotaro Horiguchi NTT Open Source Software Center
At Mon, 17 Jan 2022 09:27:14 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in > On Wed, Jan 12, 2022 at 4:09 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > The cfbot reports that this patch doesn't compile: > > https://cirrus-ci.com/task/5642000073490432?logs=build > > > > [03:01:24.477] snapbuild.c: In function ‘SnapBuildInitialSnapshot’: > > [03:01:24.477] snapbuild.c:587:2: error: ‘newsubxcnt’ undeclared (first > > use in this function); did you mean ‘newsubxip’? > > [03:01:24.477] 587 | newsubxcnt = 0; > > [03:01:24.477] | ^~~~~~~~~~ > > [03:01:24.477] | newsubxip > > [03:01:24.477] snapbuild.c:587:2: note: each undeclared identifier is > > reported only once for each function it appears in > > [03:01:24.477] snapbuild.c:535:8: warning: unused variable ‘maxxidcnt’ > > [-Wunused-variable] > > [03:01:24.477] 535 | int maxxidcnt; > > [03:01:24.477] | ^~~~~~~~~ > > > > Could you send a new version? In the meantime I will switch the patch to > > Waiting on Author. > > > > Thanks for notifying, I will work on this and send the update patch soon. me> Mmm. The size of the array cannot be larger than the numbers the me> *Connt() functions return. Thus we cannot attach the oversized array me> to ->subxip. (I don't recall clearly but that would lead to assertion me> failure somewhere..) Then, I fixed the v3 error and post v4. To recap: SnapBUildInitialSnapshot tries to store XIDS of both top and sub transactions into snapshot->xip array but the array is easily overflowed and CREATE_REPLICATOIN_SLOT command ends with an error. To fix this, this patch is doing the following things. - Use subxip array instead of xip array to allow us have larger array for xids. So the snapshot is marked as takenDuringRecovery, which is a kind of abuse but largely reduces the chance of getting "initial slot snapshot too large" error. - Still if subxip is overflowed, retry with excluding subtransactions then set suboverflowed. This causes XidInMVCCSnapshot (finally) scans over subxip array for targetted top-level xid. We could take another way: make a !takenDuringRecovery snapshot by using xip instead of subxip. It is cleaner but it has far larger chance of needing to retry. (renamed the patch since it represented a part of the patch) regards. -- Kyotaro Horiguchi NTT Open Source Software Center From 1e3ec40d70c2e7f8482632333001fe52527ef17a Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Date: Mon, 31 Jan 2022 15:03:33 +0900 Subject: [PATCH v4] Avoid an error while creating logical replication slot Snapshot can hold top XIDs up to the number of server processes but SnapBuildInitialSnapshot tries to store all top-level and sub XIDs to there and easily fails with "initial slot snapshot too large" error. Instead, create a "takenDuringRecovery" snapshot, which stores all XIDs in subxip array. This alone largely reduces the chance of getting the error. But to eliminate the chance of the error to occur, allow to create an overflowed snapshot by adding to reorder buffer a function to tell whether an XID is a top-level or not. Author: Dilip Kumar and Kyotaro Horiguchi --- .../replication/logical/reorderbuffer.c | 20 +++++++ src/backend/replication/logical/snapbuild.c | 54 ++++++++++++++----- src/include/replication/reorderbuffer.h | 1 + 3 files changed, 63 insertions(+), 12 deletions(-) diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c index 19b2ba2000..429e6296ab 100644 --- a/src/backend/replication/logical/reorderbuffer.c +++ b/src/backend/replication/logical/reorderbuffer.c @@ -3326,6 +3326,26 @@ ReorderBufferXidHasBaseSnapshot(ReorderBuffer *rb, TransactionId xid) } +/* + * ReorderBufferXidIsKnownSubXact + * Returns true if the xid is a known subtransaction. + */ +bool +ReorderBufferXidIsKnownSubXact(ReorderBuffer *rb, TransactionId xid) +{ + ReorderBufferTXN *txn; + + txn = ReorderBufferTXNByXid(rb, xid, false, + NULL, InvalidXLogRecPtr, false); + + /* a known subtxn? */ + if (txn && rbtxn_is_known_subxact(txn)) + return true; + + return false; +} + + /* * --------------------------------------- * Disk serialization support diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c index 83fca8a77d..21f30fa1d3 100644 --- a/src/backend/replication/logical/snapbuild.c +++ b/src/backend/replication/logical/snapbuild.c @@ -531,8 +531,9 @@ SnapBuildInitialSnapshot(SnapBuild *builder) { Snapshot snap; TransactionId xid; - TransactionId *newxip; - int newxcnt = 0; + TransactionId *newsubxip; + int newsubxcnt; + bool overflowed = false; Assert(!FirstSnapshotSet); Assert(XactIsoLevel == XACT_REPEATABLE_READ); @@ -568,9 +569,12 @@ SnapBuildInitialSnapshot(SnapBuild *builder) MyProc->xmin = snap->xmin; - /* allocate in transaction context */ - newxip = (TransactionId *) - palloc(sizeof(TransactionId) * GetMaxSnapshotXidCount()); + /* + * Allocate in transaction context. We use only subxip to store xids. See + * below and GetSnapshotData for details. + */ + newsubxip = (TransactionId *) + palloc(sizeof(TransactionId) * GetMaxSnapshotSubxidCount()); /* * snapbuild.c builds transactions in an "inverted" manner, which means it @@ -578,6 +582,9 @@ SnapBuildInitialSnapshot(SnapBuild *builder) * classical snapshot by marking all non-committed transactions as * in-progress. This can be expensive. */ +retry: + newsubxcnt = 0; + for (xid = snap->xmin; NormalTransactionIdPrecedes(xid, snap->xmax);) { void *test; @@ -591,12 +598,28 @@ SnapBuildInitialSnapshot(SnapBuild *builder) if (test == NULL) { - if (newxcnt >= GetMaxSnapshotXidCount()) - ereport(ERROR, - (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE), - errmsg("initial slot snapshot too large"))); + /* + * If subtransactions fit the subxid array, we fill the array and + * call it a day. Otherwise we store only top-level transactions + * into the same subxip array. + */ + if (!overflowed || + !ReorderBufferXidIsKnownSubXact(builder->reorder, xid)) + { + if (newsubxcnt >= GetMaxSnapshotSubxidCount()) + { + if (overflowed) + ereport(ERROR, + (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE), + errmsg("initial slot snapshot too large"))); - newxip[newxcnt++] = xid; + /* retry excluding subxids */ + overflowed = true; + goto retry; + } + + newsubxip[newsubxcnt++] = xid; + } } TransactionIdAdvance(xid); @@ -604,8 +627,15 @@ SnapBuildInitialSnapshot(SnapBuild *builder) /* adjust remaining snapshot fields as needed */ snap->snapshot_type = SNAPSHOT_MVCC; - snap->xcnt = newxcnt; - snap->xip = newxip; + snap->xcnt = 0; + snap->subxcnt = newsubxcnt; + snap->subxip = newsubxip; + snap->suboverflowed = overflowed; + /* + * Although this snapshot is taken actually not during recovery, all XIDs + * are stored in subxip even if it is not overflowed. + */ + snap->takenDuringRecovery = true; return snap; } diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h index aa0a73382f..c05cf3078c 100644 --- a/src/include/replication/reorderbuffer.h +++ b/src/include/replication/reorderbuffer.h @@ -669,6 +669,7 @@ void ReorderBufferXidSetCatalogChanges(ReorderBuffer *, TransactionId xid, XLog bool ReorderBufferXidHasCatalogChanges(ReorderBuffer *, TransactionId xid); bool ReorderBufferXidHasBaseSnapshot(ReorderBuffer *, TransactionId xid); +bool ReorderBufferXidIsKnownSubXact(ReorderBuffer *rb, TransactionId xid); bool ReorderBufferRememberPrepareInfo(ReorderBuffer *rb, TransactionId xid, XLogRecPtr prepare_lsn, XLogRecPtr end_lsn, TimestampTz prepare_time, -- 2.27.0
On Mon, Jan 31, 2022 at 11:50 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > SnapBUildInitialSnapshot tries to store XIDS of both top and sub > transactions into snapshot->xip array but the array is easily > overflowed and CREATE_REPLICATOIN_SLOT command ends with an error. > > To fix this, this patch is doing the following things. > > - Use subxip array instead of xip array to allow us have larger array > for xids. So the snapshot is marked as takenDuringRecovery, which > is a kind of abuse but largely reduces the chance of getting > "initial slot snapshot too large" error. > > - Still if subxip is overflowed, retry with excluding subtransactions > then set suboverflowed. This causes XidInMVCCSnapshot (finally) > scans over subxip array for targetted top-level xid. > > We could take another way: make a !takenDuringRecovery snapshot by > using xip instead of subxip. It is cleaner but it has far larger > chance of needing to retry. > > (renamed the patch since it represented a part of the patch) > Thanks for the updated version. I will look into it this week. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Mon, Jan 31, 2022 at 11:50 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > At Mon, 17 Jan 2022 09:27:14 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in > > me> Mmm. The size of the array cannot be larger than the numbers the > me> *Connt() functions return. Thus we cannot attach the oversized array > me> to ->subxip. (I don't recall clearly but that would lead to assertion > me> failure somewhere..) > > Then, I fixed the v3 error and post v4. Yeah you are right, SetTransactionSnapshot() has that assertion. Anyway after looking again it appears that GetMaxSnapshotSubxidCount is the correct size because this is PGPROC_MAX_CACHED_SUBXIDS +1, i.e. it considers top transactions as well so we don't need to add them separately. > > SnapBUildInitialSnapshot tries to store XIDS of both top and sub > transactions into snapshot->xip array but the array is easily > overflowed and CREATE_REPLICATOIN_SLOT command ends with an error. > > To fix this, this patch is doing the following things. > > - Use subxip array instead of xip array to allow us have larger array > for xids. So the snapshot is marked as takenDuringRecovery, which > is a kind of abuse but largely reduces the chance of getting > "initial slot snapshot too large" error. Right. I think the patch looks fine to me. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
В Пн, 07/02/2022 в 13:52 +0530, Dilip Kumar пишет: > On Mon, Jan 31, 2022 at 11:50 AM Kyotaro Horiguchi > <horikyota.ntt@gmail.com> wrote: > > At Mon, 17 Jan 2022 09:27:14 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in > > > > me> Mmm. The size of the array cannot be larger than the numbers the > > me> *Connt() functions return. Thus we cannot attach the oversized array > > me> to ->subxip. (I don't recall clearly but that would lead to assertion > > me> failure somewhere..) > > > > Then, I fixed the v3 error and post v4. > > Yeah you are right, SetTransactionSnapshot() has that assertion. > Anyway after looking again it appears that > GetMaxSnapshotSubxidCount is the correct size because this is > PGPROC_MAX_CACHED_SUBXIDS +1, i.e. it considers top transactions as > well so we don't need to add them separately. > > > SnapBUildInitialSnapshot tries to store XIDS of both top and sub > > transactions into snapshot->xip array but the array is easily > > overflowed and CREATE_REPLICATOIN_SLOT command ends with an error. > > > > To fix this, this patch is doing the following things. > > > > - Use subxip array instead of xip array to allow us have larger array > > for xids. So the snapshot is marked as takenDuringRecovery, which > > is a kind of abuse but largely reduces the chance of getting > > "initial slot snapshot too large" error. > > Right. I think the patch looks fine to me. > Good day. I've looked to the patch. Personally I'd prefer dynamically resize xip array. But I think there is issue with upgrade if replica source is upgraded before destination, right? Concerning patch, I think more comments should be written about new usage case for `takenDuringRecovery`. May be this field should be renamed at all? And there are checks for `takenDuringRecovery` in `heapgetpage` and `heapam_scan_sample_next_tuple`. Are this checks affected by the change? Neither the preceding discussion nor commit message answer me. ------- regards Yura Sokolov Postgres Professional y.sokolov@postgrespro.ru funny.falcon@gmail.com
At Sun, 13 Feb 2022 17:35:38 +0300, Yura Sokolov <y.sokolov@postgrespro.ru> wrote in > В Пн, 07/02/2022 в 13:52 +0530, Dilip Kumar пишет: > > Right. I think the patch looks fine to me. > > > > Good day. > > I've looked to the patch. Personally I'd prefer dynamically resize > xip array. But I think there is issue with upgrade if replica source > is upgraded before destination, right? I don't think it is relevant. I think we don't convey snapshot via streaming replication. But I think that expanding xip or subxip is wrong, since it is tied with ProcArray structure. (Even though we abuse the arrays in some situations, like this). > Concerning patch, I think more comments should be written about new > usage case for `takenDuringRecovery`. May be this field should be renamed > at all? I don't feel the need to rename it so much. It just signals that "this snapshot is in the form for recovery". The more significant reason is that I don't come up better name:p And the comment is slightly modified and gets a pointer to detailed comment. + * Although this snapshot is not acutally taken during recovery, all XIDs + * are stored in subxip. See GetSnapshotData for the details of that form + * of snapshot. > And there are checks for `takenDuringRecovery` in `heapgetpage` and > `heapam_scan_sample_next_tuple`. Are this checks affected by the change? > Neither the preceding discussion nor commit message answer me. The snapshot works correctly, but for the heapgetpage case, it foces all_visible to be false. That unnecessarily prevents visibility check from skipping. An annoying thing in SnapBuildInitialSnapshot is that we don't know the number of xids before looping over the xid range, and we don't want to bother sorting top-level xids and subxids unless we have to do so. Is it better that we hassle in SnapBuildInitialSnapshot to create a !takenDuringRecovery snapshot? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
At Fri, 01 Apr 2022 14:44:30 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in > At Sun, 13 Feb 2022 17:35:38 +0300, Yura Sokolov <y.sokolov@postgrespro.ru> wrote in > > And there are checks for `takenDuringRecovery` in `heapgetpage` and > > `heapam_scan_sample_next_tuple`. Are this checks affected by the change? > > Neither the preceding discussion nor commit message answer me. > > The snapshot works correctly, but for the heapgetpage case, it foces > all_visible to be false. That unnecessarily prevents visibility check > from skipping. > > An annoying thing in SnapBuildInitialSnapshot is that we don't know > the number of xids before looping over the xid range, and we don't > want to bother sorting top-level xids and subxids unless we have to do > so. > > Is it better that we hassle in SnapBuildInitialSnapshot to create a > !takenDuringRecovery snapshot? So this is that. v5 creates a regular snapshot. By the way, is there any chance this could be committed to 15? Otherwise I'll immediately move this to the next CF. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Attachment
On Thu, Mar 31, 2022 at 11:53 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > So this is that. v5 creates a regular snapshot. This patch will need a quick rebase over 905c020bef9, which added `extern` to several missing locations. Thanks, --Jacob
At Tue, 5 Jul 2022 11:32:42 -0700, Jacob Champion <jchampion@timescale.com> wrote in > On Thu, Mar 31, 2022 at 11:53 PM Kyotaro Horiguchi > <horikyota.ntt@gmail.com> wrote: > > So this is that. v5 creates a regular snapshot. > > This patch will need a quick rebase over 905c020bef9, which added > `extern` to several missing locations. Thanks! Just rebased. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Attachment
On Mon, Jul 18, 2022 at 10:55 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > Thanks! Just rebased. Hi, I mentioned this patch to Andres in conversation, and he expressed a concern that there might be no guarantee that we retain enough CLOG to look up XIDs. Presumably this wouldn't be an issue when the snapshot doesn't get marked suboverflowed, but what if it does? Adding Andres in the hopes that he may comment further. -- Robert Haas EDB: http://www.enterprisedb.com
Hi, Thanks for working on this! I think this should include a test that fails without this change and succeeds with it... On 2022-07-19 11:55:06 +0900, Kyotaro Horiguchi wrote: > From abcf0a0e0b3e2de9927d8943a3e3c145ab189508 Mon Sep 17 00:00:00 2001 > From: Kyotaro Horiguchi <horikyota.ntt@gmail.com> > Date: Tue, 19 Jul 2022 11:50:29 +0900 > Subject: [PATCH v6] Create correct snapshot during CREATE_REPLICATION_SLOT This sees a tad misleading - the previous snapshot wasn't borken, right? > +/* > + * ReorderBufferXidIsKnownSubXact > + * Returns true if the xid is a known subtransaction. > + */ > +bool > +ReorderBufferXidIsKnownSubXact(ReorderBuffer *rb, TransactionId xid) > +{ > + ReorderBufferTXN *txn; > + > + txn = ReorderBufferTXNByXid(rb, xid, false, > + NULL, InvalidXLogRecPtr, false); > + > + /* a known subtxn? */ > + if (txn && rbtxn_is_known_subxact(txn)) > + return true; > + > + return false; > +} The comments here just seem to restate the code.... It's not obvious to me that it's the right design (or even correct) to ask reorderbuffer about an xid being a subxid. Maybe I'm missing something, but why would reorderbuffer even be guaranteed to know about all these subxids? > @@ -568,9 +571,17 @@ SnapBuildInitialSnapshot(SnapBuild *builder) > > MyProc->xmin = snap->xmin; > > - /* allocate in transaction context */ > + /* > + * Allocate in transaction context. > + * > + * We could use only subxip to store all xids (takenduringrecovery > + * snapshot) but that causes useless visibility checks later so we hasle to > + * create a normal snapshot. > + */ I can't really parse this comment at this point, and I seriously doubt I could later on. > @@ -591,12 +605,24 @@ SnapBuildInitialSnapshot(SnapBuild *builder) > > if (test == NULL) > { > - if (newxcnt >= GetMaxSnapshotXidCount()) > - ereport(ERROR, > - (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE), > - errmsg("initial slot snapshot too large"))); > - > - newxip[newxcnt++] = xid; > + /* Store the xid to the appropriate xid array */ > + if (ReorderBufferXidIsKnownSubXact(builder->reorder, xid)) > + { > + if (!overflowed) > + { > + if (newsubxcnt >= GetMaxSnapshotSubxidCount()) > + overflowed = true; > + else > + newsubxip[newsubxcnt++] = xid; > + } > + } > + else > + { > + if (newxcnt >= GetMaxSnapshotXidCount()) > + elog(ERROR, > + "too many transactions while creating snapshot"); > + newxip[newxcnt++] = xid; > + } > } Hm, this is starting to be pretty deeply nested... I wonder if a better fix here wouldn't be to allow importing a snapshot with a larger ->xid array. Yes, we can't do that in CurrentSnapshotData, but IIRC we need to be in a transactional snapshot anyway, which is copied anyway? Greetings, Andres Freund
Hi, On 2022-09-09 13:19:14 -0400, Robert Haas wrote: > I mentioned this patch to Andres in conversation, and he expressed a > concern that there might be no guarantee that we retain enough CLOG to > look up XIDs. I was concerned we wouldn't keep enough subtrans, rather than clog. But I think we're ok, because we need to have an appropriate ->xmin for exporting / importing the snapshot. Greetings, Andres Freund
On Tue, Sep 13, 2022 at 3:22 AM Andres Freund <andres@anarazel.de> wrote: > > It's not obvious to me that it's the right design (or even correct) to ask > reorderbuffer about an xid being a subxid. Maybe I'm missing something, but > why would reorderbuffer even be guaranteed to know about all these subxids? Yeah, you are right, the reorderbuffer will only know about the transaction for which changes got added to the reorder buffer. So this seems not to be the right design idea. > > I wonder if a better fix here wouldn't be to allow importing a snapshot with a > larger ->xid array. Yes, we can't do that in CurrentSnapshotData, but IIRC we > need to be in a transactional snapshot anyway, which is copied anyway? Yeah when I first found this issue, I thought that should be the solution. But later it went in a different direction. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Thanks for raizing this up, Robert and the comment, Andres. At Tue, 13 Sep 2022 07:00:42 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in > On Tue, Sep 13, 2022 at 3:22 AM Andres Freund <andres@anarazel.de> wrote: > > > > > It's not obvious to me that it's the right design (or even correct) to ask > > reorderbuffer about an xid being a subxid. Maybe I'm missing something, but > > why would reorderbuffer even be guaranteed to know about all these subxids? > > Yeah, you are right, the reorderbuffer will only know about the > transaction for which changes got added to the reorder buffer. So > this seems not to be the right design idea. That function is called after the SnapBuild reaches SNAPBUILD_CONSISTENT state ,or SnapBuildInitialSnapshot() rejects other than that state. That is, IIUC the top-sub relationship of all the currently running transactions is fully known to reorder buffer. We need a comment about that. > > I wonder if a better fix here wouldn't be to allow importing a snapshot with a > > larger ->xid array. Yes, we can't do that in CurrentSnapshotData, but IIRC we > > need to be in a transactional snapshot anyway, which is copied anyway? > > Yeah when I first found this issue, I thought that should be the > solution. But later it went in a different direction. I think that that is the best solution if rbtxn_is_known_subxzact() is known to be unreliable at the time. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Tue, Sep 13, 2022 at 11:52 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > Thanks for raizing this up, Robert and the comment, Andres. > > At Tue, 13 Sep 2022 07:00:42 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in > > On Tue, Sep 13, 2022 at 3:22 AM Andres Freund <andres@anarazel.de> wrote: > > > > > > > > It's not obvious to me that it's the right design (or even correct) to ask > > > reorderbuffer about an xid being a subxid. Maybe I'm missing something, but > > > why would reorderbuffer even be guaranteed to know about all these subxids? > > > > Yeah, you are right, the reorderbuffer will only know about the > > transaction for which changes got added to the reorder buffer. So > > this seems not to be the right design idea. > > That function is called after the SnapBuild reaches > SNAPBUILD_CONSISTENT state ,or SnapBuildInitialSnapshot() rejects > other than that state. That is, IIUC the top-sub relationship of all > the currently running transactions is fully known to reorder buffer. > We need a comment about that. I don't think this assumption is true, any xid started after switching to the SNAPBUILD_FULL_SNAPSHOT and before switching to the SNAPBUILD_CONSISTENT, might still be in progress so we can not identify whether they are subxact or not from reorder buffer. refer to this comment: /* * c) transition from FULL_SNAPSHOT to CONSISTENT. * * In FULL_SNAPSHOT state (see d) ), and this xl_running_xacts' * oldestRunningXid is >= than nextXid from when we switched to * FULL_SNAPSHOT. This means all transactions that are currently in * progress have a catalog snapshot, and all their changes have been * collected. Switch to CONSISTENT. */ -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
At Mon, 12 Sep 2022 14:51:56 -0700, Andres Freund <andres@anarazel.de> wrote in > Hi, > > Thanks for working on this! > > > I think this should include a test that fails without this change and succeeds > with it... > > > On 2022-07-19 11:55:06 +0900, Kyotaro Horiguchi wrote: > > From abcf0a0e0b3e2de9927d8943a3e3c145ab189508 Mon Sep 17 00:00:00 2001 > > From: Kyotaro Horiguchi <horikyota.ntt@gmail.com> > > Date: Tue, 19 Jul 2022 11:50:29 +0900 > > Subject: [PATCH v6] Create correct snapshot during CREATE_REPLICATION_SLOT > > This sees a tad misleading - the previous snapshot wasn't borken, right? I saw it kind of broken that ->xip contains sub transactions. But I didn't meant it's broken by "correct". Is "proper" suitable there? > > +/* > > + * ReorderBufferXidIsKnownSubXact > > + * Returns true if the xid is a known subtransaction. > > + */ > > +bool > > +ReorderBufferXidIsKnownSubXact(ReorderBuffer *rb, TransactionId xid) > > +{ > > + ReorderBufferTXN *txn; > > + > > + txn = ReorderBufferTXNByXid(rb, xid, false, > > + NULL, InvalidXLogRecPtr, false); > > + > > + /* a known subtxn? */ > > + if (txn && rbtxn_is_known_subxact(txn)) > > + return true; > > + > > + return false; > > +} > > The comments here just seem to restate the code.... Yeah, it is pulled from the existing code but result looks like so.. > It's not obvious to me that it's the right design (or even correct) to ask > reorderbuffer about an xid being a subxid. Maybe I'm missing something, but > why would reorderbuffer even be guaranteed to know about all these subxids? I think you're missing that the code is visited only after the reorder buffer's state becomes SNAPBUILD_CONSISTENT. I think rbtxn_is_known_subxact() is reliable at that stage. > > @@ -568,9 +571,17 @@ SnapBuildInitialSnapshot(SnapBuild *builder) > > > > MyProc->xmin = snap->xmin; > > > > - /* allocate in transaction context */ > > + /* > > + * Allocate in transaction context. > > + * > > + * We could use only subxip to store all xids (takenduringrecovery > > + * snapshot) but that causes useless visibility checks later so we hasle to > > + * create a normal snapshot. > > + */ > > I can't really parse this comment at this point, and I seriously doubt I could > later on. Mmm. The "takenduringrecovery" is relly perplexing (it has been somehow lower-cased..), but after removing the parenthesized part, it looks like this. And it had a misspelling but I removed that word. Is this still unreadable? We could use only subxip to store all xids but that causes useless visibility checks later so we create a normal snapshot. > > @@ -591,12 +605,24 @@ SnapBuildInitialSnapshot(SnapBuild *builder) > > > > if (test == NULL) > > { > > - if (newxcnt >= GetMaxSnapshotXidCount()) > > - ereport(ERROR, > > - (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE), > > - errmsg("initial slot snapshot too large"))); > > - > > - newxip[newxcnt++] = xid; > > + /* Store the xid to the appropriate xid array */ > > + if (ReorderBufferXidIsKnownSubXact(builder->reorder, xid)) > > + { > > + if (!overflowed) > > + { > > + if (newsubxcnt >= GetMaxSnapshotSubxidCount()) > > + overflowed = true; > > + else > > + newsubxip[newsubxcnt++] = xid; > > + } > > + } > > + else > > + { > > + if (newxcnt >= GetMaxSnapshotXidCount()) > > + elog(ERROR, > > + "too many transactions while creating snapshot"); > > + newxip[newxcnt++] = xid; > > + } > > } > > Hm, this is starting to be pretty deeply nested... Yeah, at least one if() is removable. > I wonder if a better fix here wouldn't be to allow importing a snapshot with a > larger ->xid array. Yes, we can't do that in CurrentSnapshotData, but IIRC we > need to be in a transactional snapshot anyway, which is copied anyway? The other reason for oversized xip array is it causes visibility check when it is used. AFAICS XidInMVCCSnapshot has additional path for takenDuringRecovery snapshots that contains a linear search (currently it is replaced by pg_lfind32()). This saves us from doing this for that snapshot. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
At Tue, 13 Sep 2022 12:08:18 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in > On Tue, Sep 13, 2022 at 11:52 AM Kyotaro Horiguchi > <horikyota.ntt@gmail.com> wrote: > > That function is called after the SnapBuild reaches > > SNAPBUILD_CONSISTENT state ,or SnapBuildInitialSnapshot() rejects > > other than that state. That is, IIUC the top-sub relationship of all > > the currently running transactions is fully known to reorder buffer. > > We need a comment about that. > > I don't think this assumption is true, any xid started after switching > to the SNAPBUILD_FULL_SNAPSHOT and before switching to the > SNAPBUILD_CONSISTENT, might still be in progress so we can not > identify whether they are subxact or not from reorder buffer. Yeah, I misunderstood that the relationship is recorded earlier (how?). Thus it is not reliable in the first place. I agree that the best way is oversized xip. By the way, I feel that "is >= than" is redundant or plain wrong.. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
At Tue, 13 Sep 2022 15:45:07 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in > At Mon, 12 Sep 2022 14:51:56 -0700, Andres Freund <andres@anarazel.de> wrote in > > This sees a tad misleading - the previous snapshot wasn't borken, right? > > I saw it kind of broken that ->xip contains sub transactions. But I > didn't meant it's broken by "correct". Is "proper" suitable there? No. It's not broken if it is takenDuringRecovery. So this flag can be used to notify that xip can be oversized. I realized that rbtxn_is_known_subxact is not reliable. I'm redirecting to oversized xip. Pleas wait for a while. regareds. -- Kyotaro Horiguchi NTT Open Source Software Center
At Tue, 13 Sep 2022 16:10:59 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in > At Tue, 13 Sep 2022 12:08:18 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in > > On Tue, Sep 13, 2022 at 11:52 AM Kyotaro Horiguchi > > <horikyota.ntt@gmail.com> wrote: > > > That function is called after the SnapBuild reaches > > > SNAPBUILD_CONSISTENT state ,or SnapBuildInitialSnapshot() rejects > > > other than that state. That is, IIUC the top-sub relationship of all > > > the currently running transactions is fully known to reorder buffer. > > > We need a comment about that. > > > > I don't think this assumption is true, any xid started after switching > > to the SNAPBUILD_FULL_SNAPSHOT and before switching to the > > SNAPBUILD_CONSISTENT, might still be in progress so we can not > > identify whether they are subxact or not from reorder buffer. > > Yeah, I misunderstood that the relationship is recorded earlier > (how?). Thus it is not reliable in the first place. > > I agree that the best way is oversized xip. > > > By the way, I feel that "is >= than" is redundant or plain wrong.. By the way GetSnapshotData() does this: > snapshot->subxip = (TransactionId *) > malloc(GetMaxSnapshotSubxidCount() * sizeof(TransactionId)); ... > if (!snapshot->takenDuringRecovery) ... > else > { > subcount = KnownAssignedXidsGetAndSetXmin(snapshot->subxip, &xmin, > xmax); It is possible that the subxip is overrun. We need to expand the array somehow. Or assign the array of the size (GetMaxSnapshotXidCount() + GetMaxSnapshotSubxidCount()) for takenDuringRecovery snapshots. (I feel deja vu..) regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Sigh.. At Tue, 13 Sep 2022 16:30:06 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in > It is possible that the subxip is overrun. We need to expand the array > somehow. Or assign the array of the size (GetMaxSnapshotXidCount() + > GetMaxSnapshotSubxidCount()) for takenDuringRecovery snapshots. And I found that this is already done. What we should is doing the same thing in snapbuild. Sorry for the noise.. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
At Tue, 13 Sep 2022 16:15:34 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in > At Tue, 13 Sep 2022 15:45:07 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in > > At Mon, 12 Sep 2022 14:51:56 -0700, Andres Freund <andres@anarazel.de> wrote in > > > This sees a tad misleading - the previous snapshot wasn't borken, right? > > > > I saw it kind of broken that ->xip contains sub transactions. But I > > didn't meant it's broken by "correct". Is "proper" suitable there? > > No. It's not broken if it is takenDuringRecovery. So this flag can be > used to notify that xip can be oversized. > > I realized that rbtxn_is_known_subxact is not reliable. I'm > redirecting to oversized xip. Pleas wait for a while. However, the reader of saved snapshots (ImportSnapshot) has the restriction that > if (xcnt < 0 || xcnt > GetMaxSnapshotXidCount()) > ereport(ERROR, and > if (xcnt < 0 || xcnt > GetMaxSnapshotSubxidCount()) > ereport(ERROR, (this xid is subxcnt) And ExportSnapshot repalces oversized subxip with overflowed. So even when GetSnapshotData() returns a snapshot with oversized subxip, it is saved as just "overflowed" on exporting. I don't think this is the expected behavior since such (no xip and overflowed) snapshot no longer works. On the other hand, it seems to me that snapbuild doesn't like takenDuringRecovery snapshots. So snapshot needs additional flag signals that xip is oversized and all xid are holded there. And also need to let takenDuringRecovery suggest subxip is oversized. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Tue, Sep 13, 2022 at 05:31:05PM +0900, Kyotaro Horiguchi wrote: > And ExportSnapshot repalces oversized subxip with overflowed. > > So even when GetSnapshotData() returns a snapshot with oversized > subxip, it is saved as just "overflowed" on exporting. I don't think > this is the expected behavior since such (no xip and overflowed) > snapshot no longer works. > > On the other hand, it seems to me that snapbuild doesn't like > takenDuringRecovery snapshots. > > So snapshot needs additional flag signals that xip is oversized and > all xid are holded there. And also need to let takenDuringRecovery > suggest subxip is oversized. The discussion seems to have stopped here. As this is classified as a bug fix, I have moved this patch to next CF, waiting on author for the moment. -- Michael
Attachment
Hi, On 2022-10-12 14:10:15 +0900, Michael Paquier wrote: > The discussion seems to have stopped here. As this is classified as a > bug fix, I have moved this patch to next CF, waiting on author for the > moment. FWIW, I view this more as lifting a limitation. I wouldn't want to backpatch the change. Greetings, Andres Freund
Re: Error "initial slot snapshot too large" in create replication slot
From
"Gregory Stark (as CFM)"
Date:
On Wed, 12 Oct 2022 at 01:10, Michael Paquier <michael@paquier.xyz> wrote: > > On Tue, Sep 13, 2022 at 05:31:05PM +0900, Kyotaro Horiguchi wrote: > > And ExportSnapshot repalces oversized subxip with overflowed. > > > > So even when GetSnapshotData() returns a snapshot with oversized > > subxip, it is saved as just "overflowed" on exporting. I don't think > > this is the expected behavior since such (no xip and overflowed) > > snapshot no longer works. > > > > On the other hand, it seems to me that snapbuild doesn't like > > takenDuringRecovery snapshots. > > > > So snapshot needs additional flag signals that xip is oversized and > > all xid are holded there. And also need to let takenDuringRecovery > > suggest subxip is oversized. > > The discussion seems to have stopped here. As this is classified as a > bug fix, I have moved this patch to next CF, waiting on author for the > moment. Kyotoro Horiguchi, any chance you'll be able to work on this for this commitfest? If so shout (or anyone else is planning to push it over the line.... Andres?) otherwise I'll move it on to the next release. -- Gregory Stark As Commitfest Manager
At Mon, 20 Mar 2023 13:46:51 -0400, "Gregory Stark (as CFM)" <stark.cfm@gmail.com> wrote in > On Wed, 12 Oct 2022 at 01:10, Michael Paquier <michael@paquier.xyz> wrote: > > The discussion seems to have stopped here. As this is classified as a > > bug fix, I have moved this patch to next CF, waiting on author for the > > moment. > > Kyotoro Horiguchi, any chance you'll be able to work on this for this > commitfest? If so shout (or anyone else is planning to push it over > the line.... Andres?) otherwise I'll move it on to the next release. Ugg. sorry for being lazy. I have lost track of the conversation. I'm currently working on this and will come back soon with a new version. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
At Wed, 22 Mar 2023 14:27:40 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in > At Mon, 20 Mar 2023 13:46:51 -0400, "Gregory Stark (as CFM)" <stark.cfm@gmail.com> wrote in > > Kyotoro Horiguchi, any chance you'll be able to work on this for this > > commitfest? If so shout (or anyone else is planning to push it over > > the line.... Andres?) otherwise I'll move it on to the next release. > > Ugg. sorry for being lazy. I have lost track of the conversation. I'm > currently working on this and will come back soon with a new version. I relized that attempting to make SnapshotData.xip expansible was making procarray.c and snapmgr.c too complicated. The main reason is that SnapShotData is allocated in various ways, like on the stack, using palloc including xip/subxip arrays, with palloc then allocating xip/subxip arrays separately, or statically allocated and then having xip/subxip arrays malloc'ed later. This variety was making the expansion logic a mess. So I went back to square one and decided to use subxip as an extension for the xip array instead. Like the comment added in the function SnapBuildInitialSnapshot mentions, I don't think we can reliably identify top-level XIDs. So, this patch just increases the allowed number of XIDs by using the subxip array. (The title of the patch was changed accordingly.) regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Attachment
On Thu, Mar 23, 2023 at 10:53 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > At Wed, 22 Mar 2023 14:27:40 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in > > At Mon, 20 Mar 2023 13:46:51 -0400, "Gregory Stark (as CFM)" <stark.cfm@gmail.com> wrote in > > > Kyotoro Horiguchi, any chance you'll be able to work on this for this > > > commitfest? If so shout (or anyone else is planning to push it over > > > the line.... Andres?) otherwise I'll move it on to the next release. > > > > Ugg. sorry for being lazy. I have lost track of the conversation. I'm > > currently working on this and will come back soon with a new version. > > I relized that attempting to make SnapshotData.xip expansible was > making procarray.c and snapmgr.c too complicated. The main reason is > that SnapShotData is allocated in various ways, like on the stack, > using palloc including xip/subxip arrays, with palloc then allocating > xip/subxip arrays separately, or statically allocated and then having > xip/subxip arrays malloc'ed later. This variety was making the > expansion logic a mess. > > So I went back to square one and decided to use subxip as an extension > for the xip array instead. > > Like the comment added in the function SnapBuildInitialSnapshot > mentions, I don't think we can reliably identify top-level XIDs. So, > this patch just increases the allowed number of XIDs by using the > subxip array. Thanks for working on this, your idea looks fine but my only worry is that in the future if someone tries to change the logic of XidInMVCCSnapshot() then they must be aware that the snap->xip array and snap->subxip array no long distinguishes the top xids and subxids. I agree with the current logic if we are not marking sub-overflow then there is no issue, so can we document this in the SnapshotData structure? Also, there are some typos in the patch /idetify/identify /carete/create /Aallocate/Allocate -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Thanks for looking this! At Thu, 23 Mar 2023 14:15:12 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in > On Thu, Mar 23, 2023 at 10:53 AM Kyotaro Horiguchi > <horikyota.ntt@gmail.com> wrote: > Thanks for working on this, your idea looks fine but my only worry is > that in the future if someone tries to change the logic of > XidInMVCCSnapshot() then they must be aware that the snap->xip array > and snap->subxip array no long distinguishes the top xids and subxids. Yeah, I had the same thought when I was working on the posted version. > I agree with the current logic if we are not marking sub-overflow then > there is no issue, so can we document this in the SnapshotData > structure? (I found that it was alrady mentioned...) In a unpublished version (what I referred to as "a mess"), I added a flag called "topsub_mixed" to SnapshotData, indicating that XIDs of top and sub transactions are stored in xip and subxip arrays in a mixed manner. However, I eventually removed it since it could only be used for sanity checks related to suboverflowed. I inserted the following sentense in the middle of the comments for xip and subxip. > In the case of !suboverflowed, there's a situation where this > contains both top and sub-transaction IDs in a mixed manner. And added similar a similar sentense to a comment of XidInMVCCSnapshot. While doning this, I realized that we could simplify and optimize XID search code by combining the two XID arrays. If !suboverflowed, the array stored all active XIDs of both top and sub-transactions. Otherwise it only stores active top XIDs and maybe XIDs of some sub-transactions. If many subXIDs are stored when overflowed, there might lead to some degradation but I think the win we gain from searching just one XID array in most cases outweighs that. (I didn't do this (of course) in this version.) > Also, there are some typos in the patch > /idetify/identify > /carete/create > /Aallocate/Allocate Oops! Thanks for pointing out them. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Attachment
On Thu, Mar 23, 2023 at 11:02 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > [ new patch ] Well, I guess nobody is too excited about fixing this, because it's been another 10 months with no discussion. Andres doesn't even seem to think this is as much a bug as it is a limitation, for all that it's filed in the CF application under bug fixes. I kind of wonder if we should just close this entry in the CF, but I'll hold off on that for now. /* * For normal MVCC snapshot this contains the all xact IDs that are in * progress, unless the snapshot was taken during recovery in which case - * it's empty. For historic MVCC snapshots, the meaning is inverted, i.e. - * it contains *committed* transactions between xmin and xmax. + * it's empty. In the case of !suboverflowed, there's a situation where + * this contains both top and sub-transaction IDs in a mixed manner. For + * historic MVCC snapshots, the meaning is inverted, i.e. it contains + * *committed* transactions between xmin and xmax. * * note: all ids in xip[] satisfy xmin <= xip[i] < xmax */ I have to say that I don't like this at all. It's bad enough that we already use the xip/subxip arrays in two different ways depending on the situation. Increasing that to three different ways seems painful. How is anyone supposed to keep track of how the array is being used at which point in the code? If we are going to do that, I suspect it needs comment updates in more places than what the patch does currently. For instance: + if (newxcnt < xiplen) + newxip[newxcnt++] = xid; + else + newsubxip[newsubxcnt++] = xid; Just imagine coming across this code in 5 or 10 years and finding that it had no comment explaining anything. Yikes! Aside from the details of the patch, and perhaps more seriously, I'm not really clear that we have consensus on an approach. A few different proposals seem to have been floated, and it doesn't seem like anybody hates anybody else's idea completely, but it doesn't really seem like everyone agrees on what to do, either. -- Robert Haas EDB: http://www.enterprisedb.com
On Sat, 6 Jan 2024 at 01:47, Robert Haas <robertmhaas@gmail.com> wrote: > > On Thu, Mar 23, 2023 at 11:02 PM Kyotaro Horiguchi > <horikyota.ntt@gmail.com> wrote: > > [ new patch ] > > Well, I guess nobody is too excited about fixing this, because it's > been another 10 months with no discussion. Andres doesn't even seem to > think this is as much a bug as it is a limitation, for all that it's > filed in the CF application under bug fixes. I kind of wonder if we > should just close this entry in the CF, but I'll hold off on that for > now. I have changed the status of the patch to "Waiting on Author" as we don't have a concrete patch with an accepted design which is in a reviewable shape. We can think if we want to pursue this patch further or probably close this in the current commitfest and start it again when someone wants to work on this more actively. Regards, Vignesh
Thank you for the comments. This will help move the discussion forward. At Fri, 5 Jan 2024 15:17:11 -0500, Robert Haas <robertmhaas@gmail.com> wrote in > On Thu, Mar 23, 2023 at 11:02 PM Kyotaro Horiguchi > <horikyota.ntt@gmail.com> wrote: > > [ new patch ] > > Well, I guess nobody is too excited about fixing this, because it's > been another 10 months with no discussion. Andres doesn't even seem to > think this is as much a bug as it is a limitation, for all that it's > filed in the CF application under bug fixes. I kind of wonder if we > should just close this entry in the CF, but I'll hold off on that for > now. Perhaps you are correct. Ultimately, this issue is largely theoretical, and I don't believe anyone would be inconvenienced by imposing this contraint. > * note: all ids in xip[] satisfy xmin <= xip[i] < xmax > */ > > I have to say that I don't like this at all. It's bad enough that we > already use the xip/subxip arrays in two different ways depending on > the situation. Increasing that to three different ways seems painful. > How is anyone supposed to keep track of how the array is being used at > which point in the code? I understand. So, summirizing the current state briefly, I believe it as follows: a. snapbuild lacks a means to differentiate between top and sub xids during snapshot building. b. Abusing takenDuringRecovery could lead to potential issues, so it has been rejected. c. Dynamic sizing of xip is likely to have a significant impact on performance (as mentioned in the comments of GetSnapshotData). d. (new!) Adding a third storing method is not favored. As a method to satisfy these prerequisites, I think one direction could be to split takenDuringRecovery into flags indicating the storage method and creation timing. I present this as a last-ditch effort. It's a rough proposal, and further validation should be necessary. If this direction is also not acceptable, I'll proceed with removing the CF entry. > If we are going to do that, I suspect it needs comment updates in more > places than what the patch does currently. For instance: > > + if (newxcnt < xiplen) > + newxip[newxcnt++] = xid; > + else > + newsubxip[newsubxcnt++] = xid; > > Just imagine coming across this code in 5 or 10 years and finding that > it had no comment explaining anything. Yikes! ^^; > Aside from the details of the patch, and perhaps more seriously, I'm > not really clear that we have consensus on an approach. A few > different proposals seem to have been floated, and it doesn't seem > like anybody hates anybody else's idea completely, but it doesn't > really seem like everyone agrees on what to do, either. I don't fully agree with that.It's not so much that I dislike other proposals, but rather that we haven't been able to find a definitive solution that stands out. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Attachment
On Thu, Jan 11, 2024 at 9:47 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > I understand. So, summirizing the current state briefly, I believe it > as follows: > > a. snapbuild lacks a means to differentiate between top and sub xids > during snapshot building. > > b. Abusing takenDuringRecovery could lead to potential issues, so it > has been rejected. > > c. Dynamic sizing of xip is likely to have a significant impact on > performance (as mentioned in the comments of GetSnapshotData). > > d. (new!) Adding a third storing method is not favored. > > As a method to satisfy these prerequisites, I think one direction > could be to split takenDuringRecovery into flags indicating the > storage method and creation timing. I present this as a last-ditch > effort. It's a rough proposal, and further validation should be > necessary. If this direction is also not acceptable, I'll proceed with > removing the CF entry. I think that the idea of evolving takenDuringRecovery could potentially work for this problem and maybe for some other things as well. I remember studying that flag before and coming to the conclusion that it had some pretty specific and surprising semantics that weren't obvious from the name -- I don't remember the details -- and so I think improving it could be useful. Also, I think that multiple storage methods could be more palatable if there were a clear way to distinguish which one was in use and good comments explaining the distinction in relevant places. However, I wonder whether this whole area is in need of a bigger rethink. There seem to be a number of situations in which the split into xip and subxip arrays is not very convenient, and also some situations where it's quite important. Sometimes we want to record what's committed, and sometimes what isn't. It's all a bit messy and inconsistent. The necessity of limiting snapshot size is annoying, too. I have no real idea what can be done about all of this, but what strikes me is that the current system has grown up incrementally: we started with a data structure designed for the original use case, and now by gradually adding new use cases things have gotten complicated. If we were designing things over from scratch, maybe we'd do it differently and end up with something less messy. And maybe someone can imagine a redesign that is likewise less messy. But on the other hand, maybe not. Perhaps we can't really do any better than what we have. Then the question becomes whether this case is important enough to justify additional code complexity. I don't think I've personally seen users run into this problem so I have no special reason to think that it's important, but if it's causing issues for other people then maybe it is. -- Robert Haas EDB: http://www.enterprisedb.com
At Fri, 12 Jan 2024 11:28:09 -0500, Robert Haas <robertmhaas@gmail.com> wrote in > However, I wonder whether this whole area is in need of a bigger > rethink. There seem to be a number of situations in which the split > into xip and subxip arrays is not very convenient, and also some > situations where it's quite important. Sometimes we want to record > what's committed, and sometimes what isn't. It's all a bit messy and > inconsistent. The necessity of limiting snapshot size is annoying, > too. I have no real idea what can be done about all of this, but what > strikes me is that the current system has grown up incrementally: we > started with a data structure designed for the original use case, and > now by gradually adding new use cases things have gotten complicated. > If we were designing things over from scratch, maybe we'd do it > differently and end up with something less messy. And maybe someone > can imagine a redesign that is likewise less messy. > > But on the other hand, maybe not. Perhaps we can't really do any > better than what we have. Then the question becomes whether this case > is important enough to justify additional code complexity. I don't > think I've personally seen users run into this problem so I have no > special reason to think that it's important, but if it's causing > issues for other people then maybe it is. Thank you for the deep insights. I have understood your points. As I can't think of any further simple modifications on this line, I will withdraw this CF entry. At the moment, I also lack a fundamental, comprehensive solution, but should if I or anyone else come up with such a solution in the future, I believe it would worth a separate discussion. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Mon, Jan 15, 2024 at 9:18 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > Thank you for the deep insights. I have understood your points. As I > can't think of any further simple modifications on this line, I will > withdraw this CF entry. At the moment, I also lack a fundamental, > comprehensive solution, but should if I or anyone else come up with > such a solution in the future, I believe it would worth a separate > discussion. I completely agree. -- Robert Haas EDB: http://www.enterprisedb.com