Thread: lastOverflowedXid does not handle transaction ID wraparound
In a blog post (https://about.gitlab.com/blog/2021/09/29/why-we-spent-the-last-month-eliminating-postgresql-subtransactions/), I described how PostgreSQL can enter into a suboverflow condition on the replica under a number of conditions: 1. A long transaction starts. 2. A single SAVEPOINT is issued. 3. Many rows are updated on the primary, and the same rows are read from the replica. This can cause a significant performance degradation with a replica due to SubtransSLRU wait events since the replica needs to perform a parent lookup on an ever-growing range of XIDs. Full details on how to replicate this: https://gitlab.com/-/snippets/2187338. The main two lines of code that cause the replica to enter in the suboverflowed state are here (https://github.com/postgres/postgres/blob/317632f3073fc06047a42075eb5e28a9577a4f96/src/backend/storage/ipc/procarray.c#L2431-L2432): if (TransactionIdPrecedesOrEquals(xmin, procArray->lastOverflowedXid)) suboverflowed = true; I noticed that lastOverflowedXid doesn't get cleared even after all subtransactions have been completed. On a replica, it only seems to be updated via a XLOG_XACT_ASSIGNMENT, but no such message will be sent if subtransactions halt. If the XID wraps around again and a long transaction starts before lastOverflowedXid, the replica might unnecessarily enter in the suboverflow condition again. I've validated this by issuing a SAVEPOINT, running the read/write test, logging lastOverflowedXid to stderr, and then using pg_bench to advance XID with SELECT txid_current(). After many hours, I validated that lastOverflowedXid remained the same, and I could induce a high degree of SubtransSLRU wait events without issuing a new SAVEPOINT. I'm wondering a few things: 1. Should lastOverflowedXid be reset to 0 at some point? I'm not sure if there's a good way at the moment for the replica to know that all subtransactions have completed. 2. Alternatively, should the epoch number be used to compare xmin and lastOverflowedXid? To mitigate this issue, we've considered: 1. Restarting the replicas. This isn't great, and if another SAVEPOINT comes along, we'd have to do this again. It would be nice to be able to monitor the exact value of lastOverflowedXid. 2. Raise the NUM_SUBTRANS_BUFFERS as a workaround until the scalable SLRU patches are available (https://commitfest.postgresql.org/34/2627/). 3. Issue SAVEPOINTs periodically to "run away" from this wraparound issue.
> On Tue, Oct 12, 2021 at 09:53:22PM -0700, Stan Hu wrote: > > I described how PostgreSQL can enter into a suboverflow condition on > the replica under a number of conditions: > > 1. A long transaction starts. > 2. A single SAVEPOINT is issued. > 3. Many rows are updated on the primary, and the same rows are read > from the replica. > > I noticed that lastOverflowedXid doesn't get cleared even after all > subtransactions have been completed. On a replica, it only seems to be > updated via a XLOG_XACT_ASSIGNMENT, but no such message will be sent > if subtransactions halt. If the XID wraps around again and a long > transaction starts before lastOverflowedXid, the replica might > unnecessarily enter in the suboverflow condition again. Hi, that's an interesting finding, thanks for the investigation. I didn't reproduce it fully (haven't checked the wraparound part), but indeed lastOverflowedXid is not changing that often, only every PGPROC_MAX_CACHED_SUBXIDS subtransactions. I wonder what would be side effects of clearing it when the snapshot is not suboverfloved anymore?
> 17 окт. 2021 г., в 21:55, Dmitry Dolgov <9erthalion6@gmail.com> написал(а): > I wonder what would be side > effects of clearing it when the snapshot is not suboverfloved anymore? I think we should just invalidate lastOverflowedXid on every XLOG_RUNNING_XACTS if subxid_overflow == false. I can't finda reason not to do so. Best regards, Andrey Borodin.
> On Wed, Oct 20, 2021 at 04:00:35PM +0500, Andrey Borodin wrote: > > 17 окт. 2021 г., в 21:55, Dmitry Dolgov <9erthalion6@gmail.com> написал(а): > > I wonder what would be side > > effects of clearing it when the snapshot is not suboverfloved anymore? > > I think we should just invalidate lastOverflowedXid on every XLOG_RUNNING_XACTS if subxid_overflow == false. I can't finda reason not to do so. From what I understand that was actually the case, lastOverflowedXid was set to InvalidTransactionId in ProcArrayApplyRecoveryInfo if subxid_overflow wasn't set. Looks like 10b7c686e52a6d1bb has changed it, to what I didn't pay attention originally.
On Wed, Oct 20, 2021 at 4:00 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
> 17 окт. 2021 г., в 21:55, Dmitry Dolgov <9erthalion6@gmail.com> написал(а):
> I wonder what would be side
> effects of clearing it when the snapshot is not suboverfloved anymore?
I think we should just invalidate lastOverflowedXid on every XLOG_RUNNING_XACTS if subxid_overflow == false. I can't find a reason not to do so.
On a replica, I think it's possible for lastOverflowedXid to be set even if subxid_overflow is false on the primary and secondary (https://github.com/postgres/postgres/blob/dc899146dbf0e1d23fb24155a5155826ddce34c9/src/backend/storage/ipc/procarray.c#L1327). I thought subxid_overflow only gets set if there are more than PGPROC_MAX_CACHED_SUBXIDS (64) used in a given transaction.
Should the replica be invalidating lastOverflowedXid if subxcnt goes to zero in XLOG_RUNNING_XACTS? But if there's an outstanding snapshot with an xmin that precedes lastOverflowedXid we might violate MVCC if we invalidate this, so I wonder if we also need to check the snapshot with the lowest xmin?
At Wed, 20 Oct 2021 13:48:33 +0200, Dmitry Dolgov <9erthalion6@gmail.com> wrote in > > On Wed, Oct 20, 2021 at 04:00:35PM +0500, Andrey Borodin wrote: > > > 17 окт. 2021 г., в 21:55, Dmitry Dolgov <9erthalion6@gmail.com> написал(а): > > > I wonder what would be side > > > effects of clearing it when the snapshot is not suboverfloved anymore? > > > > I think we should just invalidate lastOverflowedXid on every XLOG_RUNNING_XACTS if subxid_overflow == false. I can'tfind a reason not to do so. > > From what I understand that was actually the case, lastOverflowedXid was > set to InvalidTransactionId in ProcArrayApplyRecoveryInfo if > subxid_overflow wasn't set. Looks like 10b7c686e52a6d1bb has changed it, > to what I didn't pay attention originally. Unfortunately(?), that doesn't happen once standbyState reaches STANDBY_SNAPSHOT_READY. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
At Wed, 20 Oct 2021 08:55:12 -0700, Stan Hu <stanhu@gmail.com> wrote in > On Wed, Oct 20, 2021 at 4:00 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote: > > > > > > > > 17 окт. 2021 г., в 21:55, Dmitry Dolgov <9erthalion6@gmail.com> > > написал(а): > > > I wonder what would be side > > > effects of clearing it when the snapshot is not suboverfloved anymore? > > > > I think we should just invalidate lastOverflowedXid on every > > XLOG_RUNNING_XACTS if subxid_overflow == false. I can't find a reason not > > to do so. > > > > > On a replica, I think it's possible for lastOverflowedXid to be set even if > subxid_overflow is false on the primary and secondary ( > https://github.com/postgres/postgres/blob/dc899146dbf0e1d23fb24155a5155826ddce34c9/src/backend/storage/ipc/procarray.c#L1327). > I thought subxid_overflow only gets set if there are more than > PGPROC_MAX_CACHED_SUBXIDS (64) used in a given transaction. > > Should the replica be invalidating lastOverflowedXid if subxcnt goes to > zero in XLOG_RUNNING_XACTS? But if there's an outstanding snapshot with an > xmin that precedes lastOverflowedXid we might violate MVCC if we invalidate > this, so I wonder if we also need to check the snapshot with the lowest > xmin? lastOverflowedXid is the smallest subxid that possibly exists but possiblly not known to the standby. So if all top-level transactions older than lastOverflowedXid end, that means that all the subtransactions in doubt are known to have been ended. XLOG_RUNNING_XACTS reports oldestRunningXid, which is the oldest running top-transaction. Standby expires xids in KnownAssignedXids array that precede to the oldestRunningXid. We are sure that all possiblly-overflown subtransactions are gone as well if the oldest xid is newer than the first overflowed subtransaction. As a cross check, the following existing code in GetSnapshotData means that no overflow is not happening if the smallest xid in the known assigned list is larger than lastOverflowedXid, which agrees to the consideration above. procaray.c:2428 > subcount = KnownAssignedXidsGetAndSetXmin(snapshot->subxip, &xmin, > xmax); > > if (TransactionIdPrecedesOrEquals(xmin, procArray->lastOverflowedXid)) > suboverflowed = true; If the discussion so far is correct, the following diff will fix the issue. diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c index bd3c7a47fe..19682b73ec 100644 --- a/src/backend/storage/ipc/procarray.c +++ b/src/backend/storage/ipc/procarray.c @@ -4463,6 +4463,12 @@ ExpireOldKnownAssignedTransactionIds(TransactionId xid) { LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE); KnownAssignedXidsRemovePreceding(xid); + /* + * reset lastOverflowedXid if we know transactions that have been possiblly + * running are being gone. + */ + if (TransactionIdPrecedes(procArray->lastOverflowedXid, xid)) + procArray->lastOverflowedXid = InvalidTransactionId; LWLockRelease(ProcArrayLock); } regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Wed, Oct 20, 2021 at 9:01 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > lastOverflowedXid is the smallest subxid that possibly exists but > possiblly not known to the standby. So if all top-level transactions > older than lastOverflowedXid end, that means that all the > subtransactions in doubt are known to have been ended. Thanks for the patch! I verified that it appears to reset lastOverflowedXid properly. I may not be understanding https://github.com/postgres/postgres/blob/dc899146dbf0e1d23fb24155a5155826ddce34c9/src/backend/storage/ipc/procarray.c#L1326-L1327 correctly, but isn't lastOverflowedXid the last subxid for a given top-level XID, so isn't it actually the largest subxid that possibly exists?
On Thu, Oct 21, 2021 at 07:21 Stan Hu <stanhu@gmail.com> wrote:
On Wed, Oct 20, 2021 at 9:01 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> lastOverflowedXid is the smallest subxid that possibly exists but
> possiblly not known to the standby. So if all top-level transactions
> older than lastOverflowedXid end, that means that all the
> subtransactions in doubt are known to have been ended.
Thanks for the patch! I verified that it appears to reset
lastOverflowedXid properly.
Is it right time to register the patch in the current commit fest, right? (How to do that?)
On a separate note, I think it would be really good to improve observability for SLRUs -- particularly for Subtrans SLRU and this overflow-related aspects. pg_stat_slru added in PG13 is really helpful, but not enough to troubleshoot, analyze and tune issues like this, and the patches related to SLRU. Basic ideas:
- expose to the user how many pages are currently used (especially useful if SLRU sizes will be configurable, see
- Andrew Borodin also expressed the idea to extend pageinspect to allow seeing the content of SLRUs
- a more specific thing: allow seeing lastOverflowedXid somehow (via SQL or in logs) - we see how important it for standbys health, but we cannot see it now.
Any ideas in the direction of observability?
Nik
On Mon, Oct 25, 2021 at 11:41 AM Nikolay Samokhvalov <samokhvalov@gmail.com> wrote:
On Thu, Oct 21, 2021 at 07:21 Stan Hu <stanhu@gmail.com> wrote:On Wed, Oct 20, 2021 at 9:01 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> lastOverflowedXid is the smallest subxid that possibly exists but
> possiblly not known to the standby. So if all top-level transactions
> older than lastOverflowedXid end, that means that all the
> subtransactions in doubt are known to have been ended.
Thanks for the patch! I verified that it appears to reset
lastOverflowedXid properly.
...
Any ideas in the direction of observability?
Perhaps, anything additional should be considered separately.
The behavior discussed here looks like a bug.
I also have tested the patch. It works fully as expected, details of testing – below.
I think this is a serious bug hitting heavily loaded Postgres setups with hot standbys
and propose fixing it in all supported major versions ASAP since the fix looks simple.
Any standby in heavily loaded systems (10k+ TPS) where subtransactions are used
may experience huge performance degradation on standbys [1]. This is what happened
recently with GitLab [2]. While a full solution to this problem is something more complex, probably
requiring changes in SLRU [3], the problem discussed here definitely feels like a serious bug
– if we fully get rid of subtransactions, since 32-bit lastOverflowedXid is not reset, in new
XID epoch standbys start experience SubtransControlLock/SubtransSLRU again –
without any subtransactions. This problem is extremely difficult to diagnose on one hand,
and it may fully make standbys irresponsible while a long-lasting transaction last on the primary
("long" here may be a matter of minutes or even dozens of seconds – it depends on the
TPS level). It is especially hard to diagnose in PG 12 or older – because it doesn't have
pg_stat_slru yet, so one cannot easily notice Subtrans reads.)
The only current solution to this problem is to restart standby Postgres.
How I tested the patch. First, I reproduced the problem:
- current 15devel Postgres, installed on 2 x c5ad.2xlarge on AWS (8 vCPUs, 16 GiB), working as
primary + standby
- follow the steps described in [3] to initiate SubtransSLRU on the standby
- at some point, stop using SAVEPOINTs on the primary - use regular UPDATEs instead, wait.
Using the following, observe procArray->lastOverflowedXid:
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index bd3c7a47fe21949ba63da26f0d692b2ee618f885..ccf3274344d7ba52a6f28a10b08dbfc310cf97e9 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -2428,6 +2428,9 @@ GetSnapshotData(Snapshot snapshot)
subcount = KnownAssignedXidsGetAndSetXmin(snapshot->subxip, &xmin,
xmax);
+ if (random() % 100000 == 0)
+ elog(WARNING, "procArray->lastOverflowedXid: %u", procArray->lastOverflowedXid);
+
if (TransactionIdPrecedesOrEquals(xmin, procArray->lastOverflowedXid))
suboverflowed = true;
}
index bd3c7a47fe21949ba63da26f0d692b2ee618f885..ccf3274344d7ba52a6f28a10b08dbfc310cf97e9 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -2428,6 +2428,9 @@ GetSnapshotData(Snapshot snapshot)
subcount = KnownAssignedXidsGetAndSetXmin(snapshot->subxip, &xmin,
xmax);
+ if (random() % 100000 == 0)
+ elog(WARNING, "procArray->lastOverflowedXid: %u", procArray->lastOverflowedXid);
+
if (TransactionIdPrecedesOrEquals(xmin, procArray->lastOverflowedXid))
suboverflowed = true;
}
Once we stop using SAVEPOINTs on the primary, the value procArray->lastOverflowedXid stop
changing, as expected.
Without the patch applied, lastOverflowedXid remains constant forever – till the server restart.
And as I mentioned, we start experiencing SubtransSLRU and pg_subtrans reads.
With the patch, lastOverflowedXid is reset to 0, as expected, shortly after an ongoing "long"
the transaction ends on the primary.
This solves the bug – we don't have SubtransSLRU on standby without actual use of subtransactions
on the primary.
The following review has been posted through the commitfest application: make installcheck-world: tested, failed Implements feature: tested, failed Spec compliant: not tested Documentation: not tested The fix is trivial and works as expected, solving the problem Tested, described details of the testing in the email thread.
On Mon, Nov 1, 2021 at 11:55 PM Nikolay Samokhvalov <nikolay@samokhvalov.com> wrote:
The following review has been posted through the commitfest application:
make installcheck-world: tested, failed
Please ignore this – I didn't understand the UI.
> 21 окт. 2021 г., в 09:01, Kyotaro Horiguchi <horikyota.ntt@gmail.com> написал(а): > > If the discussion so far is correct, the following diff will fix the > issue. > > diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c > index bd3c7a47fe..19682b73ec 100644 > --- a/src/backend/storage/ipc/procarray.c > +++ b/src/backend/storage/ipc/procarray.c > @@ -4463,6 +4463,12 @@ ExpireOldKnownAssignedTransactionIds(TransactionId xid) > { > LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE); > KnownAssignedXidsRemovePreceding(xid); > + /* > + * reset lastOverflowedXid if we know transactions that have been possiblly > + * running are being gone. > + */ > + if (TransactionIdPrecedes(procArray->lastOverflowedXid, xid)) > + procArray->lastOverflowedXid = InvalidTransactionId; > LWLockRelease(ProcArrayLock); > } The patch seems correct bugfix to me. The only question I have: is it right place from modularity standpoint? procArray->lastOverflowedXidis not a part of KnownAssignedTransactionIds? Best regards, Andrey Borodin.
( a.On Wed, Nov 3, 2021 at 11:44 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote: > > 21 окт. 2021 г., в 09:01, Kyotaro Horiguchi <horikyota.ntt@gmail.com> написал(а): > > > > If the discussion so far is correct, the following diff will fix the > > issue. > > > > diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c > > index bd3c7a47fe..19682b73ec 100644 > > --- a/src/backend/storage/ipc/procarray.c > > +++ b/src/backend/storage/ipc/procarray.c > > @@ -4463,6 +4463,12 @@ ExpireOldKnownAssignedTransactionIds(TransactionId xid) > > { > > LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE); > > KnownAssignedXidsRemovePreceding(xid); > > + /* > > + * reset lastOverflowedXid if we know transactions that have been possiblly > > + * running are being gone. > > + */ > > + if (TransactionIdPrecedes(procArray->lastOverflowedXid, xid)) > > + procArray->lastOverflowedXid = InvalidTransactionId; > > LWLockRelease(ProcArrayLock); > > } > > The patch seems correct bugfix to me. The only question I have: is it right place from modularity standpoint? procArray->lastOverflowedXidis not a part of KnownAssignedTransactionIds? It seems the right place because we take ProcArrayLock here. It would be undesirable to take it twice. We could give a better name for ExpireOldKnownAssignedTransactionIds() indicating that it could modify lastOverflowedXid as well. Any ideas? Should ExpireAllKnownAssignedTransactionIds() be also involved here? ------ Regards, Alexander Korotkov
> 3 нояб. 2021 г., в 14:08, Alexander Korotkov <aekorotkov@gmail.com> написал(а): > > ( a.On Wed, Nov 3, 2021 at 11:44 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote: >>> 21 окт. 2021 г., в 09:01, Kyotaro Horiguchi <horikyota.ntt@gmail.com> написал(а): >>> >>> If the discussion so far is correct, the following diff will fix the >>> issue. >>> >>> diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c >>> index bd3c7a47fe..19682b73ec 100644 >>> --- a/src/backend/storage/ipc/procarray.c >>> +++ b/src/backend/storage/ipc/procarray.c >>> @@ -4463,6 +4463,12 @@ ExpireOldKnownAssignedTransactionIds(TransactionId xid) >>> { >>> LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE); >>> KnownAssignedXidsRemovePreceding(xid); >>> + /* >>> + * reset lastOverflowedXid if we know transactions that have been possiblly >>> + * running are being gone. >>> + */ >>> + if (TransactionIdPrecedes(procArray->lastOverflowedXid, xid)) >>> + procArray->lastOverflowedXid = InvalidTransactionId; >>> LWLockRelease(ProcArrayLock); >>> } >> >> The patch seems correct bugfix to me. The only question I have: is it right place from modularity standpoint? procArray->lastOverflowedXidis not a part of KnownAssignedTransactionIds? > > It seems the right place because we take ProcArrayLock here. Oh.. I see. ProcArrayApplyRecoveryInfo() is taking ProcArrayLock in so many places. > It would > be undesirable to take it twice. We could give a better name for > ExpireOldKnownAssignedTransactionIds() indicating that it could modify > lastOverflowedXid as well. Any ideas? Looking more I think the name is OK. KnownAssignedXidsReset() and KnownAssignedXidsRemovePreceding() interferes with procArraya lot. > Should ExpireAllKnownAssignedTransactionIds() be also involved here? I think it's good for unification, but I do not see how procArray->lastOverflowedXid can be used after ExpireAllKnownAssignedTransactionIds(). Best regards, Andrey Borodin.
On Thu, 21 Oct 2021 at 05:01, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > At Wed, 20 Oct 2021 08:55:12 -0700, Stan Hu <stanhu@gmail.com> wrote in > > On Wed, Oct 20, 2021 at 4:00 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote: > > > > > > > > > > > > 17 окт. 2021 г., в 21:55, Dmitry Dolgov <9erthalion6@gmail.com> > > > написал(а): > > > > I wonder what would be side > > > > effects of clearing it when the snapshot is not suboverfloved anymore? > > > > > > I think we should just invalidate lastOverflowedXid on every > > > XLOG_RUNNING_XACTS if subxid_overflow == false. I can't find a reason not > > > to do so. I believe that to be an incorrect fix, but so very nearly correct. There is a documented race condition in the generation of a XLOG_RUNNING_XACTS that means there could be a new overflow event after the snapshot was taken but before it was logged. > > On a replica, I think it's possible for lastOverflowedXid to be set even if > > subxid_overflow is false on the primary and secondary ( > > https://github.com/postgres/postgres/blob/dc899146dbf0e1d23fb24155a5155826ddce34c9/src/backend/storage/ipc/procarray.c#L1327). > > I thought subxid_overflow only gets set if there are more than > > PGPROC_MAX_CACHED_SUBXIDS (64) used in a given transaction. > > > > Should the replica be invalidating lastOverflowedXid if subxcnt goes to > > zero in XLOG_RUNNING_XACTS? But if there's an outstanding snapshot with an > > xmin that precedes lastOverflowedXid we might violate MVCC if we invalidate > > this, so I wonder if we also need to check the snapshot with the lowest > > xmin? > > lastOverflowedXid is the smallest subxid that possibly exists but > possiblly not known to the standby. So if all top-level transactions > older than lastOverflowedXid end, that means that all the > subtransactions in doubt are known to have been ended. Agreed > XLOG_RUNNING_XACTS reports oldestRunningXid, which is the oldest > running top-transaction. Standby expires xids in KnownAssignedXids > array that precede to the oldestRunningXid. We are sure that all > possiblly-overflown subtransactions are gone as well if the oldest xid > is newer than the first overflowed subtransaction. Agreed > As a cross check, the following existing code in GetSnapshotData means > that no overflow is not happening if the smallest xid in the known > assigned list is larger than lastOverflowedXid, which agrees to the > consideration above. > > procaray.c:2428 > > subcount = KnownAssignedXidsGetAndSetXmin(snapshot->subxip, &xmin, > > xmax); > > > > if (TransactionIdPrecedesOrEquals(xmin, procArray->lastOverflowedXid)) > > suboverflowed = true; > > > If the discussion so far is correct, the following diff will fix the > issue. > > diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c > index bd3c7a47fe..19682b73ec 100644 > --- a/src/backend/storage/ipc/procarray.c > +++ b/src/backend/storage/ipc/procarray.c > @@ -4463,6 +4463,12 @@ ExpireOldKnownAssignedTransactionIds(TransactionId xid) > { > LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE); > KnownAssignedXidsRemovePreceding(xid); > + /* > + * reset lastOverflowedXid if we know transactions that have been possiblly > + * running are being gone. > + */ > + if (TransactionIdPrecedes(procArray->lastOverflowedXid, xid)) > + procArray->lastOverflowedXid = InvalidTransactionId; > LWLockRelease(ProcArrayLock); > } So I agree with this fix. It is however, an undocumented modularity violation. I think that is acceptable because of the ProcArrayLock traffic, but needs to have a comment to explain this at the call to ExpireOldKnownAssignedTransactionIds() i.e. " and potentially reset lastOverflowedXid", as well as a comment on the ExpireOldKnownAssignedTransactionIds() function. -- Simon Riggs http://www.EnterpriseDB.com/
Hi! On Wed, Nov 3, 2021 at 8:51 PM Simon Riggs <simon.riggs@enterprisedb.com> wrote: > It is however, an undocumented modularity violation. I think that is > acceptable because of the ProcArrayLock traffic, but needs to have a > comment to explain this at the call to > ExpireOldKnownAssignedTransactionIds() i.e. " and potentially reset > lastOverflowedXid", as well as a comment on the > ExpireOldKnownAssignedTransactionIds() function. Thank you for your feedback. Please find the revised patch attached. It incorporates this function comment changes altogether with minor editings and commit message. Let me know if you have further suggestions. I'm going to push this if no objections. ------ Regards, Alexander Korotkov
Attachment
Good catch on doing this in ExpireAllKnownAssignedTransactionIds() as well. Thanks. Looks good to me!
As Nikolay mentioned, I think this is an important bug that we are seeing in production and would appreciate a backport to v12 if possible.
On Wed, Nov 3, 2021 at 3:07 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
Hi!
On Wed, Nov 3, 2021 at 8:51 PM Simon Riggs <simon.riggs@enterprisedb.com> wrote:
> It is however, an undocumented modularity violation. I think that is
> acceptable because of the ProcArrayLock traffic, but needs to have a
> comment to explain this at the call to
> ExpireOldKnownAssignedTransactionIds() i.e. " and potentially reset
> lastOverflowedXid", as well as a comment on the
> ExpireOldKnownAssignedTransactionIds() function.
Thank you for your feedback. Please find the revised patch attached.
It incorporates this function comment changes altogether with minor
editings and commit message. Let me know if you have further
suggestions.
I'm going to push this if no objections.
------
Regards,
Alexander Korotkov
On Wed, 3 Nov 2021 at 22:07, Alexander Korotkov <aekorotkov@gmail.com> wrote: > > Hi! > > On Wed, Nov 3, 2021 at 8:51 PM Simon Riggs <simon.riggs@enterprisedb.com> wrote: > > It is however, an undocumented modularity violation. I think that is > > acceptable because of the ProcArrayLock traffic, but needs to have a > > comment to explain this at the call to > > ExpireOldKnownAssignedTransactionIds() i.e. " and potentially reset > > lastOverflowedXid", as well as a comment on the > > ExpireOldKnownAssignedTransactionIds() function. > > Thank you for your feedback. Please find the revised patch attached. > It incorporates this function comment changes altogether with minor > editings and commit message. Let me know if you have further > suggestions. > > I'm going to push this if no objections. Looks good, go for it. -- Simon Riggs http://www.EnterpriseDB.com/
At Thu, 4 Nov 2021 01:07:05 +0300, Alexander Korotkov <aekorotkov@gmail.com> wrote in > Hi! > > On Wed, Nov 3, 2021 at 8:51 PM Simon Riggs <simon.riggs@enterprisedb.com> wrote: > > It is however, an undocumented modularity violation. I think that is > > acceptable because of the ProcArrayLock traffic, but needs to have a > > comment to explain this at the call to > > ExpireOldKnownAssignedTransactionIds() i.e. " and potentially reset > > lastOverflowedXid", as well as a comment on the > > ExpireOldKnownAssignedTransactionIds() function. > > Thank you for your feedback. Please find the revised patch attached. > It incorporates this function comment changes altogether with minor > editings and commit message. Let me know if you have further > suggestions. > > I'm going to push this if no objections. Thanks for taking a look on and refining this, Simon and Alex! (while I was sick in bed X:) It looks good to me except the commit Author doesn't contain the name of Alexander Korotkov? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Hi! On Fri, Nov 5, 2021 at 10:31 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > At Thu, 4 Nov 2021 01:07:05 +0300, Alexander Korotkov <aekorotkov@gmail.com> wrote in > > On Wed, Nov 3, 2021 at 8:51 PM Simon Riggs <simon.riggs@enterprisedb.com> wrote: > > > It is however, an undocumented modularity violation. I think that is > > > acceptable because of the ProcArrayLock traffic, but needs to have a > > > comment to explain this at the call to > > > ExpireOldKnownAssignedTransactionIds() i.e. " and potentially reset > > > lastOverflowedXid", as well as a comment on the > > > ExpireOldKnownAssignedTransactionIds() function. > > > > Thank you for your feedback. Please find the revised patch attached. > > It incorporates this function comment changes altogether with minor > > editings and commit message. Let me know if you have further > > suggestions. > > > > I'm going to push this if no objections. > > Thanks for taking a look on and refining this, Simon and Alex! (while > I was sick in bed X:) > > It looks good to me except the commit Author doesn't contain the name > of Alexander Korotkov? Thank you for the suggestion. And thanks to everybody for the feedback. Pushed! ------ Regards, Alexander Korotkov
At Sat, 6 Nov 2021 19:16:09 +0300, Alexander Korotkov <aekorotkov@gmail.com> wrote in > Pushed! Thanks! -- Kyotaro Horiguchi NTT Open Source Software Center