Thread: Race condition in TransactionIdIsInProgress
Hi hackers, Postgres first records state transaction in CLOG, then removes transaction from procarray and finally releases locks. But it can happen that transaction is marked as committed in CLOG, XMAX_COMMITTED bit is set in modified tuple but TransactionIdIsInProgress still returns true. As a result "t_xmin %u is uncommitted in tuple (%u,%u) to be updated in table" error is reported. TransactionIdIsInProgress first checks for cached XID status: /* * We may have just checked the status of this transaction, so if it is * already known to be completed, we can fall out without any access to * shared memory. */ if (TransactionIdIsKnownCompleted(xid)) { xc_by_known_xact_inc(); return false; } This status cache is updated by TransactionLogFetch. So once transaction status is fetched from CLOG, subsequent invocation of TransactionIdIsKnownCompleted will return true even if transaction is not removed from procarray. The following stack illustrates how it can happen: #6 0x000055ab7758bbaf in TransactionLogFetch (transactionId=4482) at /home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/transam/transam.c:88 #7 TransactionLogFetch (transactionId=4482) at /home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/transam/transam.c:52 #8 0x000055ab7758bc7d in TransactionIdDidAbort (transactionId=transactionId@entry=4482) at /home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/transam/transam.c:186 #9 0x000055ab77812a43 in TransactionIdIsInProgress (xid=4482) at /home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/storage/ipc/procarray.c:1587 #10 0x000055ab7753de79 in HeapTupleSatisfiesVacuumHorizon (htup=htup@entry=0x7ffe562cec50, buffer=buffer@entry=1817, dead_after=dead_after@entry=0x7ffe562ceb14) at /home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/heap/heapam_visibility.c:1281 #11 0x000055ab7753e2c5 in HeapTupleSatisfiesNonVacuumable (buffer=1817, snapshot=0x7ffe562cec70, htup=0x7ffe562cec50) at /home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/heap/heapam_visibility.c:1449 #12 HeapTupleSatisfiesVisibility (tup=tup@entry=0x7ffe562cec50, snapshot=snapshot@entry=0x7ffe562cec70, buffer=buffer@entry=1817) at /home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/heap/heapam_visibility.c:1804 #13 0x000055ab7752f3b5 in heap_hot_search_buffer (tid=tid@entry=0x7ffe562cec2a, relation=relation@entry=0x7fa57fbf17c8, buffer=buffer@entry=1817, snapshot=snapshot@entry=0x7ffe562cec70, heapTuple=heapTuple@entry=0x7ffe562cec50, all_dead=all_dead@entry=0x0, first_call=true) at /home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/heap/heapam.c:1802 #14 0x000055ab775370d8 in heap_index_delete_tuples (rel=0x7fa57fbf17c8, delstate=<optimized out>) at /home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/heap/heapam.c:7480 #15 0x000055ab7755767a in table_index_delete_tuples (delstate=0x7ffe562d0160, rel=0x2) at /home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/include/access/tableam.h:1327 #16 _bt_delitems_delete_check (rel=rel@entry=0x7fa57fbf4518, buf=buf@entry=1916, heapRel=heapRel@entry=0x7fa57fbf17c8, delstate=delstate@entry=0x7ffe562d0160) at /home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/nbtree/nbtpage.c:1541 #17 0x000055ab7754e475 in _bt_bottomupdel_pass (rel=rel@entry=0x7fa57fbf4518, buf=buf@entry=1916, heapRel=heapRel@entry=0x7fa57fbf17c8, newitemsz=20) at /home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/nbtree/nbtdedup.c:406 #18 0x000055ab7755010a in _bt_delete_or_dedup_one_page (rel=0x7fa57fbf4518, heapRel=0x7fa57fbf17c8, insertstate=0x7ffe562d0640, --Type <RET> for more, q to quit, c to continue without paging-- simpleonly=<optimized out>, checkingunique=<optimized out>, uniquedup=<optimized out>, indexUnchanged=true) at /home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/nbtree/nbtinsert.c:2763 #19 0x000055ab775548c3 in _bt_findinsertloc (heapRel=0x7fa57fbf17c8, stack=0x55ab7a1cde08, indexUnchanged=true, checkingunique=true, insertstate=0x7ffe562d0640, rel=0x7fa57fbf4518) at /home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/nbtree/nbtinsert.c:904 #20 _bt_doinsert (rel=rel@entry=0x7fa57fbf4518, itup=itup@entry=0x55ab7a1cde30, checkUnique=checkUnique@entry=UNIQUE_CHECK_YES, indexUnchanged=indexUnchanged@entry=true, heapRel=heapRel@entry=0x7fa57fbf17c8) at /home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/nbtree/nbtinsert.c:255 #21 0x000055ab7755a931 in btinsert (rel=0x7fa57fbf4518, values=<optimized out>, isnull=<optimized out>, ht_ctid=0x55ab7a1cd068, heapRel=0x7fa57fbf17c8, checkUnique=UNIQUE_CHECK_YES, indexUnchanged=true, indexInfo=0x55ab7a1cdbd0) at /home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/nbtree/nbtree.c:199 #22 0x000055ab7769ae59 in ExecInsertIndexTuples (resultRelInfo=resultRelInfo@entry=0x55ab7a1c88b8, slot=slot@entry=0x55ab7a1cd038, estate=estate@entry=0x55ab7a1c8430, update=update@entry=true, noDupErr=noDupErr@entry=false, specConflict=specConflict@entry=0x0, arbiterIndexes=0x0) at /home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/executor/execIndexing.c:415 heap_lock_tuple will not wait transaction completion if HEAP_XMAX_COMMITTED is set in tuple and returns TM_Updated: /* * Time to sleep on the other transaction/multixact, if necessary. * * If the other transaction is an update/delete that's already * committed, then sleeping cannot possibly do any good: if we're * required to sleep, get out to raise an error instead. ... The problem is not so easy to reproduce. I got it when performe commits of transaction with large number of subtransactions with several synchronous replicas. But it is possible to artificially introduce delay between RecordTransactionCommit() and ProcArrayEndTransaction(): diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c index ca6f6d57d3..d095fd13b0 100644 --- a/src/backend/access/transam/xact.c +++ b/src/backend/access/transam/xact.c @@ -2187,6 +2187,7 @@ CommitTransaction(void) * durably commit. */ latestXid = RecordTransactionCommit(); + pg_usleep(random() % 100000); } else { Then the problem can be reproduced with the following pgbebch script: update.sql: \set aid random(1, 1000) select do_update(:aid,100) update function: create or replace function do_update(id integer, level integer) returns void as $$ begin begin if level > 0 then perform do_update(id, level-1); else update pgbench_accounts SET abalance = abalance + 1 WHERE aid = id; end if; exception WHEN OTHERS THEN raise notice '% %', SQLERRM, SQLSTATE; end; end; $$ language plpgsql; pgbench -i postgres pgbench -f update.sql -c 100 -T 100 -P 1 postgres
> On 10 Feb 2022, at 21:46, Konstantin Knizhnik <knizhnik@garret.ru> wrote: > As a result "t_xmin %u is uncommitted in tuple (%u,%u) to be updated in table" > error is reported. Wow, cool, that seem to be a solution to one more mysterious corruption thread - “reporting TID/table with corruption error"[0]. Thank you! Now it’s obvious that this is not a real data corruption. Maybe let’s remove corruption error code from the error? I hadbeen literally woken at night by this code few times in January. And do you have a plan how to fix the actual issue? Best regards, Andrey Borodin. [0] https://www.postgresql.org/message-id/flat/202108191637.oqyzrdtnheir%40alvherre.pgsql
On 2022-Feb-10, Andrey Borodin wrote: > > On 10 Feb 2022, at 21:46, Konstantin Knizhnik <knizhnik@garret.ru> wrote: > > As a result "t_xmin %u is uncommitted in tuple (%u,%u) to be updated in table" > > error is reported. > > Wow, cool, that seem to be a solution to one more mysterious > corruption thread - “reporting TID/table with corruption error" [0]. > Thank you! Ooh, great find. I'll have a look at this soon. Thanks for CCing me. -- Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/ "I love the Postgres community. It's all about doing things _properly_. :-)" (David Garamond)
On Thu, Feb 10, 2022 at 11:46 AM Konstantin Knizhnik <knizhnik@garret.ru> wrote: > Postgres first records state transaction in CLOG, then removes > transaction from procarray and finally releases locks. > But it can happen that transaction is marked as committed in CLOG, > XMAX_COMMITTED bit is set in modified tuple but > TransactionIdIsInProgress still returns true. As a result "t_xmin %u is > uncommitted in tuple (%u,%u) to be updated in table" > error is reported. This is exactly why the HeapTupleSatisfiesFOO() functions all test TransactionIdIsInProgress() first and TransactionIdDidCommit() only if it returns false. I don't think it's really legal to test TransactionIdDidCommit() without checking TransactionIdIsInProgress() first. And I bet that's exactly what this code is doing... -- Robert Haas EDB: http://www.enterprisedb.com
Hi, On 2022-02-10 15:21:27 -0500, Robert Haas wrote: > On Thu, Feb 10, 2022 at 11:46 AM Konstantin Knizhnik <knizhnik@garret.ru> wrote: > > Postgres first records state transaction in CLOG, then removes > > transaction from procarray and finally releases locks. > > But it can happen that transaction is marked as committed in CLOG, > > XMAX_COMMITTED bit is set in modified tuple but > > TransactionIdIsInProgress still returns true. As a result "t_xmin %u is > > uncommitted in tuple (%u,%u) to be updated in table" > > error is reported. > > This is exactly why the HeapTupleSatisfiesFOO() functions all test > TransactionIdIsInProgress() first and TransactionIdDidCommit() only if > it returns false. I don't think it's really legal to test > TransactionIdDidCommit() without checking TransactionIdIsInProgress() > first. And I bet that's exactly what this code is doing... The problem is that the TransactionIdDidAbort() call is coming from within TransactionIdIsInProgress(). If I understand correctly, the problem is that the xid cache stuff doesn't correctly work for subtransactions and breaks TransactionIdIsInProgress(). For the problem to occur you need an xid that is for a subtransaction that suboverflowed and that is 'clog committed' but not 'procarray committed'. If that xid is tested twice with TransactionIdIsInProgress(), you will get a bogus result on the second call within the same session. The first TransactionIdIsInProgress() will go to /* * Step 4: have to check pg_subtrans. * * At this point, we know it's either a subtransaction of one of the Xids * in xids[], or it's not running. If it's an already-failed * subtransaction, we want to say "not running" even though its parent may * still be running. So first, check pg_xact to see if it's been aborted. */ xc_slow_answer_inc(); if (TransactionIdDidAbort(xid)) return false; and fetch that xid. The TransactionLogFetch() will set cachedFetchXid, because the xid's status is TRANSACTION_STATUS_COMMITTED. Because the transaction didn't abort TransactionIdIsInProgress() will go onto the the following loop, see the toplevel xid is in progress in the procarray and return true. The second call to TransactionIdIsInProgress() will see that the fetched transaction matches the cached transactionid and return false. /* * We may have just checked the status of this transaction, so if it is * already known to be completed, we can fall out without any access to * shared memory. */ if (TransactionIdIsKnownCompleted(xid)) { xc_by_known_xact_inc(); return false; } Without any improper uses of TransactionIdDidAbort()/TransactionIdDidCommit() before TransactionIdIsInProgress(). And other sessions will still return true. And even *this* session can return true again, if another transaction is checked that ends up caching another transaction status in cachedFetchXid. It looks to me like the TransactionIdIsKnownCompleted() path in TransactionIdIsInProgress() is just entirely broken. An prior TransactionLogFetch() can't be allowed to make subsequent TransactionIdIsInProgress() calls return wrong values. If I understand the problem correctly, a SELECT pg_xact_status(clog-committed-pgproc-in-progres) can make TransactionIdIsInProgress() return wrong values too. It's not just TransactionIdIsInProgress() itself. Looks like this problem goes back all the way back to commit 611b4393f22f2bb43135501cd6b7591299b6b453 Author: Tom Lane <tgl@sss.pgh.pa.us> Date: 2008-03-11 20:20:35 +0000 Make TransactionIdIsInProgress check transam.c's single-item XID status cache before it goes groveling through the ProcArray. In situations where the same recently-committed transaction ID is checked repeatedly by tqual.c, this saves a lot of shared-memory searches. And it's cheap enough that it shouldn't hurt noticeably when it doesn't help. Concept and patch by Simon, some minor tweaking and comment-cleanup by Tom. I think this may actually mean that the hot corruption problem fixed in commit 18b87b201f7 Author: Andres Freund <andres@anarazel.de> Date: 2021-12-10 20:12:26 -0800 Fix possible HOT corruption when RECENTLY_DEAD changes to DEAD while pruning. for 14/master is present in older branches too :(. Need to trace through the HTSV and pruning logic with a bit more braincells than currently available to be sure. Greetings, Andres Freund
Hi, On 2022-02-10 21:56:09 -0800, Andres Freund wrote: > I think this may actually mean that the hot corruption problem fixed in > > commit 18b87b201f7 > Author: Andres Freund <andres@anarazel.de> > Date: 2021-12-10 20:12:26 -0800 > > Fix possible HOT corruption when RECENTLY_DEAD changes to DEAD while pruning. > > for 14/master is present in older branches too :(. Need to trace through the > HTSV and pruning logic with a bit more braincells than currently available to > be sure. On second thought, there's probably sufficiently more direct corruption this can cause than corruption via hot pruning. Not that that's not a problem, but ... Inconsistent TransactionIdIsInProgress() result can wreak havoc quite broadly. Looks lik syncrep will make this a lot worse, because it can drastically increase the window between the TransactionIdCommitTree() and ProcArrayEndTransaction() due to the SyncRepWaitForLSN() inbetween. But at least it might make it easier to write tests exercising this scenario... Greetings, Andres Freund
On Fri, 11 Feb 2022 at 06:11, Andres Freund <andres@anarazel.de> wrote: > Looks lik syncrep will make this a lot worse, because it can drastically > increase the window between the TransactionIdCommitTree() and > ProcArrayEndTransaction() due to the SyncRepWaitForLSN() inbetween. But at > least it might make it easier to write tests exercising this scenario... Agreed TransactionIdIsKnownCompleted(xid) is only broken because the single item cache is set too early in some cases. The single item cache is important for performance, so we just need to be more careful about setting the cache. -- Simon Riggs http://www.EnterpriseDB.com/
On Fri, 11 Feb 2022 at 08:48, Simon Riggs <simon.riggs@enterprisedb.com> wrote: > > On Fri, 11 Feb 2022 at 06:11, Andres Freund <andres@anarazel.de> wrote: > > > Looks lik syncrep will make this a lot worse, because it can drastically > > increase the window between the TransactionIdCommitTree() and > > ProcArrayEndTransaction() due to the SyncRepWaitForLSN() inbetween. But at > > least it might make it easier to write tests exercising this scenario... > > Agreed > > TransactionIdIsKnownCompleted(xid) is only broken because the single > item cache is set too early in some cases. The single item cache is > important for performance, so we just need to be more careful about > setting the cache. Something like this... fix_cachedFetchXid.v1.patch prevents the cache being set, but this fails! Not worked out why, yet. just_remove_TransactionIdIsKnownCompleted_call.v1.patch just removes the known offending call, passes make check, but IMHO leaves the same error just as likely by other callers. -- Simon Riggs http://www.EnterpriseDB.com/
Attachment
Hi, On 2022-02-11 13:48:59 +0000, Simon Riggs wrote: > On Fri, 11 Feb 2022 at 08:48, Simon Riggs <simon.riggs@enterprisedb.com> wrote: > > > > On Fri, 11 Feb 2022 at 06:11, Andres Freund <andres@anarazel.de> wrote: > > > > > Looks lik syncrep will make this a lot worse, because it can drastically > > > increase the window between the TransactionIdCommitTree() and > > > ProcArrayEndTransaction() due to the SyncRepWaitForLSN() inbetween. But at > > > least it might make it easier to write tests exercising this scenario... > > > > Agreed > > > > TransactionIdIsKnownCompleted(xid) is only broken because the single > > item cache is set too early in some cases. The single item cache is > > important for performance, so we just need to be more careful about > > setting the cache. > > Something like this... fix_cachedFetchXid.v1.patch prevents the cache > being set, but this fails! Not worked out why, yet. I don't think it's safe to check !TransactionIdKnownNotInProgress() in TransactionLogFetch(), given that TransactionIdKnownNotInProgress() needs to do TransactionLogFetch() internally. > just_remove_TransactionIdIsKnownCompleted_call.v1.patch I think this might contain a different diff than what the name suggests? > just removes the known offending call, passes make check, but IMHO > leaves the same error just as likely by other callers. Which other callers are you referring to? To me it seems that the "real" reason behind this specific issue being nontrivial to fix and behind the frequent error of calling TransactionIdDidCommit() before TransactionIdIsInProgress() is that we've really made a mess out of the transaction status determination code / API. IMO the original sin is requiring the complicated if (TransactionIdIsCurrentTransactionId(xid) ... else if (TransactionIdIsInProgress(xid) ... else if (TransactionIdDidCommit(xid) ... dance at pretty much any place checking transaction status. The multiple calls for the same xid is, I think, what pushed the caching down to the TransactionLogFetch level. ISTM that we should introduce something like TransactionIdStatus(xid) that returns XACT_STATUS_CURRENT XACT_STATUS_IN_PROGRESS XACT_STATUS_COMMITTED XACT_STATUS_ABORTED accompanied by a TransactionIdStatusMVCC(xid, snapshot) that checks XidInMVCCSnapshot() instead of TransactionIdIsInProgress(). I think that'd also make visibility checks faster - we spend a good chunk of cycles doing unnecessary function calls and repeatedly gathering information. It's also kind of weird that TransactionIdIsCurrentTransactionId() isn't more optimized - TransactionIdIsInProgress() knows it doesn't need to check anything before RecentXmin, but TransactionIdIsCurrentTransactionId() doesn't. Nor does TransactionIdIsCurrentTransactionId() check if the xid is smaller than GetTopTransactionIdIfAny() before bsearching through subxids. Of course anything along these lines would never be backpatchable... Greetings, Andres Freund
Hi, On 2022-02-10 22:11:38 -0800, Andres Freund wrote: > Looks lik syncrep will make this a lot worse, because it can drastically > increase the window between the TransactionIdCommitTree() and > ProcArrayEndTransaction() due to the SyncRepWaitForLSN() inbetween. But at > least it might make it easier to write tests exercising this scenario... FWIW, I've indeed reproduced this fairly easily with such a setup. A pgbench r/w workload that's been modified to start 70 savepoints at the start shows pgbench: error: client 22 script 0 aborted in command 12 query 0: ERROR: t_xmin 3853739 is uncommitted in tuple (2,159)to be updated in table "pgbench_branches" pgbench: error: client 13 script 0 aborted in command 12 query 0: ERROR: t_xmin 3954305 is uncommitted in tuple (2,58) tobe updated in table "pgbench_branches" pgbench: error: client 7 script 0 aborted in command 12 query 0: ERROR: t_xmin 4017908 is uncommitted in tuple (3,44) tobe updated in table "pgbench_branches" after a few minutes of running with a local, not slowed down, syncrep. Without any other artifical slowdowns or such. Greetings, Andres Freund
Hi, On 2022-02-11 16:41:24 -0800, Andres Freund wrote: > FWIW, I've indeed reproduced this fairly easily with such a setup. A pgbench > r/w workload that's been modified to start 70 savepoints at the start shows > > pgbench: error: client 22 script 0 aborted in command 12 query 0: ERROR: t_xmin 3853739 is uncommitted in tuple (2,159)to be updated in table "pgbench_branches" > pgbench: error: client 13 script 0 aborted in command 12 query 0: ERROR: t_xmin 3954305 is uncommitted in tuple (2,58)to be updated in table "pgbench_branches" > pgbench: error: client 7 script 0 aborted in command 12 query 0: ERROR: t_xmin 4017908 is uncommitted in tuple (3,44)to be updated in table "pgbench_branches" > > after a few minutes of running with a local, not slowed down, syncrep. Without > any other artifical slowdowns or such. And this can easily be triggered even without subtransactions, in a completely reliable way. The only reason I'm so far not succeeding in turning it into an isolationtester spec is that a transaction waiting for SyncRep doesn't count as waiting for isolationtester. Basically S1: BEGIN; $xid = txid_current(); UPDATE; COMMIT; <commit wait for syncrep> S2: SELECT pg_xact_status($xid); S2: UPDATE; suffices, because the pg_xact_status() causes an xlog fetch, priming the xid cache, which then causes the TransactionIdIsInProgress() to take the early return path, despite the transaction still being in progress. Which then allows the update to proceed, despite the S1 not having "properly committed" yet. Greetings, Andres Freund
On Fri, 11 Feb 2022 at 19:08, Andres Freund <andres@anarazel.de> wrote: > On 2022-02-11 13:48:59 +0000, Simon Riggs wrote: > > On Fri, 11 Feb 2022 at 08:48, Simon Riggs <simon.riggs@enterprisedb.com> wrote: > > > > > > On Fri, 11 Feb 2022 at 06:11, Andres Freund <andres@anarazel.de> wrote: > > > > > > > Looks lik syncrep will make this a lot worse, because it can drastically > > > > increase the window between the TransactionIdCommitTree() and > > > > ProcArrayEndTransaction() due to the SyncRepWaitForLSN() inbetween. But at > > > > least it might make it easier to write tests exercising this scenario... > > > > > > Agreed > > > > > > TransactionIdIsKnownCompleted(xid) is only broken because the single > > > item cache is set too early in some cases. The single item cache is > > > important for performance, so we just need to be more careful about > > > setting the cache. > > > > Something like this... fix_cachedFetchXid.v1.patch prevents the cache > > being set, but this fails! Not worked out why, yet. > > I don't think it's safe to check !TransactionIdKnownNotInProgress() in > TransactionLogFetch(), given that TransactionIdKnownNotInProgress() needs to > do TransactionLogFetch() internally. That's not correct because you're confusing TransactionIdKnownNotInProgress(), which is a new function introduced by the patch, with the existing function TransactionIdIsKnownCompleted(). > > just_remove_TransactionIdIsKnownCompleted_call.v1.patch > > I think this might contain a different diff than what the name suggests? Not at all, please don't be confused by the name. The patch removes the call to TransactionIdIsKnownCompleted() from TransactionIdIsInProgress(). I'm not sure it is possible to remove TransactionIdIsKnownCompleted() in backbranches. > > just removes the known offending call, passes make check, but IMHO > > leaves the same error just as likely by other callers. > > Which other callers are you referring to? I don't know of any, but we can't just remove a function in a backbranch, can we? > To me it seems that the "real" reason behind this specific issue being > nontrivial to fix and behind the frequent error of calling > TransactionIdDidCommit() before TransactionIdIsInProgress() is > that we've really made a mess out of the transaction status determination code > / API. IMO the original sin is requiring the complicated > > if (TransactionIdIsCurrentTransactionId(xid) > ... > else if (TransactionIdIsInProgress(xid) > ... > else if (TransactionIdDidCommit(xid) > ... > > dance at pretty much any place checking transaction status. Agreed, it's pretty weird to have to call the functions in the right order or you get the wrong answer. Especially since we have no cross-check to ensure the correct sequence was followed. > The multiple calls > for the same xid is, I think, what pushed the caching down to the > TransactionLogFetch level. > > ISTM that we should introduce something like TransactionIdStatus(xid) that > returns > XACT_STATUS_CURRENT > XACT_STATUS_IN_PROGRESS > XACT_STATUS_COMMITTED > XACT_STATUS_ABORTED > accompanied by a TransactionIdStatusMVCC(xid, snapshot) that checks > XidInMVCCSnapshot() instead of TransactionIdIsInProgress(). > > I think that'd also make visibility checks faster - we spend a good chunk of > cycles doing unnecessary function calls and repeatedly gathering information. > > > It's also kind of weird that TransactionIdIsCurrentTransactionId() isn't more > optimized - TransactionIdIsInProgress() knows it doesn't need to check > anything before RecentXmin, but TransactionIdIsCurrentTransactionId() > doesn't. Nor does TransactionIdIsCurrentTransactionId() check if the xid is > smaller than GetTopTransactionIdIfAny() before bsearching through subxids. > > Of course anything along these lines would never be backpatchable... Agreed I've presented a simple patch intended for backpatching. You may not like the name, but please have another look at "just_remove_TransactionIdIsKnownCompleted_call.v1.patch". I believe it is backpatchable with minimal impact and without loss of performance. -- Simon Riggs http://www.EnterpriseDB.com/
Hi, On 2022-02-12 13:25:58 +0000, Simon Riggs wrote: > On Fri, 11 Feb 2022 at 19:08, Andres Freund <andres@anarazel.de> wrote: > > > On 2022-02-11 13:48:59 +0000, Simon Riggs wrote: > > > On Fri, 11 Feb 2022 at 08:48, Simon Riggs <simon.riggs@enterprisedb.com> wrote: > > > > > > > > On Fri, 11 Feb 2022 at 06:11, Andres Freund <andres@anarazel.de> wrote: > > > > > > > > > Looks lik syncrep will make this a lot worse, because it can drastically > > > > > increase the window between the TransactionIdCommitTree() and > > > > > ProcArrayEndTransaction() due to the SyncRepWaitForLSN() inbetween. But at > > > > > least it might make it easier to write tests exercising this scenario... > > > > > > > > Agreed > > > > > > > > TransactionIdIsKnownCompleted(xid) is only broken because the single > > > > item cache is set too early in some cases. The single item cache is > > > > important for performance, so we just need to be more careful about > > > > setting the cache. > > > > > > Something like this... fix_cachedFetchXid.v1.patch prevents the cache > > > being set, but this fails! Not worked out why, yet. > > > > I don't think it's safe to check !TransactionIdKnownNotInProgress() in > > TransactionLogFetch(), given that TransactionIdKnownNotInProgress() needs to > > do TransactionLogFetch() internally. > > That's not correct because you're confusing > TransactionIdKnownNotInProgress(), which is a new function introduced > by the patch, with the existing function > TransactionIdIsKnownCompleted(). I don't think so. This call of the new TransactionIdKnownNotInProgress() > --- a/src/backend/access/transam/transam.c > +++ b/src/backend/access/transam/transam.c > @@ -73,6 +73,14 @@ TransactionLogFetch(TransactionId transactionId) > return TRANSACTION_STATUS_ABORTED; > } > > + /* > + * Safeguard that we have called TransactionIsIdInProgress() before > + * checking commit log manager, to ensure that we do not cache the > + * result until the xid is no longer in the procarray at eoxact. > + */ > + if (!TransactionIdKnownNotInProgress(transactionId)) > + return TRANSACTION_STATUS_IN_PROGRESS; > + > /* > * Get the transaction status. > */ isn't right. Consider what happens to TransactionIdIsInProgress(x)'s call to TransactionIdDidAbort(x) at "step 4". TransactionIdDidAbort(x) will call TransactionLogFetch(x). cachedXidIsNotInProgress isn't yet set to x, so TransactionLogFetch() returns TRANSACTION_STATUS_IN_PROGRESS. Even if the (sub)transaction aborted. Which explains why your first patch doesn't work. > > > just_remove_TransactionIdIsKnownCompleted_call.v1.patch > > > > I think this might contain a different diff than what the name suggests? > > Not at all, please don't be confused by the name. The patch removes > the call to TransactionIdIsKnownCompleted() from > TransactionIdIsInProgress(). I guess for me "just remove" doesn't include adding a new cache... > I'm not sure it is possible to remove TransactionIdIsKnownCompleted() > in backbranches. I think it'd be fine if we needed to. Or we could just make it always return false or such. > > > just removes the known offending call, passes make check, but IMHO > > > leaves the same error just as likely by other callers. > > > > Which other callers are you referring to? > > I don't know of any, but we can't just remove a function in a > backbranch, can we? We have in the past, if it's a "sufficiently internal" function. Greetings, Andres Freund
On 12/02/2022 05:42, Andres Freund wrote: > On 2022-02-11 16:41:24 -0800, Andres Freund wrote: >> FWIW, I've indeed reproduced this fairly easily with such a setup. A pgbench >> r/w workload that's been modified to start 70 savepoints at the start shows >> >> pgbench: error: client 22 script 0 aborted in command 12 query 0: ERROR: t_xmin 3853739 is uncommitted in tuple (2,159)to be updated in table "pgbench_branches" >> pgbench: error: client 13 script 0 aborted in command 12 query 0: ERROR: t_xmin 3954305 is uncommitted in tuple (2,58)to be updated in table "pgbench_branches" >> pgbench: error: client 7 script 0 aborted in command 12 query 0: ERROR: t_xmin 4017908 is uncommitted in tuple (3,44)to be updated in table "pgbench_branches" >> >> after a few minutes of running with a local, not slowed down, syncrep. Without >> any other artifical slowdowns or such. > > And this can easily be triggered even without subtransactions, in a completely > reliable way. > > The only reason I'm so far not succeeding in turning it into an > isolationtester spec is that a transaction waiting for SyncRep doesn't count > as waiting for isolationtester. > > Basically > > S1: BEGIN; $xid = txid_current(); UPDATE; COMMIT; <commit wait for syncrep> > S2: SELECT pg_xact_status($xid); > S2: UPDATE; > > suffices, because the pg_xact_status() causes an xlog fetch, priming the xid > cache, which then causes the TransactionIdIsInProgress() to take the early > return path, despite the transaction still being in progress. Which then > allows the update to proceed, despite the S1 not having "properly committed" > yet. I started to improve isolationtester, to allow the spec file to specify a custom query to check for whether the backend is blocked. But then I realized that it's not quite enough: there is currently no way write the "$xid = txid_current()" step either. Isolationtester doesn't have variables like that. Also, you need to set synchronous_standby_names to make it block, which seems a bit weird in an isolation test. I wish we had an explicit way to inject delays or failures. We discussed it before (https://www.postgresql.org/message-id/flat/CANXE4TdxdESX1jKw48xet-5GvBFVSq%3D4cgNeioTQff372KO45A%40mail.gmail.com), but I don't want to pick up that task right now. Anyway, I wrote a TAP test for this instead. See attached. I'm not sure if this is worth committing, the pg_xact_status() trick is quite special for this particular bug. Simon's just_remove_TransactionIdIsKnownCompleted_call.v1.patch looks good to me. Replying to the discussion on that: On 12/02/2022 23:50, Andres Freund wrote: > On 2022-02-12 13:25:58 +0000, Simon Riggs wrote: >> I'm not sure it is possible to remove TransactionIdIsKnownCompleted() >> in backbranches. > > I think it'd be fine if we needed to. Or we could just make it always return > false or such. > > >>>> just removes the known offending call, passes make check, but IMHO >>>> leaves the same error just as likely by other callers. >>> >>> Which other callers are you referring to? >> >> I don't know of any, but we can't just remove a function in a >> backbranch, can we? > > We have in the past, if it's a "sufficiently internal" function. I think we should remove it in v15, and leave it as it is in backbranches. Just add a comment to point out that the name is a bit misleading, because it checks the clog rather than the proc array. It's not inherently dangerous, and someone might have a legit use case for it. True, someone might also be doing this incorrect thing with it, but still seems like the right balance to me. I think we also need to change pg_xact_status(), to also call TransactionIdIsInProgress() before TransactionIdDidCommit/Abort(). Currently, if it returns "committed", and you start a new transaction, the new transaction might not yet see the effects of that "committed" transaction. With that, you cannot reproduce the original bug with pg_xact_status() though, so there's no point in committing that test then. In summary, I think we should: - commit and backpatch Simon's just_remove_TransactionIdIsKnownCompleted_call.v1.patch - fix pg_xact_status() to check TransactionIdIsInProgress() - in v15, remove TransationIdIsKnownCompleted function altogether I'll try to get around to that in the next few days, unless someone beats me to it. - Heikki
Attachment
Hi, On 2022-06-23 22:03:27 +0300, Heikki Linnakangas wrote: > On 12/02/2022 05:42, Andres Freund wrote: > > The only reason I'm so far not succeeding in turning it into an > > isolationtester spec is that a transaction waiting for SyncRep doesn't count > > as waiting for isolationtester. > > > > Basically > > > > S1: BEGIN; $xid = txid_current(); UPDATE; COMMIT; <commit wait for syncrep> > > S2: SELECT pg_xact_status($xid); > > S2: UPDATE; > > > > suffices, because the pg_xact_status() causes an xlog fetch, priming the xid > > cache, which then causes the TransactionIdIsInProgress() to take the early > > return path, despite the transaction still being in progress. Which then > > allows the update to proceed, despite the S1 not having "properly committed" > > yet. > > I started to improve isolationtester, to allow the spec file to specify a > custom query to check for whether the backend is blocked. But then I > realized that it's not quite enough: there is currently no way write the > "$xid = txid_current()" step either. Isolationtester doesn't have variables > like that. Also, you need to set synchronous_standby_names to make it block, > which seems a bit weird in an isolation test. I don't think we strictly need $xid - it likely can be implemented via something like SELECT pg_xact_status(backend_xid) FROM pg_stat_activity WHERE application_name = 'testname/s1'; > I wish we had an explicit way to inject delays or failures. We discussed it > before (https://www.postgresql.org/message-id/flat/CANXE4TdxdESX1jKw48xet-5GvBFVSq%3D4cgNeioTQff372KO45A%40mail.gmail.com), > but I don't want to pick up that task right now. Understandable :( > Anyway, I wrote a TAP test for this instead. See attached. I'm not sure if > this is worth committing, the pg_xact_status() trick is quite special for > this particular bug. Yea, not sure either. > Simon's just_remove_TransactionIdIsKnownCompleted_call.v1.patch looks good > to me. Replying to the discussion on that: > > On 12/02/2022 23:50, Andres Freund wrote: > > On 2022-02-12 13:25:58 +0000, Simon Riggs wrote: > > > I'm not sure it is possible to remove TransactionIdIsKnownCompleted() > > > in backbranches. > > > > I think it'd be fine if we needed to. Or we could just make it always return > > false or such. > > > > > > > > > just removes the known offending call, passes make check, but IMHO > > > > > leaves the same error just as likely by other callers. > > > > > > > > Which other callers are you referring to? > > > > > > I don't know of any, but we can't just remove a function in a > > > backbranch, can we? > > > > We have in the past, if it's a "sufficiently internal" function. > > I think we should remove it in v15, and leave it as it is in backbranches. > Just add a comment to point out that the name is a bit misleading, because > it checks the clog rather than the proc array. It's not inherently > dangerous, and someone might have a legit use case for it. True, someone > might also be doing this incorrect thing with it, but still seems like the > right balance to me. > > I think we also need to change pg_xact_status(), to also call > TransactionIdIsInProgress() before TransactionIdDidCommit/Abort(). > Currently, if it returns "committed", and you start a new transaction, the > new transaction might not yet see the effects of that "committed" > transaction. With that, you cannot reproduce the original bug with > pg_xact_status() though, so there's no point in committing that test then. > > In summary, I think we should: > - commit and backpatch Simon's > just_remove_TransactionIdIsKnownCompleted_call.v1.patch > - fix pg_xact_status() to check TransactionIdIsInProgress() > - in v15, remove TransationIdIsKnownCompleted function altogether > > I'll try to get around to that in the next few days, unless someone beats me > to it. Makes sense. Greetings, Andres Freund
On 24/06/2022 04:43, Andres Freund wrote: > On 2022-06-23 22:03:27 +0300, Heikki Linnakangas wrote: >> In summary, I think we should: >> - commit and backpatch Simon's >> just_remove_TransactionIdIsKnownCompleted_call.v1.patch >> - fix pg_xact_status() to check TransactionIdIsInProgress() >> - in v15, remove TransationIdIsKnownCompleted function altogether >> >> I'll try to get around to that in the next few days, unless someone beats me >> to it. > > Makes sense. This is what I came up with for master. One difference from Simon's original patch is that I decided to not expose the TransactionIdIsKnownNotInProgress(), as there are no other callers of it in core, and it doesn't seem useful to extensions. I inlined it into the caller instead. BTW, should we worry about XID wraparound with the cache? Could you have a backend sit idle for 2^32 transactions, without making any TransactionIdIsKnownNotInProgress() calls? That's not new with this patch, though, it could happen with the single-item cache in transam.c, too. - Heikki
Attachment
On Sat, 25 Jun 2022 at 10:18, Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > On 24/06/2022 04:43, Andres Freund wrote: > > On 2022-06-23 22:03:27 +0300, Heikki Linnakangas wrote: > >> In summary, I think we should: > >> - commit and backpatch Simon's > >> just_remove_TransactionIdIsKnownCompleted_call.v1.patch > >> - fix pg_xact_status() to check TransactionIdIsInProgress() > >> - in v15, remove TransationIdIsKnownCompleted function altogether > >> > >> I'll try to get around to that in the next few days, unless someone beats me > >> to it. > > > > Makes sense. > > This is what I came up with for master. One difference from Simon's > original patch is that I decided to not expose the > TransactionIdIsKnownNotInProgress(), as there are no other callers of it > in core, and it doesn't seem useful to extensions. I inlined it into the > caller instead. Looks good, thanks. > BTW, should we worry about XID wraparound with the cache? Could you have > a backend sit idle for 2^32 transactions, without making any > TransactionIdIsKnownNotInProgress() calls? That's not new with this > patch, though, it could happen with the single-item cache in transam.c, too. Yes, as a separate patch only in PG15+, imho. -- Simon Riggs http://www.EnterpriseDB.com/
On 25/06/2022 13:10, Simon Riggs wrote: > On Sat, 25 Jun 2022 at 10:18, Heikki Linnakangas <hlinnaka@iki.fi> wrote: >> >> On 24/06/2022 04:43, Andres Freund wrote: >>> On 2022-06-23 22:03:27 +0300, Heikki Linnakangas wrote: >>>> In summary, I think we should: >>>> - commit and backpatch Simon's >>>> just_remove_TransactionIdIsKnownCompleted_call.v1.patch >>>> - fix pg_xact_status() to check TransactionIdIsInProgress() >>>> - in v15, remove TransationIdIsKnownCompleted function altogether >>>> >>>> I'll try to get around to that in the next few days, unless someone beats me >>>> to it. >>> >>> Makes sense. >> >> This is what I came up with for master. One difference from Simon's >> original patch is that I decided to not expose the >> TransactionIdIsKnownNotInProgress(), as there are no other callers of it >> in core, and it doesn't seem useful to extensions. I inlined it into the >> caller instead. > > Looks good, thanks. Committed and backpatched. Thanks! - Heikki