Thread: BUG #18354: Aborted transaction aborted during cleanup when temp_file_limit exceeded
BUG #18354: Aborted transaction aborted during cleanup when temp_file_limit exceeded
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 18354 Logged by: Alexander Lakhin Email address: exclusion@gmail.com PostgreSQL version: 16.2 Operating system: Ubuntu 22.04 Description: The following script: SET temp_file_limit = 100; CREATE TABLE t (a text); INSERT INTO t VALUES ('test'); SELECT 'INSERT INTO t SELECT * FROM t RETURNING *' FROM generate_series(1, 18) \gexec triggers two errors, a warning, and an assertion failure: ERROR: temporary file size exceeds temp_file_limit (100kB) WARNING: AbortTransaction while in ABORT state ERROR: temporary file size exceeds temp_file_limit (100kB) server closed the connection unexpectedly The backtrace for the first error is: 2024-02-21 11:40:06.947 UTC|law|regression|65d5e116.13cfcb|LOG: statement: INSERT INTO t SELECT * FROM t RETURNING * 2024-02-21 11:40:06.999 UTC|law|regression|65d5e116.13cfcb|ERROR: temporary file size exceeds temp_file_limit (100kB) 2024-02-21 11:40:06.999 UTC|law|regression|65d5e116.13cfcb|BACKTRACE: FileWrite at fd.c:2183:5 BufFileDumpBuffer at buffile.c:537:18 BufFileWrite at buffile.c:698:28 writetup_heap at tuplestore.c:1513:5 dumptuples at tuplestore.c:1210:35 tuplestore_puttuple_common at tuplestore.c:833:4 MemoryContextSwitchTo at palloc.h:142:23 (inlined by) tuplestore_puttupleslot at tuplestore.c:722:2 tstoreReceiveSlot_notoast at tstoreReceiver.c:129:1 ExecutePlan at execMain.c:1701:7 standard_ExecutorRun at execMain.c:365:3 ExecutorRun at execMain.c:310:1 ProcessQuery at pquery.c:165:5 PortalRunMulti at pquery.c:1293:7 FillPortalStore at pquery.c:1042:8 PortalRun at pquery.c:763:6 exec_simple_query at postgres.c:1282:3 PostgresMain at postgres.c:4641:27 report_fork_failure_to_client at postmaster.c:4242:1 BackendStartup at postmaster.c:4199:22 ServerLoop at postmaster.c:1788:6 main at main.c:185:3 For the second error: 2024-02-21 11:40:07.001 UTC|law|regression|65d5e116.13cfcb|ERROR: temporary file size exceeds temp_file_limit (100kB) 2024-02-21 11:40:07.001 UTC|law|regression|65d5e116.13cfcb|BACKTRACE: FileWrite at fd.c:2183:5 BufFileDumpBuffer at buffile.c:537:18 BufFileFlush at buffile.c:723:3 BufFileClose at buffile.c:419:9 tuplestore_end at tuplestore.c:459:5 MemoryContextSwitchTo at palloc.h:142:23 (inlined by) PortalDrop at portalmem.c:587:3 AtCleanup_Portals at portalmem.c:907:3 CleanupTransaction at xact.c:2903:2 AbortCurrentTransaction at xact.c:3341:18 PostgresMain at postgres.c:4356:6 report_fork_failure_to_client at postmaster.c:4242:1 BackendStartup at postmaster.c:4199:22 ServerLoop at postmaster.c:1788:6 main at main.c:185:3 For the failed assertion: 2024-02-21 11:40:07.001 UTC|law|regression|65d5e116.13cfcb|WARNING: AbortTransaction while in ABORT state TRAP: failed Assert("TransactionIdIsValid(proc->xid)"), File: "procarray.c", Line: 680, PID: 1298379 ExceptionalCondition at assert.c:52:13 ProcArrayEndTransactionInternal at procarray.c:742:2 (inlined by) ProcArrayEndTransaction at procarray.c:689:4 AbortTransaction at xact.c:2841:5 AbortCurrentTransaction at xact.c:3340:4 PostgresMain at postgres.c:4356:6 report_fork_failure_to_client at postmaster.c:4242:1 BackendStartup at postmaster.c:4199:22 ServerLoop at postmaster.c:1788:6 BackgroundWorkerInitializeConnection at postmaster.c:5604:1 main at main.c:185:3 Reproduced on REL_12_STABLE .. master.
Re: BUG #18354: Aborted transaction aborted during cleanup when temp_file_limit exceeded
From
Kyotaro Horiguchi
Date:
At Wed, 21 Feb 2024 12:00:01 +0000, PG Bug reporting form <noreply@postgresql.org> wrote in > triggers two errors, a warning, and an assertion failure: > ERROR: temporary file size exceeds temp_file_limit (100kB) > WARNING: AbortTransaction while in ABORT state > ERROR: temporary file size exceeds temp_file_limit (100kB) > server closed the connection unexpectedly ... > For the second error: > 2024-02-21 11:40:07.001 UTC|law|regression|65d5e116.13cfcb|ERROR: temporary > file size exceeds temp_file_limit (100kB) > 2024-02-21 11:40:07.001 UTC|law|regression|65d5e116.13cfcb|BACKTRACE: > FileWrite at fd.c:2183:5 > BufFileDumpBuffer at buffile.c:537:18 > BufFileFlush at buffile.c:723:3 > BufFileClose at buffile.c:419:9 > tuplestore_end at tuplestore.c:459:5 > MemoryContextSwitchTo at palloc.h:142:23 > (inlined by) PortalDrop at portalmem.c:587:3 > AtCleanup_Portals at portalmem.c:907:3 Therefore, BufFileClose should not flush the content during error handling. In the first place, tuplestore doesn't need to flush the underlying files in _end and _clear. In this case, I would choose to change the general behavior of tuplestore. The attached PoC patch fixes the issue for me. It introduces a new "extended" function to control flushing, avoiding the addition of an unnatural parameter to BufFileClose. I suspect that it is usable in some other places, but I haven't checked that. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Attachment
Re: BUG #18354: Aborted transaction aborted during cleanup when temp_file_limit exceeded
From
Tender Wang
Date:
When the first error was reported, we entered AbortTransaction(), proc->xid was changed to InvalidTransactionId in ProcArrayEndTransactionInternal().
After AbortTransaction() is done, we enter into CleanupTransaction(), and error would be reported again. So we enter AbortTransaction() again because
the transaction block state is still TBLOCK_STARTED. MyProc->xid now is InvalidTransactionId, so the assert failed.
Changing the behavior of tuplestore can work for this issue, but I'm not sure if this change will affect other components which depend on BufFile(like nodeMaterial)
Kyotaro Horiguchi <horikyota.ntt@gmail.com> 于2024年2月22日周四 10:46写道:
At Wed, 21 Feb 2024 12:00:01 +0000, PG Bug reporting form <noreply@postgresql.org> wrote in
> triggers two errors, a warning, and an assertion failure:
> ERROR: temporary file size exceeds temp_file_limit (100kB)
> WARNING: AbortTransaction while in ABORT state
> ERROR: temporary file size exceeds temp_file_limit (100kB)
> server closed the connection unexpectedly
...
> For the second error:
> 2024-02-21 11:40:07.001 UTC|law|regression|65d5e116.13cfcb|ERROR: temporary
> file size exceeds temp_file_limit (100kB)
> 2024-02-21 11:40:07.001 UTC|law|regression|65d5e116.13cfcb|BACKTRACE:
> FileWrite at fd.c:2183:5
> BufFileDumpBuffer at buffile.c:537:18
> BufFileFlush at buffile.c:723:3
> BufFileClose at buffile.c:419:9
> tuplestore_end at tuplestore.c:459:5
> MemoryContextSwitchTo at palloc.h:142:23
> (inlined by) PortalDrop at portalmem.c:587:3
> AtCleanup_Portals at portalmem.c:907:3
Therefore, BufFileClose should not flush the content during error
handling. In the first place, tuplestore doesn't need to flush the
underlying files in _end and _clear. In this case, I would choose to
change the general behavior of tuplestore. The attached PoC patch
fixes the issue for me. It introduces a new "extended" function to
control flushing, avoiding the addition of an unnatural parameter to
BufFileClose. I suspect that it is usable in some other places, but I
haven't checked that.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Tender Wang
OpenPie: https://en.openpie.com/Re: BUG #18354: Aborted transaction aborted during cleanup when temp_file_limit exceeded
From
Kyotaro Horiguchi
Date:
At Thu, 22 Feb 2024 11:49:29 +0800, Tender Wang <tndrwang@gmail.com> wrote in > Changing the behavior of tuplestore can work for this issue, but I'm not > sure if this change will affect other components which depend on > BufFile(like nodeMaterial) Right. Regarding this type of error, the best we can do is to be careful not to cause exceptions (elog/ereport(>=ERROR)) within the abort code path. This is somewhat similar to the case of the interaction between malloc() and signal handler. First I tried to skip flushing in BufFileClose based on transaction state and isInterXact state. However, that didn't seem that simple. We could provide some flag to signal we're in the state where we cannot error out, which BufFileClose refers to. That in turn can cause other issues. I mentioned that I haven't checked for the similar potential issues BufFileClose can cause, but if nodeMaterial is that, the new function could be usable for that. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: BUG #18354: Aborted transaction aborted during cleanup when temp_file_limit exceeded
From
Kyotaro Horiguchi
Date:
At Thu, 22 Feb 2024 14:38:15 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in > At Thu, 22 Feb 2024 11:49:29 +0800, Tender Wang <tndrwang@gmail.com> wrote in > > Changing the behavior of tuplestore can work for this issue, but I'm not > > sure if this change will affect other components which depend on > > BufFile(like nodeMaterial) In the case of nodeMaterial, tuplestore is simply discarded and underlying files are removed without flushing by the resource owner mechanism. On the other hand, in the case of this bug report, the files in the problem tuplestore are not managed by a resource owner. As a result, PortalDrop have to explicitly call tuplestore_end to release resources used by the tuplestore. > /* > * Delete tuplestore if present. We should do this even under error > * conditions; since the tuplestore would have been using cross- > * transaction storage, its temp files need to be explicitly deleted. > */ > if (portal->holdStore) > { > MemoryContext oldcontext; > > oldcontext = MemoryContextSwitchTo(portal->holdContext); > tuplestore_end(portal->holdStore); As far as I can see, there are no caller sites to tuplestore_end in the abort path or any error handling path. regards. -- Kyotaro Horiguchi NTT Open Source Software Center