Thread: BUG #17828: postgres_fdw leaks file descriptors on error and aborts aborted transaction in lack of fds
BUG #17828: postgres_fdw leaks file descriptors on error and aborts aborted transaction in lack of fds
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 17828 Logged by: Alexander Lakhin Email address: exclusion@gmail.com PostgreSQL version: 15.2 Operating system: Ubuntu 22.04 Description: The following script: CREATE EXTENSION postgres_fdw; DO $d$ BEGIN EXECUTE $$CREATE SERVER loopback FOREIGN DATA WRAPPER postgres_fdw OPTIONS (dbname '$$||current_database()||$$', port '$$||current_setting('port')||$$', async_capable 'true' )$$; END; $d$; CREATE USER MAPPING FOR CURRENT_USER SERVER loopback; CREATE TABLE t(a int) PARTITION BY RANGE (a); CREATE TABLE pt1 (a int); INSERT INTO pt1 SELECT 1; CREATE FOREIGN TABLE fpt1 PARTITION OF t FOR VALUES FROM ( 0) TO (10) SERVER loopback OPTIONS (table_name 'pt1'); CREATE FOREIGN TABLE fpt2 PARTITION OF t FOR VALUES FROM (10) TO (20) SERVER loopback OPTIONS (table_name 'non_existent_table'); CREATE TABLE nt(a int); SELECT 'INSERT INTO nt SELECT * FROM t' FROM generate_series(1, 400) \gexec (assuming "ulimit -n" = 1024 and max_safe_fds / 3 = 329) invokes errors and an assertion failure: ... CONTEXT: remote SQL command: SELECT a FROM public.non_existent_table ERROR: relation "public.non_existent_table" does not exist CONTEXT: remote SQL command: SELECT a FROM public.non_existent_table ERROR: relation "public.non_existent_table" does not exist CONTEXT: remote SQL command: SELECT a FROM public.non_existent_table ERROR: epoll_create1 failed: Too many open files ERROR: epoll_create1 failed: Too many open files WARNING: AbortTransaction while in ABORT state server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. connection to server was lost ... Core was generated by `postgres: law regression [local] INSERT '. Program terminated with signal SIGABRT, Aborted. warning: Section `.reg-xstate/3229796' in core file too small. #0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=139748276750272) at ./nptl/pthread_kill.c:44 44 ./nptl/pthread_kill.c: No such file or directory. (gdb) bt #0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=139748276750272) at ./nptl/pthread_kill.c:44 #1 __pthread_kill_internal (signo=6, threadid=139748276750272) at ./nptl/pthread_kill.c:78 #2 __GI___pthread_kill (threadid=139748276750272, signo=signo@entry=6) at ./nptl/pthread_kill.c:89 #3 0x00007f19adb0e476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 #4 0x00007f19adaf47f3 in __GI_abort () at ./stdlib/abort.c:79 #5 0x000055f6436330b5 in ExceptionalCondition (conditionName=0x55f643c7b780 <str> "TransactionIdIsValid(proc->xid)", fileName=0x55f643c7e48c "procarray.c", lineNumber=683) at assert.c:66 #6 0x000055f642ae083c in ProcArrayEndTransaction (proc=0x7f19a9c1e160, latestXid=1060) at procarray.c:683 #7 0x000055f641180763 in AbortTransaction () at xact.c:2841 #8 0x000055f64118710f in AbortCurrentTransaction () at xact.c:3346 #9 0x000055f642c0f27a in PostgresMain (dbname=0x62900001b358 "regression", username=0x6250000020f8 "law") at postgres.c:4299 #10 0x000055f642765b37 in BackendRun (port=0x614000000240) at postmaster.c:4461 #11 0x000055f64275cd90 in BackendStartup (port=0x614000000240) at postmaster.c:4189 #12 0x000055f642756d01 in ServerLoop () at postmaster.c:1779 #13 0x000055f642753da3 in PostmasterMain (argc=3, argv=0x603000000670) at postmaster.c:1463 #14 0x000055f641f5a0fa in main (argc=3, argv=0x603000000670) at main.c:200 Here pgfdw_get_result() calls WaitLatchOrSocket() -> CreateWaitEventSet(), that fails due to a lack of file descriptors; as a result, AbortCurrentTransaction() called, that calls CallXactCallbacks() -> pgfdw_xact_callback() -> pgfdw_abort_cleanup() -> pgfdw_cancel_query() -> pgfdw_get_cleanup_result() -> WaitLatchOrSocket() -> CreateWaitEventSet(), that fails and AbortCurrentTransaction() called again...
Re: BUG #17828: postgres_fdw leaks file descriptors on error and aborts aborted transaction in lack of fds
From
Tom Lane
Date:
PG Bug reporting form <noreply@postgresql.org> writes: > The following script: > [ leaks a file descriptor per error ] Yeah, at least on platforms where WaitEventSets own kernel file descriptors. I don't think it's postgres_fdw's fault though, but that of ExecAppendAsyncEventWait, which is ignoring the possibility of failing partway through. It looks like it'd be sufficient to add a PG_CATCH or PG_FINALLY block there to make sure the WaitEventSet is disposed of properly --- fortunately, it doesn't need to have any longer lifespan than that one function. regards, tom lane
Re: BUG #17828: postgres_fdw leaks file descriptors on error and aborts aborted transaction in lack of fds
From
Alexander Lakhin
Date:
Hello, 09.03.2023 19:57, Tom Lane wrote: > PG Bug reporting form <noreply@postgresql.org> writes: >> The following script: >> [ leaks a file descriptor per error ] > Yeah, at least on platforms where WaitEventSets own kernel file > descriptors. I don't think it's postgres_fdw's fault though, > but that of ExecAppendAsyncEventWait, which is ignoring the > possibility of failing partway through. It looks like it'd be > sufficient to add a PG_CATCH or PG_FINALLY block there to make > sure the WaitEventSet is disposed of properly --- fortunately, > it doesn't need to have any longer lifespan than that one > function. Now that the leakage eliminated by 50c67c201/481d7d1c0 we still can observe the assert-triggering half of the bug with something like that: echo " CREATE EXTENSION postgres_fdw; SELECT id INTO t FROM generate_series(1,100000) id; " | psql ns=322 # assuming "ulimit -n" = 1024 and max_safe_fds / 3 = 329 for ((i=1;i<=$ns;i++)); do echo " DO \$d\$ BEGIN EXECUTE \$\$CREATE SERVER loopback_$i FOREIGN DATA WRAPPER postgres_fdw OPTIONS ( dbname '\$\$||current_database()||\$\$', port '\$\$||current_setting('port')||\$\$' )\$\$; END; \$d\$; CREATE USER MAPPING FOR CURRENT_USER SERVER loopback_$i; CREATE FOREIGN TABLE ft_$i (id int) SERVER loopback_$i OPTIONS (table_name 't'); " done | psql >psql-1.log (echo "BEGIN;"; for ((i=1;i<=$ns;i++)); do echo "DECLARE cursor_$i CURSOR FOR SELECT * FROM ft_$i;"; done echo "SELECT * INTO nt FROM ft_1 UNION SELECT * FROM ft_2;" ) | psql >psql-2.log That leads to: 2023-11-24 03:03:50.281 UTC [1053419] ERROR: epoll_create1 failed: Too many open files 2023-11-24 03:03:50.281 UTC [1053419] STATEMENT: SELECT * INTO nt FROM ft_1 UNION SELECT * FROM ft_2; 2023-11-24 03:03:50.282 UTC [1053419] ERROR: epoll_create1 failed: Too many open files 2023-11-24 03:03:50.282 UTC [1053419] WARNING: AbortTransaction while in ABORT state TRAP: failed Assert("TransactionIdIsValid(proc->xid)"), File: "procarray.c", Line: 677, PID: 1053419 Call chains for the errors are the same as in the initial report [1]. [1] https://www.postgresql.org/message-id/17828-122da8cba23236be%40postgresql.org Best regards, Alexander
Re: BUG #17828: postgres_fdw leaks file descriptors on error and aborts aborted transaction in lack of fds
From
Etsuro Fujita
Date:
On Fri, Nov 24, 2023 at 1:00 PM Alexander Lakhin <exclusion@gmail.com> wrote: > 09.03.2023 19:57, Tom Lane wrote: > > PG Bug reporting form <noreply@postgresql.org> writes: > >> The following script: > >> [ leaks a file descriptor per error ] > > Yeah, at least on platforms where WaitEventSets own kernel file > > descriptors. I don't think it's postgres_fdw's fault though, > > but that of ExecAppendAsyncEventWait, which is ignoring the > > possibility of failing partway through. It looks like it'd be > > sufficient to add a PG_CATCH or PG_FINALLY block there to make > > sure the WaitEventSet is disposed of properly --- fortunately, > > it doesn't need to have any longer lifespan than that one > > function. > > Now that the leakage eliminated by 50c67c201/481d7d1c0 we still can observe > the assert-triggering half of the bug with something like that: Will look into this. Thanks for the report! My apologies for my very late response to this thread. I overlooked it completely. :( Best regards, Etsuro Fujita
Re: BUG #17828: postgres_fdw leaks file descriptors on error and aborts aborted transaction in lack of fds
From
Etsuro Fujita
Date:
On Mon, Nov 27, 2023 at 12:05 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote: > On Fri, Nov 24, 2023 at 1:00 PM Alexander Lakhin <exclusion@gmail.com> wrote: > > Now that the leakage eliminated by 50c67c201/481d7d1c0 we still can observe > > the assert-triggering half of the bug with something like that: > > Will look into this. I finally had time to look into this. IIUC I think the assertion failure was caused by an error-during-error-recovery loop caused by the "epoll_create1 failed: Too many open files" error raised in WaitLatchOrSocket called from pgfdw_get_cleanup_result, which is called during abort cleanup. I think a simple fix to avoid such a loop is to modify the PG_CATCH block in pgfdw_get_cleanup_result so that it just ignores the passed error, not re-throwing it, and restores InterruptHoldoffCount and the memory context, like the attached. In the patch I also modified callers of pgfdw_get_cleanup_result to issue a warning when ignoring the error. I might be missing something, though. Best regards, Etsuro Fujita
Attachment
Re: BUG #17828: postgres_fdw leaks file descriptors on error and aborts aborted transaction in lack of fds
From
Tom Lane
Date:
Etsuro Fujita <etsuro.fujita@gmail.com> writes: > IIUC I think the assertion failure was caused by an > error-during-error-recovery loop caused by the "epoll_create1 failed: > Too many open files" error raised in WaitLatchOrSocket called from > pgfdw_get_cleanup_result, which is called during abort cleanup. I > think a simple fix to avoid such a loop is to modify the PG_CATCH > block in pgfdw_get_cleanup_result so that it just ignores the passed > error, not re-throwing it, and restores InterruptHoldoffCount and the > memory context, like the attached. In the patch I also modified > callers of pgfdw_get_cleanup_result to issue a warning when ignoring > the error. I might be missing something, though. I do not think ignoring the passed error is *ever* acceptable. You have no idea what the error condition is or whether your hack is sufficient to recover from it. regards, tom lane
Re: BUG #17828: postgres_fdw leaks file descriptors on error and aborts aborted transaction in lack of fds
From
Andres Freund
Date:
Hi, On 2024-02-08 12:04:24 -0500, Tom Lane wrote: > Etsuro Fujita <etsuro.fujita@gmail.com> writes: > > IIUC I think the assertion failure was caused by an > > error-during-error-recovery loop caused by the "epoll_create1 failed: > > Too many open files" error raised in WaitLatchOrSocket called from > > pgfdw_get_cleanup_result, which is called during abort cleanup. I > > think a simple fix to avoid such a loop is to modify the PG_CATCH > > block in pgfdw_get_cleanup_result so that it just ignores the passed > > error, not re-throwing it, and restores InterruptHoldoffCount and the > > memory context, like the attached. In the patch I also modified > > callers of pgfdw_get_cleanup_result to issue a warning when ignoring > > the error. I might be missing something, though. > > I do not think ignoring the passed error is *ever* acceptable. > You have no idea what the error condition is or whether your > hack is sufficient to recover from it. +1 I think we ought to understand *why* we are getting the "Too many open files". The AcquireExternalFD() in CreateWaitEventSet() should prevent that. One annoying bit is that AcquireExternalFD() failing emits the same error as if epoll_create1() itself failing, including the same errno. So we don't know if the problem is that there are too many connections and because of that we are running out of "file descriptor slots", or whether something was using up file descriptors outside of our system, or ... I also wonder if postgres_fdw should strive to use a longer lived wait event set. For efficiency, if nothing else? That'd avoid the need to create one during error handling. Greetings, Andres Freund
Re: BUG #17828: postgres_fdw leaks file descriptors on error and aborts aborted transaction in lack of fds
From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes: > I also wonder if postgres_fdw should strive to use a longer lived wait event > set. For efficiency, if nothing else? That'd avoid the need to create one > during error handling. Seems like a plausible idea. Having to acquire resources to clean up an error is an anti-pattern for sure. regards, tom lane
Re: BUG #17828: postgres_fdw leaks file descriptors on error and aborts aborted transaction in lack of fds
From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes: > I think we ought to understand *why* we are getting the "Too many open > files". The AcquireExternalFD() in CreateWaitEventSet() should prevent > that. Actually, I think the AcquireExternalFD() in CreateWaitEventSet() is *causing* that and needs to be removed. What is happening in Alexander's new example is that we are doing AcquireExternalFD() for each postgres_fdw connection (cf. libpqsrv_connect in libpq-be-fe-helpers.h), and the example is tuned to bring that exactly up to the limit of what AcquireExternalFD() allows. Then the next WaitLatchOrSocket call fails, because it does WaitEventSet *set = CreateWaitEventSet(CurrentResourceOwner, 3); Then when pgfdw_abort_cleanup tries to clean up the connections' state, it needs to do WaitLatchOrSocket again, and that fails again, and we PANIC because we're already in abort state. Since WaitLatchOrSocket is going to free this WaitEventSet before it returns, it's not apparent to me why we need to count it as a long-lived FD: we could just as well assume that it can slide in under the NUM_RESERVED_FDS limit. Or perhaps use ReserveExternalFD instead of AcquireExternalFD. We'd need some API extension to tell latch.c to do that, but that doesn't seem hard. (Unless we could consider that all WaitEventSets should use ReserveExternalFD? Not sure I want to argue for that though.) I guess a third possibility is that WaitLatchOrSocket could just permanently hang onto the WaitEventSet once it's got one. > One annoying bit is that AcquireExternalFD() failing emits the same error as > if epoll_create1() itself failing, including the same errno. It's the former. I tend to agree now that maybe using the same error text wasn't too smart. regards, tom lane
Re: BUG #17828: postgres_fdw leaks file descriptors on error and aborts aborted transaction in lack of fds
From
Andres Freund
Date:
Hi, On 2024-02-08 18:50:32 -0500, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > I think we ought to understand *why* we are getting the "Too many open > > files". The AcquireExternalFD() in CreateWaitEventSet() should prevent > > that. > > Actually, I think the AcquireExternalFD() in CreateWaitEventSet() is > *causing* that and needs to be removed. > > What is happening in Alexander's new example is that we are doing > AcquireExternalFD() for each postgres_fdw connection > (cf. libpqsrv_connect in libpq-be-fe-helpers.h), and the example > is tuned to bring that exactly up to the limit of what > AcquireExternalFD() allows. Then the next WaitLatchOrSocket call > fails, because it does > > WaitEventSet *set = CreateWaitEventSet(CurrentResourceOwner, 3); > > Then when pgfdw_abort_cleanup tries to clean up the connections' > state, it needs to do WaitLatchOrSocket again, and that fails again, > and we PANIC because we're already in abort state. > > Since WaitLatchOrSocket is going to free this WaitEventSet before it > returns, it's not apparent to me why we need to count it as a > long-lived FD: we could just as well assume that it can slide in under > the NUM_RESERVED_FDS limit. Well, the AcquireExternalFD() is in more general code, that's also used for long-lived WaitEventSets - for those it's the right thing to count it as a long lived FD. > Or perhaps use ReserveExternalFD instead of AcquireExternalFD. We'd need > some API extension to tell latch.c to do that, but that doesn't seem hard. > (Unless we could consider that all WaitEventSets should use > ReserveExternalFD? Not sure I want to argue for that though.) Yea, I don't think we want that. > I guess a third possibility is that WaitLatchOrSocket could just > permanently hang onto the WaitEventSet once it's got one. Right now we don't support changing the socket FD associated with the WES, so that'd not easily work. We've been talking about adding support for that for a while though. I might be missing something here, but leaving the concrete crash aside, why is it ok for pgfdw_get_cleanup_result() etc to block during abort processing? If I read the code right, we can end up waiting for up to 2x30s for each connection, and there can be many connections. The code also has a bunch of memory allocations, so if we are in abort processing after an out-of-memory error, we can easily cause failures again as well. Greetings, Andres Freund
Re: BUG #17828: postgres_fdw leaks file descriptors on error and aborts aborted transaction in lack of fds
From
Thomas Munro
Date:
On Fri, Feb 9, 2024 at 11:06 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Andres Freund <andres@anarazel.de> writes: > > I also wonder if postgres_fdw should strive to use a longer lived wait event > > set. For efficiency, if nothing else? That'd avoid the need to create one > > during error handling. > > Seems like a plausible idea. Having to acquire resources to clean up > an error is an anti-pattern for sure. FWIW, WRT to the long lived WES idea, I have had patches to explore several different ways to do that (as I already did successfully for other areas), and I will write about that again later, but I concluded that we needed to fix this Windows nonsense first: https://commitfest.postgresql.org/46/3523/ (That got withdrawn for being too slow, but I'll resubmit it soon enough...)
Re: BUG #17828: postgres_fdw leaks file descriptors on error and aborts aborted transaction in lack of fds
From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes: > I might be missing something here, but leaving the concrete crash aside, why > is it ok for pgfdw_get_cleanup_result() etc to block during abort processing? It's not pretty, for sure. I thought briefly about postponing the cleanup until we next try to use the connection, but I fear the semantic side-effects of that would be catastrophic. We can't leave the remote's query sitting open long after the local transaction has been canceled --- that risks undetected deadlocks, at the least. I think all we can do is try to reduce the risk of failure during transaction cleanup. regards, tom lane
Re: BUG #17828: postgres_fdw leaks file descriptors on error and aborts aborted transaction in lack of fds
From
Andres Freund
Date:
Hi, On 2024-02-08 19:20:35 -0500, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > I might be missing something here, but leaving the concrete crash aside, why > > is it ok for pgfdw_get_cleanup_result() etc to block during abort processing? > > It's not pretty, for sure. I thought briefly about postponing the > cleanup until we next try to use the connection, but I fear the > semantic side-effects of that would be catastrophic. We can't leave > the remote's query sitting open long after the local transaction has > been canceled --- that risks undetected deadlocks, at the least. > I think all we can do is try to reduce the risk of failure during > transaction cleanup. I agree that we can't just delay cleanup till, potentially, much later , but I don't think that means that we have to wait 30s for each connection, one-by-one. One way we could fix the issue at hand would be to make postgres fdw reserve one FD, for all connections, and release it before the WaitLatchOrSocket() and reacquire it after. That way we can make sure that there's an FD available. OTOH, as waiting for connections one-by-one isn't good, perhaps we should just rewrite the code to create one WES for all connections and wait in parallel, processing cancel/aborts completing as they complete. That'd make the abort less slow and it'd make the reserve-one-fd-for-postgres-fdw approach a bit less ugly. But unfortunately that's a bit big for a bugfix... Greetings, Andres Freund
Re: BUG #17828: postgres_fdw leaks file descriptors on error and aborts aborted transaction in lack of fds
From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes: > I agree that we can't just delay cleanup till, potentially, much later , but I > don't think that means that we have to wait 30s for each connection, > one-by-one. While that behavior is theoretically possible, I've heard no field complaints about it actually happening. I don't think we should move heaven and earth to prevent it, barring some evidence of it being a real-world issue. ISTM we do need to do something about this AcquireExternalFD failure though, because that's just self-inflicted damage. (And I think it's likely new since we invented WaitEventSets, though perhaps I'm wrong about that.) regards, tom lane
Re: BUG #17828: postgres_fdw leaks file descriptors on error and aborts aborted transaction in lack of fds
From
Etsuro Fujita
Date:
Hi, On Fri, Feb 9, 2024 at 11:08 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > ISTM we do need to do something about this AcquireExternalFD failure > though, because that's just self-inflicted damage. (And I think it's > likely new since we invented WaitEventSets, though perhaps I'm wrong > about that.) +1; but my concern is that even if we fix that failure, it is still possible that WaitLatchOrSocket in pgfdw_get_cleanup_result throws an error, causing the same error-during-error-recovery loop. (For example, epoll_create1() can fail in CreateWaitEventSet in the WaitLatchOrSocket after reserving an external FD, leading to an error.) Best regards, Etsuro Fujita