Home > mailing lists

Re:Re: Re: FDW connection drops with "Connection timed out" during async append query due to TCP receive buffer filling up - Mailing list pgsql-bugs

From	jiye
Subject	Re:Re: Re: FDW connection drops with "Connection timed out" during async append query due to TCP receive buffer filling up
Date	April 3 06:13:19
Msg-id	705358bc.2ead.19d5154b82f.Coremail.jiye_sw@126.com Whole thread Raw
In response to	Re: Re: FDW connection drops with "Connection timed out" during async append query due to TCP receive buffer filling up (Etsuro Fujita <etsuro.fujita@gmail.com>)
List	pgsql-bugs

Tree view

We have successfully reproduced this issue and gained a clearer understanding of its root cause. The application uses a cursor to fetch partial results in batches, with a delay between consecutive fetch operations. When the interval between two batches exceeds the tcp_user_timeout threshold, the connection is terminated unexpectedly.

In my analysis, during cursor-based queries, applications typically retrieve results in partial batches. If the number of rows fetched in a single batch is smaller than the number of rows scanned from the local table, the executor is unable to proceed with fetching rows from the foreign table. While we have attempted workarounds such as adjusting the fetch size, tuning TCP buffer parameters, and modifying the tcp_user_timeout value, these measures only mitigate the symptoms without addressing the underlying problem.

To achieve a fundamental resolution, I propose two potential solutions:

‌Alternate Row Fetching‌: Modify the executor to alternately retrieve rows from the local table and the foreign table, ensuring balanced data flow between the two data sources.
‌Asynchronous Tuple Storage‌: Implement a tuple storage mechanism to asynchronously cache results from the foreign table. This would allow the executor to fetch foreign table results into the storage buffer independently, preventing TCP window exhaustion and decoupling the dependency between local and foreign data retrieval.


At 2026-03-11 16:01:02, "Etsuro Fujita" <etsuro.fujita@gmail.com> wrote:
>On Wed, Mar 11, 2026 at 3:25 PM jiye <jiye_sw@126.com> wrote:
>> Sorry, I made a mistake about the tcp_user_timeout configuration. Our app sets it to 9000 (9 seconds), but it still errors out even with 9000 - it just takes a little longer to error.
>> And about this point :
>>   => I don’t actually know whether or if “buffer filling up” is accurate or relevant here.  It doesn’t seem that way.  You haven’t demonstrated that scenario here, just a timeout being reached.
>>   Actually i have caputured tcp dump firstly, and "tcp buffer filling up" seem to be demonstrated by "TCP windows full" packet."
>>   Secondly if data of fetch rows are not sufficiently wide, it does not reproduce this issue.
>>
>> So i suspect that the reason for this connection timeout is that the tcp buffer is full.
>
>I think this problem is not with async execution, but with your
>environment; if the root cause of it is “TCP windows full”, I think it
>might fix it to 1) retrieve only needed columns from the remote server
>and 2) decrease the fetch_size option for postgres_fdw.
>
>Best regards,
>Etsuro Fujita

pgsql-bugs by date:

From: surya poondla
Date: 03 April, 02:14:00
Subject: Re: BUG #19382: Server crash at __nss_database_lookup

From: Andrei Lepikhov
Date: 03 April, 10:17:08
Subject: pg_plan_advice fails when NestLoop outer side is Sort over FunctionScan

Re:Re: Re: FDW connection drops with "Connection timed out" during async append query due to TCP receive buffer filling up - Mailing list pgsql-bugs

Previous

Next