Sorry, I made a mistake about the tcp_user_timeout configuration. Our app sets it to 9000 (9 seconds), but it still errors out even with 9000 - it just takes a little longer to error.
And about this point :
=> I don’t actually know whether or if “buffer filling up” is accurate or relevant here. It doesn’t seem that way. You haven’t demonstrated that scenario here, just a timeout being reached.
Actually i have caputured tcp dump firstly, and "tcp buffer filling up" seem to be demonstrated by "TCP windows full" packet."
Secondly if data of fetch rows are not sufficiently wide, it does not reproduce this issue.
So i suspect that the reason for this connection timeout is that the tcp buffer is full.
At 2026-03-11 13:30:03, "David G. Johnston" <david.g.johnston@gmail.com> wrote:
On Tuesday, March 10, 2026, jiye <jiye_sw@126.com> wrote:This is a minimal working example. In practice, if the local table scan takes too long and the foreign table has sufficiently wide rows, this issue may reproduce.
In my understanding, when performing a local sequential scan, the PostgreSQL backend fetches data from the local plan without fetching any data from the FDW. As a result, the TCP receive buffer may become full, causing the FDW connection to be disconnected.
I believe this is a minor issue. How can I resolve this problem?
Do not establish a timeout that the execution of the query cannot beat. Or, I think, at least ensure the non-async portion of the query can produce a row within the allotted time so the async node is polled within the timeout. IIUC, the general loop flow is: begin append, begin async, poll async, poll non-async, poll async, poll non-async, etc…. There will usually be some lag between async polls. The tcp timeout has to be large enough to accommodate your reality. No different than if you used a statement timeout.
I don’t actually know whether or if “buffer filling up” is accurate or relevant here. It doesn’t seem that way. You haven’t demonstrated that scenario here, just a timeout being reached.
And since the main design point of async is that any of them may be polled at any time it is necessary for all such scans to be initialized before any polling begins. Starting the clock on all of them.
If you don’t want a connection timeout to happen do not set one. That’s the resolution here so far as I can tell.
David J.