> On Mar 21, 2026, at 19:40, Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
>
> Hi Chao,
>
> On Fri, Mar 20, 2026 at 5:27 PM Chao Li <li.evan.chao@gmail.com> wrote:
>> I can reproduce the server crash following your procedure, and I traced the problem.
>>
>> The issue is that, during select * from ft1, pgfdw_reject_incomplete_xact_state_change() calls
disconnect_pg_server(),which destroys conn and sets ConnCacheEntry->conn = NULL, but does not update
PgFdwScanState->conn.As a result, when "close c1" is executed later, PgFdwScanState->conn points to stale memory with
randomcontents.
>
> That is right.
>
>> I am not sure we should still allow further commands to run after select * from ft1, given that it has already
raised:"ERROR: connection to server "loopback" was lost”. Maybe we should not keep going as if the connection were
stillthere.
>
> The ERROR is thrown within a subtransaction, not within the main
> transaction, so we *should* allow commands (that don't use the
> connection) to run *after* rolling back to the subtransaction.
>
> (A transaction, like the example one, whose subtransactions failed
> abort cleanup due to a connection issue needs to abort in the end, as
> it can't commit the remote transaction anymore. Considering this
> limitation, the above behavior might be surprising to users. We might
> need to do something about this, but I think that that is another
> issue and thus should be discussed separately.)
Thanks for the explanation. Makes sense to me.
>
>> I am not very familiar with the FDW code, so I am not ready to suggest a concrete fix. But it seems wrong to let
laterpaths keep using PgFdwScanState->conn after select * from ft1 has already failed with connection loss.
>
> Why? While we can't use the connection any further, with the patch we
> can still use the cashed PGconn, as pointed out by Matheus downthread.
>
>> My guess is that we either need to invalidate all dependent state when disconnect_pg_server() runs, or otherwise
preventlater cleanup paths from touching the cached PGconn *.
>
> I don't think that that is a good idea, because that 1) would require
> invasive changes to the existing code, which isn't good especially for
> back-patching, as also mentioned by Matheus downthread, and 2) would
> result in something that is essentially the same as what I proposed;
> in other words, that would be a lot of effort with little result.
>
Makes sense. Then the patch looks good to me.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/