Re: Use-after-free issue in postgres_fdw - Mailing list pgsql-hackers

From Chao Li
Subject Re: Use-after-free issue in postgres_fdw
Date
Msg-id A6EA4C4C-37ED-42F2-9FF5-01E815D8083F@gmail.com
Whole thread Raw
In response to Use-after-free issue in postgres_fdw  (Etsuro Fujita <etsuro.fujita@gmail.com>)
Responses Re: Use-after-free issue in postgres_fdw
List pgsql-hackers

> On Mar 19, 2026, at 22:56, Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
>
> Hi,
>
> I got an offline report from my colleague Zhibai Song that
> close_cursor() is called for a freed PGconn, leading to a server
> crash.  Here is a reproducer (the original reproducer he provided is a
> bit complex, so I simplified it):
>
> create server loopback
>    foreign data wrapper postgres_fdw
>    options (dbname 'postgres');
> create user mapping for current_user server loopback;
> create table t1 (id int, data text);
> create foreign table ft1 (id int, data text)
>    server loopback options (table_name 't1');
> insert into ft1 values (1, 'foo');
> start transaction;
> -- This caches the remote connection's PGconn in PgFdwScanState
> declare c1 cursor for select * from ft1;
> fetch c1;
> id | data
> ----+------
>  1 | foo
> (1 row)
>
> savepoint s1;
> select * from ft1;
> id | data
> ----+------
>  1 | foo
> (1 row)
>
> select pid from pg_stat_activity
>    where datname = 'postgres'
>      and application_name = 'postgres_fdw';
>  pid
> -------
> 91853
> (1 row)
>
> -- This terminates the remote session
> select pg_terminate_backend(91853);
> pg_terminate_backend
> ----------------------
> t
> (1 row)
>
> -- This leaves the remote connection's changing_xact_state as true
> rollback to s1;
> savepoint s1;
> -- This calls pgfdw_reject_incomplete_xact_state_change(), freeing
> -- the remote connection's PGconn as changing_xact_state is true
> select * from ft1;
> ERROR:  connection to server "loopback" was lost
> rollback to s1;
> -- This calls close_cursor() on the PGconn cached in PgFdwScanState,
> -- which was freed above, leading to a server crash
> close c1;
>
> I think the root cause is that it is too early to free the PGconn in
> pgfdw_reject_incomplete_xact_state_change() even if the connection is
> in a state where we cannot use it any further; I think we should delay
> that until abort cleanup (ie, pgfdw_xact_callback()).  Attached is a
> patch for that.
>
> Best regards,
> Etsuro Fujita
> <fix-connection-handling-in-postgres-fdw.patch>

Hi Etsuro-san,

I can reproduce the server crash following your procedure, and I traced the problem.

The issue is that, during select * from ft1, pgfdw_reject_incomplete_xact_state_change() calls disconnect_pg_server(),
whichdestroys conn and sets ConnCacheEntry->conn = NULL, but does not update PgFdwScanState->conn. As a result, when
"closec1" is executed later, PgFdwScanState->conn points to stale memory with random contents. 

I am not sure we should still allow further commands to run after select * from ft1, given that it has already raised:
"ERROR: connection to server "loopback" was lost”. Maybe we should not keep going as if the connection were still
there.

I am not very familiar with the FDW code, so I am not ready to suggest a concrete fix. But it seems wrong to let later
pathskeep using PgFdwScanState->conn after select * from ft1 has already failed with connection loss. My guess is that
weeither need to invalidate all dependent state when disconnect_pg_server() runs, or otherwise prevent later cleanup
pathsfrom touching the cached PGconn *. 

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/







pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: Eliminating SPI / SQL from some RI triggers - take 3
Next
From: Ashutosh Sharma
Date:
Subject: Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication