Re: Use-after-free issue in postgres_fdw - Mailing list pgsql-hackers
| From | Chao Li |
|---|---|
| Subject | Re: Use-after-free issue in postgres_fdw |
| Date | |
| Msg-id | A6EA4C4C-37ED-42F2-9FF5-01E815D8083F@gmail.com Whole thread Raw |
| In response to | Use-after-free issue in postgres_fdw (Etsuro Fujita <etsuro.fujita@gmail.com>) |
| Responses |
Re: Use-after-free issue in postgres_fdw
|
| List | pgsql-hackers |
> On Mar 19, 2026, at 22:56, Etsuro Fujita <etsuro.fujita@gmail.com> wrote: > > Hi, > > I got an offline report from my colleague Zhibai Song that > close_cursor() is called for a freed PGconn, leading to a server > crash. Here is a reproducer (the original reproducer he provided is a > bit complex, so I simplified it): > > create server loopback > foreign data wrapper postgres_fdw > options (dbname 'postgres'); > create user mapping for current_user server loopback; > create table t1 (id int, data text); > create foreign table ft1 (id int, data text) > server loopback options (table_name 't1'); > insert into ft1 values (1, 'foo'); > start transaction; > -- This caches the remote connection's PGconn in PgFdwScanState > declare c1 cursor for select * from ft1; > fetch c1; > id | data > ----+------ > 1 | foo > (1 row) > > savepoint s1; > select * from ft1; > id | data > ----+------ > 1 | foo > (1 row) > > select pid from pg_stat_activity > where datname = 'postgres' > and application_name = 'postgres_fdw'; > pid > ------- > 91853 > (1 row) > > -- This terminates the remote session > select pg_terminate_backend(91853); > pg_terminate_backend > ---------------------- > t > (1 row) > > -- This leaves the remote connection's changing_xact_state as true > rollback to s1; > savepoint s1; > -- This calls pgfdw_reject_incomplete_xact_state_change(), freeing > -- the remote connection's PGconn as changing_xact_state is true > select * from ft1; > ERROR: connection to server "loopback" was lost > rollback to s1; > -- This calls close_cursor() on the PGconn cached in PgFdwScanState, > -- which was freed above, leading to a server crash > close c1; > > I think the root cause is that it is too early to free the PGconn in > pgfdw_reject_incomplete_xact_state_change() even if the connection is > in a state where we cannot use it any further; I think we should delay > that until abort cleanup (ie, pgfdw_xact_callback()). Attached is a > patch for that. > > Best regards, > Etsuro Fujita > <fix-connection-handling-in-postgres-fdw.patch> Hi Etsuro-san, I can reproduce the server crash following your procedure, and I traced the problem. The issue is that, during select * from ft1, pgfdw_reject_incomplete_xact_state_change() calls disconnect_pg_server(), whichdestroys conn and sets ConnCacheEntry->conn = NULL, but does not update PgFdwScanState->conn. As a result, when "closec1" is executed later, PgFdwScanState->conn points to stale memory with random contents. I am not sure we should still allow further commands to run after select * from ft1, given that it has already raised: "ERROR: connection to server "loopback" was lost”. Maybe we should not keep going as if the connection were still there. I am not very familiar with the FDW code, so I am not ready to suggest a concrete fix. But it seems wrong to let later pathskeep using PgFdwScanState->conn after select * from ft1 has already failed with connection loss. My guess is that weeither need to invalidate all dependent state when disconnect_pg_server() runs, or otherwise prevent later cleanup pathsfrom touching the cached PGconn *. Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/
pgsql-hackers by date: