Re: Use-after-free issue in postgres_fdw - Mailing list pgsql-hackers

From Matheus Alcantara
Subject Re: Use-after-free issue in postgres_fdw
Date
Msg-id 65f59349-2ef6-4ddc-82a0-691ca30743e6@gmail.com
Whole thread Raw
In response to Re: Use-after-free issue in postgres_fdw  (Chao Li <li.evan.chao@gmail.com>)
List pgsql-hackers
On 20/03/26 05:27, Chao Li wrote:
> I can reproduce the server crash following your procedure, and I traced the problem.
> 
> The issue is that, during select * from ft1, pgfdw_reject_incomplete_xact_state_change() calls
disconnect_pg_server(),which destroys conn and sets ConnCacheEntry->conn = NULL, but does not update
PgFdwScanState->conn.As a result, when "close c1" is executed later, PgFdwScanState->conn points to stale memory with
randomcontents.
 
> 
> I am not sure we should still allow further commands to run after select * from ft1, given that it has already
raised:"ERROR:  connection to server "loopback" was lost”. Maybe we should not keep going as if the connection were
stillthere.
 
> 
> I am not very familiar with the FDW code, so I am not ready to suggest a concrete fix. But it seems wrong to let
laterpaths keep using PgFdwScanState->conn after select * from ft1 has already failed with connection loss. My guess is
thatwe either need to invalidate all dependent state when disconnect_pg_server() runs, or otherwise prevent later
cleanuppaths from touching the cached PGconn *.
 
> 

Although I agree with this I think that it will be a quite invasive 
change to fix this issue, considering that it should be back-ported.

I see two ways to implement this:

1. ForeignScanState->conn points to a ConnCacheEntry and it access the 
PGConn via ConnCacheEntry->conn, so it can check if != NULL before using.

2. Make pgfdw_reject_incomplete_xact_state_change() or 
disconnect_pg_server() receive a PgFdwConnState and add a new field on 
this state to represent that the connection is closed and them check 
this field on the proper code paths.

With the patch proposed on the previous email the server don't crash 
and any use of PgFdwScanState->conn will make the command to fail with 
something like this:

    ERROR:  08006: no connection to the server
    CONTEXT:  remote SQL command: CLOSE c1
    LOCATION:  pgfdw_report_internal, connection.c:1059


So I don't think that this change would be worth, or I'm missing 
something? What do you think?

--
Matheus Alcantara
EDB: https://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Bharath Rupireddy
Date:
Subject: Re: Introduce XID age based replication slot invalidation
Next
From: Tom Lane
Date:
Subject: Re: TupleDescAttr bounds checks