Re: BUG #19366: heap-use-after-free in pgaio_io_reclaim() detected with RELCACHE_FORCE_RELEASE - Mailing list pgsql-bugs

From Andres Freund
Subject Re: BUG #19366: heap-use-after-free in pgaio_io_reclaim() detected with RELCACHE_FORCE_RELEASE
Date
Msg-id an3xpqvvga47xpazihhdijpsuor4offvt2shctqdfwkwh7liye@k2cqhszxqwva
Whole thread Raw
In response to BUG #19366: heap-use-after-free in pgaio_io_reclaim() detected with RELCACHE_FORCE_RELEASE  (PG Bug reporting form <noreply@postgresql.org>)
List pgsql-bugs
Hi,

On 2025-12-29 06:00:01 +0000, PG Bug reporting form wrote:
> The following bug has been logged on the website:
>
> Bug reference:      19366
> Logged by:          Alexander Lakhin
> Email address:      exclusion@gmail.com
> PostgreSQL version: 18.1
> Operating system:   Ubuntu 24.04
> Description:

Alexander pinged me about this - thanks, I had missed this thread!


> =================================================================
> ==1414701==ERROR: AddressSanitizer: heap-use-after-free on address
> 0x52d000160a10 at pc 0x6315765530f4 bp 0x7fff3a67b6d0 sp 0x7fff3a67b6c0
> WRITE of size 8 at 0x52d000160a10 thread T0
>     #0 0x6315765530f3 in pgaio_io_reclaim
> .../src/backend/storage/aio/aio.c:698
>     #1 0x6315765523dd in pgaio_io_process_completion
> [...]
>     #5 0x6315765568ad in pgaio_closing_fd
> .../src/backend/storage/aio/aio.c:1279
>     #6 0x6315765bf4dc in FileClose .../src/backend/storage/file/fd.c:1975
>     #7 0x6315766d8285 in mdclose .../src/backend/storage/smgr/md.c:726
>     #8 0x6315766e3264 in smgrrelease .../src/backend/storage/smgr/smgr.c:356
>     #9 0x6315766e34af in smgrclose .../src/backend/storage/smgr/smgr.c:376
>     #10 0x631576ee2edb in RelationCloseSmgr
> ../../../../src/include/utils/rel.h:597
>     #11 0x631576efae6e in RelationInvalidateRelation
> .../src/backend/utils/cache/relcache.c:2527
>     #12 0x631576efb3f8 in RelationClearRelation
> .../src/backend/utils/cache/relcache.c:2560
>     #13 0x631576ef7582 in RelationCloseCleanup
> .../src/backend/utils/cache/relcache.c:2251
>     #14 0x631576f247bf in ResOwnerReleaseRelation
> [...]
>     #18 0x63157709ace5 in ResourceOwnerRelease
> .../src/backend/utils/resowner/resowner.c:661
>     #19 0x631574fd4ac1 in AbortTransaction
> (.../tmp_install/usr/local/pgsql/bin/postgres+0x3437cf4) (BuildId:
> fb9da6221fd034ea4004b34de480b536444e54b6)

The problem is that for reasons I can't quite fathom, relcache cleanup happens
way earlier in resowner cleanup than I had realized. The resowner cleanup then
can trigger waiting for the IO as part of closing file descriptors, which in
turn will reference memory that was freed below AtAbort_Portals().

Importantly, at that point we haven't yet done this bit from
ResouceOwnerReleaseInternal():

        while (!dlist_is_empty(&owner->aio_handles))
        {
            dlist_node *node = dlist_head_node(&owner->aio_handles);

            pgaio_io_release_resowner(node, !isCommit);
        }

which would have removed the reference to the local memory.


Besides that relcache cleanup happens early, I'm also somewhat surprised at
AtAbort_Portals() happen so early and that AtAbort_Portals() frees memory.
Note that

/*
 * Abort processing for portals.
 *
 * At this point we run the cleanup hook if present, but we can't release the
 * portal's memory until the cleanup call.
 */
void
AtAbort_Portals(void)

says that memory won't be released. Unfortunately, while that's kinda true, we
*do* already clean up some of the memory:
        /*
         * Although we can't delete the portal data structure proper, we can
         * release any memory in subsidiary contexts, such as executor state.
         * The cleanup hook was the last thing that might have needed data
         * there.  But leave active portals alone.
         */
        if (portal->status != PORTAL_ACTIVE)
            MemoryContextDeleteChildren(portal->portalContext);

Not yet quite sure how to best fix this.

Greetings,

Andres Freund



pgsql-bugs by date:

Previous
From: Amit Langote
Date:
Subject: Re: BUG #19099: Conditional DELETE from partitioned table with non-updatable partition raises internal error
Next
From: Pierre Forstmann
Date:
Subject: Re: BUG #19369: Not documented that io_uring on kernel versions between 5.1 and below 5.6 does not work