Re: Parallel query: Use TopTransactionContext for ReinitializeParallelDSM() - Mailing list pgsql-hackers

From Jeevan Chalke
Subject Re: Parallel query: Use TopTransactionContext for ReinitializeParallelDSM()
Date
Msg-id CAM2+6=UTQfDEJy+5ckRbUDxovERbDGW8Q6vxwoh7Dz_wNPaqZA@mail.gmail.com
Whole thread Raw
In response to Parallel query: Use TopTransactionContext for ReinitializeParallelDSM()  (Jakub Wartak <jakub.wartak@enterprisedb.com>)
List pgsql-hackers

Hi Jakub,

Thanks for the patch.

Although I was unable to reproduce the reported issue, a review of the code flow confirms that to maintain consistency with all other Parallel Query memory allocations, the context should be switched to TopTransactionContext. This change appears correct and serves as a necessary safeguard to prevent potential future race conditions.

Thanks


On Mon, Dec 8, 2025 at 2:37 PM Jakub Wartak <jakub.wartak@enterprisedb.com> wrote:
Hi -hackers, we've got report of rare crash in EDB's PostgreSQL fork
which revealed that the PostgreSQL's ReinitializeParallelDSM()
allocates memory from wrong memory context (ExecutorState) instead of
TopTransactionContext.

Currently we do not have public reproducer for this in PG community
version, however when using Parallel Query, initially both
ParallelContext pcxt->worker and pcxt->worker[i].error_mqh are
allocated from TopTransactionContext (and that's appears to be
correct), but that happens only initially, because later on - where
reuse of DSM/workers is in play and more data is involved /
technically where
ExecParlallelReinitalize()->ReinitializeParallelDSM() is involved -
after some time, the pcxt->worker[i].error_mqh will end up being
reinitialized from ExecutorState memory context and that's the bug. In
layman terms it means that this ReinitializeParallelDSM() is usually
used for Nested Loop with Gather (useful side note: but this can be
easier triggered with enable_material = off as this reaches the
ReinitializeParallelDSM() way faster ).

Normally this is not causing problems, however this might be
problematic when query is cancelled, as the ExecutorState might be
pfreed depending on what is inside PG_CATCH(): in case of having
SPI_finish() there, the ExecutorState will be pfreed, then the
pcxt->workers[i].error_mqh-> might end up being accessed by the SIGINT
handler itself later like this: AbortTransaction →
DestroyParallelContext() which leads to use after free.

The stack would be similiar to : longjmp -> AbortCurrentTransaction()
-> AbortTransaction() -> AtEOXact_Parallel() ->
DestroyParallelContext() -> shm_mq_detach(). SPI_finish() is just an
example here where we have caught it (it's releasing ExecutorState).

To sum up it occurs in the following conditions:
1. parallel query involved with nest loop/gather
2. ReinitializeParallelDSM() being used
3. query cancellation
4. PG_CATCH() pfreeing ExecutorState

We have attached the patch. The issue is within shm_mq_attach(), but
we want to protect the whole function just in case just like in
InitializeParallelDSM. Thanks to Jeevan Chalke and Robert Haas for
help while debugging this.

-J.


--
Jeevan Chalke
Principal Engineer, Engineering Manager
Product Development


enterprisedb.com

pgsql-hackers by date:

Previous
From: Nazir Bilal Yavuz
Date:
Subject: Re: [PATCH] test_aio: Skip io_uring tests when kernel disables it
Next
From: Michael Paquier
Date:
Subject: Re: [Proposal] Adding callback support for custom statistics kinds