Handle interrupts while waiting on Append's async subplans - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Handle interrupts while waiting on Append's async subplans
Date
Msg-id 37a40570-f558-40d3-b5ea-5c2079b3b30b@iki.fi
Whole thread Raw
List pgsql-hackers
While working on the big interrupts refactoring, I noticed that 
ExecAppendAsyncEventWait() doesn't include WL_LATCH_SET in the wait set. 
As a result, it will not wake up on an interrupt like a query cancel 
request or sinval catchup.

I'm surprised this has gone unnoticed for so long. It's pretty easy to 
reproduce the issue with a foreign table over a view on "SELECT 
pg_sleep(100)", for example. Did I miss a bug report?

At first glance the fix is very straightforward; we just need to call 
AddWaitEventToSet() to wait for WL_LATCH_SET too. However, when I did 
that the straightforward way, the 'postgres_fdw' test got stuck in a 
busy-loop in ExecAppendAsyncGetNext(). 
postgresForeignAsyncConfigureWait() does this:

>     else if (pendingAreq->requestor != areq->requestor)
>     {
>         /*
>          * This is the case when the in-process request was made by another
>          * Append.  Note that it might be useless to process the request made
>          * by that Append, because the query might not need tuples from that
>          * Append anymore; so we avoid processing it to begin a fetch for the
>          * given request if possible.  If there are any child subplans of the
>          * same parent that are ready for new requests, skip the given
>          * request.  Likewise, if there are any configured events other than
>          * the postmaster death event, skip it.  Otherwise, process the
>          * in-process request, then begin a fetch to configure the event
>          * below, because we might otherwise end up with no configured events
>          * other than the postmaster death event.
>          */
>         if (!bms_is_empty(requestor->as_needrequest))
>             return;
>         if (GetNumRegisteredWaitEvents(set) > 1)
>             return;
>         process_pending_request(pendingAreq);
>         fetch_more_data_begin(areq);
>     }

When I added the WL_LATCH_SET event to the set before calling this, the 
check for "GetNumRegisteredWaitEvents(set) > 1" was always true.

I don't understand what that comment is trying to say. What is an 
"in-process request"? And why does postgres_fdw care what other async 
subplans are registered in the Append node? If it is legitimate for it 
to care, we probably should've abstracted that somehow, rather than 
expose that "GetNumRegisteredWaitEvents(set) == 1" means that there are 
no other events registered yet.

In any case, the attached patch works and seems like an easy fix for 
stable branches at least.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Attachment

pgsql-hackers by date:

Previous
From: Sami Imseih
Date:
Subject: Re: Doc: clarify possibility of ephemeral discrepancies between state and wait_event in pg_stat_activity
Next
From: Robert Haas
Date:
Subject: Re: Add -k/--link option to pg_combinebackup