Re: [EXTERNAL] Re: Add non-blocking version of PQcancel - Mailing list pgsql-hackers

From Jelte Fennema-Nio
Subject Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
Date
Msg-id CAGECzQRvmUK5-d68A+cm+fgmfht9Dv2uZ28-qq3QiaF6EAZqPQ@mail.gmail.com
Whole thread Raw
In response to Re: [EXTERNAL] Re: Add non-blocking version of PQcancel  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
List pgsql-hackers
On Wed, 13 Mar 2024 at 04:53, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I suspect it's basically just a
> timing dependency.  Have you thought about the fact that a cancel
> request is a no-op if it arrives after the query's done?

I agree it's probably a timing issue. The cancel being received after
the query is done seems very unlikely, since the query takes 180
seconds (assuming PG_TEST_TIMEOUT_DEFAULT is not lowered for these
animals). I think it's more likely that the cancel request arrives too
early, and thus being ignored because no query is running yet. The
test already had logic to wait until the query backend was in the
"active" state, before sending a cancel to solve that issue. But my
guess is that that somehow isn't enough.

Sadly I'm having a hard time reliably reproducing this race condition
locally. So it's hard to be sure what is happening here. Attached is a
patch with a wild guess as to what the issue might be (i.e. seeing an
outdated "active" state and thus passing the check even though the
query is not running yet)

Attachment

pgsql-hackers by date:

Previous
From: shveta malik
Date:
Subject: Re: Introduce XID age and inactive timeout based replication slot invalidation
Next
From: Thomas Munro
Date:
Subject: Re: Vectored I/O in bulk_write.c