Re: Don't use the deprecated and insecure PQcancel in our frontend tools anymore - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Don't use the deprecated and insecure PQcancel in our frontend tools anymore
Date
Msg-id b0e6a640-5aed-469f-867a-cc820b046798@iki.fi
Whole thread Raw
In response to Re: Don't use the deprecated and insecure PQcancel in our frontend tools anymore  ("Jelte Fennema-Nio" <postgres@jeltef.nl>)
List pgsql-hackers
On 17/03/2026 12:31, Jelte Fennema-Nio wrote:
> On Mon, 16 Mar 2026 at 10:57, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>> +1. With a little extra effort, the cancellation can be made abortable
>> too, so that you don't need to wait for the TCP timeout. I.e when
>> ResetCancelConn() is called, the cancellation thread can immediately
>> call PQcancelReset().
> 
> I agree we could do that, but I don't think we should. Then we'd be
> getting into the exact situation where psql doesn't wait for an already
> in-flight cancel request to be processed by the server before sending
> the next query. i.e. while this would be fine if there's a network
> issue, it would be bad if the server is just slow to respond to the
> cancel request (e.g. because there's a PgBouncer in the middle that
> hasn't forwarded the request yet).

True, it can get messy fast. Hmm, perhaps you could check 
PQcancelStatus() and only cancel the cancellation if it hasn't sent the 
CancelRequest packet yet.

>> One a different topic, is there any guarantee on which thread will
>> receive the SIGINT? It matters because psql's cancel callback sometimes
>> calls longjmp(), which assumes that the signal handler is executed in
>> the main thread.
> 
> Good point, I had thought about whether this mattered, but hadn't
> considered the callbacks. Attached is v7 that makes sure the signal is
> always handled by the main thread by blocking SIGINT before creating the
> cancel thread.

The more I look into the cancel handling in psql and other client 
programs, the more I want to gouge my eyes out. It was pretty messy 
before this patch, too, and now also we have an extra thread to worry about.

For example, let's look at what happens at COPY IN, starting from 
HandleCopyResult(). It sets SetCancelConn(pset.db), and then calls 
handleCopyIn(). handleCopyIn() calls sigsetjmp(), enables the longjmp in 
the signal handler with "sigint_interrupt_enabled = true". and calls 
fgets().

So we rely on longjmp() to get out of the fgets(). Is that safe? I 
suppose, it's worked all these years. It's scary though, I would never 
do that in new code. And then you add threads to the mix... is it still 
safe in the presence of threads? glibc uses internal locks on FILE 
objects to make them thread-safe; if you longjmp() in the middle of 
fgets(), is the lock released? I suppose it doesn't matter if only one 
thread ever accesses it, but ugh.

Because the signal handler longjmp()'d out, it never calls PQcancel() 
(or wakes up the thread, with the patch) in that case, even though the 
cancel connection is currently armed with SetCancelConn(). Is that 
intentional? If you then immediately SIGINT again, then the signal 
handler will PQcancel(). I guess there's some logic to that; if the 
signal arrives while you're blocked on the input, sending the CopyEnd 
probably still works. Perhaps is should be documented that the callback 
is called first.

The cancel handling in wait_on_slots() in parallel_slot.c is surprising 
in a different way. It already uses async libpq calls and has a select() 
loop, but it still relies on the signal handler to do the cancellation. 
And it arbitrarily PQcancel()s only one of the connections it waits on.

The comments you added in your patches help a lot, explaining the three 
different ways that cancellation can be handled, thanks for that.

psql has a global variable, 'cancel_pressed', which is set in the signal 
handler callback. Is it redundant with 'CancelRequested' that's also set 
in the signal handler?


Some ideas on how to make this less confusing:

- Change pg_dump to use threads on Unix too.

- Extract the cancel_pipe stuff into a helper function, and reuse it in 
pg_dump and in wait_on_slots()

- Make the callback and SetCancelConn() mutually exclusive, so that when 
you call SetCancelConn(), it *replaces* the callback if any was set. To 
re-install it, call SetCancelCallback(). This would make them more 
symmetrical, and would remove the confusion on what happens with the 
active cancel connection if the callback longjmp()s. The 
SetCancelCallback()/ResetCancelCallback() calls would replace the 
sigint_interrupt_enabled variable.

- Instead of SetCancelConn() and ResetCancelConn(), have functions like 
"StartCancelInBackground(conn)" and "WaitBackgroundCancel()". The cancel 
callback would then call StartCancelInBackground()".

Not sure how much these really help, but maybe something to try out.

Then there's the idea of switching to async libpq calls and select() 
everywhere...

Another more radical idea: Could we move this functionality to libpq 
itself? Imagine that we had a function like PQrequestCancel(PGconn 
*conn) that just set a flag on the PGconn object to indicate that 
cancellation has been requested. When that flag is set, all blocking 
calls like PQexec() on the original connection drive the cancellation 
connection and sending the cancel request, instead of (or in addition 
to) polling the main connection. The new PQrequestCancel() would be safe 
to call from the signal handler, but then main thread would do all the 
work. That'd be very easy to use when applicable, but there's one pretty 
serious problem: it wouldn't work with async libpq calls and select() 
without more API changes. The internal cancel connection that's opened 
would have a different fd from the one returned by PQsocket(), so the 
application wouldn't know to poll it.

- Heikki




pgsql-hackers by date:

Previous
From: Lukas Fittl
Date:
Subject: Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?
Next
From: Melanie Plageman
Date:
Subject: Re: EXPLAIN: showing ReadStream / prefetch stats