Re: Don't use the deprecated and insecure PQcancel in our frontend tools anymore - Mailing list pgsql-hackers
| From | Heikki Linnakangas |
|---|---|
| Subject | Re: Don't use the deprecated and insecure PQcancel in our frontend tools anymore |
| Date | |
| Msg-id | b0e6a640-5aed-469f-867a-cc820b046798@iki.fi Whole thread Raw |
| In response to | Re: Don't use the deprecated and insecure PQcancel in our frontend tools anymore ("Jelte Fennema-Nio" <postgres@jeltef.nl>) |
| List | pgsql-hackers |
On 17/03/2026 12:31, Jelte Fennema-Nio wrote: > On Mon, 16 Mar 2026 at 10:57, Heikki Linnakangas <hlinnaka@iki.fi> wrote: >> +1. With a little extra effort, the cancellation can be made abortable >> too, so that you don't need to wait for the TCP timeout. I.e when >> ResetCancelConn() is called, the cancellation thread can immediately >> call PQcancelReset(). > > I agree we could do that, but I don't think we should. Then we'd be > getting into the exact situation where psql doesn't wait for an already > in-flight cancel request to be processed by the server before sending > the next query. i.e. while this would be fine if there's a network > issue, it would be bad if the server is just slow to respond to the > cancel request (e.g. because there's a PgBouncer in the middle that > hasn't forwarded the request yet). True, it can get messy fast. Hmm, perhaps you could check PQcancelStatus() and only cancel the cancellation if it hasn't sent the CancelRequest packet yet. >> One a different topic, is there any guarantee on which thread will >> receive the SIGINT? It matters because psql's cancel callback sometimes >> calls longjmp(), which assumes that the signal handler is executed in >> the main thread. > > Good point, I had thought about whether this mattered, but hadn't > considered the callbacks. Attached is v7 that makes sure the signal is > always handled by the main thread by blocking SIGINT before creating the > cancel thread. The more I look into the cancel handling in psql and other client programs, the more I want to gouge my eyes out. It was pretty messy before this patch, too, and now also we have an extra thread to worry about. For example, let's look at what happens at COPY IN, starting from HandleCopyResult(). It sets SetCancelConn(pset.db), and then calls handleCopyIn(). handleCopyIn() calls sigsetjmp(), enables the longjmp in the signal handler with "sigint_interrupt_enabled = true". and calls fgets(). So we rely on longjmp() to get out of the fgets(). Is that safe? I suppose, it's worked all these years. It's scary though, I would never do that in new code. And then you add threads to the mix... is it still safe in the presence of threads? glibc uses internal locks on FILE objects to make them thread-safe; if you longjmp() in the middle of fgets(), is the lock released? I suppose it doesn't matter if only one thread ever accesses it, but ugh. Because the signal handler longjmp()'d out, it never calls PQcancel() (or wakes up the thread, with the patch) in that case, even though the cancel connection is currently armed with SetCancelConn(). Is that intentional? If you then immediately SIGINT again, then the signal handler will PQcancel(). I guess there's some logic to that; if the signal arrives while you're blocked on the input, sending the CopyEnd probably still works. Perhaps is should be documented that the callback is called first. The cancel handling in wait_on_slots() in parallel_slot.c is surprising in a different way. It already uses async libpq calls and has a select() loop, but it still relies on the signal handler to do the cancellation. And it arbitrarily PQcancel()s only one of the connections it waits on. The comments you added in your patches help a lot, explaining the three different ways that cancellation can be handled, thanks for that. psql has a global variable, 'cancel_pressed', which is set in the signal handler callback. Is it redundant with 'CancelRequested' that's also set in the signal handler? Some ideas on how to make this less confusing: - Change pg_dump to use threads on Unix too. - Extract the cancel_pipe stuff into a helper function, and reuse it in pg_dump and in wait_on_slots() - Make the callback and SetCancelConn() mutually exclusive, so that when you call SetCancelConn(), it *replaces* the callback if any was set. To re-install it, call SetCancelCallback(). This would make them more symmetrical, and would remove the confusion on what happens with the active cancel connection if the callback longjmp()s. The SetCancelCallback()/ResetCancelCallback() calls would replace the sigint_interrupt_enabled variable. - Instead of SetCancelConn() and ResetCancelConn(), have functions like "StartCancelInBackground(conn)" and "WaitBackgroundCancel()". The cancel callback would then call StartCancelInBackground()". Not sure how much these really help, but maybe something to try out. Then there's the idea of switching to async libpq calls and select() everywhere... Another more radical idea: Could we move this functionality to libpq itself? Imagine that we had a function like PQrequestCancel(PGconn *conn) that just set a flag on the PGconn object to indicate that cancellation has been requested. When that flag is set, all blocking calls like PQexec() on the original connection drive the cancellation connection and sending the cancel request, instead of (or in addition to) polling the main connection. The new PQrequestCancel() would be safe to call from the signal handler, but then main thread would do all the work. That'd be very easy to use when applicable, but there's one pretty serious problem: it wouldn't work with async libpq calls and select() without more API changes. The internal cancel connection that's opened would have a different fd from the one returned by PQsocket(), so the application wouldn't know to poll it. - Heikki
pgsql-hackers by date: