Well, I've got this new code running, and it works. Sort of.
The postmaster and backend seem to be fine ... but psql has a tendency
to coredump right after sending a cancel request.
After digging into it, I realized that the problem is that psql.c is
coded to invoke PQrequestCancel() directly from its SIGINT signal
handler. That was cool when the only thing PQrequestCancel() did
was to invoke send().
But now, PQrequestCancel requires allocating memory, opening a new
connection, sending some data, closing the connection, and freeing
memory.
On my machine, the C library is not reentrant, and if you try to do
this sort of stuff from a signal handler that has interrupted a call
to malloc() or printf() or some such, you can expect to crash.
I can see several alternatives, none very attractive:
1. Try to code the new PQrequestCancel so that it doesn't invoke
any likely-non-reentrant part of the C library. Difficult at best,
maybe impossible (is gethostbyname reentrant? I doubt it if malloc
isn't).
2. Live with PQrequestCancel not being reentrant: code apps using it
to invoke it from main line not a signal handler. The trouble is that
this makes it *substantially* harder to use. In psql.c, for example,
we could no longer use plain PQexec; we'd have to write some kind of
loop around the more primitive libpq functions, so that control would
block out in psql.c while waiting for a backend response, and not down
in the guts of libpq.
3. Keep a connection to the postmaster open at all times so that
PQrequestCancel only needs to do a send() and not any of the hard
stuff. This is not good because it risks overflowing the number of
open files the postmaster process can have at one time. It also means
establishing two IPC connections not one during backend startup, which
is clearly a performance loss.
4. Stick with OOB-based cancels and live with the portability
limitations thereof.
I will work on #1 but I am not very hopeful of success. Has anyone
got a better idea?
regards, tom lane