Michael Paquier <michael@paquier.xyz> writes:
> Based on the way the code is written on HEAD, this would be the
> correct assumption. Now, calling PQgetCancel() would return NULL for
> a connection that we actually ignore in the code a couple of lines
> above when it has PGINVALID_SOCKET. So it seems to me that the
> suggestion of using "cancelconn", which would be the first valid
> connection, rather than always the first connection, which may be
> using an invalid socket, is correct, so as we always have our hands
> on a way to cancel a command.
I came across this commit (52144b6fc) while writing release notes,
and I have to seriously question whether it's right yet. The thing
that needs to be asked is, if we get a SIGINT in a program using this
logic, why would we propagate a cancel to just one of the controlled
sessions and not all of them?
It looks to me like the original concept was that slot zero would be
a "master" connection, such that canceling just that one would have a
useful effect. Maybe the current users of parallel_slot.c still use
it like that, but I bet it's more likely that the connections are
all doing independent things and you really gotta cancel them all
if you want out.
I suppose maybe this commit improved matters: if you are running N jobs
then typing control-C N times (not too quickly) might eventually get
you out, by successively canceling the lowest-numbered surviving
connection. Previously you could have pounded the key all day and
not gotten rid of any but the zero'th task. OTOH, if the connections
don't exit but just go back to idle, which seems pretty possible,
then it's not clear we've moved the needle at all.
Anyway I think this needs rewritten, not just tweaked. The cancel.c
infrastructure is really next to useless here since it is only designed
with one connection in mind. I'm inclined to think we should only
expect the signal handler to set CancelRequested, and then manually
issue cancels to all live connections when we see that become set.
I'm not proposing reverting 52144b6fc, because I doubt it made
anything worse; but I'm thinking of leaving it out of the release
notes, because I'm unsure it had any user-visible effect at all.
It doesn't look to me like we'd ever get to wait_on_slots unless
all the connections are known busy.
regards, tom lane