the real question is: why on earth aren't the wait loops responding to SIGINT and SIGTERM? I wonder if there might be something funky about parallel query + statement timeouts.
Agreed. Seems like a backtrace wouldn't help much. I saw the other thread with similar cancellation issues a couple notes that might help:
1) I also have a lateral select inside of a view there. seems doubtful that the lateral has anything to do with it, but in case that could be it, thought I'd pass that along.
2) Are there any settings that could potentially help with this? for instance, this isn't on a replica, so max_standby_archive_delay wouldn't more forcefully (potentially) cancel a query, is there anything similar that could work here? as you noted we've already set a statement timeout, so it isn't responding to that, but it does get cancelled when another (hung) process is SIGKILL-ed. When that happens the db goes into recovery mode - so is it being sent SIGKILL at that point as well? Or is it some other signal that is a little less invasive? Probably not, but thought I'd ask.