On Mon, Jan 26, 2026 at 4:08 PM Andres Freund <andres@anarazel.de> wrote:
...
> For measuring particularly stuck things, I've been wondering about having a > regular timer that starts to collect more information if stuck in a place for > a while. That would probably end up being lower overhead than constantly > measuring... But it would also be a lot more work.
Well if something is really stuck, I think the wait events are covering us on that, aren't they? One can argue if they carry enough information (for me they mostly do, but I'm trying to squeeze some more stuff into them in a nearby thread [1], BTW: it's kind of "blocked" due to that 56-bit relfilenode idea/question, any thoughts on that?)
One scenario where wait events won't help at all is if you have a backend stuck somewhere that's not calling CHECK_FOR_INTERRUPTS(). Or at least that was the case as of a few years ago; it wasn't an uncommon thing to see in a very large fleet. My guess is that such a backend also wouldn't be responding to internal timers though...