On Fri, May 13, 2022 at 6:16 AM Japin Li <japinli@hotmail.com> wrote:
> The process cannot be terminated by pg_terminate_backend(), although
> it returns true.
pg_terminate_backend() just sends SIGINT. What I'm wondering is what
happens when the stuck process receives SIGINT. It would be useful, I
think, to check the value of the global variable InterruptHoldoffCount
in the stuck process by attaching to it with gdb. I would also try
running "strace -p $PID" on the stuck process and then try terminating
it again with pg_terminate_backend(). Either the system call in which
it's currently stuck returns and then it makes the same system call
again and hangs again ... or the signal doesn't dislodge it from the
system call in which it's stuck in the first place. It would be useful
to know which of those two things is happening.
One thing I find a bit curious is that the top of the stack in your
case is ioctl(). And there are no calls to ioctl() anywhere in
latch.c, nor have there ever been. What operating system is this? We
have 4 different versions of WaitEventSetWaitBlock() that call
epoll_wait(), kevent(), poll(), and WaitForMultipleObjects()
respectively. I wonder which of those we're using, and whether one of
those calls is showing up as ioctl() in the stacktrace, or whether
there's some other function being called in here that is somehow
resulting in ioctl() getting called.
--
Robert Haas
EDB: http://www.enterprisedb.com