Thread: Backends stunk in wait event IPC/MessageQueueInternal

Backends stunk in wait event IPC/MessageQueueInternal

From
Japin Li
Date:
Hi, hackers

I had an incident on my Postgres 14 that queries hung in wait event
IPC / MessageQueueInternal, MessageQueueReceive.  It likes [1],
however, it doesn't have any discussions.

The process cannot be terminated by pg_terminate_backend(), although
it returns true.

Here is the call stack comes from pstack:

485073: /opt/local/pgsql/14/bin/postgres
 fffffb7fef216f4a ioctl     (d, d001, fffffb7fffdfa0e0)
 00000000008b8ec2 WaitEventSetWait  () + 112
 00000000008b920f WaitLatch () + 6f
 00000000008bf434 shm_mq_wait_internal () + 64
 00000000008bff74 shm_mq_receive () + 2b4
 000000000079fdc8 TupleQueueReaderNext () + 28
 000000000077d8ca gather_merge_readnext () + 13a
 000000000077db25 ExecGatherMerge () + 215
 0000000000790675 ExecNextLoop () + 175
 0000000000790675 ExecNextLoop () + 175
 000000000076267d standard_ExecutorRun () + fd
 fffffb7fe3965fbd pgss_executorRun () + fd
 00000000008df99b PortalRunSelect () + 1cb
 00000000008e0dcf PortalRun () + 17f
 00000000008ddacd PostgresMain () + 100d
 0000000000857f62 ServerLoop () + cd2
 0000000000858cee main () + 453
 00000000005ab777 _start_crt () + 87
 00000000005ab6d8 _start () + 18


Any suggestions?  Thanks in advance!

[1] https://www.postgresql.org/message-id/flat/E9FA92C2921F31408041863B74EE4C2001A479E590%40CCPMAILDAG03.cantab.local

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.



Re: Backends stunk in wait event IPC/MessageQueueInternal

From
Justin Pryzby
Date:
On Fri, May 13, 2022 at 06:16:23PM +0800, Japin Li wrote:
> I had an incident on my Postgres 14 that queries hung in wait event
> IPC / MessageQueueInternal, MessageQueueReceive.  It likes [1],
> however, it doesn't have any discussions.

If the process is still running, or if the problem recurs, I suggest to create
a corefile with gcore, aka gdb generate-core-file.  Then, we can look at the
backtrace at our leisure, even if the cluster needed to be restarted right
away.

What minor version of postgres is this, and what OS ?

-- 
Justin



Re: Backends stunk in wait event IPC/MessageQueueInternal

From
Japin Li
Date:
On Fri, 13 May 2022 at 19:41, Justin Pryzby <pryzby@telsasoft.com> wrote:
> On Fri, May 13, 2022 at 06:16:23PM +0800, Japin Li wrote:
>> I had an incident on my Postgres 14 that queries hung in wait event
>> IPC / MessageQueueInternal, MessageQueueReceive.  It likes [1],
>> however, it doesn't have any discussions.
>
> If the process is still running, or if the problem recurs, I suggest to create
> a corefile with gcore, aka gdb generate-core-file.  Then, we can look at the
> backtrace at our leisure, even if the cluster needed to be restarted right
> away.
>

Thanks for your advice, I will try it later.

> What minor version of postgres is this, and what OS ?

PostgreSQL 14.2 and Solaris.

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.



Re: Backends stunk in wait event IPC/MessageQueueInternal

From
Robert Haas
Date:
On Fri, May 13, 2022 at 6:16 AM Japin Li <japinli@hotmail.com> wrote:
> The process cannot be terminated by pg_terminate_backend(), although
> it returns true.

pg_terminate_backend() just sends SIGINT. What I'm wondering is what
happens when the stuck process receives SIGINT. It would be useful, I
think, to check the value of the global variable InterruptHoldoffCount
in the stuck process by attaching to it with gdb. I would also try
running "strace -p $PID" on the stuck process and then try terminating
it again with pg_terminate_backend(). Either the system call in which
it's currently stuck returns and then it makes the same system call
again and hangs again ... or the signal doesn't dislodge it from the
system call in which it's stuck in the first place. It would be useful
to know which of those two things is happening.

One thing I find a bit curious is that the top of the stack in your
case is ioctl(). And there are no calls to ioctl() anywhere in
latch.c, nor have there ever been. What operating system is this? We
have 4 different versions of WaitEventSetWaitBlock() that call
epoll_wait(), kevent(), poll(), and WaitForMultipleObjects()
respectively. I wonder which of those we're using, and whether one of
those calls is showing up as ioctl() in the stacktrace, or whether
there's some other function being called in here that is somehow
resulting in ioctl() getting called.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Backends stunk in wait event IPC/MessageQueueInternal

From
Thomas Munro
Date:
On Sat, May 14, 2022 at 2:09 AM Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, May 13, 2022 at 6:16 AM Japin Li <japinli@hotmail.com> wrote:
> > The process cannot be terminated by pg_terminate_backend(), although
> > it returns true.

> One thing I find a bit curious is that the top of the stack in your
> case is ioctl(). And there are no calls to ioctl() anywhere in
> latch.c, nor have there ever been. What operating system is this? We
> have 4 different versions of WaitEventSetWaitBlock() that call
> epoll_wait(), kevent(), poll(), and WaitForMultipleObjects()
> respectively. I wonder which of those we're using, and whether one of
> those calls is showing up as ioctl() in the stacktrace, or whether
> there's some other function being called in here that is somehow
> resulting in ioctl() getting called.

I guess this is really illumos (née OpenSolaris), not Solaris, using
our epoll build mode, with illumos's emulation of epoll, which maps
epoll onto Sun's /dev/poll driver:

https://github.com/illumos/illumos-gate/blob/master/usr/src/lib/libc/port/sys/epoll.c#L230

That'd explain:

  fffffb7fef216f4a ioctl     (d, d001, fffffb7fffdfa0e0)

That matches the value DP_POLL from:

https://github.com/illumos/illumos-gate/blob/master/usr/src/uts/common/sys/devpoll.h#L44

Or if it's really Solaris, huh, are people moving illumos code back
into closed Solaris these days?

As for why it's hanging, I don't know, but one thing that we changed
in 14 was that we started using signalfd() to receive latch signals on
systems that have it, and illumos also has an emulation of signalfd()
that our configure script finds:

https://github.com/illumos/illumos-gate/blob/master/usr/src/uts/common/io/signalfd.c

There were in fact a couple of unexplained hangs on the illumos build
farm animals, and then they were changed to use -DWAIT_USE_POLL so
that they wouldn't automatically choose epoll()/signalfd().   That is
not very satisfactory, but as far as I know there is a bug in either
epoll() or signalfd(), or at least some difference compared to the
Linux implementation they are emulating.   spent quite a bit of time
ping ponging emails back and forth with the owner of a hanging BF
animal trying to get a minimal repro for a bug report, without
success.  I mean, it's possible that the bug is in PostgreSQL (though
no complaint has ever reached me about this stuff on Linux), but while
trying to investigate it a kernel panic happened[1], which I think
counts as a point against that theory...

(For what it's worth, WSL1 also emulates these two Linux interfaces
and also apparently doesn't do so well enough for our purposes, also
for reasons not understood by us.)

In short, I'd recommend -DWAIT_USE_POLL for now.  It's possible that
we could do something to prevent the selection of WAIT_USE_EPOLL on
that platform, or that we should have a halfway option epoll() but not
signalfd() (= go back to using the self-pipe trick), patches welcome,
but that feels kinda strange and would be very niche combination that
isn't fun to maintain... the real solution is to fix the bug.

[1] https://www.illumos.org/issues/13700



Re: Backends stunk in wait event IPC/MessageQueueInternal

From
Thomas Munro
Date:
On Sat, May 14, 2022 at 9:25 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> In short, I'd recommend -DWAIT_USE_POLL for now.  It's possible that
> we could do something to prevent the selection of WAIT_USE_EPOLL on
> that platform, or that we should have a halfway option epoll() but not
> signalfd() (= go back to using the self-pipe trick), patches welcome,
> but that feels kinda strange and would be very niche combination that
> isn't fun to maintain... the real solution is to fix the bug.

I felt a bit sad about writing that, so I took a crack at trying to
write a patch that separates the signalfd/self-pipe choice from the
epoll/poll choice.  Maybe it's not too bad.

Japin, are you able to reproduce the problem reliably?  Did I guess
right, that you're on illumos?  Does this help?  I used
defined(__sun__) to select the option, but I don't remember if that's
the right way to detect that OS family, could you confirm that, or
adjust as required?

Attachment

Re: Backends stunk in wait event IPC/MessageQueueInternal

From
Thomas Munro
Date:
On Sat, May 14, 2022 at 10:25 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> Japin, are you able to reproduce the problem reliably?  Did I guess
> right, that you're on illumos?  Does this help?  I used
> defined(__sun__) to select the option, but I don't remember if that's
> the right way to detect that OS family, could you confirm that, or
> adjust as required?

Better version.  Now you can independently set -DWAIT_USE_{POLL,EPOLL}
and -DWAIT_USE_{SELF_PIPE,SIGNALFD} for testing, and it picks a
sensible default.

Attachment

Re: Backends stunk in wait event IPC/MessageQueueInternal

From
Japin Li
Date:
On Sat, 14 May 2022 at 11:01, Thomas Munro <thomas.munro@gmail.com> wrote:
> On Sat, May 14, 2022 at 10:25 AM Thomas Munro <thomas.munro@gmail.com> wrote:
>> Japin, are you able to reproduce the problem reliably?  Did I guess
>> right, that you're on illumos?  Does this help?  I used
>> defined(__sun__) to select the option, but I don't remember if that's
>> the right way to detect that OS family, could you confirm that, or
>> adjust as required?
>
> Better version.  Now you can independently set -DWAIT_USE_{POLL,EPOLL}
> and -DWAIT_USE_{SELF_PIPE,SIGNALFD} for testing, and it picks a
> sensible default.

Sorry for the late reply. My bad! It actually SmartOS, which is based on illumos.

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.



Re: Backends stunk in wait event IPC/MessageQueueInternal

From
Japin Li
Date:
On Fri, 13 May 2022 at 22:08, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, May 13, 2022 at 6:16 AM Japin Li <japinli@hotmail.com> wrote:
>> The process cannot be terminated by pg_terminate_backend(), although
>> it returns true.
>
> pg_terminate_backend() just sends SIGINT. What I'm wondering is what
> happens when the stuck process receives SIGINT. It would be useful, I
> think, to check the value of the global variable InterruptHoldoffCount
> in the stuck process by attaching to it with gdb. I would also try
> running "strace -p $PID" on the stuck process and then try terminating
> it again with pg_terminate_backend(). Either the system call in which
> it's currently stuck returns and then it makes the same system call
> again and hangs again ... or the signal doesn't dislodge it from the
> system call in which it's stuck in the first place. It would be useful
> to know which of those two things is happening.
>
> One thing I find a bit curious is that the top of the stack in your
> case is ioctl(). And there are no calls to ioctl() anywhere in
> latch.c, nor have there ever been. What operating system is this? We
> have 4 different versions of WaitEventSetWaitBlock() that call
> epoll_wait(), kevent(), poll(), and WaitForMultipleObjects()
> respectively. I wonder which of those we're using, and whether one of
> those calls is showing up as ioctl() in the stacktrace, or whether
> there's some other function being called in here that is somehow
> resulting in ioctl() getting called.

Thanks for your advice.  I will try this on Monday.

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.



Re: Backends stunk in wait event IPC/MessageQueueInternal

From
Japin Li
Date:
On Sat, 14 May 2022 at 11:01, Thomas Munro <thomas.munro@gmail.com> wrote:
> On Sat, May 14, 2022 at 10:25 AM Thomas Munro <thomas.munro@gmail.com> wrote:
>> Japin, are you able to reproduce the problem reliably?  Did I guess
>> right, that you're on illumos?  Does this help?  I used
>> defined(__sun__) to select the option, but I don't remember if that's
>> the right way to detect that OS family, could you confirm that, or
>> adjust as required?
>
> Better version.  Now you can independently set -DWAIT_USE_{POLL,EPOLL}
> and -DWAIT_USE_{SELF_PIPE,SIGNALFD} for testing, and it picks a
> sensible default.

Thanks for your patch!  The illumos already defined the following macros.

$ gcc -dM -E - </dev/null | grep -e 'illumos' -e 'sun'
#define __sun 1
#define __illumos__ 1
#define sun 1
#define __sun__ 1

Maybe use the __illumos__ macro more accurity.

+#elif defined(WAIT_USE_EPOLL) && defined(HAVE_SYS_SIGNALFD_H) && \
+       !defined(__sun__)


-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.



Re: Backends stunk in wait event IPC/MessageQueueInternal

From
Thomas Munro
Date:
On Mon, May 16, 2022 at 3:45 PM Japin Li <japinli@hotmail.com> wrote:
> Maybe use the __illumos__ macro more accurity.
>
> +#elif defined(WAIT_USE_EPOLL) && defined(HAVE_SYS_SIGNALFD_H) && \
> +       !defined(__sun__)

Thanks, updated, and with a new commit message.

I don't know much about these OSes (though I used lots of Sun machines
during the Jurassic period).  I know that there are three
distributions of illumos: OmniOS, SmartOS and OpenIndiana, and they
share the same kernel and base system.  The off-list reports I
received about hangs and kernel panics were from OpenIndiana animals
hake and haddock, which are not currently reporting (I'll ask why),
and then their owner defined -DWAIT_USE_POLL to clear that up while we
waited for progress on his kernel panic bug report.  I see that OmniOS
animal pollock is currently reporting and also uses -DWAIT_USE_POLL,
but I couldn't find any discussion about that.

Of course, you might be hitting some completely different problem,
given the lack of information.  I'd be interested in the output of "p
*MyLatch" (= to see if the latch has already been set), and whether
"kill -URG PID" dislodges the stuck process.  But given the open
kernel bug report that I've now been reminded of, I'm thinking about
pushing this anyway.  Then we could ask the animal owners to remove
-DWAIT_USE_POLL so that they'd effectively be running with
-DWAIT_USE_EPOLL and -DWAIT_USE_SELF_PIPE, which would be more like
PostgreSQL 13, but people who want to reproduce the problem on the
illumos side could build with -DWAIT_USE_SIGNALFD.

Attachment

Re: Backends stunk in wait event IPC/MessageQueueInternal

From
Thomas Munro
Date:
On Tue, May 17, 2022 at 3:31 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> On Mon, May 16, 2022 at 3:45 PM Japin Li <japinli@hotmail.com> wrote:
> > Maybe use the __illumos__ macro more accurity.
> >
> > +#elif defined(WAIT_USE_EPOLL) && defined(HAVE_SYS_SIGNALFD_H) && \
> > +       !defined(__sun__)
>
> Thanks, updated, and with a new commit message.

Pushed to master and REL_14_STABLE.

I'll email the illumos build farm animal owners to say that they
should be able to remove -DWAIT_USE_POLL.

Theoretically, it might be useful that we've separated the
WAIT_USE_SELF_PIPE code from WAIT_USE_POLL if someone eventually wants
to complete the set of possible WaitEventSet implementations by adding
/dev/poll (Solaris, HPUX) and pollset (AIX) support.  I don't think
those have a nicer way to receive race-free signal wakeups.
Realistically no one's likely to show up with a patch for those old
proprietary Unixen at this point on the timeline, I just think it's
interesting that every OS had something better than poll(), we just
need that fallback for lack of patches, not lack of kernel features.
Ironically the typical monster AIX systems I've run into in the wild
are probably much more capable of suffering from poll() contention
than all these puny x86 systems, with oodles of CPUs and NUMA nodes.
If someone *is* still interested in scalability on AIX, I'd recommend
looking at pollset for latch.c, and also the stalled huge pages
thing[1].

[1] https://www.postgresql.org/message-id/CA%2BhUKGJE4dq%2BNZHrm%3DpNSNCYwDCH%2BT6HtaWm5Lm8vZzygknPpA%40mail.gmail.com



Re: Backends stunk in wait event IPC/MessageQueueInternal

From
Thomas Munro
Date:
On Sun, Jun 26, 2022 at 11:18 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> On Tue, May 17, 2022 at 3:31 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> > On Mon, May 16, 2022 at 3:45 PM Japin Li <japinli@hotmail.com> wrote:
> > > Maybe use the __illumos__ macro more accurity.
> > >
> > > +#elif defined(WAIT_USE_EPOLL) && defined(HAVE_SYS_SIGNALFD_H) && \
> > > +       !defined(__sun__)
> >
> > Thanks, updated, and with a new commit message.
>
> Pushed to master and REL_14_STABLE.

FTR: I noticed that https://www.illumos.org/issues/13700 had been
marked fixed, so I asked if we should remove our check[1].  Nope,
another issue was opened at https://www.illumos.org/issues/14892,
which I'll keep an eye on.  It seems we're pretty good at hitting
poll/event-related kernel bugs in various OSes.

[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=3ab4fc5dcf30ebc90a23ad878342dc528e2d25ce



Re: Backends stunk in wait event IPC/MessageQueueInternal

From
Thomas Munro
Date:
On Sun, Aug 28, 2022 at 11:03 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> On Sun, Jun 26, 2022 at 11:18 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> > On Tue, May 17, 2022 at 3:31 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> > > On Mon, May 16, 2022 at 3:45 PM Japin Li <japinli@hotmail.com> wrote:
> > > > Maybe use the __illumos__ macro more accurity.
> > > >
> > > > +#elif defined(WAIT_USE_EPOLL) && defined(HAVE_SYS_SIGNALFD_H) && \
> > > > +       !defined(__sun__)
> > >
> > > Thanks, updated, and with a new commit message.
> >
> > Pushed to master and REL_14_STABLE.
>
> FTR: I noticed that https://www.illumos.org/issues/13700 had been
> marked fixed, so I asked if we should remove our check[1].  Nope,
> another issue was opened at https://www.illumos.org/issues/14892,
> which I'll keep an eye on.  It seems we're pretty good at hitting
> poll/event-related kernel bugs in various OSes.

I happened to notice in the release notes for OmniOS that Stephen
posted in the nearby GSSAPI thread that this has now been fixed.  I
think there's no point in changing the back branches (hard to
synchronise with kernel upgrades), but I also don't want to leave this
weird wart in the code forever.  Shall we remove it in 16?  I don't
personally care if it's 16 or 17, but I wanted to make a note about
the cleanup opportunity either way, and will add this to the open
commitfest.

Attachment