Re: stress test for parallel workers - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: stress test for parallel workers
Date
Msg-id CA+hUKG+C6uPF1cNZkW8xg+NAgorW9Q5DQusGGCAj+K8sb8m_aQ@mail.gmail.com
Whole thread Raw
In response to Re: stress test for parallel workers  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Sat, Oct 12, 2019 at 9:40 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2019-10-11 14:56:41 -0400, Tom Lane wrote:
> >> ... So it's really hard to explain
> >> that as anything except a kernel bug: sometimes, the kernel
> >> doesn't give us as much stack as it promised it would.  And the
> >> machine is not loaded enough for there to be any rational
> >> resource-exhaustion excuse for that.
>
> > Linux expands stack space only on demand, thus it's possible to run out
> > of stack space while there ought to be stack space. Unfortunately that
> > during a stack expansion, which means there's no easy place to report
> > that.  I've seen this be hit in production on busy machines.
>
> As I said, this machine doesn't seem busy enough for that to be a
> tenable excuse; there's nobody but me logged in, and the buildfarm
> critter isn't running.

Yeah.  As I speculated in the other thread[1], the straightforward
can't-allocate-any-more-space-but-no-other-way-to-tell-you-that case,
ie, the explanation that doesn't involve a bug in Linux or PostgreSQL,
seems unlikely unless we also see other more obvious signs of
occasional overcommit problems (ie not during stack expansion) on
those hosts, doesn't it?  How likely is it that this 1-2MB of stack
space is the straw that breaks the camels back, every time?

[1] https://www.postgresql.org/message-id/CA%2BhUKGJ_MkqdEH-LmmebhNLSFeyWwvYVXfPaz3A2_p27EQfZwA%40mail.gmail.com



pgsql-hackers by date:

Previous
From: Justin Pryzby
Date:
Subject: v12.0 ERROR: trying to store a heap tuple into wrong type of slot
Next
From: Tom Lane
Date:
Subject: Re: stress test for parallel workers