Re: stress test for parallel workers - Mailing list pgsql-hackers

From Tom Lane
Subject Re: stress test for parallel workers
Date
Msg-id 19525.1570826441@sss.pgh.pa.us
Whole thread Raw
In response to Re: stress test for parallel workers  (Andres Freund <andres@anarazel.de>)
Responses Re: stress test for parallel workers
List pgsql-hackers
Andres Freund <andres@anarazel.de> writes:
> On 2019-10-11 14:56:41 -0400, Tom Lane wrote:
>> ... So it's really hard to explain
>> that as anything except a kernel bug: sometimes, the kernel
>> doesn't give us as much stack as it promised it would.  And the
>> machine is not loaded enough for there to be any rational
>> resource-exhaustion excuse for that.

> Linux expands stack space only on demand, thus it's possible to run out
> of stack space while there ought to be stack space. Unfortunately that
> during a stack expansion, which means there's no easy place to report
> that.  I've seen this be hit in production on busy machines.

As I said, this machine doesn't seem busy enough for that to be a
tenable excuse; there's nobody but me logged in, and the buildfarm
critter isn't running.

> I wonder if the machine is configured with overcommit_memory=2,
> i.e. don't overcommit.  cat /proc/sys/vm/overcommit_memory would tell.

$ cat /proc/sys/vm/overcommit_memory
0

> What does grep -E '^(Mem|Commit)' /proc/meminfo show while it's
> happening?

idle:

$ grep -E '^(Mem|Commit)' /proc/meminfo
MemTotal:        2074816 kB
MemFree:           36864 kB
MemAvailable:    1779584 kB
CommitLimit:     1037376 kB
Committed_AS:     412480 kB

a few captures while regression tests are running:

$ grep -E '^(Mem|Commit)' /proc/meminfo
MemTotal:        2074816 kB
MemFree:            8512 kB
MemAvailable:    1819264 kB
CommitLimit:     1037376 kB
Committed_AS:     371904 kB
$ grep -E '^(Mem|Commit)' /proc/meminfo
MemTotal:        2074816 kB
MemFree:           32640 kB
MemAvailable:    1753792 kB
CommitLimit:     1037376 kB
Committed_AS:     585984 kB
$ grep -E '^(Mem|Commit)' /proc/meminfo
MemTotal:        2074816 kB
MemFree:           56640 kB
MemAvailable:    1695744 kB
CommitLimit:     1037376 kB
Committed_AS:     568768 kB


> What does the signal information say? You can see it with
> p $_siginfo
> after receiving the signal. A SIGSEGV here, I assume.

(gdb) p $_siginfo
$1 = {si_signo = 11, si_errno = 0, si_code = 128, _sifields = {_pad = {0 <repeats 28 times>}, _kill = {si_pid = 0,
si_uid= 0},  
    _timer = {si_tid = 0, si_overrun = 0, si_sigval = {sival_int = 0, sival_ptr = 0x0}}, _rt = {si_pid = 0, si_uid = 0,
si_sigval= { 
        sival_int = 0, sival_ptr = 0x0}}, _sigchld = {si_pid = 0, si_uid = 0, si_status = 0, si_utime = 0, si_stime =
0},_sigfault = { 
      si_addr = 0x0}, _sigpoll = {si_band = 0, si_fd = 0}}}

> Yea, that seems like it might be good. But we have to be careful too, as
> there's some thing were do want to be interruptable from within a signal
> handler. We start some processes from within one after all...

The proposed patch has zero effect on what the signal mask will be inside
a signal handler, only on the transient state during handler entry/exit.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: stress test for parallel workers
Next
From: Chapman Flack
Date:
Subject: Re: let's make the list of reportable GUCs configurable (was Re: Add%r substitution for psql prompts to show recovery status)