Andres Freund <andres@anarazel.de> writes:
> On 2019-10-11 14:56:41 -0400, Tom Lane wrote:
>> ... So it's really hard to explain
>> that as anything except a kernel bug: sometimes, the kernel
>> doesn't give us as much stack as it promised it would. And the
>> machine is not loaded enough for there to be any rational
>> resource-exhaustion excuse for that.
> Linux expands stack space only on demand, thus it's possible to run out
> of stack space while there ought to be stack space. Unfortunately that
> during a stack expansion, which means there's no easy place to report
> that. I've seen this be hit in production on busy machines.
As I said, this machine doesn't seem busy enough for that to be a
tenable excuse; there's nobody but me logged in, and the buildfarm
critter isn't running.
> I wonder if the machine is configured with overcommit_memory=2,
> i.e. don't overcommit. cat /proc/sys/vm/overcommit_memory would tell.
$ cat /proc/sys/vm/overcommit_memory
0
> What does grep -E '^(Mem|Commit)' /proc/meminfo show while it's
> happening?
idle:
$ grep -E '^(Mem|Commit)' /proc/meminfo
MemTotal: 2074816 kB
MemFree: 36864 kB
MemAvailable: 1779584 kB
CommitLimit: 1037376 kB
Committed_AS: 412480 kB
a few captures while regression tests are running:
$ grep -E '^(Mem|Commit)' /proc/meminfo
MemTotal: 2074816 kB
MemFree: 8512 kB
MemAvailable: 1819264 kB
CommitLimit: 1037376 kB
Committed_AS: 371904 kB
$ grep -E '^(Mem|Commit)' /proc/meminfo
MemTotal: 2074816 kB
MemFree: 32640 kB
MemAvailable: 1753792 kB
CommitLimit: 1037376 kB
Committed_AS: 585984 kB
$ grep -E '^(Mem|Commit)' /proc/meminfo
MemTotal: 2074816 kB
MemFree: 56640 kB
MemAvailable: 1695744 kB
CommitLimit: 1037376 kB
Committed_AS: 568768 kB
> What does the signal information say? You can see it with
> p $_siginfo
> after receiving the signal. A SIGSEGV here, I assume.
(gdb) p $_siginfo
$1 = {si_signo = 11, si_errno = 0, si_code = 128, _sifields = {_pad = {0 <repeats 28 times>}, _kill = {si_pid = 0,
si_uid= 0},
_timer = {si_tid = 0, si_overrun = 0, si_sigval = {sival_int = 0, sival_ptr = 0x0}}, _rt = {si_pid = 0, si_uid = 0,
si_sigval= {
sival_int = 0, sival_ptr = 0x0}}, _sigchld = {si_pid = 0, si_uid = 0, si_status = 0, si_utime = 0, si_stime =
0},_sigfault = {
si_addr = 0x0}, _sigpoll = {si_band = 0, si_fd = 0}}}
> Yea, that seems like it might be good. But we have to be careful too, as
> there's some thing were do want to be interruptable from within a signal
> handler. We start some processes from within one after all...
The proposed patch has zero effect on what the signal mask will be inside
a signal handler, only on the transient state during handler entry/exit.
regards, tom lane