Re: Why is infinite_recurse test suddenly failing? - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Why is infinite_recurse test suddenly failing?
Date
Msg-id 14080.1557516917@sss.pgh.pa.us
Whole thread Raw
In response to Re: Why is infinite_recurse test suddenly failing?  (Andres Freund <andres@anarazel.de>)
Responses Re: Why is infinite_recurse test suddenly failing?
List pgsql-hackers
Andres Freund <andres@anarazel.de> writes:
> On 2019-05-10 11:38:57 -0400, Tom Lane wrote:
>> I am wondering if, somehow, the stack depth limit seen by the postmaster
>> sometimes doesn't apply to its children.  That would be pretty wacko
>> kernel behavior, especially if it's only intermittently true.
>> But we're running out of other explanations.

> I wonder if this is a SIGSEGV that actually signals an OOM
> situation. Linux, if it can't actually extend the stack on-demand due to
> OOM, sends a SIGSEGV.  The signal has that information, but
> unfortunately the buildfarm code doesn't print it.  p $_siginfo would
> show us some of that...

> Mark, how tight is the memory on that machine? Does dmesg have any other
> information (often segfaults are logged by the kernel with the code
> IIRC).

It does sort of smell like a resource exhaustion problem, especially
if all these buildfarm animals are VMs running on the same underlying
platform.  But why would that manifest as "you can't have a measly two
megabytes of stack" and not as any other sort of OOM symptom?

Mark, if you don't mind modding your local copies of the buildfarm
script, I think what Andres is asking for is a pretty trivial addition
in PGBuild/Utils.pm's sub get_stack_trace:

    my $cmdfile = "./gdbcmd";
    my $handle;
    open($handle, '>', $cmdfile) || die "opening $cmdfile: $!";
    print $handle "bt\n";
+    print $handle "p $_siginfo\n";
    close($handle);

            regards, tom lane



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: What's the point of allow_system_table_mods?
Next
From: Ashwin Agrawal
Date:
Subject: Re: Inconsistency between table am callback and table function names