Thread: make check crashes on POWER8 machine

make check crashes on POWER8 machine

From

Victor Wagner

Date:

13 March 2020, 07:29:13

Hi,

I've encountered a problem with Postgres on PowerPC machine. Sometimes
make check on REL_12_STABLE branch crashes with segmentation fault.

It seems that problem is in errors.sql when executed 

select infinite_recures(); statement

so stack trace, produced by gdb is too long to post here.

Problem is rare and doesn't occur on all runs of make check.
When I run make check repeatedly it occurs once a several hundreds runs.

It seems that problem is architecture-dependent, because I cannot
reproduce it on x86_64 CPU with more than thousand runs of make check.

Machine is KVM virtual server on POWER8 system with following CPU:

$ lscpu
Architecture:          ppc64le
Byte Order:            Little Endian
CPU(s):                32
On-line CPU(s) list:   0-31
Thread(s) per core:    8
Core(s) per socket:    4
Socket(s):             1
NUMA node(s):          1
Model:                 2.0 (pvr 004d 0200)
Model name:            POWER8 (architected), altivec supported
Hypervisor vendor:     KVM
Virtualization type:   para
L1d cache:             64K
L1i cache:             32K
NUMA node0 CPU(s):     0-31

Running RedHat 7.6.



I've collected  all relevant information i've can think of (including
210Mb core file, git commit id, configure and backend logs, list of
installed RPMs) and put it into Google Drive
https://drive.google.com/file/d/1Xs7DixBhMPEmViGUt5wAMewB6_xbZirY/view

Hope that somebody more experienced with POWER CPUs can suggest
something about this problem.

--

Re: make check crashes on POWER8 machine

From

Justin Pryzby

Date:

13 March 2020, 12:43:59

On Fri, Mar 13, 2020 at 10:29:13AM +0300, Victor Wagner wrote:
> Hi,
> 
> I've encountered a problem with Postgres on PowerPC machine. Sometimes

Is it related to
https://www.postgresql.org/message-id/20032.1570808731%40sss.pgh.pa.us
https://bugzilla.kernel.org/show_bug.cgi?id=205183

(My initial report on that thread was unrelated user-error on my part)

> It seems that problem is in errors.sql when executed 
> 
> select infinite_recures(); statement
> 
> so stack trace, produced by gdb is too long to post here.
> 
> Problem is rare and doesn't occur on all runs of make check.
> When I run make check repeatedly it occurs once a several hundreds runs.
> 
> It seems that problem is architecture-dependent, because I cannot
> reproduce it on x86_64 CPU with more than thousand runs of make check.

That's all consistent with the above problem.

> Running RedHat 7.6.

-- 
Justin

Re: make check crashes on POWER8 machine

From

Victor Wagner

Date:

13 March 2020, 13:16:10

On Fri, 13 Mar 2020 07:43:59 -0500
Justin Pryzby <pryzby@telsasoft.com> wrote:

> On Fri, Mar 13, 2020 at 10:29:13AM +0300, Victor Wagner wrote:
> > Hi,
> > 
> > I've encountered a problem with Postgres on PowerPC machine.
> > Sometimes  
> 
> Is it related to
> https://www.postgresql.org/message-id/20032.1570808731%40sss.pgh.pa.us
> https://bugzilla.kernel.org/show_bug.cgi?id=205183

I don't think so. At least I cannot see any signal handler-related stuff
in the trace, but see lots of calls to stored procedure executor
instead.

Although several different stack traces show completely different parts
of code when signal SIGSEGV arrives, which may point to asynchronous
nature of the problem.

Unfortunately I've not kept all the cores I've seen.

It rather looks like that in some rare circumstances Postgres is unable
to properly determine end of stack condition.
--

Re: make check crashes on POWER8 machine

From

Tom Lane

Date:

13 March 2020, 14:56:15

Victor Wagner <vitus@wagner.pp.ru> writes:
> Justin Pryzby <pryzby@telsasoft.com> wrote:
>> On Fri, Mar 13, 2020 at 10:29:13AM +0300, Victor Wagner wrote:
>>> I've encountered a problem with Postgres on PowerPC machine.

>> Is it related to
>> https://www.postgresql.org/message-id/20032.1570808731%40sss.pgh.pa.us
>> https://bugzilla.kernel.org/show_bug.cgi?id=205183

> I don't think so. At least I cannot see any signal handler-related stuff
> in the trace, but see lots of calls to stored procedure executor
> instead.

Read the whole thread.  We fixed the issue with recursion in the
postmaster (9abb2bfc0); but the intermittent failure in infinite_recurse
is exactly the same as what we've been seeing for a long time in the
buildfarm, and there is zero doubt that it's that kernel bug.

In the other thread I'd suggested that we could quit running
errors.sql in parallel with other tests, but that would slow down
parallel regression testing for everybody.  I'm disinclined to do
that now, since the buildfarm problem is intermittent and easily
recognized.

            regards, tom lane

Re: make check crashes on POWER8 machine

From

Victor Wagner

Date:

14 March 2020, 09:49:28

В Fri, 13 Mar 2020 10:56:15 -0400
Tom Lane <tgl@sss.pgh.pa.us> пишет:

> Victor Wagner <vitus@wagner.pp.ru> writes:
> > Justin Pryzby <pryzby@telsasoft.com> wrote:
> >> On Fri, Mar 13, 2020 at 10:29:13AM +0300, Victor Wagner wrote:
> >>> I've encountered a problem with Postgres on PowerPC machine.
>
> >> Is it related to
> >> https://www.postgresql.org/message-id/20032.1570808731%40sss.pgh.pa.us
> >> https://bugzilla.kernel.org/show_bug.cgi?id=205183
>
> > I don't think so. At least I cannot see any signal handler-related
> > stuff in the trace, but see lots of calls to stored procedure
> > executor instead.
>
> Read the whole thread.  We fixed the issue with recursion in the
> postmaster (9abb2bfc0); but the intermittent failure in
> infinite_recurse is exactly the same as what we've been seeing for a
> long time in the buildfarm, and there is zero doubt that it's that
> kernel bug.

I've tried to cherry-pick commit 9abb2bfc8 into REL_12_STABLE and rerun
make check in loop. Oops, on 543 run it segfaults with same symptoms
as before.

Here is link to new core and logs

https://drive.google.com/file/d/1oF-0fKHKvFn6FaJ3u-v36p9W0EBAY9nb/view?usp=sharing

I'll try to do this simple test (run make check repeatedly) with
master. There is some time until end of weekend when this machine is
non needed by anyone else, so I have time to couple of thousands runs.





--
                                   Victor Wagner <vitus@wagner.pp.ru>

Re: make check crashes on POWER8 machine

From

Tom Lane

Date:

14 March 2020, 13:19:41

Victor Wagner <vitus@wagner.pp.ru> writes:
> Tom Lane <tgl@sss.pgh.pa.us> пишет:
>> Read the whole thread.  We fixed the issue with recursion in the
>> postmaster (9abb2bfc0); but the intermittent failure in
>> infinite_recurse is exactly the same as what we've been seeing for a
>> long time in the buildfarm, and there is zero doubt that it's that
>> kernel bug.

> I've tried to cherry-pick commit 9abb2bfc8 into REL_12_STABLE and rerun
> make check in loop. Oops, on 543 run it segfaults with same symptoms
> as before.

Unsurprising, because it's a kernel bug.  Maybe you could try
cherry-picking the patch proposed at kernel.org (see other thread).

            regards, tom lane