Re: stress test for parallel workers - Mailing list pgsql-hackers

From Andrew Dunstan
Subject Re: stress test for parallel workers
Date
Msg-id 84813a30-8c34-9c32-7ad5-90d9eefba468@2ndQuadrant.com
Whole thread Raw
In response to Re: stress test for parallel workers  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: stress test for parallel workers
List pgsql-hackers
On 10/10/19 5:34 PM, Tom Lane wrote:
> I wrote:
>>>> Yeah, I've been wondering whether pg_ctl could fork off a subprocess
>>>> that would fork the postmaster, wait for the postmaster to exit, and then
>>>> report the exit status.
>> [ pushed at 6a5084eed ]
>> Given wobbegong's recent failure rate, I don't think we'll have to wait
>> long.
> Indeed, we didn't:
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wobbegong&dt=2019-10-10%2020%3A54%3A46
>
> The tail end of the system log looks like
>
> 2019-10-10 21:00:33.717 UTC [15127:306] pg_regress/date FATAL:  postmaster exited during a parallel transaction
> 2019-10-10 21:00:33.717 UTC [15127:307] pg_regress/date LOG:  disconnection: session time: 0:00:02.896 user=fedora
database=regressionhost=[local]
 
> /bin/sh: line 1: 14168 Segmentation fault      (core dumped)
"/home/fedora/build-farm-10-clang/buildroot/HEAD/pgsql.build/tmp_install/home/fedora/build-farm-clang/buildroot/HEAD/inst/bin/postgres"
-F-c listen_addresses="" -k "/tmp/pg_upgrade_check-ZrhQ4h"
 
> postmaster exit status is 139
>
> So that's definitive proof that the postmaster is suffering a SIGSEGV.
> Unfortunately, we weren't blessed with a stack trace, even though
> wobbegong is running a buildfarm client version that is new enough
> to try to collect one.  However, seeing that wobbegong is running
> a pretty-recent Fedora release, the odds are that systemd-coredump
> has commandeered the core dump and squirreled it someplace where
> we can't find it.



At least on F29 I have set /proc/sys/kernel/core_pattern and it works.



>
> Much as one could wish otherwise, systemd doesn't seem likely to
> either go away or scale back its invasiveness; so I'm afraid we
> are probably going to need to teach the buildfarm client how to
> negotiate with systemd-coredump at some point.  I don't much want
> to do that right this minute, though.
>
> A nearer-term solution would be to reproduce this manually and
> dig into the core.  Mark, are you in a position to give somebody
> ssh access to wobbegong's host, or another similarly-configured VM?



I have given Mark my SSH key. That doesn't mean others interested shouldn't.


>
> (While at it, it'd be nice to investigate the infinite_recurse
> failures we've been seeing on all those ppc64 critters ...)
>
>             



Yeah.


cheers


andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




pgsql-hackers by date:

Previous
From: Jeremy Finzel
Date:
Subject: BRIN index which is much faster never chosen by planner
Next
From: Michael Lewis
Date:
Subject: Re: BRIN index which is much faster never chosen by planner