Re: stress test for parallel workers - Mailing list pgsql-hackers

From Tom Lane
Subject Re: stress test for parallel workers
Date
Msg-id 24429.1565186225@sss.pgh.pa.us
Whole thread Raw
In response to Re: stress test for parallel workers  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: stress test for parallel workers  (Heikki Linnakangas <hlinnaka@iki.fi>)
List pgsql-hackers
Heikki Linnakangas <hlinnaka@iki.fi> writes:
> On 07/08/2019 02:57, Thomas Munro wrote:
>> On Wed, Jul 24, 2019 at 5:15 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> So I think I've got to take back the assertion that we've got
>>> some lurking generic problem.  This pattern looks way more
>>> like a platform-specific issue.  Overaggressive OOM killer
>>> would fit the facts on vulpes/wobbegong, perhaps, though
>>> it's odd that it only happens on HEAD runs.

>> chipmunk also:
>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=chipmunk&dt=2019-08-06%2014:16:16

> FWIW, I looked at the logs in /var/log/* on chipmunk, and found no
> evidence of OOM killings. I can see nothing unusual in the OS logs
> around the time of that failure.

Oh, that is very useful info, thanks.  That seems to mean that we
should be suspecting a segfault, assertion failure, etc inside
the postmaster.  I don't see any TRAP message in chipmunk's log,
so assertion failure seems to be ruled out, but other sorts of
process-crashing errors would fit the facts.

A stack trace from the crash would be mighty useful info along
about here.  I wonder whether chipmunk has the infrastructure
needed to create such a thing.  From memory, the buildfarm requires
gdb for that, but not sure if there are additional requirements.
Also, if you're using systemd or something else that thinks it
ought to interfere with where cores get dropped, that could be
a problem.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Unix-domain socket support on Windows
Next
From: Stephen Frost
Date:
Subject: Re: no default hash partition