Re: stress test for parallel workers - Mailing list pgsql-hackers

From Mark Wong
Subject Re: stress test for parallel workers
Date
Msg-id 20191011202853.GA23809@2ndQuadrant.com
Whole thread Raw
In response to Re: stress test for parallel workers  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-hackers
On Sat, Oct 12, 2019 at 08:41:12AM +1300, Thomas Munro wrote:
> On Sat, Oct 12, 2019 at 7:56 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > This matches up with the intermittent infinite_recurse failures
> > we've been seeing in the buildfarm.  Those are happening across
> > a range of systems, but they're (almost) all Linux-based ppc64,
> > suggesting that there's a longstanding arch-specific kernel bug
> > involved.  For reference, I scraped the attached list of such
> > failures in the last three months.  I wonder whether we can get
> > the attention of any kernel hackers about that.
> 
> Yeah, I don't know anything about this stuff, but I was also beginning
> to wonder if something is busted in the arch-specific fault.c code
> that checks if stack expansion is valid[1], in a way that fails with a
> rapidly growing stack, well timed incoming signals, and perhaps
> Docker/LXC (that's on Mark's systems IIUC, not sure about the ARM
> boxes that failed or if it could be relevant here).  Perhaps the
> arbitrary tolerances mentioned in that comment are relevant.

This specific one (wobbegon) is OpenStack/KVM[2], for what it's worth...

"... cluster is an OpenStack based cluster offering POWER8 & POWER9 LE
instances running on KVM ..."

But to keep you on your toes, some of my ppc animals are Docker within
other OpenStack/KVM instance...

Regards,
Mark

[1] https://github.com/torvalds/linux/blob/master/arch/powerpc/mm/fault.c#L244
[2] https://osuosl.org/services/powerdev/

-- 
Mark Wong
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: stress test for parallel workers
Next
From: Robert Haas
Date:
Subject: Re: let's make the list of reportable GUCs configurable (was Re: Add%r substitution for psql prompts to show recovery status)