On Sat, Oct 12, 2019 at 08:41:12AM +1300, Thomas Munro wrote:
> On Sat, Oct 12, 2019 at 7:56 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > This matches up with the intermittent infinite_recurse failures
> > we've been seeing in the buildfarm. Those are happening across
> > a range of systems, but they're (almost) all Linux-based ppc64,
> > suggesting that there's a longstanding arch-specific kernel bug
> > involved. For reference, I scraped the attached list of such
> > failures in the last three months. I wonder whether we can get
> > the attention of any kernel hackers about that.
>
> Yeah, I don't know anything about this stuff, but I was also beginning
> to wonder if something is busted in the arch-specific fault.c code
> that checks if stack expansion is valid[1], in a way that fails with a
> rapidly growing stack, well timed incoming signals, and perhaps
> Docker/LXC (that's on Mark's systems IIUC, not sure about the ARM
> boxes that failed or if it could be relevant here). Perhaps the
> arbitrary tolerances mentioned in that comment are relevant.
This specific one (wobbegon) is OpenStack/KVM[2], for what it's worth...
"... cluster is an OpenStack based cluster offering POWER8 & POWER9 LE
instances running on KVM ..."
But to keep you on your toes, some of my ppc animals are Docker within
other OpenStack/KVM instance...
Regards,
Mark
[1] https://github.com/torvalds/linux/blob/master/arch/powerpc/mm/fault.c#L244
[2] https://osuosl.org/services/powerdev/
--
Mark Wong
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/