Home > mailing lists

Reinitialize stack base after fork (for the benefit of rr)? - Mailing list pgsql-hackers

From	Andres Freund
Subject	Reinitialize stack base after fork (for the benefit of rr)?
Date	March 27, 2020 18:22:17
Msg-id	20200327182217.ubrrl32lyfhxfwk5@alap3.anarazel.de Whole thread Raw
Responses	Re: Reinitialize stack base after fork (for the benefit of rr)? Re: Reinitialize stack base after fork (for the benefit of rr)?
List	pgsql-hackers

Tree view

Hi,

I've found rr [1] very useful to debug issues in postgres. The ability
to hit a bug, and then e.g. identify a pointer with problematic
contents, set a watchpoint on its contents, and reverse-continue is
extremely powerful.

Unfortunately, when running postgres, it currently occasionally triggers
spurious stack-too-deep errors. That turns out to be because it has to
use an alternative stack in some corner cases (IIUC when a signal
arrives while already in a signal handler). That corner case can
unfortunately be hit from within postmaster, and at least can lead to
sigusr1_handler() being called with an alternative stack set.

Unfortunately that means that processes that postmaster fork()s while
using that alternative stack will continue their live using that
alternative stack. Which then subsequently means that our stack depth
checks always trigger.

I've not seen this trigger for normal backends (which makes sense,
they're not started from a signal handler), but for bgworkers. In
particular parallel workers are prone to hit the issue.


I've locally fixed the issue by computing the stack base address anew
for postmaster children. Currently in InitPostmasterChild().

I'd like to get that change upstream. The rr hackers have fixed a number
of other issues that could be hit with postgres, but they couldn't see a
good way to address the potential for a different signal stack in this
edge case. And it doesn't seem crazy to me to compute the stack base
again in postmaster children: It's cheap enough and it's extremely
unlikely that postmaster uses up a crazy amount of stack.

I also don't find it too crazy to guard against forks in signal handlers
leading to a different stack base address. It's a pretty odd thing to
do.


Tom, while imo not a fix of the right magnitude here: Are you planning /
hoping to work again on your postmaster latch patch? I think it'd be
really good if we could restructure the postmaster code to do far far
less in signal handlers. And the postmaster latch patch seems like a big
step in that direction.  I think we mostly dropped it due to the release
schedule last time round?

Greetings,

Andres Freund

[1] https://github.com/mozilla/rr/

pgsql-hackers by date:

From: Robert Haas
Date: 27 March 2020, 18:13:17
Subject: Re: backup manifests

From: Hamid Akhtar
Date: 27 March 2020, 18:22:18
Subject: Re: BUG #16040: PL/PGSQL RETURN QUERY statement never uses a parallel plan

Reinitialize stack base after fork (for the benefit of rr)? - Mailing list pgsql-hackers

Previous

Next