Thread: Re: Fix early elog(FATAL)

Re: Fix early elog(FATAL)

From
Nathan Bossart
Date:
On Sat, Dec 07, 2024 at 07:46:14PM -0800, Noah Misch wrote:
> main() says:
> 
>     /*
>      * Fire up essential subsystems: error and memory management
>      *
>      * Code after this point is allowed to use elog/ereport, though
>      * localization of messages may not work right away, and messages won't go
>      * anywhere but stderr until GUC settings get loaded.
>      */
>     MemoryContextInit();
> 
> However, appending elog(ERROR, "whoops") breaks like:
> 
> $ initdb -D discard_me
> FATAL:  whoops
> PANIC:  proc_exit() called in child process
> no data was returned by command ""/home/nm/sw/nopath/pghead/bin/postgres" -V"
> child process was terminated by signal 6: Aborted
> 
> So does the ereport(FATAL) in ClosePostmasterPorts().  The "called in child
> process" check (added in commit 97550c0 of 2023-10) reads MyProcPid, which we
> set later.  Three ways to fix this:

I noticed that you committed a fix for this.  Sorry for not responding
earlier.

> 1. Call InitProcessGlobals() earlier.  This could also reduce the total call
>    sites from 3 to 2 (main() and post-fork).
> 
> 2. Move MyProcPid init out of InitProcessGlobals(), to main() and post-fork.
>    This has less to go wrong in back branches.  While probably irrelevant,
>    this avoids calling pg_prng_strong_seed() in processes that will exit after
>    help() or GucInfoMain().
> 
> 3. Revert 97550c0, as commit 3b00fdb anticipated.

I did partially revert 97550c0 in commit 8fd0498, but we decided to leave
some of the checks [0].

> I don't think the choice matters much, so here is (2).

FWIW I'd probably vote for option 1.  That keeps the initialization of the
globals together, reduces the call sites, and fixes the bug.  I'd worry a
little about moving the MyProcPid assignments out of that function without
adding a bunch of commentary to explain why.

[0] https://postgr.es/m/20231122225945.3kgclsgz5lqmtnan%40awork3.anarazel.de

-- 
nathan



Re: Fix early elog(FATAL)

From
Noah Misch
Date:
On Tue, Dec 10, 2024 at 04:18:19PM -0600, Nathan Bossart wrote:
> On Sat, Dec 07, 2024 at 07:46:14PM -0800, Noah Misch wrote:
> > Three ways to fix this:
> 
> I noticed that you committed a fix for this.  Sorry for not responding
> earlier.
> 
> > 1. Call InitProcessGlobals() earlier.  This could also reduce the total call
> >    sites from 3 to 2 (main() and post-fork).
> > 
> > 2. Move MyProcPid init out of InitProcessGlobals(), to main() and post-fork.
> >    This has less to go wrong in back branches.  While probably irrelevant,
> >    this avoids calling pg_prng_strong_seed() in processes that will exit after
> >    help() or GucInfoMain().
> > 
> > 3. Revert 97550c0, as commit 3b00fdb anticipated.
> 
> I did partially revert 97550c0 in commit 8fd0498, but we decided to leave
> some of the checks

Got it.  Old branches couldn't merely do (3), anyway, since 3b00fdb is v17+.
I missed that while writing the list.

> > I don't think the choice matters much, so here is (2).
> 
> FWIW I'd probably vote for option 1.  That keeps the initialization of the
> globals together, reduces the call sites, and fixes the bug.  I'd worry a
> little about moving the MyProcPid assignments out of that function without
> adding a bunch of commentary to explain why.

Can you say more about that?  A comment about MyProcPid could say "fork() is
the one thing that changes the getpid() return value".  To me, the things
InitProcessGlobals() sets are all different.  MyProcPid can be set without
elog(ERROR) and gets invalidated at fork().  The others reasonably could
elog(ERROR).  (They currently don't.)  The random state could have a different
lifecycle.  If we had a builtin pooler that reused processes, we'd
reinitialize random state at each process reuse, not at each fork().  So I see
the grouping of (MyProcPid, MyStartTimestamp, random seed) as mostly an
accident of history.

Thanks,
nm



Re: Fix early elog(FATAL)

From
Nathan Bossart
Date:
On Wed, Dec 11, 2024 at 07:34:14PM -0800, Noah Misch wrote:
> On Tue, Dec 10, 2024 at 04:18:19PM -0600, Nathan Bossart wrote:
>> FWIW I'd probably vote for option 1.  That keeps the initialization of the
>> globals together, reduces the call sites, and fixes the bug.  I'd worry a
>> little about moving the MyProcPid assignments out of that function without
>> adding a bunch of commentary to explain why.
> 
> Can you say more about that?  A comment about MyProcPid could say "fork() is
> the one thing that changes the getpid() return value".  To me, the things
> InitProcessGlobals() sets are all different.  MyProcPid can be set without
> elog(ERROR) and gets invalidated at fork().  The others reasonably could
> elog(ERROR).  (They currently don't.)  The random state could have a different
> lifecycle.  If we had a builtin pooler that reused processes, we'd
> reinitialize random state at each process reuse, not at each fork().  So I see
> the grouping of (MyProcPid, MyStartTimestamp, random seed) as mostly an
> accident of history.

Fair enough.  I suppose part of my hesitation stems from expecting hackers
to be more likely to remember to call InitProcessGlobals() than to
initialize MyProcPid.  But given your change requires initializing
MyProcPid in exactly 2 places, and there are unlikely to be more in the
near future, I might be overthinking it.

-- 
nathan



Re: Fix early elog(FATAL)

From
Noah Misch
Date:
On Thu, Dec 12, 2024 at 10:07:00AM -0600, Nathan Bossart wrote:
> On Wed, Dec 11, 2024 at 07:34:14PM -0800, Noah Misch wrote:
> > On Tue, Dec 10, 2024 at 04:18:19PM -0600, Nathan Bossart wrote:
> >> FWIW I'd probably vote for option 1.  That keeps the initialization of the
> >> globals together, reduces the call sites, and fixes the bug.  I'd worry a
> >> little about moving the MyProcPid assignments out of that function without
> >> adding a bunch of commentary to explain why.
> > 
> > Can you say more about that?  A comment about MyProcPid could say "fork() is
> > the one thing that changes the getpid() return value".  To me, the things
> > InitProcessGlobals() sets are all different.  MyProcPid can be set without
> > elog(ERROR) and gets invalidated at fork().  The others reasonably could
> > elog(ERROR).  (They currently don't.)  The random state could have a different
> > lifecycle.  If we had a builtin pooler that reused processes, we'd
> > reinitialize random state at each process reuse, not at each fork().  So I see
> > the grouping of (MyProcPid, MyStartTimestamp, random seed) as mostly an
> > accident of history.
> 
> Fair enough.  I suppose part of my hesitation stems from expecting hackers
> to be more likely to remember to call InitProcessGlobals() than to
> initialize MyProcPid.  But given your change requires initializing
> MyProcPid in exactly 2 places, and there are unlikely to be more in the
> near future, I might be overthinking it.

I don't feel strongly either way.  I did write it the option-1 way originally.
Then I started thinking about changes at a distance causing the other
InitProcessGlobals() tasks to palloc or elog.  We could do option-1 in master
and keep the back branches in their current state.



Re: Fix early elog(FATAL)

From
Nathan Bossart
Date:
On Fri, Dec 13, 2024 at 07:15:05PM -0800, Noah Misch wrote:
> On Thu, Dec 12, 2024 at 10:07:00AM -0600, Nathan Bossart wrote:
>> On Wed, Dec 11, 2024 at 07:34:14PM -0800, Noah Misch wrote:
>> > On Tue, Dec 10, 2024 at 04:18:19PM -0600, Nathan Bossart wrote:
>> >> FWIW I'd probably vote for option 1.  That keeps the initialization of the
>> >> globals together, reduces the call sites, and fixes the bug.  I'd worry a
>> >> little about moving the MyProcPid assignments out of that function without
>> >> adding a bunch of commentary to explain why.
>> > 
>> > Can you say more about that?  A comment about MyProcPid could say "fork() is
>> > the one thing that changes the getpid() return value".  To me, the things
>> > InitProcessGlobals() sets are all different.  MyProcPid can be set without
>> > elog(ERROR) and gets invalidated at fork().  The others reasonably could
>> > elog(ERROR).  (They currently don't.)  The random state could have a different
>> > lifecycle.  If we had a builtin pooler that reused processes, we'd
>> > reinitialize random state at each process reuse, not at each fork().  So I see
>> > the grouping of (MyProcPid, MyStartTimestamp, random seed) as mostly an
>> > accident of history.

I just noticed that InitProcessGlobals() is relatively new.  It was added
in v12 by commit 197e4af.

>> Fair enough.  I suppose part of my hesitation stems from expecting hackers
>> to be more likely to remember to call InitProcessGlobals() than to
>> initialize MyProcPid.  But given your change requires initializing
>> MyProcPid in exactly 2 places, and there are unlikely to be more in the
>> near future, I might be overthinking it.
> 
> I don't feel strongly either way.  I did write it the option-1 way originally.
> Then I started thinking about changes at a distance causing the other
> InitProcessGlobals() tasks to palloc or elog.  We could do option-1 in master
> and keep the back branches in their current state.

I don't feel strongly either way, either.  I don't think it's important
enough to diverge from the back-branches.

-- 
nathan



Re: Fix early elog(FATAL)

From
Nathan Bossart
Date:
On Fri, Dec 13, 2024 at 10:14:43PM -0600, Nathan Bossart wrote:
> On Fri, Dec 13, 2024 at 07:15:05PM -0800, Noah Misch wrote:
>> I don't feel strongly either way.  I did write it the option-1 way originally.
>> Then I started thinking about changes at a distance causing the other
>> InitProcessGlobals() tasks to palloc or elog.  We could do option-1 in master
>> and keep the back branches in their current state.
> 
> I don't feel strongly either way, either.  I don't think it's important
> enough to diverge from the back-branches.

Also, thank you for fixing this.

-- 
nathan