Re: Fix early elog(FATAL) - Mailing list pgsql-hackers

From Nathan Bossart
Subject Re: Fix early elog(FATAL)
Date
Msg-id Z1i-K95hTcggVOgn@nathan
Whole thread Raw
Responses Re: Fix early elog(FATAL)
List pgsql-hackers
On Sat, Dec 07, 2024 at 07:46:14PM -0800, Noah Misch wrote:
> main() says:
> 
>     /*
>      * Fire up essential subsystems: error and memory management
>      *
>      * Code after this point is allowed to use elog/ereport, though
>      * localization of messages may not work right away, and messages won't go
>      * anywhere but stderr until GUC settings get loaded.
>      */
>     MemoryContextInit();
> 
> However, appending elog(ERROR, "whoops") breaks like:
> 
> $ initdb -D discard_me
> FATAL:  whoops
> PANIC:  proc_exit() called in child process
> no data was returned by command ""/home/nm/sw/nopath/pghead/bin/postgres" -V"
> child process was terminated by signal 6: Aborted
> 
> So does the ereport(FATAL) in ClosePostmasterPorts().  The "called in child
> process" check (added in commit 97550c0 of 2023-10) reads MyProcPid, which we
> set later.  Three ways to fix this:

I noticed that you committed a fix for this.  Sorry for not responding
earlier.

> 1. Call InitProcessGlobals() earlier.  This could also reduce the total call
>    sites from 3 to 2 (main() and post-fork).
> 
> 2. Move MyProcPid init out of InitProcessGlobals(), to main() and post-fork.
>    This has less to go wrong in back branches.  While probably irrelevant,
>    this avoids calling pg_prng_strong_seed() in processes that will exit after
>    help() or GucInfoMain().
> 
> 3. Revert 97550c0, as commit 3b00fdb anticipated.

I did partially revert 97550c0 in commit 8fd0498, but we decided to leave
some of the checks [0].

> I don't think the choice matters much, so here is (2).

FWIW I'd probably vote for option 1.  That keeps the initialization of the
globals together, reduces the call sites, and fixes the bug.  I'd worry a
little about moving the MyProcPid assignments out of that function without
adding a bunch of commentary to explain why.

[0] https://postgr.es/m/20231122225945.3kgclsgz5lqmtnan%40awork3.anarazel.de

-- 
nathan



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: Unmark gen_random_uuid() function leakproof
Next
From: Nathan Bossart
Date:
Subject: Re: Add a warning message when using unencrypted passwords