Re: Crash during backend start when low on memory - Mailing list pgsql-bugs

From Mats Kindahl
Subject Re: Crash during backend start when low on memory
Date
Msg-id CA+14424g_23hypp=jcTs8P5iFyDuzSU9BJzqon+9+YgW7az1zw@mail.gmail.com
Whole thread Raw
In response to Re: Crash during backend start when low on memory  (Andres Freund <andres@anarazel.de>)
Responses Re: Crash during backend start when low on memory
Re: Crash during backend start when low on memory
List pgsql-bugs
On Mon, Feb 6, 2023 at 12:18 AM Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2023-02-05 22:00:20 +0100, Mats Kindahl wrote:
> Tom had some concerns about using PG_TRY() inside the postmaster
> ServerLoop

I don't think we gain anything in most of the places by using PG_TRY
instead of a non-throwing allocations. The relevant is cooperating
closely enough that just returning returning a failure indicator is
sufficient.

Note that that doesn't mean that palloc can't be used, see
MCXT_ALLOC_NO_OOM.

Ah, yes, then both patches can be rewritten to not use PG_TRY().
 

>, but I cannot see how that can cause problems since it is "just"
> a setjmp() and longjmp(). It allocates a structure on the stack that
> contains the values of the registers and the signal set; it is big, but not
> exceedingly so. If somebody could enlighten me about if there are any
> problems with this, I would be very grateful.

If you aren't careful you end up with PG_exception_stack in a backend
still pointing to that PG_TRY. Which means that an error during backend
startup would jump into postmaster code.  Which would not be good.

I did run into this case when "failing" some of the memory allocations inside the backend with a debugger (see the PATCH.md).
 
There are, however, already cases in the backend startup that can raise an error (for example, the code that creates the MessageContext in PostgresMain). So, either if one of these error cases is triggered, or if you make a mistake and add an ereport(ERROR,...) at the wrong place during startup, it could cause this problem. In other words, wouldn't adding a PG_TRY() in the code spawning the backend *prevent* such a problem and hence reduce the risk of a mistake in code causing the backend jumping into postmaster code (of course assuming that the code is written correctly).



Particularly because this is a bugfix we need to make the fix more
minimal than the patches proposed so far.

I can remove the PG_TRY() and use ereport(LOG, ...) + ExitPostmaster if it feels like a safer path for incorporating a bug fix, but note that there is still a risk that the backend will tear down the postmaster. For example, allocating memory for a memory context can fail, or throwing another error inside the backend startup can also fail, so I think that there will be more work to do after this.

Best wishes,
Mats Kindahl
 


Greetings,

Andres Freund

pgsql-bugs by date:

Previous
From: PG Bug reporting form
Date:
Subject: BUG #17774: Assert triggered on brin_minmax_multi.c
Next
From: Masahiko Sawada
Date:
Subject: Re: WAL segments removed from primary despite the fact that logical replication slot needs it.