On Mon, Feb 6, 2023 at 12:18 AM Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2023-02-05 22:00:20 +0100, Mats Kindahl wrote: > Tom had some concerns about using PG_TRY() inside the postmaster > ServerLoop
I don't think we gain anything in most of the places by using PG_TRY instead of a non-throwing allocations. The relevant is cooperating closely enough that just returning returning a failure indicator is sufficient.
Note that that doesn't mean that palloc can't be used, see MCXT_ALLOC_NO_OOM.
Ah, yes, then both patches can be rewritten to not use PG_TRY().
>, but I cannot see how that can cause problems since it is "just" > a setjmp() and longjmp(). It allocates a structure on the stack that > contains the values of the registers and the signal set; it is big, but not > exceedingly so. If somebody could enlighten me about if there are any > problems with this, I would be very grateful.
If you aren't careful you end up with PG_exception_stack in a backend still pointing to that PG_TRY. Which means that an error during backend startup would jump into postmaster code. Which would not be good.
I did run into this case when "failing" some of the memory allocations inside the backend with a debugger (see the PATCH.md).
There are, however, already cases in the backend startup that can raise an error (for example, the code that creates the MessageContext in PostgresMain). So, either if one of these error cases is triggered, or if you make a mistake and add an ereport(ERROR,...) at the wrong place during startup, it could cause this problem. In other words, wouldn't adding a PG_TRY() in the code spawning the backend *prevent* such a problem and hence reduce the risk of a mistake in code causing the backend jumping into postmaster code (of course assuming that the code is written correctly).
Particularly because this is a bugfix we need to make the fix more minimal than the patches proposed so far.
I can remove the PG_TRY() and use ereport(LOG, ...) + ExitPostmaster if it feels like a safer path for incorporating a bug fix, but note that there is still a risk that the backend will tear down the postmaster. For example, allocating memory for a memory context can fail, or throwing another error inside the backend startup can also fail, so I think that there will be more work to do after this.