Thread: Crash during backend start when low on memory

Crash during backend start when low on memory

From
Mats Kindahl
Date:
Hi all,

We have a crash when starting up a backend that can happen when the system is low on memory. Even though the system is probably not in good shape, it seems unnecessary to trigger a crash and it would be preferable to just deny creating the backend and throw an error.

Inside BackendInitialize, there is this code:

/*
* Save remote_host and remote_port in port structure (after this, they
* will appear in log_line_prefix data for log messages).
*/
port->remote_host = strdup(remote_host);
port->remote_port = strdup(remote_port);
   .
   .
   .
appendStringInfoString(&ps_data, port->remote_host);
if (port->remote_port[0] != '\0')
    appendStringInfo(&ps_data, "(%s)", port->remote_port);

There is no assignment to remote_port or remote_host in the lines between. If strdup() fails to allocate memory for the strings, it will return NULL and the dereference at the later lines will cause a segmentation fault, which will bring down the server. There seems to be code that reads the start packet between these two lines of code, but I can trigger a segmentation fault without having a correct user. This seems unnecessary.

I can trigger the crash using a debugger:
  1. Connect a debugger to the postmaster
  2. Set a breakpoint on BackendStartup and continue.
  3. Try to connect to the server using psql and use a non-existing user name.
  4. Set follow-fork-mode to child and set a breakpoint on BackendInitialize and continue executing.
  5. Step to the first lines above and set port->remote_port and port->remote_host to NULL simulating a failed strdup.
  6. Continue executing, and the backend will get a segfault and crash the server.
This is probably not a big problem in normal installations. It requires you to be out of memory to trigger this and you will run out of backends long before you run out of memory, but if you're running in constrained environments or in a container that is pushing the limits of the amount of memory to allocate, this can trigger the crash.

A tentative patch to fix this just to check the allocation and exit the process if there is not enough memory, taking care to not try to allocate more memory. I used this patch (attached as well).

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 1da5752047..e4b3d1bd59 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -4327,6 +4327,14 @@ BackendInitialize(Port *port)
       port->remote_host = strdup(remote_host);
       port->remote_port = strdup(remote_port);
 
+       if (port->remote_host == NULL || port->remote_port == NULL) {
+               /* Since we are out of memory, we use strerror_r and write_stderr
+                  here. They do not allocate memory and just use the stack. */
+               char sebuf[PG_STRERROR_R_BUFLEN];
+               write_stderr("out of memory: %s", strerror_r(errno, sebuf, sizeof(sebuf)));
+               proc_exit(1);
+       }
+
       /* And now we can issue the Log_connections message, if wanted */
       if (Log_connections)
       {

However, an alternative patch is to not use strdup() or any other functions that allocate memory on the heap and instead use pstrdup(), pmalloc() and friends. Not sure if there are any reasons to not use those in this function, but it seems like the memory context is correctly set up when BackendInitialize() is called. I have attached a patch for that as well. In this case, it is not necessary to check the return value since MemoryContextAlloc() will report an error if it fails to allocate memory and not return NULL.

Best wishes,
Mats Kindahl, Timescale
Attachment

Re: Crash during backend start when low on memory

From
Daniel Gustafsson
Date:
> On 14 Dec 2022, at 13:58, Mats Kindahl <mats@timescale.com> wrote:

> If strdup() fails to allocate memory for the strings, it will return NULL and the dereference at the later lines will
causea segmentation fault, which will bring down the server. There seems to be code that reads the start packet between
thesetwo lines of code, but I can trigger a segmentation fault without having a correct user. This seems unnecessary. 

Ugh.

> A tentative patch to fix this just to check the allocation and exit the process if there is not enough memory, taking
careto not try to allocate more memory. I used this patch (attached as well). 
>
> diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
> index 1da5752047..e4b3d1bd59 100644
> --- a/src/backend/postmaster/postmaster.c
> +++ b/src/backend/postmaster/postmaster.c
> @@ -4327,6 +4327,14 @@ BackendInitialize(Port *port)
>        port->remote_host = strdup(remote_host);
>        port->remote_port = strdup(remote_port);
>
> +       if (port->remote_host == NULL || port->remote_port == NULL) {
> +               /* Since we are out of memory, we use strerror_r and write_stderr
> +                  here. They do not allocate memory and just use the stack. */
> +               char sebuf[PG_STRERROR_R_BUFLEN];
> +               write_stderr("out of memory: %s", strerror_r(errno, sebuf, sizeof(sebuf)));
> +               proc_exit(1);
> +       }
> +

Something like this (with the comment in the right syntax) seems like an
appropriate fix.

> However, an alternative patch is to not use strdup() or any other functions that allocate memory on the heap and
insteaduse pstrdup(), pmalloc() and friends. Not sure if there are any reasons to not use those in this function, but
itseems like the memory context is correctly set up when BackendInitialize() is called. I have attached a patch for
thatas well. In this case, it is not necessary to check the return value since MemoryContextAlloc() will report an
errorif it fails to allocate memory and not return NULL. 

I think using pstrdup would require changing over to PostmasterContext (or
TopMemoryContext even?) to ensure these allocations are around for the lifetime
of the backend.

A related issue is that we in PostmasterMain strdup into output_config_variable
during option parsing, but we don't check for success.  When we then go about
checking if -C was set, we can't tell if it wasn't at all set or if the
strdup() call failed.  Another one is the -D option where a failure to strdup
will silently fall back to $PGDATA.  Both of these should IMO check for strdup
returning NULL and exit in case it does.

--
Daniel Gustafsson        https://vmware.com/




Re: Crash during backend start when low on memory

From
Tom Lane
Date:
Daniel Gustafsson <daniel@yesql.se> writes:
>> On 14 Dec 2022, at 13:58, Mats Kindahl <mats@timescale.com> wrote:
>> A tentative patch to fix this just to check the allocation and exit the process if there is not enough memory,
takingcare to not try to allocate more memory. I used this patch (attached as well). 

> Something like this (with the comment in the right syntax) seems like an
> appropriate fix.

I don't like that approach one bit.  Extending it to all the places where
this could theoretically happen would require a large amount of thought,
because you'd need a custom solution for each place.  That is completely
unwarranted for what's basically (IMO) an academic exercise.  The right
thing is to s/strdup/something/g, which we can do pretty mindlessly once
we settle on the "something".  I'd prefer that the something be pstrdup,
because otherwise future hackers will have to stop and think whether
they should be using pstrdup or something else when hacking in session
startup code.  As you say, we do have to think through which context
needs to be active for that to work.  I'd also be okay with deciding that
we need an explicit MemoryContextStrdup in some places, as long as
it's pretty clear which places need that.

BTW, I think it's also pretty tenable to decide "this is a non-problem,
let's ignore it".  Yeah, the backend will crash and provoke a DB reset
cycle, but if you're that hard up for memory (a) that is likely to happen
somewhere else soon anyway, and (b) a reset cycle might not be such a
bad thing, as it could relieve the memory pressure.  I'm especially not
pleased with the prospect of writing a bunch of nigh-untestable code
that makes extremely dubious claims of being able to cope.  (Hint:
proc_exit() almost certainly results in memory allocations.  And
I think the claim that neither strerror_r nor write_stderr does
is based on little beyond wishful thinking.  Especially so if
write_stderr is forwarding to the log collector.)

            regards, tom lane



Re: Crash during backend start when low on memory

From
Mats Kindahl
Date:


On Wed, Dec 14, 2022 at 2:23 PM Daniel Gustafsson <daniel@yesql.se> wrote:
> On 14 Dec 2022, at 13:58, Mats Kindahl <mats@timescale.com> wrote:

> If strdup() fails to allocate memory for the strings, it will return NULL and the dereference at the later lines will cause a segmentation fault, which will bring down the server. There seems to be code that reads the start packet between these two lines of code, but I can trigger a segmentation fault without having a correct user. This seems unnecessary.

Ugh.

> A tentative patch to fix this just to check the allocation and exit the process if there is not enough memory, taking care to not try to allocate more memory. I used this patch (attached as well).
>
> diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
> index 1da5752047..e4b3d1bd59 100644
> --- a/src/backend/postmaster/postmaster.c
> +++ b/src/backend/postmaster/postmaster.c
> @@ -4327,6 +4327,14 @@ BackendInitialize(Port *port)
>        port->remote_host = strdup(remote_host);
>        port->remote_port = strdup(remote_port);

> +       if (port->remote_host == NULL || port->remote_port == NULL) {
> +               /* Since we are out of memory, we use strerror_r and write_stderr
> +                  here. They do not allocate memory and just use the stack. */
> +               char sebuf[PG_STRERROR_R_BUFLEN];
> +               write_stderr("out of memory: %s", strerror_r(errno, sebuf, sizeof(sebuf)));
> +               proc_exit(1);
> +       }
> +

Something like this (with the comment in the right syntax) seems like an
appropriate fix.

Yeah, I fixed the comments. Patch attached. I am actually sceptical of this approach and think using pstrdup() and friends is better. With this approach, there are two different mechanisms for allocating memory, so there is an inconsistency in the code that is just confusing and raises questions rather than solves problems. Better to have a consistent approach and use pstrdup() everywhere.
 

> However, an alternative patch is to not use strdup() or any other functions that allocate memory on the heap and instead use pstrdup(), pmalloc() and friends. Not sure if there are any reasons to not use those in this function, but it seems like the memory context is correctly set up when BackendInitialize() is called. I have attached a patch for that as well. In this case, it is not necessary to check the return value since MemoryContextAlloc() will report an error if it fails to allocate memory and not return NULL.

I think using pstrdup would require changing over to PostmasterContext (or
TopMemoryContext even?) to ensure these allocations are around for the lifetime
of the backend.

PostmasterContext is set quite early in the procedure so it should not be a problem to use it. I think we can use PostmasterContext rather than TopMemoryContext. The memory for the allocations are passed down to PostmasterMain() and used inside InitPostgres() but as far as I can tell, they are not used after the InitPostgres() call. The PostmasterContex is destroyed right after the call to InitPostgres() inside PostmasterMain() and then the rest is allocated in other contexts.

 

A related issue is that we in PostmasterMain strdup into output_config_variable
during option parsing, but we don't check for success.  When we then go about
checking if -C was set, we can't tell if it wasn't at all set or if the
strdup() call failed.  Another one is the -D option where a failure to strdup
will silently fall back to $PGDATA.  Both of these should IMO check for strdup
returning NULL and exit in case it does.

For the strdup() in PostmasterMain() we have this code quite early in the function, so it should be safe to use pstrdup() consistently.

/*
* By default, palloc() requests in the postmaster will be allocated in
* the PostmasterContext, which is space that can be recycled by backends.
* Allocated data that needs to be available to backends should be
* allocated in TopMemoryContext.
*/
PostmasterContext = AllocSetContextCreate(TopMemoryContext,
 "Postmaster",
 ALLOCSET_DEFAULT_SIZES);
MemoryContextSwitchTo(PostmasterContext);

 I have attached a patch for that as well, which is the approach I think is better.

Best wishes,
Mats Kindahl, Timescale

--
Daniel Gustafsson               https://vmware.com/

Attachment

Re: Crash during backend start when low on memory

From
Mats Kindahl
Date:


On Wed, Dec 14, 2022 at 4:54 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Daniel Gustafsson <daniel@yesql.se> writes:
>> On 14 Dec 2022, at 13:58, Mats Kindahl <mats@timescale.com> wrote:
>> A tentative patch to fix this just to check the allocation and exit the process if there is not enough memory, taking care to not try to allocate more memory. I used this patch (attached as well).

> Something like this (with the comment in the right syntax) seems like an
> appropriate fix.

I don't like that approach one bit.  Extending it to all the places where
this could theoretically happen would require a large amount of thought,
because you'd need a custom solution for each place.  That is completely
unwarranted for what's basically (IMO) an academic exercise.  The right
thing is to s/strdup/something/g, which we can do pretty mindlessly once
we settle on the "something".  I'd prefer that the something be pstrdup,
because otherwise future hackers will have to stop and think whether
they should be using pstrdup or something else when hacking in session
startup code.  As you say, we do have to think through which context
needs to be active for that to work.  I'd also be okay with deciding that
we need an explicit MemoryContextStrdup in some places, as long as
it's pretty clear which places need that.

I think we can use the PostmasterContext and I agree that it is a lot clearer and consistent to use pstrdup() and friends everywhere.
 

BTW, I think it's also pretty tenable to decide "this is a non-problem,
let's ignore it".  Yeah, the backend will crash and provoke a DB reset
cycle, but if you're that hard up for memory (a) that is likely to happen
somewhere else soon anyway, and (b) a reset cycle might not be such a
bad thing, as it could relieve the memory pressure.  I'm especially not
pleased with the prospect of writing a bunch of nigh-untestable code
that makes extremely dubious claims of being able to cope.  (Hint:
proc_exit() almost certainly results in memory allocations.  And
I think the claim that neither strerror_r nor write_stderr does
is based on little beyond wishful thinking.  Especially so if
write_stderr is forwarding to the log collector.)

I don't think the first patch should be used, but added it to the previous mail so that you can look at it. Using pstrdup() makes the code consistent, is a trivial patch, and avoids a lot of unnecessary discussions and questions, so I think there is value in adding this patch, even if this is not a common occurrence and even if this is an edge case.
 
Best wishes,
Mats Kindahl, Timescale


                        regards, tom lane

Re: Crash during backend start when low on memory

From
Alvaro Herrera
Date:
On 2022-Dec-14, Mats Kindahl wrote:

> PostmasterContext is set quite early in the procedure so it should not be a
> problem to use it. I think we can use PostmasterContext rather than
> TopMemoryContext. The memory for the allocations are passed down to
> PostmasterMain() and used inside InitPostgres() but as far as I can tell,
> they are not used after the InitPostgres() call. The PostmasterContex is
> destroyed right after the call to InitPostgres() inside PostmasterMain()
> and then the rest is allocated in other contexts.

Well, this is what PostgresMain has to say about it:

    /*
     * If the PostmasterContext is still around, recycle the space; we don't
     * need it anymore after InitPostgres completes.  Note this does not trash
     * *MyProcPort, because ConnCreate() allocated that space with malloc()
     * ... else we'd need to copy the Port data first.  Also, subsidiary data
     * such as the username isn't lost either; see ProcessStartupPacket().
     */
    if (PostmasterContext)
    {
        MemoryContextDelete(PostmasterContext);

So I think blowing away part of struct Port independently of the
containing struct is a bad idea.  Why isn't more appropriate to do what
ConnCreate does?

    if (!(port = (Port *) calloc(1, sizeof(Port))))
    {
        ereport(LOG,
                (errcode(ERRCODE_OUT_OF_MEMORY),
                 errmsg("out of memory")));
        ExitPostmaster(1);
    }

... ugh.  I have to admit that killing postmaster due to a transient
out-of-memory makes me a bit nervous -- do note that this
ExitPostmaster() happens before we've forked the backend, so this code
does bring the server down (!!).  If we want to make postmaster more
robust in the face of OOM, we should also change this code somehow, yes?

I agree with Mats' premise when starting the thread -- even if the
server is not in great shape, it doesn't seem like taking the whole
service down is a great response.  It would be better to just refuse the
connection with some error and let existing connections chug along,
poorly though they can.  Maybe they'll also die soon, freeing memory;
but that'll be no good if Postmaster is gone altogether.

Now, it would be better if we could test this more thoroughly.  Perhaps
it's possible to use something like the mallocfail.c lib that Heikki
wrote a few years ago, teaching it to handle calloc
https://postgr.es/m/547480DE.4040408@vmware.com

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Oh, great altar of passive entertainment, bestow upon me thy discordant images
at such speed as to render linear thought impossible" (Calvin a la TV)



Re: Crash during backend start when low on memory

From
Mats Kindahl
Date:
On Thu, Dec 15, 2022 at 6:20 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On 2022-Dec-14, Mats Kindahl wrote:

> PostmasterContext is set quite early in the procedure so it should not be a
> problem to use it. I think we can use PostmasterContext rather than
> TopMemoryContext. The memory for the allocations are passed down to
> PostmasterMain() and used inside InitPostgres() but as far as I can tell,
> they are not used after the InitPostgres() call. The PostmasterContex is
> destroyed right after the call to InitPostgres() inside PostmasterMain()
> and then the rest is allocated in other contexts.

Well, this is what PostgresMain has to say about it:

        /*
         * If the PostmasterContext is still around, recycle the space; we don't
         * need it anymore after InitPostgres completes.  Note this does not trash
         * *MyProcPort, because ConnCreate() allocated that space with malloc()
         * ... else we'd need to copy the Port data first.  Also, subsidiary data
         * such as the username isn't lost either; see ProcessStartupPacket().
         */
        if (PostmasterContext)
        {
                MemoryContextDelete(PostmasterContext);


Ugh. Ok. Missed that comment.
 
So I think blowing away part of struct Port independently of the
containing struct is a bad idea.  Why isn't more appropriate to do what
ConnCreate does?

        if (!(port = (Port *) calloc(1, sizeof(Port))))
        {
                ereport(LOG,
                                (errcode(ERRCODE_OUT_OF_MEMORY),
                                 errmsg("out of memory")));
                ExitPostmaster(1);
        }

... ugh.  I have to admit that killing postmaster due to a transient
out-of-memory makes me a bit nervous -- do note that this
ExitPostmaster() happens before we've forked the backend, so this code
does bring the server down (!!).  If we want to make postmaster more
robust in the face of OOM, we should also change this code somehow, yes?

Seems reasonable, yes. I'll see what I can throw together.
 

I agree with Mats' premise when starting the thread -- even if the
server is not in great shape, it doesn't seem like taking the whole
service down is a great response.  It would be better to just refuse the
connection with some error and let existing connections chug along,
poorly though they can.  Maybe they'll also die soon, freeing memory;
but that'll be no good if Postmaster is gone altogether.

Now, it would be better if we could test this more thoroughly.  Perhaps
it's possible to use something like the mallocfail.c lib that Heikki
wrote a few years ago, teaching it to handle calloc
https://postgr.es/m/547480DE.4040408@vmware.com

 Thanks. Will take a look at that.



--
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Oh, great altar of passive entertainment, bestow upon me thy discordant images
at such speed as to render linear thought impossible" (Calvin a la TV)

Re: Crash during backend start when low on memory

From
Mats Kindahl
Date:


On Fri, Dec 16, 2022 at 9:46 AM Mats Kindahl <mats@timescale.com> wrote:
On Thu, Dec 15, 2022 at 6:20 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On 2022-Dec-14, Mats Kindahl wrote:

> PostmasterContext is set quite early in the procedure so it should not be a
> problem to use it. I think we can use PostmasterContext rather than
> TopMemoryContext. The memory for the allocations are passed down to
> PostmasterMain() and used inside InitPostgres() but as far as I can tell,
> they are not used after the InitPostgres() call. The PostmasterContex is
> destroyed right after the call to InitPostgres() inside PostmasterMain()
> and then the rest is allocated in other contexts.

Well, this is what PostgresMain has to say about it:

        /*
         * If the PostmasterContext is still around, recycle the space; we don't
         * need it anymore after InitPostgres completes.  Note this does not trash
         * *MyProcPort, because ConnCreate() allocated that space with malloc()
         * ... else we'd need to copy the Port data first.  Also, subsidiary data
         * such as the username isn't lost either; see ProcessStartupPacket().
         */
        if (PostmasterContext)
        {
                MemoryContextDelete(PostmasterContext);


Ugh. Ok. Missed that comment.
 
So I think blowing away part of struct Port independently of the
containing struct is a bad idea.  Why isn't more appropriate to do what
ConnCreate does?

        if (!(port = (Port *) calloc(1, sizeof(Port))))
        {
                ereport(LOG,
                                (errcode(ERRCODE_OUT_OF_MEMORY),
                                 errmsg("out of memory")));
                ExitPostmaster(1);
        }

... ugh.  I have to admit that killing postmaster due to a transient
out-of-memory makes me a bit nervous -- do note that this
ExitPostmaster() happens before we've forked the backend, so this code
does bring the server down (!!).  If we want to make postmaster more
robust in the face of OOM, we should also change this code somehow, yes?

Seems reasonable, yes. I'll see what I can throw together.
 

I agree with Mats' premise when starting the thread -- even if the
server is not in great shape, it doesn't seem like taking the whole
service down is a great response.  It would be better to just refuse the
connection with some error and let existing connections chug along,
poorly though they can.  Maybe they'll also die soon, freeing memory;
but that'll be no good if Postmaster is gone altogether.

Now, it would be better if we could test this more thoroughly.  Perhaps
it's possible to use something like the mallocfail.c lib that Heikki
wrote a few years ago, teaching it to handle calloc
https://postgr.es/m/547480DE.4040408@vmware.com

 Thanks. Will take a look at that.

I have an improved patch based on raising an error when malloc fails. I used the version of mallocfail that is available at https://github.com/ralight/mallocfail which allows you to run the program with incrementally failing memory allocations but since the count is around 6886 or so when the postmaster starts to allow backend starts and other processes are starting in between, this count is not always accurate and something "smarter" is needed to get a reliable failure test.

It is quite straightforward to test it by attaching a debugger and setting the return value of malloc, but to be able to have some "smarter" failures only when, for example, starting a backend it would be necessary to have something better integrated with the postgres code. Being able to attach a uprobe using bpftrace and override the return value would be the most elegant solution, but that is currently not possible, so considering whether to add something like the mallocfail above to the regular testing.

Note: there are a bunch of other core dumps that occur during startup before the postmaster is ready to accept connections, but I think that is less of a problem since it fails during startup before the system is ready to accept connections.

Best wishes,
Mats Kindahl, Timescale




--
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Oh, great altar of passive entertainment, bestow upon me thy discordant images
at such speed as to render linear thought impossible" (Calvin a la TV)
Attachment

Re: Crash during backend start when low on memory

From
Alvaro Herrera
Date:
On 2023-Jan-13, Mats Kindahl wrote:

> I have an improved patch based on raising an error when malloc fails. I
> used the version of mallocfail that is available at
> https://github.com/ralight/mallocfail which allows you to run the program
> with incrementally failing memory allocations but since the count is around
> 6886 or so when the postmaster starts to allow backend starts and other
> processes are starting in between, this count is not always accurate and
> something "smarter" is needed to get a reliable failure test.

Hmm, but I'm not sure I like what you did to BackendStartup:

@@ -4054,14 +4073,7 @@ BackendStartup(Port *port)
      * Create backend data structure.  Better before the fork() so we can
      * handle failure cleanly.
      */
-    bn = (Backend *) malloc(sizeof(Backend));
-    if (!bn)
-    {
-        ereport(LOG,
-                (errcode(ERRCODE_OUT_OF_MEMORY),
-                 errmsg("out of memory")));
-        return STATUS_ERROR;
-    }
+    CHECKED_CALL(bn = malloc(sizeof(Backend)));

With this code, now we will have postmaster raise ERROR if this malloc
fails, rather than returning STATUS_ERROR as previously.  How does
postmaster react to this?  I think it won't go very well.

(Stylistically, I'd rather have the test and corresponding reaction at
each call site rather than using a macro to hide it.  I especially don't
like the fact that the errdetail you propose would report a line of C
code to the user, rather than an intelligible explanation of what that
code is trying to do.)

> It is quite straightforward to test it by attaching a debugger and setting
> the return value of malloc, but to be able to have some "smarter" failures
> only when, for example, starting a backend it would be necessary to have
> something better integrated with the postgres code. Being able to attach a
> uprobe using bpftrace and override the return value would be the most
> elegant solution, but that is currently not possible, so considering
> whether to add something like the mallocfail above to the regular testing.

Hmm, I'm not sure that this type of testing is something to do on an
ongoing basis.

> Note: there are a bunch of other core dumps that occur during startup
> before the postmaster is ready to accept connections, but I think that is
> less of a problem since it fails during startup before the system is ready
> to accept connections.

That sounds reasonable.  The service watcher should restart postmaster
appropriately in that case (be it old SysV init, systemd, your container
management layer, or whatever.)

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Ellos andaban todos desnudos como su madre los parió, y también las mujeres,
aunque no vi más que una, harto moza, y todos los que yo vi eran todos
mancebos, que ninguno vi de edad de más de XXX años" (Cristóbal Colón)



Re: Crash during backend start when low on memory

From
Tom Lane
Date:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> On 2023-Jan-13, Mats Kindahl wrote:
>> I have an improved patch based on raising an error when malloc fails.

> Hmm, but I'm not sure I like what you did to BackendStartup:

Yeah, the BackendStartup change is 100% wrong; it is replacing
perfectly good code that recovers correctly with bad code that
will take down the postmaster (not a backend child!) on OOM.

By and large I don't like anything about this patch.  You could
get the same results (i.e. elog(ERROR) on OOM) by replacing the
malloc calls with pallocs.

            regards, tom lane



Re: Crash during backend start when low on memory

From
Mats Kindahl
Date:
On Fri, Jan 13, 2023 at 12:58 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On 2023-Jan-13, Mats Kindahl wrote:

> I have an improved patch based on raising an error when malloc fails. I
> used the version of mallocfail that is available at
> https://github.com/ralight/mallocfail which allows you to run the program
> with incrementally failing memory allocations but since the count is around
> 6886 or so when the postmaster starts to allow backend starts and other
> processes are starting in between, this count is not always accurate and
> something "smarter" is needed to get a reliable failure test.

Hmm, but I'm not sure I like what you did to BackendStartup:

@@ -4054,14 +4073,7 @@ BackendStartup(Port *port)
         * Create backend data structure.  Better before the fork() so we can
         * handle failure cleanly.
         */
-       bn = (Backend *) malloc(sizeof(Backend));
-       if (!bn)
-       {
-               ereport(LOG,
-                               (errcode(ERRCODE_OUT_OF_MEMORY),
-                                errmsg("out of memory")));
-               return STATUS_ERROR;
-       }
+       CHECKED_CALL(bn = malloc(sizeof(Backend)));

With this code, now we will have postmaster raise ERROR if this malloc
fails, rather than returning STATUS_ERROR as previously.  How does
postmaster react to this?  I think it won't go very well.

Well.. this error is caught in the caller using this code:

PG_TRY();
{
  port = ConnCreate(ListenSocket[i]);
  BackendStartup(port);
}
PG_FINALLY();
{
  /*
   * We no longer need the open socket or port structure
   * in this process, and definitely not if we failed
   * spawning a backend.
   */
  if (port)
  {
     if (port->sock != PGINVALID_SOCKET)
     StreamClose(port->sock);
     ConnFree(port);
  }
}
PG_END_TRY();

Unless I am mistaken, this should mean that the error message is printed, the streams closed and memory free'ed, and PostmasterMain continues as before. It could just be logged using LOG level, but an ERROR stands out better and is searchable so it is easy to surface in systems monitoring the server. (Digging for specific log messages is cumbersome, just printing out all ERROR messages in the log is easy.)


(Stylistically, I'd rather have the test and corresponding reaction at
each call site rather than using a macro to hide it.  I especially don't
like the fact that the errdetail you propose would report a line of C
code to the user, rather than an intelligible explanation of what that
code is trying to do.)

I agree this is ugly, but since this was rare and the error message is mostly to understand what went wrong to be able to debug it, I thought this would be sufficient for that. It is also easier to maintain. Trivial to replace.


> It is quite straightforward to test it by attaching a debugger and setting
> the return value of malloc, but to be able to have some "smarter" failures
> only when, for example, starting a backend it would be necessary to have
> something better integrated with the postgres code. Being able to attach a
> uprobe using bpftrace and override the return value would be the most
> elegant solution, but that is currently not possible, so considering
> whether to add something like the mallocfail above to the regular testing.

Hmm, I'm not sure that this type of testing is something to do on an
ongoing basis.

Me neither, which is why I didn't spend time on it right now. The argument for doing it is that it would be good to have some test that can at least be easily used to check that it behaves correctly on startup even if memory failures occur.
 

> Note: there are a bunch of other core dumps that occur during startup
> before the postmaster is ready to accept connections, but I think that is
> less of a problem since it fails during startup before the system is ready
> to accept connections.

That sounds reasonable.  The service watcher should restart postmaster
appropriately in that case (be it old SysV init, systemd, your container
management layer, or whatever.)

Yes, that was my thinking as well. Core dumps are ugly, but provide useful information to figure out where things went wrong.
 

--
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Ellos andaban todos desnudos como su madre los parió, y también las mujeres,
aunque no vi más que una, harto moza, y todos los que yo vi eran todos
mancebos, que ninguno vi de edad de más de XXX años" (Cristóbal Colón)

Re: Crash during backend start when low on memory

From
Mats Kindahl
Date:
On Fri, Jan 13, 2023 at 4:01 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> On 2023-Jan-13, Mats Kindahl wrote:
>> I have an improved patch based on raising an error when malloc fails.

> Hmm, but I'm not sure I like what you did to BackendStartup:

Yeah, the BackendStartup change is 100% wrong; it is replacing
perfectly good code that recovers correctly with bad code that
will take down the postmaster (not a backend child!) on OOM.

AFAICT, the error is caught by the caller (using PG_TRY), executes some cleanup code, and then continues executing, so it shouldn't take down the postmaster.
 

By and large I don't like anything about this patch.  You could
get the same results (i.e. elog(ERROR) on OOM) by replacing the
malloc calls with pallocs.

Well... as Alvaro pointed out, just replacing it with palloc will blow out part of the structure being allocated for the port information when the backend starts running.

However, continuing on an approach using memory contexts instead of directly allocating memory, something like this could be done:
  1. There is already a memory context PostmasterContext for the postmaster startup code, so only allocate stuff here that is not needed after a backend has started.
  2. Add a new memory context for "backend startup" (say, BackendStartupContext) and use that for allocating stuff that should exist after the backend starts. For example, the port information. (It might be possible to move more information here, such as the HBA structures, which would make it clearer that this is startup memory for the backend. Not sure if there are some potential issues with this.)
  3. Create the BackendStartupContext when starting the backend.
  4. Replace previous malloc and friends with palloc in the PostmasterContext (where suitable).
  5. Replace allocation of port structure using malloc with allocation in BackendStartupContext
  6. When the backend starts, it would toss the memory context from the postmaster, but that would leave the port structure intact. The backend already tosses the PostmasterContext, but not until after it has done authentication.
  7. When the postmaster has spawned the backend, it should toss the BackendStartupContext since it is not relevant any more.
  8. Since creating the BackendStartupContext can fail with an error, we need to wrap this in a PG_TRY block. Otherwise the postmaster will exit on error.
This provides a few advantages compared to allocating memory using malloc and friends:
  1. We allocate memory for the postmaster context in one place, when the actual startup is running. If we're low on memory at this point, we can abort early.
  2. We allocate memory for the backend startup in one place, when starting the backend. If we're low on memory at this point, we can abort early.
  3. Code is significantly easier to read and will not require a lot of checks during startup.
I have a tentative patch, but it contains some problems so I will solve these first.

Best wishes,
Mats Kindahl


                        regards, tom lane

Re: Crash during backend start when low on memory

From
Tom Lane
Date:
Mats Kindahl <mats@timescale.com> writes:
> On Fri, Jan 13, 2023 at 4:01 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Yeah, the BackendStartup change is 100% wrong; it is replacing
>> perfectly good code that recovers correctly with bad code that
>> will take down the postmaster (not a backend child!) on OOM.

> AFAICT, the error is caught by the caller (using PG_TRY), executes some
> cleanup code, and then continues executing, so it shouldn't take down the
> postmaster.

There are no PG_TRY blocks in the postmaster, and certainly no recovery.

            regards, tom lane



Re: Crash during backend start when low on memory

From
Mats Kindahl
Date:
On Mon, Jan 16, 2023 at 9:58 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Mats Kindahl <mats@timescale.com> writes:
> On Fri, Jan 13, 2023 at 4:01 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Yeah, the BackendStartup change is 100% wrong; it is replacing
>> perfectly good code that recovers correctly with bad code that
>> will take down the postmaster (not a backend child!) on OOM.

> AFAICT, the error is caught by the caller (using PG_TRY), executes some
> cleanup code, and then continues executing, so it shouldn't take down the
> postmaster.

There are no PG_TRY blocks in the postmaster, and certainly no recovery.

I added one in the patch. Doesn't this work? It seemed to work when I tried it.

Best wishes,
Mats Kindahl


                        regards, tom lane

Re: Crash during backend start when low on memory

From
Tom Lane
Date:
Mats Kindahl <mats@timescale.com> writes:
> On Mon, Jan 16, 2023 at 9:58 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> There are no PG_TRY blocks in the postmaster, and certainly no recovery.

> I added one in the patch.

That's VERY unlikely to get accepted.

            regards, tom lane



Re: Crash during backend start when low on memory

From
Mats Kindahl
Date:
On Fri, Jan 13, 2023 at 12:58 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On 2023-Jan-13, Mats Kindahl wrote:

> I have an improved patch based on raising an error when malloc fails. I
> used the version of mallocfail that is available at
> https://github.com/ralight/mallocfail which allows you to run the program
> with incrementally failing memory allocations but since the count is around
> 6886 or so when the postmaster starts to allow backend starts and other
> processes are starting in between, this count is not always accurate and
> something "smarter" is needed to get a reliable failure test.

Hmm, but I'm not sure I like what you did to BackendStartup:

@@ -4054,14 +4073,7 @@ BackendStartup(Port *port)
         * Create backend data structure.  Better before the fork() so we can
         * handle failure cleanly.
         */
-       bn = (Backend *) malloc(sizeof(Backend));
-       if (!bn)
-       {
-               ereport(LOG,
-                               (errcode(ERRCODE_OUT_OF_MEMORY),
-                                errmsg("out of memory")));
-               return STATUS_ERROR;
-       }
+       CHECKED_CALL(bn = malloc(sizeof(Backend)));

With this code, now we will have postmaster raise ERROR if this malloc
fails, rather than returning STATUS_ERROR as previously.  How does
postmaster react to this?  I think it won't go very well.

(Stylistically, I'd rather have the test and corresponding reaction at
each call site rather than using a macro to hide it.  I especially don't
like the fact that the errdetail you propose would report a line of C
code to the user, rather than an intelligible explanation of what that
code is trying to do.)

> It is quite straightforward to test it by attaching a debugger and setting
> the return value of malloc, but to be able to have some "smarter" failures
> only when, for example, starting a backend it would be necessary to have
> something better integrated with the postgres code. Being able to attach a
> uprobe using bpftrace and override the return value would be the most
> elegant solution, but that is currently not possible, so considering
> whether to add something like the mallocfail above to the regular testing.

Hmm, I'm not sure that this type of testing is something to do on an
ongoing basis.

> Note: there are a bunch of other core dumps that occur during startup
> before the postmaster is ready to accept connections, but I think that is
> less of a problem since it fails during startup before the system is ready
> to accept connections.

That sounds reasonable.  The service watcher should restart postmaster
appropriately in that case (be it old SysV init, systemd, your container
management layer, or whatever.)

Hi all,

I have gone over the two different approaches and have patches for both attached so that we can compare. The second approach, using palloc() and friends, is in my opinion safer so this is where I have spent most of the time.
  • The first approach allocates memory in the postmaster using malloc and raises an error if that fails. If this happens in the backend after it is started, a FATAL error is raised, which will abort the backend and send a useful message to the connected client, but not bring down the postmaster. The errors in the postmaster are caught using a PG_TRY(), which will prevent the postmaster from aborting, and continue executing.
  • The second approach creates a new memory context BackendStartupContext where objects that need to live after the backend is forked are allocated. Replace all uses of malloc() and friends with palloc() and friends. Since palloc() can throw an error, we need to catch this using a PG_TRY() inside the postmaster. In the event that the backend process throws an error, we convert this to a FATAL error and re-throw it, which will cause the backend process to terminate and send a useful message to the connected client.
The second approach, using palloc() and friends, is in my opinion less error prone since it will capture any raised errors that prevent the backend from starting, not just memory allocation errors. I have included a file PATCH.md containing information about the manual tests I've done (using a debugger) for both patches. There are a few pieces of memory that are allocated in the PostmasterContext and then copied out from it before destroying it. It would be possible to allocate this in the BackendStartupContext instead (for the palloc() approach) and destroy the PostmasterContext directly after the fork(), which provides a small advantage in being slightly less error prone and slightly easier to maintain.

Using malloc() and friends can be changed to not use ERROR-level reports and in that sense do not require using PG_TRY() and can be re-written to not use that. For palloc() and friends it seems they always run the risk of throwing an ERROR, so without making it behave differently in the postmaster (which seems to add unnecessary complications to the function), it will require PG_TRY() to capture the errors.

Tom had some concerns about using PG_TRY() inside the postmaster ServerLoop, but I cannot see how that can cause problems since it is "just" a setjmp() and longjmp(). It allocates a structure on the stack that contains the values of the registers and the signal set; it is big, but not exceedingly so. If somebody could enlighten me about if there are any problems with this, I would be very grateful.

Best wishes,
Mats Kindahl, Timescale

--
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Ellos andaban todos desnudos como su madre los parió, y también las mujeres,
aunque no vi más que una, harto moza, y todos los que yo vi eran todos
mancebos, que ninguno vi de edad de más de XXX años" (Cristóbal Colón)
Attachment

Re: Crash during backend start when low on memory

From
Andres Freund
Date:
Hi,

On 2023-02-05 22:00:20 +0100, Mats Kindahl wrote:
> Tom had some concerns about using PG_TRY() inside the postmaster
> ServerLoop

I don't think we gain anything in most of the places by using PG_TRY
instead of a non-throwing allocations. The relevant is cooperating
closely enough that just returning returning a failure indicator is
sufficient.

Note that that doesn't mean that palloc can't be used, see
MCXT_ALLOC_NO_OOM.


>, but I cannot see how that can cause problems since it is "just"
> a setjmp() and longjmp(). It allocates a structure on the stack that
> contains the values of the registers and the signal set; it is big, but not
> exceedingly so. If somebody could enlighten me about if there are any
> problems with this, I would be very grateful.

If you aren't careful you end up with PG_exception_stack in a backend
still pointing to that PG_TRY. Which means that an error during backend
startup would jump into postmaster code.  Which would not be good.



Particularly because this is a bugfix we need to make the fix more
minimal than the patches proposed so far.


Greetings,

Andres Freund



Re: Crash during backend start when low on memory

From
Mats Kindahl
Date:
On Mon, Feb 6, 2023 at 12:18 AM Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2023-02-05 22:00:20 +0100, Mats Kindahl wrote:
> Tom had some concerns about using PG_TRY() inside the postmaster
> ServerLoop

I don't think we gain anything in most of the places by using PG_TRY
instead of a non-throwing allocations. The relevant is cooperating
closely enough that just returning returning a failure indicator is
sufficient.

Note that that doesn't mean that palloc can't be used, see
MCXT_ALLOC_NO_OOM.

Ah, yes, then both patches can be rewritten to not use PG_TRY().
 

>, but I cannot see how that can cause problems since it is "just"
> a setjmp() and longjmp(). It allocates a structure on the stack that
> contains the values of the registers and the signal set; it is big, but not
> exceedingly so. If somebody could enlighten me about if there are any
> problems with this, I would be very grateful.

If you aren't careful you end up with PG_exception_stack in a backend
still pointing to that PG_TRY. Which means that an error during backend
startup would jump into postmaster code.  Which would not be good.

I did run into this case when "failing" some of the memory allocations inside the backend with a debugger (see the PATCH.md).
 
There are, however, already cases in the backend startup that can raise an error (for example, the code that creates the MessageContext in PostgresMain). So, either if one of these error cases is triggered, or if you make a mistake and add an ereport(ERROR,...) at the wrong place during startup, it could cause this problem. In other words, wouldn't adding a PG_TRY() in the code spawning the backend *prevent* such a problem and hence reduce the risk of a mistake in code causing the backend jumping into postmaster code (of course assuming that the code is written correctly).



Particularly because this is a bugfix we need to make the fix more
minimal than the patches proposed so far.

I can remove the PG_TRY() and use ereport(LOG, ...) + ExitPostmaster if it feels like a safer path for incorporating a bug fix, but note that there is still a risk that the backend will tear down the postmaster. For example, allocating memory for a memory context can fail, or throwing another error inside the backend startup can also fail, so I think that there will be more work to do after this.

Best wishes,
Mats Kindahl
 


Greetings,

Andres Freund

Re: Crash during backend start when low on memory

From
Mats Kindahl
Date:


On Mon, Feb 6, 2023 at 8:37 AM Mats Kindahl <mats@timescale.com> wrote:

I can remove the PG_TRY() and use ereport(LOG, ...) + ExitPostmaster if it feels like a safer path for incorporating a bug fix, but note that there is still a risk that the backend will tear down the postmaster. For example, allocating memory for a memory context can fail, or throwing another error inside the backend startup can also fail, so I think that there will be more work to do after this.

Not ExitPostmaster, of course, but some proc_exit call for the backend and returning status code for the postmaster.

Re: Crash during backend start when low on memory

From
Andres Freund
Date:
Hi,

On 2023-02-06 08:37:17 +0100, Mats Kindahl wrote:
> On Mon, Feb 6, 2023 at 12:18 AM Andres Freund <andres@anarazel.de> wrote:
> > >, but I cannot see how that can cause problems since it is "just"
> > > a setjmp() and longjmp(). It allocates a structure on the stack that
> > > contains the values of the registers and the signal set; it is big, but
> > not
> > > exceedingly so. If somebody could enlighten me about if there are any
> > > problems with this, I would be very grateful.
> >
> > If you aren't careful you end up with PG_exception_stack in a backend
> > still pointing to that PG_TRY. Which means that an error during backend
> > startup would jump into postmaster code.  Which would not be good.
> >
> 
> I did run into this case when "failing" some of the memory allocations
> inside the backend with a debugger (see the PATCH.md).
> 
> There are, however, already cases in the backend startup that can raise an
> error (for example, the code that creates the MessageContext in
> PostgresMain). So, either if one of these error cases is triggered, or if
> you make a mistake and add an ereport(ERROR,...) at the wrong place during
> startup, it could cause this problem. In other words, wouldn't adding a
> PG_TRY() in the code spawning the backend *prevent* such a problem and
> hence reduce the risk of a mistake in code causing the backend jumping into
> postmaster code (of course assuming that the code is written correctly).

It's fine to add ereport(ERROR)s inside the backend startup - elog.c knows how
to deal with no error handlers being set up. We just promote the ERROR to a
FATAL. That's not at all the same as having a PG_exception_stack set up at a
point we don't want to return to.

Remember that backends are forked, so they inherit the stack that postmaster
has set up.


> > Particularly because this is a bugfix we need to make the fix more
> > minimal than the patches proposed so far.
> >
> 
> I can remove the PG_TRY() and use ereport(LOG, ...) + ExitPostmaster if it
> feels like a safer path for incorporating a bug fix, but note that there is
> still a risk that the backend will tear down the postmaster. For example,
> allocating memory for a memory context can fail, or throwing another error
> inside the backend startup can also fail, so I think that there will be
> more work to do after this.

I can't really follow.

Greetings,

Andres Freund



Re: Crash during backend start when low on memory

From
Mats Kindahl
Date:
On Mon, Feb 6, 2023 at 7:41 PM Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2023-02-06 08:37:17 +0100, Mats Kindahl wrote:
> On Mon, Feb 6, 2023 at 12:18 AM Andres Freund <andres@anarazel.de> wrote:
> > >, but I cannot see how that can cause problems since it is "just"
> > > a setjmp() and longjmp(). It allocates a structure on the stack that
> > > contains the values of the registers and the signal set; it is big, but
> > not
> > > exceedingly so. If somebody could enlighten me about if there are any
> > > problems with this, I would be very grateful.
> >
> > If you aren't careful you end up with PG_exception_stack in a backend
> > still pointing to that PG_TRY. Which means that an error during backend
> > startup would jump into postmaster code.  Which would not be good.
> >
>
> I did run into this case when "failing" some of the memory allocations
> inside the backend with a debugger (see the PATCH.md).
>
> There are, however, already cases in the backend startup that can raise an
> error (for example, the code that creates the MessageContext in
> PostgresMain). So, either if one of these error cases is triggered, or if
> you make a mistake and add an ereport(ERROR,...) at the wrong place during
> startup, it could cause this problem. In other words, wouldn't adding a
> PG_TRY() in the code spawning the backend *prevent* such a problem and
> hence reduce the risk of a mistake in code causing the backend jumping into
> postmaster code (of course assuming that the code is written correctly).

It's fine to add ereport(ERROR)s inside the backend startup - elog.c knows how
to deal with no error handlers being set up. We just promote the ERROR to a
FATAL. That's not at all the same as having a PG_exception_stack set up at a
point we don't want to return to.

Remember that backends are forked, so they inherit the stack that postmaster
has set up.


> > Particularly because this is a bugfix we need to make the fix more
> > minimal than the patches proposed so far.
> >
>
> I can remove the PG_TRY() and use ereport(LOG, ...) + ExitPostmaster if it
> feels like a safer path for incorporating a bug fix, but note that there is
> still a risk that the backend will tear down the postmaster. For example,
> allocating memory for a memory context can fail, or throwing another error
> inside the backend startup can also fail, so I think that there will be
> more work to do after this.

I can't really follow.

My apologies, I'll add my reasoning below.

However, since you wanted a patch not using PG_TRY for the bugfix I have attached a patch that does not use it and instead uses memory context functions and the MCXT_ALLOC_NO_OOM flag where applicable. I've tested it manually similarly to the other patches and it passes "make check" and "make installcheck". I have added comments to explain my reasoning for the important allocations: what memory context it is allocated in and why.

My reasoning was that using a PG_TRY around the call to BackendStartup() would prevent any errors generated now and in future code from killing the postmaster, so it would be a more safe alternative. The difference in size is not that big between the patches, but this patch does not use exception handling.

Best wishes,
Mats Kindahl


Greetings,

Andres Freund
Attachment