Re: Crash during backend start when low on memory - Mailing list pgsql-bugs

From Alvaro Herrera
Subject Re: Crash during backend start when low on memory
Date
Msg-id 20221215172008.islvqfyysom3ufxy@alvherre.pgsql
Whole thread Raw
In response to Re: Crash during backend start when low on memory  (Mats Kindahl <mats@timescale.com>)
Responses Re: Crash during backend start when low on memory
List pgsql-bugs
On 2022-Dec-14, Mats Kindahl wrote:

> PostmasterContext is set quite early in the procedure so it should not be a
> problem to use it. I think we can use PostmasterContext rather than
> TopMemoryContext. The memory for the allocations are passed down to
> PostmasterMain() and used inside InitPostgres() but as far as I can tell,
> they are not used after the InitPostgres() call. The PostmasterContex is
> destroyed right after the call to InitPostgres() inside PostmasterMain()
> and then the rest is allocated in other contexts.

Well, this is what PostgresMain has to say about it:

    /*
     * If the PostmasterContext is still around, recycle the space; we don't
     * need it anymore after InitPostgres completes.  Note this does not trash
     * *MyProcPort, because ConnCreate() allocated that space with malloc()
     * ... else we'd need to copy the Port data first.  Also, subsidiary data
     * such as the username isn't lost either; see ProcessStartupPacket().
     */
    if (PostmasterContext)
    {
        MemoryContextDelete(PostmasterContext);

So I think blowing away part of struct Port independently of the
containing struct is a bad idea.  Why isn't more appropriate to do what
ConnCreate does?

    if (!(port = (Port *) calloc(1, sizeof(Port))))
    {
        ereport(LOG,
                (errcode(ERRCODE_OUT_OF_MEMORY),
                 errmsg("out of memory")));
        ExitPostmaster(1);
    }

... ugh.  I have to admit that killing postmaster due to a transient
out-of-memory makes me a bit nervous -- do note that this
ExitPostmaster() happens before we've forked the backend, so this code
does bring the server down (!!).  If we want to make postmaster more
robust in the face of OOM, we should also change this code somehow, yes?

I agree with Mats' premise when starting the thread -- even if the
server is not in great shape, it doesn't seem like taking the whole
service down is a great response.  It would be better to just refuse the
connection with some error and let existing connections chug along,
poorly though they can.  Maybe they'll also die soon, freeing memory;
but that'll be no good if Postmaster is gone altogether.

Now, it would be better if we could test this more thoroughly.  Perhaps
it's possible to use something like the mallocfail.c lib that Heikki
wrote a few years ago, teaching it to handle calloc
https://postgr.es/m/547480DE.4040408@vmware.com

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Oh, great altar of passive entertainment, bestow upon me thy discordant images
at such speed as to render linear thought impossible" (Calvin a la TV)



pgsql-bugs by date:

Previous
From: reiner peterke
Date:
Subject: Re: BUG #17720: pg_dump creates a dump with primary key that cannot be restored, when specifying 'using index ...'
Next
From: Tom Lane
Date:
Subject: Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)