Re: Missing checks when malloc returns NULL... - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Missing checks when malloc returns NULL...
Date
Msg-id CAB7nPqTfSDR7T4aPxELJbw1W9qR5U3L4=i+KgR11vH+smxZ0Yg@mail.gmail.com
Whole thread Raw
In response to Re: Missing checks when malloc returns NULL...  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Missing checks when malloc returns NULL...  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Wed, Aug 31, 2016 at 7:47 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Michael Paquier <michael.paquier@gmail.com> writes:
>> [ malloc-nulls-v5.patch ]
>
> I've committed some form of all of these changes

Thanks!

> except the one in
> adt/pg_locale.c.  I'm not entirely sure whether we need to do anything
> about that at all, but if we do, this doesn't cut it:
>
>      thousands_sep = db_encoding_strdup(encoding, extlconv->thousands_sep);
>      grouping = strdup(extlconv->grouping);
>
> +    if (!grouping)
> +        elog(ERROR, "out of memory");
> +
>  #ifdef WIN32
>      /* use monetary to set the ctype */
>      setlocale(LC_CTYPE, locale_monetary);
>
> There are multiple things wrong with that:
>
> 1. The db_encoding_strdup calls can also return NULL on OOM, and aren't
> being checked likewise.  (And there's another plain strdup further down,
> too.)
>
> 2. You can't safely throw an elog right there, because you need to restore
> the backend's prevailing setlocale state first.

Arg I haven't though of this one.

> 3. Also, if you do it like that, you'll leak whatever strings were already
> strdup'd.  (This is a relatively minor problem, but still, if we're
> trying to be 100% correct then we're not there yet.)
>
> Also, now that I'm looking at it, I see there's another pre-existing bug:
>
> 4. An elog exit is possible, due to either OOM or encoding conversion
> failure, inside db_encoding_strdup().  This means we have problems #2 and
> #3 even in the existing code.
>
> Now, I believe that the coding intention here was that assigning NULL
> to the respective fields of CurrentLocaleConv is okay and better than
> failing the operation completely.  One argument against that is that
> it's unclear whether everyplace that uses any of those fields is checking
> for NULL first; and in any case, silently falling back to nonlocalized
> behavior might not be better than reporting OOM.  Still, it's certainly
> better than risking problem #2, which could cause all sorts of subsequent
> malfunctions.
>
> I think that a complete fix for this might go along the lines of
>
> 1. While we are setlocale'd to the nonstandard locales, collect all the
> values by strdup'ing into a local variable of type "struct lconv".
> (We must strdup for the reason noted in the comment, that localeconv's
> output is no longer valid once we do another setlocale.)  Then restore
> the standard locale settings.

The one at the top of the file... That's really platform-dependent.

> 2. Use db_encoding_strdup to replace any strings that need to be
> converted.  (If it throws an elog, we have no damage worse than
> leaking the already strdup'd strings.)
>
> 3. Check for any nulls in the struct; if so, use free_struct_lconv
> to release whatever we did get, and then throw elog("out of memory").
>
> 4. Otherwise, copy the struct to CurrentLocaleConv.
>
> If we were really feeling picky, we could probably put in a PG_TRY
> block around step 2 to release the strdup'd storage after a conversion
> failure.  Not sure if it's worth the trouble.

It doesn't sound that much complicated to do that, I'll see about it,
but I guess that we could just do it in another thread..

> BTW, I marked the commitfest entry closed, but that may have been
> premature --- feel free to reopen it if you submit additional patches
> in this thread.

There are still two things that could be worked I guess:
- plperl and pltcl cleanup, and their abusive use of malloc. I'll
raise a new thread about that after brushing up a bit what I have and
add a new entry in the CF as that's not directly related to this
thread.
- ShmemAlloc and its missing checks. And we can do something here.

I have slept on it, and looked at the numbers. There are 11 calls to
ShmemAlloc in the code, and 4 of them are performing checks. And in
one of them there is this pattern in ShmemInitStruct(), which is also
something ShmemInitHash relies on:

        /* It isn't in the table yet. allocate and initialize it */
        structPtr = ShmemAlloc(size);
        if (structPtr == NULL)
        {
            /* out of memory; remove the failed ShmemIndex entry */
            hash_search(ShmemIndex, name, HASH_REMOVE, NULL);
            LWLockRelease(ShmemIndexLock);
            ereport(ERROR,
                    (errcode(ERRCODE_OUT_OF_MEMORY),
                     errmsg("not enough shared memory for data structure"
                            " \"%s\" (%zu bytes requested)",
                            name, size)));
        }
Which means that processes have an escape window when initializing
shared memory by cleaning up the index if an entry cannot be found and
then cannot be created properly. I don't think that it is a good idea
to change that by forcing ShmemAlloc to fail. So I would tend to just
have the patch attached and add those missing NULL-checks on all the
existing ShmemAlloc() calls.

Opinions?
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Declarative partitioning - another take
Next
From: Amit Langote
Date:
Subject: Re: Declarative partitioning - another take