Thread: Missing checks when malloc returns NULL...
Hi all, While auditing the code, I got surprised that there are a couple of code paths that do nothing for this error handling: - pg_regress and isolationtester use malloc extensively, in case of failure those would just crash crash. I think that it matters for buildfarm members that are under memory pressure to not do so, so those should use pg_malloc instead. - refint.c makes use of malloc to store plans in top memory context. That's a buggy concept clearly... This code would need to be reworked more largely than in the patch I attach. - pg_dlsym for darwin uses malloc, but would crash on failure - ps_status.c does nothing when it uses malloc(). - sprompt.c uses malloc once, and would crash on failure - mcxt.c uses that, which is surprising: @@ -704,7 +704,8 @@ MemoryContextCreate(NodeTag tag, Size size, { /* Special case for startup: use good ol' malloc */ node = (MemoryContext) malloc(needed); - Assert(node != NULL); + if (node == NULL) + elog(PANIC, "out of memory"); } I think that a PANIC is cleaner here instead of a simple crash. So attached is a patch aimed at improving things. Thoughts? -- Michael
Attachment
Michael Paquier <michael.paquier@gmail.com> writes: > - mcxt.c uses that, which is surprising: > @@ -704,7 +704,8 @@ MemoryContextCreate(NodeTag tag, Size size, > { > /* Special case for startup: use good ol' malloc */ > node = (MemoryContext) malloc(needed); > - Assert(node != NULL); > + if (node == NULL) > + elog(PANIC, "out of memory"); > } > I think that a PANIC is cleaner here instead of a simple crash. But the elog mechanism assumes that memory contexts are working. If this ever actually did fire, no good would come of it. regards, tom lane
On Tue, Jun 21, 2016 at 10:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Michael Paquier <michael.paquier@gmail.com> writes: >> - mcxt.c uses that, which is surprising: >> @@ -704,7 +704,8 @@ MemoryContextCreate(NodeTag tag, Size size, >> { >> /* Special case for startup: use good ol' malloc */ >> node = (MemoryContext) malloc(needed); >> - Assert(node != NULL); >> + if (node == NULL) >> + elog(PANIC, "out of memory"); >> } >> I think that a PANIC is cleaner here instead of a simple crash. > > But the elog mechanism assumes that memory contexts are working. > If this ever actually did fire, no good would come of it. OK, there is not much that we can do here then. What about the rest? Those seem like legit concerns to me. -- Michael
Attachment
On Wed, Jun 22, 2016 at 10:41 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > OK, there is not much that we can do here then. What about the rest? > Those seem like legit concerns to me. Registered this one to the next CF as a bug fix: https://commitfest.postgresql.org/10/653/ It would be better not to crash if we can avoid it. -- Michael
On 06/22/2016 04:41 AM, Michael Paquier wrote: > On Tue, Jun 21, 2016 at 10:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Michael Paquier <michael.paquier@gmail.com> writes: >>> - mcxt.c uses that, which is surprising: >>> @@ -704,7 +704,8 @@ MemoryContextCreate(NodeTag tag, Size size, >>> { >>> /* Special case for startup: use good ol' malloc */ >>> node = (MemoryContext) malloc(needed); >>> - Assert(node != NULL); >>> + if (node == NULL) >>> + elog(PANIC, "out of memory"); >>> } >>> I think that a PANIC is cleaner here instead of a simple crash. >> >> But the elog mechanism assumes that memory contexts are working. >> If this ever actually did fire, no good would come of it. > make s > OK, there is not much that we can do here then. What about the rest? > Those seem like legit concerns to me. There's also a realloc() and an strdup() call in refint.c. But looking at refint.c, I wonder why it's using malloc()/free() in the first place, rather than e.g. TopMemoryContext. The string construction code with sprintf() and strlen() looks quite antiqued, too, StringInfo would make it more readable. Does refint.c actually have any value anymore? What if we just removed it altogether? It's good to have some C triggers in contrib, to serve as examples, and to have some regression test coverage for all that. But ISTM that the 'autoinc' module covers the trigger API just as well. The code in ps_status() runs before the elog machinery has been initialized, so you get a rather unhelpful error: > error occurred at ps_status.c:167 before error message processing is available I guess that's still better than outright crashing, though. There are also a few strdup() calls there that can run out of memory.. Not many of the callers of simple_prompt() check for a NULL result, so in all probability, returning NULL from there will just crash in the caller. There's one existing "return NULL" in there already, so this isn't a new problem, but can we find a better way? There are more malloc(), realloc() and strdup() calls in isolationtester and pg_regress, that we ought to change too while we're at it. - Heikki
On Thu, Aug 18, 2016 at 6:12 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote: > On 06/22/2016 04:41 AM, Michael Paquier wrote: >> make s >> OK, there is not much that we can do here then. What about the rest? >> Those seem like legit concerns to me. > > > There's also a realloc() and an strdup() call in refint.c. But looking at > refint.c, I wonder why it's using malloc()/free() in the first place, rather > than e.g. TopMemoryContext. The string construction code with sprintf() and > strlen() looks quite antiqued, too, StringInfo would make it more readable. > > Does refint.c actually have any value anymore? What if we just removed it > altogether? It's good to have some C triggers in contrib, to serve as > examples, and to have some regression test coverage for all that. But ISTM > that the 'autoinc' module covers the trigger API just as well. I'd be in favor for nuking it instead of keeping this code as it overlaps with autoinc, I did not notice that. Even if that's an example, it is actually showing some bad code patters, which kills its own purpose. > The code in ps_status() runs before the elog machinery has been initialized, > so you get a rather unhelpful error: > >> error occurred at ps_status.c:167 before error message processing is >> available > > I guess that's still better than outright crashing, though. There are also a > few strdup() calls there that can run out of memory.. Right. Another possibility is to directly call write_stderr to be sure that the user gets the right message, then do exit(1). Thinking more about it, as save_ps_display_args is called really at the beginning this is perhaps what makes the most sense, so I changed the patch this way. > Not many of the callers of simple_prompt() check for a NULL result, so in > all probability, returning NULL from there will just crash in the caller. > There's one existing "return NULL" in there already, so this isn't a new > problem, but can we find a better way? I got inspired by the return value in the case of WIN32. Letting sprompt.c in charge of printing an error does not seem like a good idea to me, and there are already callers of simple_prompt() that check for NULL and report an error appropriately, like pg_backup_db.c. So I think that we had better adapt all the existing calls of simple_prompt checking for NULL and make things more consistent by having the callers report errors. Hence I updated the patch with those changes. Another possibility would be to keep a static buffer that has a fixed size, basically 4096 as this is the maximum expected by psql, but that's a waste of bytes in all other cases: one caller uses 4096, two 128 and the rest use 100 or less. By the way, while looking at that, I also noticed that in exec_command of psql's command.c we don't check for gets_fromFile that could return NULL. > There are more malloc(), realloc() and strdup() calls in isolationtester and > pg_regress, that we ought to change too while we're at it. Right. I added some handling for those callers as well with the pg_ equivalents. -- Michael
Attachment
Hello, Michael. Your patch [1] was marked as "Needs review" so I decided to take a look. It looks good to me. However I found dozens of places in PostgreSQL code that seem to have similar problem you are trying to fix [2]. As far as I understand these are only places left that don't check malloc/realloc/ strdup return values properly. I thought maybe you will be willing to fix they too so we could forget about this problem forever. If not I will be happy to do it. However in this case we need someone familiar with affected parts of the code who will be willing to re-check a new patch since I'm not filling particularly confident about how exactly errors should be handled in all these cases. By the way maybe someone knows other procedures besides malloc, realloc and strdup that require special attention? [1] https://commitfest.postgresql.org/10/653/ [2] http://afiskon.ru/s/15/83287ef7d2_malloc.txt -- Best regards, Aleksander Alekseev
On Sat, Aug 27, 2016 at 12:33 AM, Aleksander Alekseev <a.alekseev@postgrespro.ru> wrote: > Your patch [1] was marked as "Needs review" so I decided to take a look. Thanks for the input! > It looks good to me. However I found dozens of places in PostgreSQL code > that seem to have similar problem you are trying to fix [2]. As far as I > understand these are only places left that don't check malloc/realloc/ > strdup return values properly. I thought maybe you will be willing to > fix they too so we could forget about this problem forever. So my lookup was still incomplete. > If not I will be happy to do it. However in this case we need someone > familiar with affected parts of the code who will be willing to re-check > a new patch since I'm not filling particularly confident about how > exactly errors should be handled in all these cases. I'll do it and compress that in my patch. > By the way maybe someone knows other procedures besides malloc, realloc > and strdup that require special attention? I don't know how you did it, but this email has visibly broken the original thread. Did you change the topic name? ./src/backend/postmaster/postmaster.c: userDoption = strdup(optarg); [...] ./src/backend/bootstrap/bootstrap.c: userDoption = strdup(optarg); [...] ./src/backend/tcop/postgres.c: userDoption = strdup(optarg); We cannot use pstrdup here because memory contexts are not set up here, still it would be better than crashing, but I am not sure if that's worth complicating this code.. Other opinions are welcome. ./contrib/vacuumlo/vacuumlo.c: param.pg_user = strdup(optarg); [...] ./contrib/pg_standby/pg_standby.c: triggerPath = strdup(optarg); [...] ./src/bin/pg_archivecleanup/pg_archivecleanup.c: additional_ext = strdup(optarg); /* Extension to remove Right we can do something here with pstrdup(). ./contrib/spi/refint.c: plan->splan = (SPIPlanPtr *) malloc(sizeof(SPIPlanPtr)); Regarding refint.c, you can see upthread. Instead of reworking the code it would be better to just drop it from the tree. ./src/backend/utils/adt/pg_locale.c: grouping = strdup(extlconv->grouping); Here that would be a failure with an elog() as this is getting used by things like NUM_TOCHAR_finish and PGLC_localeconv. ./src/backend/utils/mmgr/mcxt.c: node = (MemoryContext) malloc(needed); You cannot do much here... ./src/timezone/zic.c: lcltime = strdup(optarg); This is upstream code, not worth worrying. ./src/pl/tcl/pltcl.c: prodesc->user_proname = strdup(NameStr(procStruct->proname)); ./src/pl/tcl/pltcl.c: prodesc->internal_proname = strdup(internal_proname); ./src/pl/tcl/pltcl.c- if (prodesc->user_proname == NULL || prodesc->internal_proname == NULL) ./src/pl/tcl/pltcl.c- ereport(ERROR, ./src/pl/tcl/pltcl.c- (errcode(ERRCODE_OUT_OF_MEMORY), ./src/pl/tcl/pltcl.c- errmsg("out of memory"))); Ah, yes. Here we need to do a free(prodesc) first. ./src/common/exec.c: putenv(strdup(env_path)); set_pglocale_pgservice() is used at the beginning of each process run, meaning that a failure here would be printf(stderr), followed by exit() for frontend, even ECPG as this compiles with -DFRONTEND. Backend can use elog(ERROR) btw. -- Michael
On Sat, Aug 27, 2016 at 10:24 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > ./src/backend/postmaster/postmaster.c: userDoption = > strdup(optarg); > [...] > ./src/backend/bootstrap/bootstrap.c: userDoption = > strdup(optarg); > [...] > ./src/backend/tcop/postgres.c: userDoption = strdup(optarg); > We cannot use pstrdup here because memory contexts are not set up > here, still it would be better than crashing, but I am not sure if > that's worth complicating this code.. Other opinions are welcome. Those are not changed. We could have some NULL-checks but I am not sure that's worth the complication here. > ./contrib/vacuumlo/vacuumlo.c: param.pg_user = strdup(optarg); > [...] > ./contrib/pg_standby/pg_standby.c: triggerPath = strdup(optarg); > [...] > ./src/bin/pg_archivecleanup/pg_archivecleanup.c: > additional_ext = strdup(optarg); /* Extension to remove > Right we can do something here with pstrdup(). Those are updated with pg_strdup(). > ./contrib/spi/refint.c: plan->splan = (SPIPlanPtr *) > malloc(sizeof(SPIPlanPtr)); > Regarding refint.c, you can see upthread. Instead of reworking the > code it would be better to just drop it from the tree. I'd rather see this nuked out of the surface of the code tree. > ./src/backend/utils/adt/pg_locale.c: grouping = strdup(extlconv->grouping); > Here that would be a failure with an elog() as this is getting used by > things like NUM_TOCHAR_finish and PGLC_localeconv. Done. > ./src/pl/tcl/pltcl.c: prodesc->user_proname = > strdup(NameStr(procStruct->proname)); > ./src/pl/tcl/pltcl.c: prodesc->internal_proname = > strdup(internal_proname); > ./src/pl/tcl/pltcl.c- if (prodesc->user_proname == NULL || > prodesc->internal_proname == NULL) > ./src/pl/tcl/pltcl.c- ereport(ERROR, > ./src/pl/tcl/pltcl.c- (errcode(ERRCODE_OUT_OF_MEMORY), > ./src/pl/tcl/pltcl.c- errmsg("out of memory"))); > Ah, yes. Here we need to do a free(prodesc) first. Done. > ./src/common/exec.c: putenv(strdup(env_path)); > set_pglocale_pgservice() is used at the beginning of each process run, > meaning that a failure here would be printf(stderr), followed by > exit() for frontend, even ECPG as this compiles with -DFRONTEND. > Backend can use elog(ERROR) btw. Done. Regarding the handling of strdup in libpq the code is careful in its handling after a review, so we had better do nothing. After that, there are two calls to realloc and one call to malloc that deserve some attention, though those happen in pgc.l and I am not exactly sure how to handle them. As Alexander's email (https://www.postgresql.org/message-id/20160826153358.GA29981%40e733) has broken this thread, I am attaching to this thread the analysis report that has been generated by Alexander previously. It was referenced in only an URL. Attached is an updated patch addressing those extra points. -- Michael
Attachment
Hello, Michael > I don't know how you did it, but this email has visibly broken the > original thread. Did you change the topic name? I'm very sorry for this. I had no local copy of this thread. So I wrote a new email with the same subject hoping it will be OK. Apparently right In-Reply-To header is also required. > if (prodesc->user_proname == NULL || prodesc->internal_proname == NULL) > + { > + free(prodesc); I think that prodesc->user_proname and prodesc->internal_proname should also be freed if they are not NULL's. > By the way maybe someone knows other procedures besides malloc, realloc > and strdup that require special attention? I recalled that there is also calloc(). I've found four places that use calloc() and look suspicious to me (see attachment). What do you think - are these bugs or not? -- Best regards, Aleksander Alekseev
Attachment
> Hello, Michael > > > I don't know how you did it, but this email has visibly broken the > > original thread. Did you change the topic name? > > I'm very sorry for this. I had no local copy of this thread. So I wrote a > new email with the same subject hoping it will be OK. Apparently right > In-Reply-To header is also required. > > > if (prodesc->user_proname == NULL || prodesc->internal_proname == NULL) > > + { > > + free(prodesc); > > I think that prodesc->user_proname and prodesc->internal_proname should > also be freed if they are not NULL's. > > > By the way maybe someone knows other procedures besides malloc, realloc > > and strdup that require special attention? > > I recalled that there is also calloc(). I've found four places that use > calloc() and look suspicious to me (see attachment). What do you think - > are these bugs or not? I've just realized that there is also malloc-compatible ShmemAlloc(). Apparently it's return value sometimes is not properly checked too. See attachment. -- Best regards, Aleksander Alekseev
Attachment
Aleksander Alekseev <a.alekseev@postgrespro.ru> writes: >> if (prodesc->user_proname == NULL || prodesc->internal_proname == NULL) >> + { >> + free(prodesc); > I think that prodesc->user_proname and prodesc->internal_proname should > also be freed if they are not NULL's. Hmm, this is kind of putting lipstick on a pig, isn't it? That code is still prone to leakage further down, because it calls stuff like SearchSysCache which is entirely capable of throwing elog(ERROR). If we're going to touch compile_pltcl_function at all, I'd vote for (1) changing these malloc calls to MemoryContextAlloc(TopMemoryContext,... (2) putting the cleanup into a PG_CATCH block, and removing all the retail free() calls that are there now. regards, tom lane
On Mon, Aug 29, 2016 at 11:13 PM, Aleksander Alekseev <a.alekseev@postgrespro.ru> wrote: >> Hello, Michael >> >> > I don't know how you did it, but this email has visibly broken the >> > original thread. Did you change the topic name? >> >> I'm very sorry for this. I had no local copy of this thread. So I wrote a >> new email with the same subject hoping it will be OK. Apparently right >> In-Reply-To header is also required. >> >> > if (prodesc->user_proname == NULL || prodesc->internal_proname == NULL) >> > + { >> > + free(prodesc); >> >> I think that prodesc->user_proname and prodesc->internal_proname should >> also be freed if they are not NULL's. >> >> > By the way maybe someone knows other procedures besides malloc, realloc >> > and strdup that require special attention? >> >> I recalled that there is also calloc(). I've found four places that use >> calloc() and look suspicious to me (see attachment). What do you think - >> are these bugs or not? ./src/backend/storage/buffer/localbuf.c: LocalBufferBlockPointers = (Block *) calloc(nbufs, sizeof(Block)); ./src/interfaces/libpq/fe-print.c- fprintf(stderr, libpq_gettext("out of memory\n")); Here it does not matter the process is taken down with FATAL or abort() immediately. ./src/backend/bootstrap/bootstrap.c: app = Typ = ALLOC(struct typmap *, i + 1); But here it does actually matter. > I've just realized that there is also malloc-compatible ShmemAlloc(). > Apparently it's return value sometimes is not properly checked too. See > attachment. ./src/backend/storage/lmgr/proc.c: pgxacts = (PGXACT *) ShmemAlloc(TotalProcs * sizeof(PGXACT)); ./src/backend/storage/lmgr/proc.c: ProcStructLock = (slock_t *) ShmemAlloc(sizeof(slock_t)); ./src/backend/storage/lmgr/lwlock.c: ptr = (char *) ShmemAlloc(spaceLocks); ./src/backend/storage/ipc/shmem.c: ShmemAlloc(sizeof(*ShmemVariableCache)); ./src/backend/access/transam/slru.c: shared->buffer_locks = (LWLockPadded *) ShmemAlloc(sizeof(LWLockPadded) * nslots); ./src/backend/postmaster/postmaster.c: ShmemBackendArray = (Backend *) ShmemAlloc(size); The funny part here is that ProcGlobal->allProcs is actually handled, but not the two others. Well yes, you are right, we really need to fail on FATAL for all of them if ShmemAlloc returns NULL as they involve the shmem initialization at postmaster startup. -- Michael
On Mon, Aug 29, 2016 at 11:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Aleksander Alekseev <a.alekseev@postgrespro.ru> writes: >>> if (prodesc->user_proname == NULL || prodesc->internal_proname == NULL) >>> + { >>> + free(prodesc); > >> I think that prodesc->user_proname and prodesc->internal_proname should >> also be freed if they are not NULL's. > > Hmm, this is kind of putting lipstick on a pig, isn't it? That code > is still prone to leakage further down, because it calls stuff like > SearchSysCache which is entirely capable of throwing elog(ERROR). > > If we're going to touch compile_pltcl_function at all, I'd vote for > > (1) changing these malloc calls to MemoryContextAlloc(TopMemoryContext,... > (2) putting the cleanup into a PG_CATCH block, and removing all the > retail free() calls that are there now. We've done similar handling lately with for example 8c75ad43 for plpython, and this has finished by using TopMemoryContext, so that's the way to do. By the way plperl needs the same cleanup, and by looking at the archives guess who did exactly that with a set of patches not long ago: https://www.postgresql.org/message-id/CAB7nPqRxvq+Q66UFzD9wa5UAftYN4WAUADbjXKFrync96kf-VQ@mail.gmail.com But I did not get much feedback about those patches :) So for now I have removed this part from the patch of this thread. -- Michael
On Tue, Aug 30, 2016 at 2:57 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > The funny part here is that ProcGlobal->allProcs is actually handled, > but not the two others. Well yes, you are right, we really need to > fail on FATAL for all of them if ShmemAlloc returns NULL as they > involve the shmem initialization at postmaster startup. And with an actual patch things are better. -- Michael
Attachment
> > The funny part here is that ProcGlobal->allProcs is actually handled, > > but not the two others. Well yes, you are right, we really need to > > fail on FATAL for all of them if ShmemAlloc returns NULL as they > > involve the shmem initialization at postmaster startup. > > And with an actual patch things are better. Currently I can't think of any further improvements. I even would dare to say that patch is Ready for Committer. -- Best regards, Aleksander Alekseev
On Tue, Aug 30, 2016 at 5:08 PM, Aleksander Alekseev <a.alekseev@postgrespro.ru> wrote: >> > The funny part here is that ProcGlobal->allProcs is actually handled, >> > but not the two others. Well yes, you are right, we really need to >> > fail on FATAL for all of them if ShmemAlloc returns NULL as they >> > involve the shmem initialization at postmaster startup. >> >> And with an actual patch things are better. > > Currently I can't think of any further improvements. I even would dare > to say that patch is Ready for Committer. Thanks for the fruitful input by the way! You spotted many things. -- Michael
> >> And with an actual patch things are better. > > > > Currently I can't think of any further improvements. I even would dare > > to say that patch is Ready for Committer. > > Thanks for the fruitful input by the way! You spotted many things. Thank _you_ for paying attention for such issues in PostgreSQL code! -- Best regards, Aleksander Alekseev
Michael Paquier <michael.paquier@gmail.com> writes: >> I've just realized that there is also malloc-compatible ShmemAlloc(). >> Apparently it's return value sometimes is not properly checked too. See >> attachment. > The funny part here is that ProcGlobal->allProcs is actually handled, > but not the two others. Well yes, you are right, we really need to > fail on FATAL for all of them if ShmemAlloc returns NULL as they > involve the shmem initialization at postmaster startup. A quick scan says that most callers are failing to check at all, several more just have duplicate check-and-immediately-elog code, and only one has got anything useful to do besides throwing error. I think what we ought to do is make ShmemAlloc act like palloc (ie throw error not return NULL), and remove the duplicated error checks. For the one caller that that would be bad for, we could invent something like ShmemAllocNoError, or ShmemAllocExtended with a no_error flag, or whatever other new API suits your fancy. But as-is, it's just inviting more errors-of-omission like the large number that already exist. I think people are conditioned by palloc to expect such functions to throw error. regards, tom lane
On Tue, Aug 30, 2016 at 10:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I think what we ought to do is make ShmemAlloc act like palloc > (ie throw error not return NULL), and remove the duplicated error > checks. For the one caller that that would be bad for, we could > invent something like ShmemAllocNoError, or ShmemAllocExtended with > a no_error flag, or whatever other new API suits your fancy. But > as-is, it's just inviting more errors-of-omission like the large > number that already exist. I think people are conditioned by palloc > to expect such functions to throw error. The only reason why I did not propose that for ShmemAlloc is because of extensions potentially using this routine and having some special handling when it returns NULL. And changing it to behave like palloc would break such extensions. -- Michael
> > I think what we ought to do is make ShmemAlloc act like palloc > > (ie throw error not return NULL), and remove the duplicated error > > checks. For the one caller that that would be bad for, we could > > invent something like ShmemAllocNoError, or ShmemAllocExtended with > > a no_error flag, or whatever other new API suits your fancy. But > > as-is, it's just inviting more errors-of-omission like the large > > number that already exist. I think people are conditioned by palloc > > to expect such functions to throw error. > > The only reason why I did not propose that for ShmemAlloc is because > of extensions potentially using this routine and having some special > handling when it returns NULL. And changing it to behave like palloc > would break such extensions. I suggest to keep ShmemAlloc as is for backward compatibility and introduce a new procedure ShmemAllocSafe. Also we could add a scary comment (and also a warning log message?) that ShmemAlloc is deprecated and possibly will be removed in later releases. -- Best regards, Aleksander Alekseev
Michael Paquier <michael.paquier@gmail.com> writes: > On Tue, Aug 30, 2016 at 10:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I think what we ought to do is make ShmemAlloc act like palloc >> (ie throw error not return NULL), and remove the duplicated error >> checks. > The only reason why I did not propose that for ShmemAlloc is because > of extensions potentially using this routine and having some special > handling when it returns NULL. And changing it to behave like palloc > would break such extensions. The evidence from the callers in core suggests that this change would be much more likely to fix extensions than break them, ie it's more likely that they are missing error checks than that they have something useful to do if the alloc fails. An extension that actually does need that could do something like #if CATALOG_VERSION_NO < whatever-v10-is #define ShmemAllocNoError(x) ShmemAlloc(x) #endif ... ptr = ShmemAllocNoError(size);if (ptr == NULL) // same as before from here on regards, tom lane
Aleksander Alekseev <a.alekseev@postgrespro.ru> writes: > I suggest to keep ShmemAlloc as is for backward compatibility and > introduce a new procedure ShmemAllocSafe. I think that's about the worst of all possible worlds, as it guarantees having to touch most call sites. If there were more than one known caller that really wanted the return-NULL behavior, exact backwards compatibility might carry the day; but as things stand I think the odds are that most call sites need an error check and probably have not got one. I'd rather err in favor of "safe by default" than "backwards compatible". regards, tom lane
Michael Paquier <michael.paquier@gmail.com> writes: > And with an actual patch things are better. Working through this patch, it suddenly strikes me that we are going about fixing the callers of simple_prompt the wrong way. The existing definition with returning a malloc'd string creates a hazard of malloc failure, and it *also* creates a hazard of not remembering to free the result. Moreover, there are almost no callers that want a max result longer than ~100 bytes. Seems like it would be a whole lot easier all around to make the caller supply the buffer, ie typical call would be roughly char buf[100]; simple_prompt("Password: ", buf, sizeof(buf), false); Callers that want to deal with a malloc'd buffer (all one of them, looks like) can do it themselves, for basically only one more line than is needed now. regards, tom lane
Michael Paquier <michael.paquier@gmail.com> writes: > [ malloc-nulls-v5.patch ] I've committed some form of all of these changes except the one in adt/pg_locale.c. I'm not entirely sure whether we need to do anything about that at all, but if we do, this doesn't cut it: thousands_sep = db_encoding_strdup(encoding, extlconv->thousands_sep); grouping = strdup(extlconv->grouping); + if (!grouping) + elog(ERROR, "out of memory"); +#ifdef WIN32 /* use monetary to set the ctype */ setlocale(LC_CTYPE, locale_monetary); There are multiple things wrong with that: 1. The db_encoding_strdup calls can also return NULL on OOM, and aren't being checked likewise. (And there's another plain strdup further down, too.) 2. You can't safely throw an elog right there, because you need to restore the backend's prevailing setlocale state first. 3. Also, if you do it like that, you'll leak whatever strings were already strdup'd. (This is a relatively minor problem, but still, if we're trying to be 100% correct then we're not there yet.) Also, now that I'm looking at it, I see there's another pre-existing bug: 4. An elog exit is possible, due to either OOM or encoding conversion failure, inside db_encoding_strdup(). This means we have problems #2 and #3 even in the existing code. Now, I believe that the coding intention here was that assigning NULL to the respective fields of CurrentLocaleConv is okay and better than failing the operation completely. One argument against that is that it's unclear whether everyplace that uses any of those fields is checking for NULL first; and in any case, silently falling back to nonlocalized behavior might not be better than reporting OOM. Still, it's certainly better than risking problem #2, which could cause all sorts of subsequent malfunctions. I think that a complete fix for this might go along the lines of 1. While we are setlocale'd to the nonstandard locales, collect all the values by strdup'ing into a local variable of type "struct lconv". (We must strdup for the reason noted in the comment, that localeconv's output is no longer valid once we do another setlocale.) Then restore the standard locale settings. 2. Use db_encoding_strdup to replace any strings that need to be converted. (If it throws an elog, we have no damage worse than leaking the already strdup'd strings.) 3. Check for any nulls in the struct; if so, use free_struct_lconv to release whatever we did get, and then throw elog("out of memory"). 4. Otherwise, copy the struct to CurrentLocaleConv. If we were really feeling picky, we could probably put in a PG_TRY block around step 2 to release the strdup'd storage after a conversion failure. Not sure if it's worth the trouble. BTW, I marked the commitfest entry closed, but that may have been premature --- feel free to reopen it if you submit additional patches in this thread. regards, tom lane
On Wed, Aug 31, 2016 at 2:15 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Michael Paquier <michael.paquier@gmail.com> writes: >> And with an actual patch things are better. > > Working through this patch, it suddenly strikes me that we are going > about fixing the callers of simple_prompt the wrong way. The existing > definition with returning a malloc'd string creates a hazard of malloc > failure, and it *also* creates a hazard of not remembering to free the > result. Yes, this cleanup was part of the candidate patch of this thread as well. > Moreover, there are almost no callers that want a max result > longer than ~100 bytes. True, there is basically one such caller, psql, with 4096 bytes. > Seems like it would be a whole lot easier all > around to make the caller supply the buffer, ie typical call would be > roughly > > char buf[100]; > > simple_prompt("Password: ", buf, sizeof(buf), false); > > Callers that want to deal with a malloc'd buffer (all one of them, looks > like) can do it themselves, for basically only one more line than is > needed now. Yes, that's possible as well and I thought about doing so, but I found the buffer allocated from within simple_prompt clearer when hacking this part. By the way, I just reviewed 9daec77e that you pushed instead and that looks fine to me. Thanks! -- Michael
On Wed, Aug 31, 2016 at 7:47 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Michael Paquier <michael.paquier@gmail.com> writes: >> [ malloc-nulls-v5.patch ] > > I've committed some form of all of these changes Thanks! > except the one in > adt/pg_locale.c. I'm not entirely sure whether we need to do anything > about that at all, but if we do, this doesn't cut it: > > thousands_sep = db_encoding_strdup(encoding, extlconv->thousands_sep); > grouping = strdup(extlconv->grouping); > > + if (!grouping) > + elog(ERROR, "out of memory"); > + > #ifdef WIN32 > /* use monetary to set the ctype */ > setlocale(LC_CTYPE, locale_monetary); > > There are multiple things wrong with that: > > 1. The db_encoding_strdup calls can also return NULL on OOM, and aren't > being checked likewise. (And there's another plain strdup further down, > too.) > > 2. You can't safely throw an elog right there, because you need to restore > the backend's prevailing setlocale state first. Arg I haven't though of this one. > 3. Also, if you do it like that, you'll leak whatever strings were already > strdup'd. (This is a relatively minor problem, but still, if we're > trying to be 100% correct then we're not there yet.) > > Also, now that I'm looking at it, I see there's another pre-existing bug: > > 4. An elog exit is possible, due to either OOM or encoding conversion > failure, inside db_encoding_strdup(). This means we have problems #2 and > #3 even in the existing code. > > Now, I believe that the coding intention here was that assigning NULL > to the respective fields of CurrentLocaleConv is okay and better than > failing the operation completely. One argument against that is that > it's unclear whether everyplace that uses any of those fields is checking > for NULL first; and in any case, silently falling back to nonlocalized > behavior might not be better than reporting OOM. Still, it's certainly > better than risking problem #2, which could cause all sorts of subsequent > malfunctions. > > I think that a complete fix for this might go along the lines of > > 1. While we are setlocale'd to the nonstandard locales, collect all the > values by strdup'ing into a local variable of type "struct lconv". > (We must strdup for the reason noted in the comment, that localeconv's > output is no longer valid once we do another setlocale.) Then restore > the standard locale settings. The one at the top of the file... That's really platform-dependent. > 2. Use db_encoding_strdup to replace any strings that need to be > converted. (If it throws an elog, we have no damage worse than > leaking the already strdup'd strings.) > > 3. Check for any nulls in the struct; if so, use free_struct_lconv > to release whatever we did get, and then throw elog("out of memory"). > > 4. Otherwise, copy the struct to CurrentLocaleConv. > > If we were really feeling picky, we could probably put in a PG_TRY > block around step 2 to release the strdup'd storage after a conversion > failure. Not sure if it's worth the trouble. It doesn't sound that much complicated to do that, I'll see about it, but I guess that we could just do it in another thread.. > BTW, I marked the commitfest entry closed, but that may have been > premature --- feel free to reopen it if you submit additional patches > in this thread. There are still two things that could be worked I guess: - plperl and pltcl cleanup, and their abusive use of malloc. I'll raise a new thread about that after brushing up a bit what I have and add a new entry in the CF as that's not directly related to this thread. - ShmemAlloc and its missing checks. And we can do something here. I have slept on it, and looked at the numbers. There are 11 calls to ShmemAlloc in the code, and 4 of them are performing checks. And in one of them there is this pattern in ShmemInitStruct(), which is also something ShmemInitHash relies on: /* It isn't in the table yet. allocate and initialize it */ structPtr = ShmemAlloc(size); if (structPtr == NULL) { /* out of memory; remove the failed ShmemIndex entry */ hash_search(ShmemIndex, name, HASH_REMOVE, NULL); LWLockRelease(ShmemIndexLock); ereport(ERROR, (errcode(ERRCODE_OUT_OF_MEMORY), errmsg("not enough shared memory for data structure" " \"%s\" (%zu bytes requested)", name, size))); } Which means that processes have an escape window when initializing shared memory by cleaning up the index if an entry cannot be found and then cannot be created properly. I don't think that it is a good idea to change that by forcing ShmemAlloc to fail. So I would tend to just have the patch attached and add those missing NULL-checks on all the existing ShmemAlloc() calls. Opinions? -- Michael
Attachment
Michael Paquier <michael.paquier@gmail.com> writes: > Which means that processes have an escape window when initializing > shared memory by cleaning up the index if an entry cannot be found and > then cannot be created properly. I don't think that it is a good idea > to change that by forcing ShmemAlloc to fail. So I would tend to just > have the patch attached and add those missing NULL-checks on all the > existing ShmemAlloc() calls. > Opinions? I still think it'd be better to fix that as attached, because it represents a net reduction not net addition of code, and it provides a defense against future repetitions of the same omission. If only 4 out of 11 existing calls were properly checked --- some of them adjacent to calls with checks --- that should tell us that we *will* have more instances of the same bug if we don't fix it centrally. I also note that your patch missed checks for two ShmemAlloc calls in InitShmemAllocation and ShmemInitStruct. Admittedly, since those are the very first such calls, it's highly unlikely they'd fail; but I think this exercise is not about dismissing failures as improbable. Almost all of these failures are improbable, given that we precalculate the shmem space requirement. regards, tom lane diff --git a/src/backend/storage/ipc/shmem.c b/src/backend/storage/ipc/shmem.c index 1efe020..cc3af2d 100644 *** a/src/backend/storage/ipc/shmem.c --- b/src/backend/storage/ipc/shmem.c *************** InitShmemAllocation(void) *** 163,177 **** /* * ShmemAlloc -- allocate max-aligned chunk from shared memory * ! * Assumes ShmemLock and ShmemSegHdr are initialized. * ! * Returns: real pointer to memory or NULL if we are out ! * of space. Has to return a real pointer in order ! * to be compatible with malloc(). */ void * ShmemAlloc(Size size) { Size newStart; Size newFree; void *newSpace; --- 163,194 ---- /* * ShmemAlloc -- allocate max-aligned chunk from shared memory * ! * Throws error if request cannot be satisfied. * ! * Assumes ShmemLock and ShmemSegHdr are initialized. */ void * ShmemAlloc(Size size) { + void *newSpace; + + newSpace = ShmemAllocNoError(size); + if (!newSpace) + ereport(ERROR, + (errcode(ERRCODE_OUT_OF_MEMORY), + errmsg("out of shared memory (%zu bytes requested)", + size))); + return newSpace; + } + + /* + * ShmemAllocNoError -- allocate max-aligned chunk from shared memory + * + * As ShmemAlloc, but returns NULL if out of space, rather than erroring. + */ + void * + ShmemAllocNoError(Size size) + { Size newStart; Size newFree; void *newSpace; *************** ShmemAlloc(Size size) *** 206,216 **** SpinLockRelease(ShmemLock); ! if (!newSpace) ! ereport(WARNING, ! (errcode(ERRCODE_OUT_OF_MEMORY), ! errmsg("out of shared memory"))); ! Assert(newSpace == (void *) CACHELINEALIGN(newSpace)); return newSpace; --- 223,229 ---- SpinLockRelease(ShmemLock); ! /* note this assert is okay with newSpace == NULL */ Assert(newSpace == (void *) CACHELINEALIGN(newSpace)); return newSpace; *************** ShmemInitHash(const char *name, /* table *** 293,299 **** * The shared memory allocator must be specified too. */ infoP->dsize = infoP->max_dsize = hash_select_dirsize(max_size); ! infoP->alloc = ShmemAlloc; hash_flags |= HASH_SHARED_MEM | HASH_ALLOC | HASH_DIRSIZE; /* look it up in the shmem index */ --- 306,312 ---- * The shared memory allocator must be specified too. */ infoP->dsize = infoP->max_dsize = hash_select_dirsize(max_size); ! infoP->alloc = ShmemAllocNoError; hash_flags |= HASH_SHARED_MEM | HASH_ALLOC | HASH_DIRSIZE; /* look it up in the shmem index */ *************** ShmemInitStruct(const char *name, Size s *** 364,375 **** */ Assert(shmemseghdr->index == NULL); structPtr = ShmemAlloc(size); - if (structPtr == NULL) - ereport(ERROR, - (errcode(ERRCODE_OUT_OF_MEMORY), - errmsg("not enough shared memory for data structure" - " \"%s\" (%zu bytes requested)", - name, size))); shmemseghdr->index = structPtr; *foundPtr = FALSE; } --- 377,382 ---- *************** ShmemInitStruct(const char *name, Size s *** 410,416 **** else { /* It isn't in the table yet. allocate and initialize it */ ! structPtr = ShmemAlloc(size); if (structPtr == NULL) { /* out of memory; remove the failed ShmemIndex entry */ --- 417,423 ---- else { /* It isn't in the table yet. allocate and initialize it */ ! structPtr = ShmemAllocNoError(size); if (structPtr == NULL) { /* out of memory; remove the failed ShmemIndex entry */ diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c index 7cdb355..4064b20 100644 *** a/src/backend/storage/lmgr/predicate.c --- b/src/backend/storage/lmgr/predicate.c *************** InitPredicateLocks(void) *** 1184,1195 **** requestSize = mul_size((Size) max_table_size, PredXactListElementDataSize); PredXact->element = ShmemAlloc(requestSize); - if (PredXact->element == NULL) - ereport(ERROR, - (errcode(ERRCODE_OUT_OF_MEMORY), - errmsg("not enough shared memory for elements of data structure" - " \"%s\" (%zu bytes requested)", - "PredXactList", requestSize))); /* Add all elements to available list, clean. */ memset(PredXact->element, 0, requestSize); for (i = 0; i < max_table_size; i++) --- 1184,1189 ---- *************** InitPredicateLocks(void) *** 1255,1266 **** requestSize = mul_size((Size) max_table_size, RWConflictDataSize); RWConflictPool->element = ShmemAlloc(requestSize); - if (RWConflictPool->element == NULL) - ereport(ERROR, - (errcode(ERRCODE_OUT_OF_MEMORY), - errmsg("not enough shared memory for elements of data structure" - " \"%s\" (%zu bytes requested)", - "RWConflictPool", requestSize))); /* Add all elements to available list, clean. */ memset(RWConflictPool->element, 0, requestSize); for (i = 0; i < max_table_size; i++) --- 1249,1254 ---- diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c index 9a758bd..33e7023 100644 *** a/src/backend/storage/lmgr/proc.c --- b/src/backend/storage/lmgr/proc.c *************** InitProcGlobal(void) *** 194,207 **** * between groups. */ procs = (PGPROC *) ShmemAlloc(TotalProcs * sizeof(PGPROC)); ProcGlobal->allProcs = procs; /* XXX allProcCount isn't really all of them; it excludes prepared xacts */ ProcGlobal->allProcCount = MaxBackends + NUM_AUXILIARY_PROCS; - if (!procs) - ereport(FATAL, - (errcode(ERRCODE_OUT_OF_MEMORY), - errmsg("out of shared memory"))); - MemSet(procs, 0, TotalProcs * sizeof(PGPROC)); /* * Also allocate a separate array of PGXACT structures. This is separate --- 194,203 ---- * between groups. */ procs = (PGPROC *) ShmemAlloc(TotalProcs * sizeof(PGPROC)); + MemSet(procs, 0, TotalProcs * sizeof(PGPROC)); ProcGlobal->allProcs = procs; /* XXX allProcCount isn't really all of them; it excludes prepared xacts */ ProcGlobal->allProcCount = MaxBackends + NUM_AUXILIARY_PROCS; /* * Also allocate a separate array of PGXACT structures. This is separate diff --git a/src/include/storage/shmem.h b/src/include/storage/shmem.h index 6468e66..2560e6c 100644 *** a/src/include/storage/shmem.h --- b/src/include/storage/shmem.h *************** typedef struct SHM_QUEUE *** 35,40 **** --- 35,41 ---- extern void InitShmemAccess(void *seghdr); extern void InitShmemAllocation(void); extern void *ShmemAlloc(Size size); + extern void *ShmemAllocNoError(Size size); extern bool ShmemAddrIsValid(const void *addr); extern void InitShmemIndex(void); extern HTAB *ShmemInitHash(const char *name, long init_size, long max_size,
On Thu, Sep 1, 2016 at 12:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I still think it'd be better to fix that as attached, because it > represents a net reduction not net addition of code, and it provides > a defense against future repetitions of the same omission. If only > 4 out of 11 existing calls were properly checked --- some of them > adjacent to calls with checks --- that should tell us that we *will* > have more instances of the same bug if we don't fix it centrally. > > I also note that your patch missed checks for two ShmemAlloc calls in > InitShmemAllocation and ShmemInitStruct. Admittedly, since those are > the very first such calls, it's highly unlikely they'd fail; but I think > this exercise is not about dismissing failures as improbable. Almost > all of these failures are improbable, given that we precalculate the > shmem space requirement. OK, that looks fine to me after review. Also, we could take one extra step forward then, and just introduce ShmemAllocExtended that holds two flags as per the attached: - SHMEM_ALLOC_ZERO that zeros all the fields - SHMEM_ALLOC_NO_OOM that does not fail Or we could just put a call to MemSet directly in ShmemAlloc(), but I'd rather keep the base routines extensible. What do you think about the attached? One other possibility would be to take your patch, but use MemSet unconditionally on it as this should not cause any overhead. -- Michael
Attachment
Michael Paquier <michael.paquier@gmail.com> writes: > Also, we could take one extra step forward then, and just introduce > ShmemAllocExtended that holds two flags as per the attached: > - SHMEM_ALLOC_ZERO that zeros all the fields > - SHMEM_ALLOC_NO_OOM that does not fail Don't see the point really ... it's just more API churn without any very compelling reason. (It doesn't look like you correctly implemented the case of both flags being set, either.) regards, tom lane
On Thu, Sep 1, 2016 at 10:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Michael Paquier <michael.paquier@gmail.com> writes: >> Also, we could take one extra step forward then, and just introduce >> ShmemAllocExtended that holds two flags as per the attached: >> - SHMEM_ALLOC_ZERO that zeros all the fields >> - SHMEM_ALLOC_NO_OOM that does not fail > > Don't see the point really ... it's just more API churn without any > very compelling reason. OK, then no objection to your approach. At least I tried. -- Michael
Michael Paquier <michael.paquier@gmail.com> writes: > On Thu, Sep 1, 2016 at 10:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Don't see the point really ... it's just more API churn without any >> very compelling reason. > OK, then no objection to your approach. At least I tried. OK, pushed my version then. I think we have now dealt with everything mentioned in this thread except for the issues in pg_locale.c. Are you planning to tackle that? regards, tom lane
On Thu, Sep 1, 2016 at 11:16 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Michael Paquier <michael.paquier@gmail.com> writes: >> On Thu, Sep 1, 2016 at 10:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Don't see the point really ... it's just more API churn without any >>> very compelling reason. > >> OK, then no objection to your approach. At least I tried. > > OK, pushed my version then. I think we have now dealt with everything > mentioned in this thread except for the issues in pg_locale.c. Are > you planning to tackle that? Yes, not for this CF though. So I have switched the CF entry as committed. -- Michael