Thread: printf %s with NULL pointer (was Re: BUG #17098: Assert failed on composing an error message when adding a type to an extension being dropped)

[ moved from -bugs list for more visibility ]

I wrote:
> However, that root issue is converted from a relatively minor bug into
> a server crash because snprintf.c treats a NULL pointer passed to %s
> as a crash-worthy error.  I have advocated for that behavior in the
> past, but I'm starting to wonder if it wouldn't be wiser to change
> over to the glibc-ish behavior of printing "(null)" or the like.
> It seems like we've long since found all the interesting bugs that
> the assert-or-crash behavior could reveal, and now we're down to
> weird corner cases where its main effect is to weaken our robustness.

I did a little more thinking about this.  I believe the strongest
argument for having snprintf.c crash on NULL is that it keeps us
from relying on having more-forgiving behavior in case we're using
platform-supplied *printf functions (cf commit 0c62356cc).  However,
that is only relevant for code that's meant to go into pre-v12 branches,
since we stopped using libc's versions of these functions in v12.

So one plausible way to approach this is to say that we should wait
until v11 is EOL and then change it.

However, that feels overly conservative to me.  I doubt that anyone
is *intentionally* relying on *printf not crashing on a NULL pointer.
For example, in the case that started this thread:

        if (OidIsValid(oldExtension))
            ereport(ERROR,
                    (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                     errmsg("%s is already a member of extension \"%s\"",
                            getObjectDescription(&object, false),
                            get_extension_name(oldExtension))));

the problem is failure to consider the possibility that
get_extension_name could return NULL due to a just-committed
concurrent DROP EXTENSION.  I'm afraid there are a lot of
corner cases like that still lurking.

So my feeling about this is that switching snprintf.c's behavior
would produce some net gain in robustness for v12 and up, while
not making things any worse for the older branches.  I still hold
to the opinion that we've already flushed out all the cases of
passing NULL that we're likely to find via ordinary testing.

Thoughts?

            regards, tom lane



On Mon, 2021-07-12 at 13:20 -0400, Tom Lane wrote:
> > However, that root issue is converted from a relatively minor bug into
> > a server crash because snprintf.c treats a NULL pointer passed to %s
> > as a crash-worthy error.  I have advocated for that behavior in the
> > past, but I'm starting to wonder if it wouldn't be wiser to change
> > over to the glibc-ish behavior of printing "(null)" or the like.
> 
> So my feeling about this is that switching snprintf.c's behavior
> would produce some net gain in robustness for v12 and up, while
> not making things any worse for the older branches.  I still hold
> to the opinion that we've already flushed out all the cases of
> passing NULL that we're likely to find via ordinary testing.

New cases could be introduced in the future and might remain undetected.

What about adding an Assert that gags on NULLs, but still printing them
as "(null)"?  That would help find such problems in a debug build.

Yours,
Laurenz Albe




Laurenz Albe <laurenz.albe@cybertec.at> writes:
> On Mon, 2021-07-12 at 13:20 -0400, Tom Lane wrote:
>> So my feeling about this is that switching snprintf.c's behavior
>> would produce some net gain in robustness for v12 and up, while
>> not making things any worse for the older branches.  I still hold
>> to the opinion that we've already flushed out all the cases of
>> passing NULL that we're likely to find via ordinary testing.

> New cases could be introduced in the future and might remain undetected.
> What about adding an Assert that gags on NULLs, but still printing them
> as "(null)"?  That would help find such problems in a debug build.

I think you're missing my main point, which is that it seems certain that
there are corner cases that do this *now*.  I'm proposing that we redefine
this as not being a crash case, full stop.

Now, what we don't have control of is what will happen in pre-v12
branches on platforms where we use the system's *printf.  However,
note what I wrote in the log for 0c62356cc:

    Per commit e748e902d, we appear to have little or no coverage in the
    buildfarm of machines that will dump core when asked to printf a
    null string pointer.

Thus it appears that a large fraction of the world is already either
using glibc or following glibc's lead on this point.  If we do likewise,
it will remove some crash cases and not introduce any new ones.

In hindsight I feel like 0c62356cc was an overreaction to the unusual
property e748e902d's bug had, namely that "(null)" was getting printed
in a place where it would not show up in any visible output.  Since
we certainly wouldn't consider that behavior OK if we saw it, you'd
really have to assume that there are more lurking bugs with that same
property in order to conclude that the Assert is worth its keep.

            regards, tom lane



I wrote:
> Now, what we don't have control of is what will happen in pre-v12
> branches on platforms where we use the system's *printf.  However,
> note what I wrote in the log for 0c62356cc:
>     Per commit e748e902d, we appear to have little or no coverage in the
>     buildfarm of machines that will dump core when asked to printf a
>     null string pointer.
> Thus it appears that a large fraction of the world is already either
> using glibc or following glibc's lead on this point.

Further to that point: I just ran around and verified that the system
printf prints "(null)" rather than crashing on FreeBSD 12.2, NetBSD 8.99,
OpenBSD 6.8, macOS 11.4, and Solaris 11.3.  AIX 7.2 and HPUX 10.20 print
"", but still don't crash.  If we change snprintf.c then we will also be
okay on Windows, because we've always used our own snprintf on that
platform.  In short, the only place I can find where there is actually
any hazard is Solaris 10 [1].  I do not think we should let the risk of
obscure bugs in pre-v12 versions on one obsolete OS drive our
decision-making about this.

            regards, tom lane

[1] Per experimentation locally and on the GCC compile farm, using
the attached.
#include <stdio.h>

int
main(int argc, char **argv)
{
    printf("NULL prints as \"%s\"\n", (char *) NULL);
    return 0;
}

Em ter., 13 de jul. de 2021 às 11:29, Tom Lane <tgl@sss.pgh.pa.us> escreveu:
Laurenz Albe <laurenz.albe@cybertec.at> writes:
> On Mon, 2021-07-12 at 13:20 -0400, Tom Lane wrote:
>> So my feeling about this is that switching snprintf.c's behavior
>> would produce some net gain in robustness for v12 and up, while
>> not making things any worse for the older branches.  I still hold
>> to the opinion that we've already flushed out all the cases of
>> passing NULL that we're likely to find via ordinary testing.

> New cases could be introduced in the future and might remain undetected.
> What about adding an Assert that gags on NULLs, but still printing them
> as "(null)"?  That would help find such problems in a debug build.

I think you're missing my main point, which is that it seems certain that
there are corner cases that do this *now*.  I'm proposing that we redefine
this as not being a crash case, full stop.
I agree with Laurenz Albe, that on Debug builds, *printf with NULL, must crash.
On production builds, fine, printing (null).
This will put a little more pressure on support, "Hey what mean's (null) in my logs?",
but it's ok.

regards,
Ranier Vilela
Ranier Vilela <ranier.vf@gmail.com> writes:
> Em ter., 13 de jul. de 2021 às 11:29, Tom Lane <tgl@sss.pgh.pa.us> escreveu:
>> I think you're missing my main point, which is that it seems certain that
>> there are corner cases that do this *now*.  I'm proposing that we redefine
>> this as not being a crash case, full stop.

> I agree with Laurenz Albe, that on Debug builds, *printf with NULL, must
> crash.

Did you see my followup?  The vast majority of live systems do not do
that, so we are accomplishing nothing of value by insisting it's a
crash-worthy bug.

I flat out don't agree that "crash on debug builds but it's okay on
production" is a useful way to define this.  I spend way too much
time already on bug reports that only manifest with asserts enabled.

            regards, tom lane



Em ter., 13 de jul. de 2021 às 15:26, Tom Lane <tgl@sss.pgh.pa.us> escreveu:
Ranier Vilela <ranier.vf@gmail.com> writes:
> Em ter., 13 de jul. de 2021 às 11:29, Tom Lane <tgl@sss.pgh.pa.us> escreveu:
>> I think you're missing my main point, which is that it seems certain that
>> there are corner cases that do this *now*.  I'm proposing that we redefine
>> this as not being a crash case, full stop.

> I agree with Laurenz Albe, that on Debug builds, *printf with NULL, must
> crash.

Did you see my followup?
I am trying.
 
  The vast majority of live systems do not do
that, so we are accomplishing nothing of value by insisting it's a
crash-worthy bug.
I agreed.


I flat out don't agree that "crash on debug builds but it's okay on
production" is a useful way to define this.  I spend way too much
time already on bug reports that only manifest with asserts enabled.
I understand.
Bug reports will decrease, because of that, people will lose interest and motivation to report,
because (null), it doesn't seem like a serious error and my server didn't crash.

It's a tricky tradeoff.

regards,
Ranier Vilela
On Tue, 2021-07-13 at 14:26 -0400, Tom Lane wrote:
> Did you see my followup?  The vast majority of live systems do not do
> that, so we are accomplishing nothing of value by insisting it's a
> crash-worthy bug.
> 
> I flat out don't agree that "crash on debug builds but it's okay on
> production" is a useful way to define this.  I spend way too much
> time already on bug reports that only manifest with asserts enabled.

You convinced my that printing "(null)" is better than crashing.
Having a "(null)" show up in a weird place is certainly a minor inconvenience.

But I don't buy your second point: if it is like that, why do we have
Asserts at all?

Yours,
Laurenz Albe




On 13.07.21 20:26, Tom Lane wrote:
> Did you see my followup?  The vast majority of live systems do not do
> that, so we are accomplishing nothing of value by insisting it's a
> crash-worthy bug.

But there are no guarantees that that will be maintained in the future. 
In the past, it has often come back to bite us when we relied on 
implementation-dependent behavior in the C library or the compiler, 
because no optimization might invalidate old assumptions.

In this particular case, I would for example be quite curious how those 
alternative minimal C libraries such as musl-libc handle this.




Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
> In this particular case, I would for example be quite curious how those 
> alternative minimal C libraries such as musl-libc handle this.

Interesting question, so I took a look:

https://git.musl-libc.org/cgit/musl/tree/src/stdio/vfprintf.c#n593

                    case 's':
            a = arg.p ? arg.p : "(null)";
            ...

Any others you'd like to consider?

            regards, tom lane



I wrote:
> Interesting question, so I took a look:
> https://git.musl-libc.org/cgit/musl/tree/src/stdio/vfprintf.c#n593
>                     case 's':
>             a = arg.p ? arg.p : "(null)";

BTW, the adjacent code shows that musl is also supporting glibc's
"%m" extension, so I imagine that they are endeavoring to be
compatible with glibc, and this goes along with that.  But that
just supports my larger point: printing "(null)" is clearly the
de facto standard now, whether or not POSIX has caught up with it.

            regards, tom lane



On 14.07.21 18:26, Tom Lane wrote:
> Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
>> In this particular case, I would for example be quite curious how those
>> alternative minimal C libraries such as musl-libc handle this.
> 
> Interesting question, so I took a look:
> 
> https://git.musl-libc.org/cgit/musl/tree/src/stdio/vfprintf.c#n593
> 
>                      case 's':
>             a = arg.p ? arg.p : "(null)";
>             ...
> 
> Any others you'd like to consider?

Similar here: 
https://github.com/ensc/dietlibc/blob/master/lib/__v_printf.c#L188

I think unless we get counterexamples, this is all good.



Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
> On 14.07.21 18:26, Tom Lane wrote:
>> https://git.musl-libc.org/cgit/musl/tree/src/stdio/vfprintf.c#n593
>> case 's':
>> a = arg.p ? arg.p : "(null)";

> Similar here: 
> https://github.com/ensc/dietlibc/blob/master/lib/__v_printf.c#L188

I also took a look at μClibc, as well as glibc itself, and learned some
additional facts.  glibc's behavior is not just 'print "(null)" instead'.
It is 'print "(null)" if the field width allows at least six characters,
otherwise print nothing'.  μClibc is bug-compatible with this, but other
implementations seem to generally just substitute "(null)" for the input
string and run with that.  I'm inclined to side with the latter camp.
I'd rather see something like "(nu" than empty because the latter looks
too much like it might be correct output; so I think glibc is expending
extra code to produce a less-good result.

> I think unless we get counterexamples, this is all good.

Barring objections, I'll press ahead with changing snprintf.c to do this.

            regards, tom lane