Thread: printf format selection vs. reality

printf format selection vs. reality

From
Tom Lane
Date:
Our configure script goes to considerable lengths to try to identify
what printf width modifier to use for int64 values.  In particular
see PGAC_FUNC_SNPRINTF_LONG_LONG_INT_MODIFIER, which claims

# MinGW uses '%I64d', though gcc throws a warning with -Wall,
# while '%lld' doesn't generate a warning, but doesn't work.

However, if we decide that we ought to use our own snprintf replacement,
we throw that info away and set INT64_FORMAT to "%lld" which we know
is what that code uses.

This was all right when that code was first written, when we had only
a very small number of uses of INT64_FORMAT and they all were in
snprintf() calls.  It's been a long time since that was true: pgbench,
in particular, has been passing INT64_FORMAT to the native printf
with increasing enthusiasm.  In reality, we do not anymore work with
situations where our snprintf has a different idea about this format
modifier than libc does.

What's more, as far as I can find, we do not have any buildfarm members
that set INT64_MODIFIER to anything other than "l" or "ll", which no
doubt explains why we've not noticed a problem.  The comment quoted
above doesn't seem to apply to any current buildfarm members.

I think we should abandon the pretense that we can work with libc
printfs that accept anything but "l"/"ll", and rip out the excess
complexity in configure, just setting INT64_MODIFIER to "l" or "ll"
depending on whether "int64" is "long" or "long long".

Comments?

            regards, tom lane


Re: printf format selection vs. reality

From
Tom Lane
Date:
I wrote:
> I think we should abandon the pretense that we can work with libc
> printfs that accept anything but "l"/"ll", and rip out the excess
> complexity in configure, just setting INT64_MODIFIER to "l" or "ll"
> depending on whether "int64" is "long" or "long long".

I pushed that, but while working on the patch I noticed two other
overly-optimistic assumptions of the same kind:

* If the system's printf doesn't support the "z" length modifier,
we assume we can deal with that by using our own snprintf.  Just
as in the "ll" case, that only works to the extent that "z" is
used only with elog/ereport, not directly in printf or fprintf.
I've not dug for counterexamples, but it's hard to believe there
aren't some now, and even harder to believe we'd not introduce
them in future.

* If the system's printf doesn't support argument reordering (%$n),
we assume we can deal with that by using our own snprintf.  Again,
that's bunk because we apply translation to all sorts of strings
that are given directly to printf or fprintf.

I think that a reasonably painless solution to the second point
is just to reject --enable-nls if printf doesn't support argument
reordering.  This would not even break any buildfarm animals,
AFAICT; all the ones that are testing for the feature are finding
it present.  And, though the feature seems to postdate C99, it's
hard to believe that anybody who'd care about NLS would be using
a platform that hasn't got it.

The %z issue is a good deal stickier, as (a) we do have a surviving
Unix buildfarm animal (gaur/pademelon) that doesn't support %z,
and (b) so far as I can tell from available documentation, Windows
doesn't support it either, until very recently (like VS2017).
If it were just (a), that would probably mean it's time to put
gaur/pademelon out to pasture, but (b) means we have to deal with
this somehow.

The practical alternatives seem to be to avoid %z in frontend code,
or to invent a macro SIZE_T_MODIFIER and use it like INT64_MODIFIER.
Either one will be extremely error-prone, I'm afraid, unless we can
find a way to get compiler warnings for violations.

Also, I'm suspcious that we're going to have to back-patch something
for this.

Thoughts?

            regards, tom lane


Re: printf format selection vs. reality

From
Alvaro Herrera
Date:
On 2018-May-23, Tom Lane wrote:

> The practical alternatives seem to be to avoid %z in frontend code,
> or to invent a macro SIZE_T_MODIFIER and use it like INT64_MODIFIER.
> Either one will be extremely error-prone, I'm afraid, unless we can
> find a way to get compiler warnings for violations.

Usage of %z outside safe-known seems really limited.  It would be sad to
force SIZE_T_MODIFIER for elog calls (where it is prevalent) just for
the benefit of usage outside of elog on fringe platforms -- you're right
that we do have a few cases of %z under fprintf() already.  The good
news is that AFAICS those strings are not translatable today, so
changing only those to SIZE_T_MODIFIER (and leaving alone those using
elog) is maybe not such a big deal.  I think those are dshash.c, dsa.c,
slab.c and aset.c only.

(I assume without checking that with the stringinfo API it's OK to use %z).

Can't we raise warnings on such usages with an archetype change?  (Hm,
is it possible to change archetype for fprintf?)

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: printf format selection vs. reality

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> On 2018-May-23, Tom Lane wrote:
>> The practical alternatives seem to be to avoid %z in frontend code,
>> or to invent a macro SIZE_T_MODIFIER and use it like INT64_MODIFIER.
>> Either one will be extremely error-prone, I'm afraid, unless we can
>> find a way to get compiler warnings for violations.

> Usage of %z outside safe-known seems really limited.  It would be sad to
> force SIZE_T_MODIFIER for elog calls (where it is prevalent) just for
> the benefit of usage outside of elog on fringe platforms -- you're right
> that we do have a few cases of %z under fprintf() already.  The good
> news is that AFAICS those strings are not translatable today, so
> changing only those to SIZE_T_MODIFIER (and leaving alone those using
> elog) is maybe not such a big deal.  I think those are dshash.c, dsa.c,
> slab.c and aset.c only.

Yeah, I just went through things myself, and concluded that right now
the only hazards are in debug code such as dsa_dump().  So I think that
(a) we don't have a problem we have to fix right now, and (b) going
over to SIZE_T_MODIFIER seems like more trouble than it'd be worth.
Still, this seems like something that will certainly bite us eventually
if we don't install some kind of check.

> (I assume without checking that with the stringinfo API it's OK to use %z).

It is, that goes to snprintf.

> Can't we raise warnings on such usages with an archetype change?  (Hm,
> is it possible to change archetype for fprintf?)

The problem is to get a compiler that thinks that %z is a violation
of *any* archetype.  gaur's compiler does think that, but it has no
archetype that does accept %z, so that's little help (I've had it
building with -Wno-format since we added %z).

It might be possible for me to install a fractionally-newer compiler
on gaur's host and get usable warnings that way.  I think however
that a more practical approach is likely to be to depend on the
Windows/gcc buildfarm members, where (if gcc is correctly installed
and doing what it's supposed to) we should find that %z is accepted
by the gnu_printf archetype but not the plain printf one.  So I wish
somebody would try out the patch in <2975.1526862605@sss.pgh.pa.us>
on MinGW.  It would also be good to find out whether MSVC can be
persuaded to check printf strings.

            regards, tom lane

PS: per the above, the patch in <2975.1526862605@sss.pgh.pa.us>
would need to be adjusted to use gnu_printf on the stringinfo
functions, if we don't want complaints about %z.  This opens
the question of whether we want to allow %m there ...


Re: printf format selection vs. reality

From
Tom Lane
Date:
Sigh, I'm an idiot.  I forgot that USE_REPL_SNPRINTF doesn't just
replace snprintf, it replaces the entire *printf family; see
port.h lines 137ff.  So actually we're OK as far as these %z and
argument-reordering concerns go.  Maybe the comments in configure
could use a bit of work though.

There's maybe also an argument for reverting b929614f5, because
actually that code did do something useful, ie allow us to work on
platforms without %ll.  But I'm inclined to leave that alone;
it's an extra configure test to detect a case that probably no longer
occurs in the wild.  Moreover, since %ll and %z are both C99-isms,
and the former had considerable currency even before C99 (evidence:
gaur/pademelon) it's pretty hard to credit that a platform's *printf
would fail the %ll test yet pass the %z test.  So I think we're
likely OK without it, even on dinosaur platforms.

            regards, tom lane


Re: printf format selection vs. reality

From
Thomas Munro
Date:
On Thu, May 24, 2018 at 8:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The problem is to get a compiler that thinks that %z is a violation
> of *any* archetype.  gaur's compiler does think that, but it has no
> archetype that does accept %z, so that's little help (I've had it
> building with -Wno-format since we added %z).

From the pie-in-the-sky department:  I think it would be cool to
develop a Clang semantic checker plugin[1] that performs arbitrary
checks to our taste, as an external project.  Custom format string
checkers might not be terribly interesting but would make an easy
starting point (maybe start by stealing some code from
lib/Sema/SemaChecking.cpp?), but there are plenty of less localised
programming rules in our project that are easy to break (stuff about
node types etc).  I've seen this sort of thing done to impose house
rules on mountains of C++ code with good effect.  You can either
invent your own attributes or (to avoid having to change the target
source tree at all) just hard code your checker to recognise stuff.
It's considerably easier to do this with the full AST etc than with
(say) checker scripts operating on the source.  I'm not working on
this myself but I thought I'd mention it in case it interests someone
out there...

[1] https://clang.llvm.org/docs/ClangPlugins.html

-- 
Thomas Munro
http://www.enterprisedb.com