Thread: printf format selection vs. reality
Our configure script goes to considerable lengths to try to identify what printf width modifier to use for int64 values. In particular see PGAC_FUNC_SNPRINTF_LONG_LONG_INT_MODIFIER, which claims # MinGW uses '%I64d', though gcc throws a warning with -Wall, # while '%lld' doesn't generate a warning, but doesn't work. However, if we decide that we ought to use our own snprintf replacement, we throw that info away and set INT64_FORMAT to "%lld" which we know is what that code uses. This was all right when that code was first written, when we had only a very small number of uses of INT64_FORMAT and they all were in snprintf() calls. It's been a long time since that was true: pgbench, in particular, has been passing INT64_FORMAT to the native printf with increasing enthusiasm. In reality, we do not anymore work with situations where our snprintf has a different idea about this format modifier than libc does. What's more, as far as I can find, we do not have any buildfarm members that set INT64_MODIFIER to anything other than "l" or "ll", which no doubt explains why we've not noticed a problem. The comment quoted above doesn't seem to apply to any current buildfarm members. I think we should abandon the pretense that we can work with libc printfs that accept anything but "l"/"ll", and rip out the excess complexity in configure, just setting INT64_MODIFIER to "l" or "ll" depending on whether "int64" is "long" or "long long". Comments? regards, tom lane
I wrote: > I think we should abandon the pretense that we can work with libc > printfs that accept anything but "l"/"ll", and rip out the excess > complexity in configure, just setting INT64_MODIFIER to "l" or "ll" > depending on whether "int64" is "long" or "long long". I pushed that, but while working on the patch I noticed two other overly-optimistic assumptions of the same kind: * If the system's printf doesn't support the "z" length modifier, we assume we can deal with that by using our own snprintf. Just as in the "ll" case, that only works to the extent that "z" is used only with elog/ereport, not directly in printf or fprintf. I've not dug for counterexamples, but it's hard to believe there aren't some now, and even harder to believe we'd not introduce them in future. * If the system's printf doesn't support argument reordering (%$n), we assume we can deal with that by using our own snprintf. Again, that's bunk because we apply translation to all sorts of strings that are given directly to printf or fprintf. I think that a reasonably painless solution to the second point is just to reject --enable-nls if printf doesn't support argument reordering. This would not even break any buildfarm animals, AFAICT; all the ones that are testing for the feature are finding it present. And, though the feature seems to postdate C99, it's hard to believe that anybody who'd care about NLS would be using a platform that hasn't got it. The %z issue is a good deal stickier, as (a) we do have a surviving Unix buildfarm animal (gaur/pademelon) that doesn't support %z, and (b) so far as I can tell from available documentation, Windows doesn't support it either, until very recently (like VS2017). If it were just (a), that would probably mean it's time to put gaur/pademelon out to pasture, but (b) means we have to deal with this somehow. The practical alternatives seem to be to avoid %z in frontend code, or to invent a macro SIZE_T_MODIFIER and use it like INT64_MODIFIER. Either one will be extremely error-prone, I'm afraid, unless we can find a way to get compiler warnings for violations. Also, I'm suspcious that we're going to have to back-patch something for this. Thoughts? regards, tom lane
On 2018-May-23, Tom Lane wrote: > The practical alternatives seem to be to avoid %z in frontend code, > or to invent a macro SIZE_T_MODIFIER and use it like INT64_MODIFIER. > Either one will be extremely error-prone, I'm afraid, unless we can > find a way to get compiler warnings for violations. Usage of %z outside safe-known seems really limited. It would be sad to force SIZE_T_MODIFIER for elog calls (where it is prevalent) just for the benefit of usage outside of elog on fringe platforms -- you're right that we do have a few cases of %z under fprintf() already. The good news is that AFAICS those strings are not translatable today, so changing only those to SIZE_T_MODIFIER (and leaving alone those using elog) is maybe not such a big deal. I think those are dshash.c, dsa.c, slab.c and aset.c only. (I assume without checking that with the stringinfo API it's OK to use %z). Can't we raise warnings on such usages with an archetype change? (Hm, is it possible to change archetype for fprintf?) -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > On 2018-May-23, Tom Lane wrote: >> The practical alternatives seem to be to avoid %z in frontend code, >> or to invent a macro SIZE_T_MODIFIER and use it like INT64_MODIFIER. >> Either one will be extremely error-prone, I'm afraid, unless we can >> find a way to get compiler warnings for violations. > Usage of %z outside safe-known seems really limited. It would be sad to > force SIZE_T_MODIFIER for elog calls (where it is prevalent) just for > the benefit of usage outside of elog on fringe platforms -- you're right > that we do have a few cases of %z under fprintf() already. The good > news is that AFAICS those strings are not translatable today, so > changing only those to SIZE_T_MODIFIER (and leaving alone those using > elog) is maybe not such a big deal. I think those are dshash.c, dsa.c, > slab.c and aset.c only. Yeah, I just went through things myself, and concluded that right now the only hazards are in debug code such as dsa_dump(). So I think that (a) we don't have a problem we have to fix right now, and (b) going over to SIZE_T_MODIFIER seems like more trouble than it'd be worth. Still, this seems like something that will certainly bite us eventually if we don't install some kind of check. > (I assume without checking that with the stringinfo API it's OK to use %z). It is, that goes to snprintf. > Can't we raise warnings on such usages with an archetype change? (Hm, > is it possible to change archetype for fprintf?) The problem is to get a compiler that thinks that %z is a violation of *any* archetype. gaur's compiler does think that, but it has no archetype that does accept %z, so that's little help (I've had it building with -Wno-format since we added %z). It might be possible for me to install a fractionally-newer compiler on gaur's host and get usable warnings that way. I think however that a more practical approach is likely to be to depend on the Windows/gcc buildfarm members, where (if gcc is correctly installed and doing what it's supposed to) we should find that %z is accepted by the gnu_printf archetype but not the plain printf one. So I wish somebody would try out the patch in <2975.1526862605@sss.pgh.pa.us> on MinGW. It would also be good to find out whether MSVC can be persuaded to check printf strings. regards, tom lane PS: per the above, the patch in <2975.1526862605@sss.pgh.pa.us> would need to be adjusted to use gnu_printf on the stringinfo functions, if we don't want complaints about %z. This opens the question of whether we want to allow %m there ...
Sigh, I'm an idiot. I forgot that USE_REPL_SNPRINTF doesn't just replace snprintf, it replaces the entire *printf family; see port.h lines 137ff. So actually we're OK as far as these %z and argument-reordering concerns go. Maybe the comments in configure could use a bit of work though. There's maybe also an argument for reverting b929614f5, because actually that code did do something useful, ie allow us to work on platforms without %ll. But I'm inclined to leave that alone; it's an extra configure test to detect a case that probably no longer occurs in the wild. Moreover, since %ll and %z are both C99-isms, and the former had considerable currency even before C99 (evidence: gaur/pademelon) it's pretty hard to credit that a platform's *printf would fail the %ll test yet pass the %z test. So I think we're likely OK without it, even on dinosaur platforms. regards, tom lane
On Thu, May 24, 2018 at 8:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > The problem is to get a compiler that thinks that %z is a violation > of *any* archetype. gaur's compiler does think that, but it has no > archetype that does accept %z, so that's little help (I've had it > building with -Wno-format since we added %z). From the pie-in-the-sky department: I think it would be cool to develop a Clang semantic checker plugin[1] that performs arbitrary checks to our taste, as an external project. Custom format string checkers might not be terribly interesting but would make an easy starting point (maybe start by stealing some code from lib/Sema/SemaChecking.cpp?), but there are plenty of less localised programming rules in our project that are easy to break (stuff about node types etc). I've seen this sort of thing done to impose house rules on mountains of C++ code with good effect. You can either invent your own attributes or (to avoid having to change the target source tree at all) just hard code your checker to recognise stuff. It's considerably easier to do this with the full AST etc than with (say) checker scripts operating on the source. I'm not working on this myself but I thought I'd mention it in case it interests someone out there... [1] https://clang.llvm.org/docs/ClangPlugins.html -- Thomas Munro http://www.enterprisedb.com