Thread: Portable check for unportable macro usage
A portability hazard that we fight constantly is that the <ctype.h> macros such as isalpha() can't safely be called on "char" values. Per POSIX, you have to make sure the input is cast to "unsigned char". That's far too easy to forget, but on most machines you won't get any warning about it. We have a couple of buildfarm machines that will generate "char value used as array subscript" warnings for this, because their implementations of these macros are indeed just array lookups. But I think gaur is the only one that's still active, and it's not going to last forever. Ralph Corderoy, one of the nmh hackers, was looking at this problem and came up with this cute solution [1]: #ifndef NDEBUG #if EOF != -1 #error "Please report this to nmh's authors." #endif extern int ctype_identity[257]; /* [n] = n-1 */ #define isupper(c) ((isupper)((ctype_identity + 1)[c])) ... #endif This essentially converts the macros to always have an array lookup as the first step, and thus you'll get the char-as-array-subscript warning if your compiler has one. (AFAIK, all versions of gcc will emit that with -Wall.) You wouldn't want this in production, of course, since it's pure overhead and the only value is as a compile-time check. For our purposes, rather than "#ifndef NDEBUG" we could either use "#ifdef USE_ASSERT_CHECKING" or invent a separate macro to control this replacement. I'd be a bit inclined to invent a separate macro and set up a buildfarm member to #define that symbol and throw an error on char-as-array-subscript. Individual developers could #define it for test purposes but I wouldn't expect most of us to bother, as long as there's a backstop in the build farm. Thoughts, objections? regards, tom lane [1] http://lists.nongnu.org/archive/html/nmh-workers/2016-10/msg00313.html
On 2016-10-19 12:14:50 -0400, Tom Lane wrote: > A portability hazard that we fight constantly is that the <ctype.h> > macros such as isalpha() can't safely be called on "char" values. > Per POSIX, you have to make sure the input is cast to "unsigned char". > That's far too easy to forget, but on most machines you won't get any > warning about it. > We have a couple of buildfarm machines that will generate "char value > used as array subscript" warnings for this, because their implementations > of these macros are indeed just array lookups. But I think gaur is > the only one that's still active, and it's not going to last forever. > > Ralph Corderoy, one of the nmh hackers, was looking at this problem > and came up with this cute solution [1]: > > #ifndef NDEBUG > #if EOF != -1 > #error "Please report this to nmh's authors." > #endif > > extern int ctype_identity[257]; /* [n] = n-1 */ > #define isupper(c) ((isupper)((ctype_identity + 1)[c])) > ... > #endif Hm. I'd be kind of inclined to instead do something akin to #include <ctype.h> #define system_isupper(c) isupper(c) #undef isupper #define isupper(c) (AssertVariableIsOfTypeMacro(c, unsigned char), isupper(c)) => /home/andres/src/postgresql/src/include/c.h:745:7: error: static assertion failed: "c does not have type unsigned char" that would probably result in better error messages,and we can leave it enabled regardless of debug mode. That requires one usage adoption in wparser_def.c's p_iswhat() Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > Hm. I'd be kind of inclined to instead do something akin to > #include <ctype.h> > #define system_isupper(c) isupper(c) > #undef isupper Note that that doesn't do what you are probably thinking it does. What is actually happening there, I believe, is that you're forcing a fallback to the plain-function definition of isupper(). Ralph's proposal accomplishes the same thing less messily by parenthesizing the new macro's reference to isupper. But in either case, we're disabling any possible macro optimization in <ctype.h>, which makes me not want to say it's something we'd enable unconditionally. > #define isupper(c) (AssertVariableIsOfTypeMacro(c, unsigned char), isupper(c)) > => > /home/andres/src/postgresql/src/include/c.h:745:7: error: static assertion failed: "c does not have type unsigned char" Not sure we really want that; it breaks the standard-intended usage of applying these functions to the result of getc(). It might be all right for the core code, but I could see third-party authors getting pretty upset with us for unilaterally imposing non-POSIX semantics on these functions. regards, tom lane