Thread: LLVM miscompiles numeric.c access to short numeric var headers
I've been using LLVM's sanitizers and asan turned up a new bit of compiler behaviour that I hadn't had before. I don't see it in 3.7 or before, only in their HEAD so I don't know if it's a bug or intentional. In numeric.c we have the short numeric headers that have one uint16 (in addition to the varlena header) followed by digits. When compiling with -O2 on x86-64 LLVM now seems to use a 4-byte access. When there are no digits (because the number is 0) that overruns the palloced block. init_var_from_num: NUMERIC (short) w=0 d=0 POS 6 bytes: 18 00 00 00 00 80 ================================================================= ==30523==ERROR: AddressSanitizer: unknown-crash on address 0x6250002c7edc at pc 0x00000144a6d6 bp 0x7ffcfbaadeb0 sp 0x7ffcfbaadea8 READ of size 4 at 0x6250002c7edc thread T0 #0 0x144a6d5 in init_var_from_num /home/stark/src/pg/postgresql-master/src/backend/utils/adt/numeric.c:4764:17 Shadow bytes around the buggy address: =>0x0c4a80050fd0: 00 00 00 00 00 00 f7 f7 00 00 f7[06]00 00 f7 00 0x000000000144a6d6 <+1078>: mov $0x2e02680,%edi 0x000000000144a6db <+1083>: mov %r14,0x10(%rbx) 0x000000000144a6df<+1087>: mov %rsi,%r14 Now, in Postgres currently this is safe for the same reason that the Form_Data struct accesses are safe. Numerics will always be 4-byte aligned when allocated in memory. They won't be aligned on disk if they have a short varlena header but we don't use the GETARG_PP macros in numeric.c so those always get reallocated in the GETARG macro. This does represent a hazard though in case we ever do add support for unaligned packed varlena support to numeric.c. In fact it's extremely tempting to do so since we never do much work on the varlena form anyways, we just convert it to NumericVar anyways. It's awfully annoying that we have to palloc and memcpy to this other intermediate form just to do it again to the usable internal form. If we did that we would have to deal with unaligned accesses anyways though so we would probably have to change this code to just use a char* buffer and skip the union and struct anyways. Doing so while maintaining binary on-disk compatibility could be tricky. -- greg
Greg Stark <stark@mit.edu> writes: > In numeric.c we have the short numeric headers that have one uint16 > (in addition to the varlena header) followed by digits. When compiling > with -O2 on x86-64 LLVM now seems to use a 4-byte access. Either that's a reportable compiler bug, or someplace nearby we've casted the pointer to something that would require a 4-byte struct. I'm not sure which code you're looking at exactly, but maybe we're using "union NumericChoice" prematurely? regards, tom lane
On Thu, Nov 12, 2015 at 2:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Either that's a reportable compiler bug, or someplace nearby we've > casted the pointer to something that would require a 4-byte struct. > I'm not sure which code you're looking at exactly, but maybe we're > using "union NumericChoice" prematurely? I don't see how these structs could ever be used without casting to NumericChoice to get the scale and weight even when there are no digits. I think it wouldn't be too hard to just give up on the structs and unions and use a char * as the underlying type. We could access the meta information directly using byte accesses and memcpy the digits to an aligned array of digits when setting up the var. I think the code would be simpler in the end and make it easy to support packed varlenas. It triggers on the line with the NUMERIC_WEIGHT() macro call in init_var_from_num(): #define NUMERIC_WEIGHT(n) (NUMERIC_HEADER_IS_SHORT((n)) ? \ (((n)->choice.n_short.n_header & NUMERIC_SHORT_WEIGHT_SIGN_MASK? \ ~NUMERIC_SHORT_WEIGHT_MASK : 0) \ | ((n)->choice.n_short.n_header & NUMERIC_SHORT_WEIGHT_MASK))\ : ((n)->choice.n_long.n_weight)) static void init_var_from_num(Numeric num, NumericVar *dest) { dump_numeric("init_var_from_num", num); dest->ndigits = NUMERIC_NDIGITS(num); dest->weight = NUMERIC_WEIGHT(num); dest->sign = NUMERIC_SIGN(num); dest->dscale =NUMERIC_DSCALE(num); dest->digits = NUMERIC_DIGITS(num); dest->buf = NULL; /* digits array is not palloc'd */ } I think the -- greg
Greg Stark <stark@mit.edu> writes: > On Thu, Nov 12, 2015 at 2:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Either that's a reportable compiler bug, or someplace nearby we've >> casted the pointer to something that would require a 4-byte struct. >> I'm not sure which code you're looking at exactly, but maybe we're >> using "union NumericChoice" prematurely? > It triggers on the line with the NUMERIC_WEIGHT() macro call in > init_var_from_num(): Hmm. I'd argue that that is a compiler bug, but I dunno if the LLVM guys would see it that way. The existence of other union members in the declaration shouldn't license them to assume that the particular instance we're accessing is large enough for any possible member. > I think it wouldn't be too hard to just give up on the structs > and unions and use a char * as the underlying type. We could access > the meta information directly using byte accesses and memcpy the > digits to an aligned array of digits when setting up the var. Meh. The palloc to create an aligned array of digits would eat up any possible performance win --- it'd be just about as expensive as the existing unpack operation. I think we could fix the immediate issue by redeclaring numeric headers as arrays of (u)int16 rather than structs. I'm not very excited about the packed-header case. regards, tom lane
On Thu, Nov 12, 2015 at 3:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Meh. The palloc to create an aligned array of digits would eat up > any possible performance win --- it'd be just about as expensive > as the existing unpack operation. I suppose we would only need to palloc the digits if we found they were unaligned which would only happen if the varlena header was packed. So if the varlena header wasn't packed (or if we were just lucky with the packed alignment which would happen half the time) we could share the bytes from the packed varlena in the buffer directly in the var. > I think we could fix the immediate issue by redeclaring numeric > headers as arrays of (u)int16 rather than structs. I'm not > very excited about the packed-header case. That would require giving up the pretense that the code supports base 10 and base 100 I suppose. And would still be doing a palloc/memcpy for data smaller than 128 bytes. -- greg
Greg Stark <stark@mit.edu> writes: > On Thu, Nov 12, 2015 at 3:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I think we could fix the immediate issue by redeclaring numeric >> headers as arrays of (u)int16 rather than structs. I'm not >> very excited about the packed-header case. > That would require giving up the pretense that the code supports base > 10 and base 100 I suppose. No, not really. If we redefine NumericVar as a uint16 array, then we'd have n_header or n_sign_dscale as array[0], n_weight as (int16) array[1], and n_data as (NumericDigit *) &array[1] or (NumericDigit *) &array[2] depending. Doesn't matter which way NumericDigit is declared. regards, tom lane
Fwiw it looks like the LLVM folk think this is an asan bug. https://llvm.org/bugs/show_bug.cgi?id=25550 It looks like the fix is that the compiler should avoid this optimization if the code is being compiled with instrumentation. This worries me a bit but I think our code is safe as the Datum will always be either palloced which will be fully aligned and therefore can't overrun a page or on a stack in which case the whole struct will be allocated regardless of how many digits we need.