Thread: LLVM miscompiles numeric.c access to short numeric var headers

LLVM miscompiles numeric.c access to short numeric var headers

From

Greg Stark

Date:

12 November 2015, 15:38:11

I've been using LLVM's sanitizers and asan turned up a new bit of
compiler behaviour that I hadn't had before. I don't see it in 3.7 or
before, only in their HEAD so I don't know if it's a bug or
intentional.

In numeric.c we have the short numeric headers that have one uint16
(in addition to the varlena header) followed by digits. When compiling
with -O2 on x86-64 LLVM now seems to use a 4-byte access. When there
are no digits (because the number is 0) that overruns the palloced
block.

init_var_from_num: NUMERIC (short) w=0 d=0 POS 6 bytes: 18 00 00 00 00 80
=================================================================
==30523==ERROR: AddressSanitizer: unknown-crash on address
0x6250002c7edc at pc 0x00000144a6d6 bp 0x7ffcfbaadeb0 sp
0x7ffcfbaadea8
READ of size 4 at 0x6250002c7edc thread T0   #0 0x144a6d5 in init_var_from_num
/home/stark/src/pg/postgresql-master/src/backend/utils/adt/numeric.c:4764:17

Shadow bytes around the buggy address:
=>0x0c4a80050fd0: 00 00 00 00 00 00 f7 f7 00 00 f7[06]00 00 f7 00
  0x000000000144a6d6 <+1078>: mov    $0x2e02680,%edi  0x000000000144a6db <+1083>: mov    %r14,0x10(%rbx)
0x000000000144a6df<+1087>: mov    %rsi,%r14
 

Now, in Postgres currently this is safe for the same reason that the
Form_Data struct accesses are safe. Numerics will always be 4-byte
aligned when allocated in memory. They won't be aligned on disk if
they have a short varlena header but we don't use the GETARG_PP macros
in numeric.c so those always get reallocated in the GETARG macro.

This does represent a hazard though in case we ever do add support for
unaligned packed varlena support to numeric.c. In fact it's extremely
tempting to do so since we never do much work on the varlena form
anyways, we just convert it to NumericVar anyways. It's awfully
annoying that we have to palloc and memcpy to this other intermediate
form just to do it again to the usable internal form. If we did that
we would have to deal with unaligned accesses anyways though so we
would probably have to change this code to just use a char* buffer and
skip the union and struct anyways. Doing so while maintaining binary
on-disk compatibility could be tricky.


-- 
greg

Re: LLVM miscompiles numeric.c access to short numeric var headers

From

Tom Lane

Date:

12 November 2015, 17:39:10

Greg Stark <stark@mit.edu> writes:
> In numeric.c we have the short numeric headers that have one uint16
> (in addition to the varlena header) followed by digits. When compiling
> with -O2 on x86-64 LLVM now seems to use a 4-byte access.

Either that's a reportable compiler bug, or someplace nearby we've
casted the pointer to something that would require a 4-byte struct.
I'm not sure which code you're looking at exactly, but maybe we're
using "union NumericChoice" prematurely?
        regards, tom lane

Re: LLVM miscompiles numeric.c access to short numeric var headers

From

Greg Stark

Date:

12 November 2015, 18:30:16

On Thu, Nov 12, 2015 at 2:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Either that's a reportable compiler bug, or someplace nearby we've
> casted the pointer to something that would require a 4-byte struct.
> I'm not sure which code you're looking at exactly, but maybe we're
> using "union NumericChoice" prematurely?

I don't see how these structs could ever be used without casting to
NumericChoice to get the scale and weight even when there are no
digits. I think it wouldn't be too hard to just give up on the structs
and unions and use a char * as the underlying type. We could access
the meta information directly using byte accesses and memcpy the
digits to an aligned array of digits when setting up the var. I think
the code would be simpler in the end and make it easy to support
packed varlenas.

It triggers on the line with the NUMERIC_WEIGHT() macro call in
init_var_from_num():

#define NUMERIC_WEIGHT(n) (NUMERIC_HEADER_IS_SHORT((n)) ? \   (((n)->choice.n_short.n_header &
NUMERIC_SHORT_WEIGHT_SIGN_MASK? \     ~NUMERIC_SHORT_WEIGHT_MASK : 0) \    | ((n)->choice.n_short.n_header &
NUMERIC_SHORT_WEIGHT_MASK))\   : ((n)->choice.n_long.n_weight))

static void
init_var_from_num(Numeric num, NumericVar *dest)
{ dump_numeric("init_var_from_num", num);
 dest->ndigits = NUMERIC_NDIGITS(num); dest->weight = NUMERIC_WEIGHT(num); dest->sign = NUMERIC_SIGN(num); dest->dscale
=NUMERIC_DSCALE(num); dest->digits = NUMERIC_DIGITS(num); dest->buf = NULL; /* digits array is not palloc'd */

}

I think the

-- 
greg

Re: LLVM miscompiles numeric.c access to short numeric var headers

From

Tom Lane

Date:

12 November 2015, 18:51:30

Greg Stark <stark@mit.edu> writes:
> On Thu, Nov 12, 2015 at 2:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Either that's a reportable compiler bug, or someplace nearby we've
>> casted the pointer to something that would require a 4-byte struct.
>> I'm not sure which code you're looking at exactly, but maybe we're
>> using "union NumericChoice" prematurely?

> It triggers on the line with the NUMERIC_WEIGHT() macro call in
> init_var_from_num():

Hmm.  I'd argue that that is a compiler bug, but I dunno if the LLVM
guys would see it that way.  The existence of other union members in
the declaration shouldn't license them to assume that the particular
instance we're accessing is large enough for any possible member.

> I think it wouldn't be too hard to just give up on the structs
> and unions and use a char * as the underlying type. We could access
> the meta information directly using byte accesses and memcpy the
> digits to an aligned array of digits when setting up the var.

Meh.  The palloc to create an aligned array of digits would eat up
any possible performance win --- it'd be just about as expensive
as the existing unpack operation.

I think we could fix the immediate issue by redeclaring numeric
headers as arrays of (u)int16 rather than structs.  I'm not
very excited about the packed-header case.
        regards, tom lane

Re: LLVM miscompiles numeric.c access to short numeric var headers

From

Greg Stark

Date:

12 November 2015, 19:07:24

On Thu, Nov 12, 2015 at 3:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Meh.  The palloc to create an aligned array of digits would eat up
> any possible performance win --- it'd be just about as expensive
> as the existing unpack operation.

I suppose we would only need to palloc the digits if we found they
were unaligned which would only happen if the varlena header was
packed. So if the varlena header wasn't packed (or if we were just
lucky with the packed alignment which would happen half the time) we
could share the bytes from the packed varlena in the buffer directly
in the var.

> I think we could fix the immediate issue by redeclaring numeric
> headers as arrays of (u)int16 rather than structs.  I'm not
> very excited about the packed-header case.

That would require giving up the pretense that the code supports base
10 and base 100 I suppose. And would still be doing a palloc/memcpy
for data smaller than 128 bytes.

-- 
greg

Re: LLVM miscompiles numeric.c access to short numeric var headers

From

Tom Lane

Date:

12 November 2015, 19:43:21

Greg Stark <stark@mit.edu> writes:
> On Thu, Nov 12, 2015 at 3:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I think we could fix the immediate issue by redeclaring numeric
>> headers as arrays of (u)int16 rather than structs.  I'm not
>> very excited about the packed-header case.

> That would require giving up the pretense that the code supports base
> 10 and base 100 I suppose.

No, not really.  If we redefine NumericVar as a uint16 array,
then we'd have n_header or n_sign_dscale as array[0],
n_weight as (int16) array[1], and n_data as (NumericDigit *) &array[1]
or (NumericDigit *) &array[2] depending.  Doesn't matter which
way NumericDigit is declared.
        regards, tom lane

Re: LLVM miscompiles numeric.c access to short numeric var headers

From

Greg Stark

Date:

18 November 2015, 22:04:10

Fwiw it looks like the LLVM folk think this is an asan bug.

https://llvm.org/bugs/show_bug.cgi?id=25550

It looks like the fix is that the compiler should avoid this
optimization if the code is being compiled with instrumentation. This
worries me a bit but I think our code is safe as the Datum will always
be either palloced which will be fully aligned and therefore can't
overrun a page or on a stack in which case the whole struct will be
allocated regardless of how many digits we need.