Re: pg11.1 jit segv - Mailing list pgsql-hackers

From Andres Freund
Subject Re: pg11.1 jit segv
Date
Msg-id 20181127030035.n6avagjgmolbrlw7@alap3.anarazel.de
Whole thread Raw
In response to Re: pg11.1 jit segv  (Justin Pryzby <pryzby@telsasoft.com>)
Responses Re: pg11.1 jit segv  (Justin Pryzby <pryzby@telsasoft.com>)
List pgsql-hackers
On 2018-11-17 17:37:15 -0600, Justin Pryzby wrote:
> On Fri, Nov 16, 2018 at 10:24:46AM -0600, Justin Pryzby wrote:
> > On Fri, Nov 16, 2018 at 08:38:26AM -0600, Justin Pryzby wrote:
> > > The table is not too special, but was probably ALTERed to add columns a good
> > > number of times by one of our processes.  It has ~1100 columns, including
> > > arrays, and some with null_frac=1.  I'm trying to come up with a test case
> > > involving column types and order.
> 
> Try this ?
> 
> SELECT 'DROP TABLE t; CREATE TABLE t (a3 text, a1 int[], '||array_to_string(array_agg('c'||i||' bigint default
0'),',')||');INSERT INTO t VALUES(0)' FROM generate_series(1,999) i;
 
> \gexec
> SET jit=on; SET jit_above_cost=0; SELECT a3 FROM t LIMIT 9;
> 
> That's given all sorts of nice errors:
> 
> ERROR:  invalid memory alloc request size 18446744073709551613
> ERROR:  compressed data is corrupted
> 
> And occasionally crashes and/or returns unrelated data:
> 
>  = '0', $21 = '0', $22 = '0', $23 = '0', $24 = '0', $25 = '2741'\x03
>  n 21782 :constvalue 4 [ 0 0 0 0 0 0 0 0 ]}) :location 

Ah, hah. The issue is that t_hoff is larger than 128 here (due to the
size of the NULL bitmap), and apparently getelementptr interprets an
i8 > 128 as a signed integer. Which thus yields a negative offset from
the start of the tuple, which predictably doesn't work great.

    v_hoff =
        l_load_struct_gep(b, v_tuplep,
                          FIELDNO_HEAPTUPLEHEADERDATA_HOFF,
                          "t_hoff");
    v_tupdata_base =
        LLVMBuildGEP(b,
                     LLVMBuildBitCast(b,
                                      v_tuplep,
                                      l_ptr(LLVMInt8Type()),
                                      ""),
                     &v_hoff, 1,
                     "v_tupdata_base");

I'd missed the "These integers are treated as signed values where
relevant." bit in the getelementptr docs
http://llvm.org/docs/LangRef.html#getelementptr-instruction

The fix is easy enough, just adding a
    v_hoff = LLVMBuildZExt(b, v_hoff, LLVMInt32Type(), "");
fixes the issue for me.

Could you check that the attached patch this also fixes your original
issue? Going through the code to see if there's other occurances of
this.

Greetings,

Andres Freund

Attachment

pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: dsa_allocate() faliure
Next
From: David Steele
Date:
Subject: Remove Deprecated Exclusive Backup Mode