Thread: Do we work with LLVM 12 on s390x?

Do we work with LLVM 12 on s390x?

From
Tom Lane
Date:
The Red Hat folk are seeing a problem with that combination:

https://bugzilla.redhat.com/show_bug.cgi?id=1940964

which boils down to

> Build fails with this error:
> ERROR:  failed to JIT module: Added modules have incompatible data layouts:
E-m:e-i1:8:16-i8:8:16-i64:64-f128:64-a:8:16-n32:64(module) vs
E-m:e-i1:8:16-i8:8:16-i64:64-f128:64-v128:64-a:8:16-n32:64(jit) 

(By "build", I imagine the reporter means "regression tests")

So I was wondering if we'd tested it yet.

            regards, tom lane



Re: Do we work with LLVM 12 on s390x?

From
Andres Freund
Date:
Hi,

On 2021-03-19 14:03:21 -0400, Tom Lane wrote:
> The Red Hat folk are seeing a problem with that combination:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1940964
> 
> which boils down to
> 
> > Build fails with this error:
> > ERROR:  failed to JIT module: Added modules have incompatible data layouts:
E-m:e-i1:8:16-i8:8:16-i64:64-f128:64-a:8:16-n32:64(module) vs
E-m:e-i1:8:16-i8:8:16-i64:64-f128:64-v128:64-a:8:16-n32:64(jit)
 
> 
> (By "build", I imagine the reporter means "regression tests")
> 
> So I was wondering if we'd tested it yet.

Yes, I did test it not too long ago, after Christoph Berg reported
Debian s390x failing with jit. Which made me learn a bunch of s390x
assembler and discover a bug in our code that only rarely happend (iirc
something about booleans that are not exactly 0 or 1 not testing
true)...

https://www.postgresql.org/message-id/20201015222924.yyms42qjloydfvar%40alap3.anarazel.de

I think the error above comes from a "mismatch" between the clang used
to compile bitcode, and the LLVM version linked to. Normally we're
somewhat tolerant of differences between the two, but there was an ABI
change at some point, leading to that error.  IIRC I hit that, but it
vanished as soon as I used a matching libllvm and clang.

Greetings,

Andres Freund



Re: Do we work with LLVM 12 on s390x?

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> I think the error above comes from a "mismatch" between the clang used
> to compile bitcode, and the LLVM version linked to. Normally we're
> somewhat tolerant of differences between the two, but there was an ABI
> change at some point, leading to that error.  IIRC I hit that, but it
> vanished as soon as I used a matching libllvm and clang.

Thanks, I passed that advice on.

            regards, tom lane



Re: Do we work with LLVM 12 on s390x?

From
Honza Horak
Date:
On 3/19/21 8:15 PM, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
>> I think the error above comes from a "mismatch" between the clang used
>> to compile bitcode, and the LLVM version linked to. Normally we're
>> somewhat tolerant of differences between the two, but there was an ABI
>> change at some point, leading to that error.  IIRC I hit that, but it
>> vanished as soon as I used a matching libllvm and clang.
> 
> Thanks, I passed that advice on.
> 
>             regards, tom lane

Tom Stellard was so kind to look at this issue deeper with his LLVM 
skills and found PostgreSQL is not actually handling the LLVM perfectly. 
He's working on improving the patch, but sharing even the first attempt 
with upstream seems like a good idea:

https://src.fedoraproject.org/rpms/postgresql/pull-request/29

Regards,
Honza




Re: Do we work with LLVM 12 on s390x?

From
Tom Stellard
Date:
On 4/21/21 6:40 AM, Honza Horak wrote:
> On 3/19/21 8:15 PM, Tom Lane wrote:
>> Andres Freund <andres@anarazel.de> writes:
>>> I think the error above comes from a "mismatch" between the clang used
>>> to compile bitcode, and the LLVM version linked to. Normally we're
>>> somewhat tolerant of differences between the two, but there was an ABI
>>> change at some point, leading to that error.  IIRC I hit that, but it
>>> vanished as soon as I used a matching libllvm and clang.
>>
>> Thanks, I passed that advice on.
>>
>>             regards, tom lane
> 
> Tom Stellard was so kind to look at this issue deeper with his LLVM skills and found PostgreSQL is not actually
handlingthe LLVM perfectly. He's working on improving the patch, but sharing even the first attempt with upstream seems
likea good idea:
 
> 
> https://src.fedoraproject.org/rpms/postgresql/pull-request/29
> 

I wrote a new patch based on the bug discussion[1].  It works around
the issue specifically on s390x rather than disabling specific
CPUs and features for all targets.  The patch is attached.


[1] https://www.postgresql.org/message-id/flat/16971-5d004d34742a3d35%40postgresql.org


> Regards,
> Honza
> 


Attachment

Re: Do we work with LLVM 12 on s390x?

From
Honza Horak
Date:
On 4/22/21 6:35 PM, Tom Stellard wrote:
> On 4/21/21 6:40 AM, Honza Horak wrote:
>> On 3/19/21 8:15 PM, Tom Lane wrote:
>>> Andres Freund <andres@anarazel.de> writes:
>>>> I think the error above comes from a "mismatch" between the clang used
>>>> to compile bitcode, and the LLVM version linked to. Normally we're
>>>> somewhat tolerant of differences between the two, but there was an ABI
>>>> change at some point, leading to that error.  IIRC I hit that, but it
>>>> vanished as soon as I used a matching libllvm and clang.
>>>
>>> Thanks, I passed that advice on.
>>>
>>>             regards, tom lane
>>
>> Tom Stellard was so kind to look at this issue deeper with his LLVM 
>> skills and found PostgreSQL is not actually handling the LLVM 
>> perfectly. He's working on improving the patch, but sharing even the 
>> first attempt with upstream seems like a good idea:
>>
>> https://src.fedoraproject.org/rpms/postgresql/pull-request/29
>>
> 
> I wrote a new patch based on the bug discussion[1].  It works around
> the issue specifically on s390x rather than disabling specific
> CPUs and features for all targets.  The patch is attached.
> 
> 
> [1] 
> https://www.postgresql.org/message-id/flat/16971-5d004d34742a3d35%40postgresql.org 

Thanks, Tom, it looks good in koji build, so merging so far. We very 
much appreciate your help here.

Cheers,
Honza

> 
>> Regards,
>> Honza
>>
> 




Re: Do we work with LLVM 12 on s390x?

From
Tom Stellard
Date:
On 4/22/21 3:25 PM, Andres Freund wrote:
> Hi,
> 
> On 2021-04-22 09:35:48 -0700, Tom Stellard wrote:
>> On 4/21/21 6:40 AM, Honza Horak wrote:
>> I wrote a new patch based on the bug discussion[1].  It works around
>> the issue specifically on s390x rather than disabling specific
>> CPUs and features for all targets.  The patch is attached.
> 
> Cool, this is a pretty clear improvement. There's a few minor things I'd
> change to fit it into PG - do you mind if I send that to the thread at
> [1] for you to test before I push it?
> 

Sure, no problem.
> 
>> +/*
>> + * For the systemz target, LLVM uses a different datalayout for z13 and newer
>> + * CPUs than it does for older CPUs.  This can cause a mismatch in datalayouts
>> + * in the case where the llvm_types_module is compiled with a pre-z13 CPU
>> + * and the JIT is running on z13 or newer.
>> + * See computeDataLayout() function in
>> + * llvm/lib/Target/SystemZ/SystemZTargetMachine.cpp for information on the
>> + * datalayout differences.
>> + */
>> +static bool
>> +needs_systemz_workaround(void)
>> +{
>> +    bool ret = false;
>> +    LLVMContextRef llvm_context;
>> +    LLVMTypeRef vec_type;
>> +    LLVMTargetDataRef llvm_layoutref;
>> +    if (strncmp(LLVMGetTargetName(llvm_targetref), "systemz", strlen("systemz")))
>> +    {
>> +        return false;
>> +    }
>> +
>> +    llvm_context = LLVMGetModuleContext(llvm_types_module);
>> +    vec_type = LLVMVectorType(LLVMIntTypeInContext(llvm_context, 32), 4);
>> +    llvm_layoutref = LLVMCreateTargetData(llvm_layout);
>> +    ret = (LLVMABIAlignmentOfType(llvm_layoutref, vec_type) == 16);
>> +    LLVMDisposeTargetData(llvm_layoutref);
>> +    return ret;
>> +}
> 
> I wonder if it'd be better to compare LLVMCopyStringRepOfTargetData() of
> the llvm_types_module with the one of the JIT target machine, and only
> specify -vector in that case? We currently support older LLVM versions
> than the one that introduced the vector specific handling for systemz,
> and I don't know what'd happen if we unnecessarily specified -vector.
> 

The problem is that you have to pass the features to LLVMCreateTargetMachine
in order to know what the data layout of the JIT target is going to be,
so the only way to make this work, would be to create  the TargetMachine
with the default features, check it's datalayout, and then re-create the
TargetMachine in order to apply the workaround.  Maybe that's not so bad?

The other question I had is should we #ifdef ARCH_S390x in
needs_sytemz_workaround(), so we don't need to check the target
name.

-Tom


> Greetings,
> 
> Andres Freund
>