Re: BUG #16971: Incompatible datalayout errors with llvmjit - Mailing list pgsql-bugs

From Andres Freund
Subject Re: BUG #16971: Incompatible datalayout errors with llvmjit
Date
Msg-id 20210420225228.qr4x6zv3hqjorh5t@alap3.anarazel.de
Whole thread Raw
In response to BUG #16971: Incompatible datalayout errors with llvmjit  (PG Bug reporting form <noreply@postgresql.org>)
List pgsql-bugs
Hi,

On 2021-04-20 14:42:28 -0700, Tom Stellard wrote:
> On 4/20/21 12:29 PM, Andres Freund wrote:
> > On 2021-04-19 18:29:52 +0000, PG Bug reporting form wrote:
> > > In our Fedora builds, we are getting errors[1] in the postgresql tests due
> > > to incompatible datalayouts between the JIT engine and the LLVM modules
> > > being compiled.  The problem is that the JIT engine is being created with
> > > host specific CPU and features, while the datalayout for the compiled module
> > > is being taken from llvmjit_types.bc which is compiled without any specified
> > > CPU type or features.
> >
> > It's very odd that features would change the data layout - analogizing
> > with plain C code that'd mean that you cannot link a binary compiled
> > with something like -mavx2 against a library compiled without. To me
> > this smells like a bug somewhere lower level.

> You are correct that is odd, and to be honest, I didn't think that LLVM
> targets were allowed to change the datalayout based on the CPU type.

That was my impression...


> > Reformatting the error yields:
> > ERROR:  failed to JIT module: Added modules have incompatible data layouts:
> > E-m:e-i1:8:16-i8:8:16-i64:64-f128:64-        a:8:16-n32:64 (module) vs
> > E-m:e-i1:8:16-i8:8:16-i64:64-f128:64-v128:64-a:8:16-n32:64 (jit)
> >
> > The -v128:64 is about how to align vectors. Skimming the relevant LLVM
> > code I don't see why it'd be included in JIted code but not native code.

> The relevant code in LLVM is here:
> https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/SystemZ/SystemZTargetMachine.cpp#L88

Thanks for the pointer!


> I'm checking with upstream LLVM to see if this allowed or not.  However,
> this behavior is present in at least LLVM 11 and LLVM 12 (I haven't
> checked earlier versions), so postgresql will have to deal with this
> somehow.

Yea, seems we need to add a workaround for the issue, given how much
longer LLVM releases tend to be used than they are maintained. One
simple hack would be to add "-vector" to the list of features on s390x,
which afaict should avoid the issue for now?

In LLVM's main branch the code is this:

// Determine whether we use the vector ABI.
static bool UsesVectorABI(StringRef CPU, StringRef FS) {
  // We use the vector ABI whenever the vector facility is avaiable.
  // This is the case by default if CPU is z13 or later, and can be
  // overridden via "[+-]vector" feature string elements.
  bool VectorABI = true;
  bool SoftFloat = false;
  if (CPU.empty() || CPU == "generic" ||
      CPU == "z10" || CPU == "z196" || CPU == "zEC12" ||
      CPU == "arch8" || CPU == "arch9" || CPU == "arch10")
    VectorABI = false;

  SmallVector<StringRef, 3> Features;
  FS.split(Features, ',', -1, false /* KeepEmpty */);
  for (auto &Feature : Features) {
    if (Feature == "vector" || Feature == "+vector")
      VectorABI = true;
    if (Feature == "-vector")
      VectorABI = false;
    if (Feature == "soft-float" || Feature == "+soft-float")
      SoftFloat = true;
    if (Feature == "-soft-float")
      SoftFloat = false;
  }

  return VectorABI && !SoftFloat;
}

So appending -vector should be sufficient?


But we'd have to do so only after checking that there's a data layout
mismatch, because otherwise we'd just create a new problem if somebody
compiles with -march=native or such.


Greetings,

Andres Freund



pgsql-bugs by date:

Previous
From: Tom Stellard
Date:
Subject: Re: BUG #16971: Incompatible datalayout errors with llvmjit
Next
From: Michael Paquier
Date:
Subject: Re: BUG #16972: parameter parallel_leader_participation's category problem