Re: JIT compiling with LLVM v9.0 - Mailing list pgsql-hackers

From Robert Haas
Subject Re: JIT compiling with LLVM v9.0
Date
Msg-id CA+Tgmobx7UkmsXFC-mb7Az7_50aVdQC9a=fpi20CrJzmZE1RxA@mail.gmail.com
Whole thread Raw
In response to JIT compiling with LLVM v9.0  (Andres Freund <andres@anarazel.de>)
Responses Re: JIT compiling with LLVM v9.0
List pgsql-hackers
On Wed, Jan 24, 2018 at 2:20 AM, Andres Freund <andres@anarazel.de> wrote:
> == Error handling ==
>
> There's two aspects to error handling.
>
> Firstly, generated (LLVM IR) and emitted functions (mmap()ed segments)
> need to be cleaned up both after a successful query execution and after
> an error.  I've settled on a fairly boring resowner based mechanism. On
> errors all expressions owned by a resowner are released, upon success
> expressions are reassigned to the parent / released on commit (unless
> executor shutdown has cleaned them up of course).

Cool.

> A second, less pretty and newly developed, aspect of error handling is
> OOM handling inside LLVM itself. The above resowner based mechanism
> takes care of cleaning up emitted code upon ERROR, but there's also the
> chance that LLVM itself runs out of memory. LLVM by default does *not*
> use any C++ exceptions. It's allocations are primarily funneled through
> the standard "new" handlers, and some direct use of malloc() and
> mmap(). For the former a 'new handler' exists
> http://en.cppreference.com/w/cpp/memory/new/set_new_handler for the
> latter LLVM provides callback that get called upon failure
> (unfortunately mmap() failures are treated as fatal rather than OOM
> errors).
> What I've chosen to do, and I'd be interested to get some input about
> that, is to have two functions that LLVM using code must use:
>   extern void llvm_enter_fatal_on_oom(void);
>   extern void llvm_leave_fatal_on_oom(void);
> before interacting with LLVM code (ie. emitting IR, or using the above
> functions) llvm_enter_fatal_on_oom() needs to be called.
>
> When a libstdc++ new or LLVM error occurs, the handlers set up by the
> above functions trigger a FATAL error. We have to use FATAL rather than
> ERROR, as we *cannot* reliably throw ERROR inside a foreign library
> without risking corrupting its internal state.

That bites, although it's probably tolerable if we expect such errors
only in exceptional situations such as a needed shared library failing
to load or something. Killing the session when we run out of memory
during JIT compilation is not very nice at all.  Does the LLVM library
have any useful hooks that we can leverage here, like a hypothetical
function LLVMProvokeFailureAsSoonAsConvenient()?  The equivalent
function for PostgreSQL would do { InterruptPending = true;
QueryCancelPending = true; }.  And maybe LLVMSetProgressCallback()
that would get called periodically and let us set a handler that could
check for interrupts on the PostgreSQL side and then call
LLVMProvokeFailureAsSoonAsConvenient() as applicable?  This problem
can't be completely unique to PostgreSQL; anybody who is using LLVM
for JIT from a long-running process needs a solution, so you might
think that the library would provide one.

> This facility allows us to get the bitcode for all operators
> (e.g. int8eq, float8pl etc), without maintaining two copies. The way
> I've currently set it up is that, if --with-llvm is passed to configure,
> all backend files are also compiled to bitcode files.  These bitcode
> files get installed into the server's
>   $pkglibdir/bitcode/postgres/
> under their original subfolder, eg.
>   ~/build/postgres/dev-assert/install/lib/bitcode/postgres/utils/adt/float.bc
> Using existing LLVM functionality (for parallel LTO compilation),
> additionally an index is over these is stored to
>   $pkglibdir/bitcode/postgres.index.bc

That sounds pretty sweet.

> When deciding to JIT for the first time, $pkglibdir/bitcode/ is scanned
> for all .index.bc files and a *combined* index over all these files is
> built in memory.  The reason for doing so is that that allows "easy"
> access to inlining access for extensions - they can install code into
>   $pkglibdir/bitcode/[extension]/
> accompanied by
>   $pkglibdir/bitcode/[extension].index.bc
> just alongside the actual library.

But that means that if an extension is installed after the initial
scan has been done, concurrent sessions won't notice the new files.
Maybe that's OK, but I wonder if we can do better.

> Do people feel these should be hidden behind #ifdefs, always present but
> prevent from being set to a meaningful, or unrestricted?

We shouldn't allow non-superusers to set any GUC that dumps files to
the data directory or provides an easy to way to crash the server, run
the machine out of memory, or similar.  GUCs that just print stuff, or
make queries faster/slower, can be set by anyone, I think.  I favor
having the debugging stuff available in the default build.  This
feature has a chance of containing bugs, and those bugs will be hard
to troubleshoot if the first step in getting information on what went
wrong is "recompile".

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: JIT compiling with LLVM v9.0
Next
From: Robert Haas
Date:
Subject: Re: [HACKERS] MERGE SQL Statement for PG11