Re: Lazy JIT IR code generation to increase JIT speed with partitions - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Lazy JIT IR code generation to increase JIT speed with partitions
Date
Msg-id 20220704203222.5my22fcsdqjr7ejj@awork3.anarazel.de
Whole thread Raw
In response to Re: Lazy JIT IR code generation to increase JIT speed with partitions  (David Geier <geidav.pg@gmail.com>)
Responses Re: Lazy JIT IR code generation to increase JIT speed with partitions
List pgsql-hackers
Hi,

On 2022-06-27 16:55:55 +0200, David Geier wrote:
> Indeed, the total JIT time increases the more modules are used. The reason
> for this to happen is that the inlining pass loads and deserializes all to
> be inlined modules (.bc files) from disk prior to inlining them via
> llvm::IRMover. There's already a cache for such modules in the code, but it
> is currently unused. This is because llvm::IRMover takes the module to be
> inlined as std::unique_ptr<llvm::Module>. The by-value argument requires
> the source module to be moved, which means it cannot be reused afterwards.
> The code is accounting for that by erasing the module from the cache after
> inlining it, which in turns requires reloading the module next time a
> reference to it is encountered.
>
> Instead of each time loading and deserializing all to be inlined modules
> from disk, they can reside in the cache and instead be cloned via
> llvm::CloneModule() before they get inlined. Key to calling
> llvm::CloneModule() is fully deserializing the module upfront, instead of
> loading the module lazily. That is why I changed the call from
> LLVMGetBitcodeModuleInContext2() (which lazily loads the module via
> llvm::getOwningLazyBitcodeModule()) to LLVMParseBitCodeInContext2() (which
> fully loads the module via llvm::parseBitcodeFile()). Beyond that it seems
> like that prior to LLVM 13, cloning modules could fail with an assertion
> (not sure though if that would cause problems in a release build without
> assertions). Andres reported this problem back in the days here [1]. In the
> meanwhile the issue got discussed in [2] and finally fixed for LLVM 13, see
> [3].

Unfortunately that doesn't work right now - that's where I had started. The
problem is that IRMover renames types. Which, in the case of cloned modules
unfortunately means that types used cloned modules are also renamed in the
"origin" module. Which then causes problems down the line, because parts of
the LLVM code match types by type names.

That can then have the effect of drastically decreasing code generation
quality over time, because e.g. inlining never manages to find signatures
compatible.


> However, curiously the time spent on optimizing is also reduced (95ms
> instead of 164ms). Could this be because some of the applied optimizations
> are ending up in the cached module?

I suspect it's more that optimization stops being able to do a lot, due to the
type renamign issue.


> @Andres: could you provide me with the queries that caused the assertion
> failure in LLVM?

I don't think I have the concrete query. What I tend to do is to run the whole
regression tests with forced JITing. I'm fairly certain this triggered the bug
at the time.


> Have you ever observed a segfault with a non-assert-enabled build?

I think I observed bogus code generation that then could lead to segfaults or
such.


> I just want to make sure this is truly fixed in LLVM 13. Running 'make
> check-world' all tests passed.

With jit-ing forced for everything?

One more thing to try is to jit-compile twice and ensure the code is the
same. It certainly wasn't in the past due to the above issue.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Re: Export log_line_prefix(); useful for emit_log_hook.
Next
From: Jaime Casanova
Date:
Subject: Re: doc: BRIN indexes and autosummarize