Re: Lazy JIT IR code generation to increase JIT speed with partitions - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Lazy JIT IR code generation to increase JIT speed with partitions |
Date | |
Msg-id | 20220704203222.5my22fcsdqjr7ejj@awork3.anarazel.de Whole thread Raw |
In response to | Re: Lazy JIT IR code generation to increase JIT speed with partitions (David Geier <geidav.pg@gmail.com>) |
Responses |
Re: Lazy JIT IR code generation to increase JIT speed with partitions
|
List | pgsql-hackers |
Hi, On 2022-06-27 16:55:55 +0200, David Geier wrote: > Indeed, the total JIT time increases the more modules are used. The reason > for this to happen is that the inlining pass loads and deserializes all to > be inlined modules (.bc files) from disk prior to inlining them via > llvm::IRMover. There's already a cache for such modules in the code, but it > is currently unused. This is because llvm::IRMover takes the module to be > inlined as std::unique_ptr<llvm::Module>. The by-value argument requires > the source module to be moved, which means it cannot be reused afterwards. > The code is accounting for that by erasing the module from the cache after > inlining it, which in turns requires reloading the module next time a > reference to it is encountered. > > Instead of each time loading and deserializing all to be inlined modules > from disk, they can reside in the cache and instead be cloned via > llvm::CloneModule() before they get inlined. Key to calling > llvm::CloneModule() is fully deserializing the module upfront, instead of > loading the module lazily. That is why I changed the call from > LLVMGetBitcodeModuleInContext2() (which lazily loads the module via > llvm::getOwningLazyBitcodeModule()) to LLVMParseBitCodeInContext2() (which > fully loads the module via llvm::parseBitcodeFile()). Beyond that it seems > like that prior to LLVM 13, cloning modules could fail with an assertion > (not sure though if that would cause problems in a release build without > assertions). Andres reported this problem back in the days here [1]. In the > meanwhile the issue got discussed in [2] and finally fixed for LLVM 13, see > [3]. Unfortunately that doesn't work right now - that's where I had started. The problem is that IRMover renames types. Which, in the case of cloned modules unfortunately means that types used cloned modules are also renamed in the "origin" module. Which then causes problems down the line, because parts of the LLVM code match types by type names. That can then have the effect of drastically decreasing code generation quality over time, because e.g. inlining never manages to find signatures compatible. > However, curiously the time spent on optimizing is also reduced (95ms > instead of 164ms). Could this be because some of the applied optimizations > are ending up in the cached module? I suspect it's more that optimization stops being able to do a lot, due to the type renamign issue. > @Andres: could you provide me with the queries that caused the assertion > failure in LLVM? I don't think I have the concrete query. What I tend to do is to run the whole regression tests with forced JITing. I'm fairly certain this triggered the bug at the time. > Have you ever observed a segfault with a non-assert-enabled build? I think I observed bogus code generation that then could lead to segfaults or such. > I just want to make sure this is truly fixed in LLVM 13. Running 'make > check-world' all tests passed. With jit-ing forced for everything? One more thing to try is to jit-compile twice and ensure the code is the same. It certainly wasn't in the past due to the above issue. Greetings, Andres Freund
pgsql-hackers by date: