Re: Lazy JIT IR code generation to increase JIT speed with partitions - Mailing list pgsql-hackers
From | David Geier |
---|---|
Subject | Re: Lazy JIT IR code generation to increase JIT speed with partitions |
Date | |
Msg-id | 201ee9fe-25f4-acd8-bc0f-1984df3c2d07@gmail.com Whole thread Raw |
In response to | Re: Lazy JIT IR code generation to increase JIT speed with partitions (Dmitry Dolgov <9erthalion6@gmail.com>) |
Responses |
Re: Lazy JIT IR code generation to increase JIT speed with partitions
|
List | pgsql-hackers |
Hi, Thanks for taking a look! On 12/1/22 22:12, Dmitry Dolgov wrote: > > First to summarize things a bit: from what I understand there are two > suggestions on the table, one is about caching modules when doing > inlining, the second is about actual lazy jitting. Are those to tightly > coupled together, could they be split into two separate patches? It > would make it a bit easier to review and test. Yes. > > I was playing with the caching part of the patch (still have to think > about the lazy jitting), which generally seems to be a good idea. From > what I see the main concern here is a chance that IRMover will rename > types, degrading performance of the generated code in long run. I have > to admit, I'm not fully sure mentioned LLVM 13 fix [1] completely > eliminates those concerns, somehow its description is formulated in not > very committing way ("don't know if this patch fixes ..., but it does > fix a few soundness issues that have crept in."). But I haven't found > any crashes or false asserts coming from the LLVM side (using v13), > running several rounds of regression tests with forced jitting, so a > point to the fix. > > I would be curious to learn how Andres was troubleshooting type renaming > issues? Using LLVM 13 from packages, jitting the same query twice and > dumping the bitcode out showed some difference in types a-la > "%struct.ExprState.142" vs "%struct.ExprState.430" (from what I > understood, the whole thing is an identifier, including the number) with > the patch, but the very same changes are happening on the main branch as > well. Of course, I was inspecting bitcode only for certain queries, it > doesn't exclude that some other examples actually feature type renaming. > > In general, it would be interesting to know how to do some sort of "smoke > tests" for the generated code, e.g. in case if LLVM has fixed this > particular issue, but they might reappear in the future? +1, especially with my findings described below. > I did few performance tests and got numbers similar to posted in the > thread, inlining time being impressively reduced (~10 times) as well as > (suspiciously) optimization time (~1.5 times). The approach was a bit > different though, I've run the sequence of example queries from the > thread using pgbench and checked jit counters from pgss. > > Few small notes: > > If caching of modules is safe from LLVM >= 13, I guess it should be > wrapped into a corresponding condition on LLVM_VERSION_MAJOR, right? > > Why the assert about hasExternalLinkage was removed from the > llvmjit_inline.cpp? > > For so much discussion about such a small change there is definitely not > enough commentaries in the code about dangers of cloning modules. > > Also, maybe a stupid question, but how big this cache should be? From > what I understand it will be cleared on llvm_shutdown, does it mean it > can grow unbounded if e.g. a single session is firing all the time > different queries at the db? The cache doesn't contain the modules generated for a query but the bitcode files with the to-be-inlined functions stored on disk. So the cache would stop growing as soon as a connection process has loaded all of them. Nevertheless, if a workload uses many connections and truly all or at least most of the bitcode files that could be quite some memory. Beyond that I made an unfortunate discovery: the inlining time goes down by so much because no inlining is happening anymore :-/. This is because I cloned the module only when passing it to the IRMover, not right after taking it from the cache. However, after the module is taken from the cache it's still modified. llvm_execute_inline_plan() executes if (valueToImport->hasExternalLinkage()) { valueToImport->setLinkage(LinkageTypes::AvailableExternallyLinkage); } But when checking if a function is inlinable in function_inlinable() we check if (F.hasAvailableExternallyLinkage()) return false; which makes the code bail out for any function to be inlined. It's very curious as to why we didn't really see that when dumping the bitcode. It seems like the bitcode is always different enough to not spot that. So +1 on your point on having a smoke test in place to verify things still work. If I change the code to clone the module right after taking it from the cache and before making the changes to it, the JIT time remains high and seems even higher than when re-reading the file from disk. Profiling showed that llvm::CloneModule() is just super slow, especially for big bitcode files like numeric.bc. I haven't investigated why that is and if we can do something about it. I also don't plan to do so for the moment being. For reference, I attached the patch which only contains the caching code. It's the variant that clones early. -- David Geier (ServiceNow)
Attachment
pgsql-hackers by date: