Re: terminate called after throwing an instance of 'std::bad_alloc' - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: terminate called after throwing an instance of 'std::bad_alloc' |
Date | |
Msg-id | 20201007230242.rnwsbnxywvthef3w@alap3.anarazel.de Whole thread Raw |
In response to | Re: terminate called after throwing an instance of 'std::bad_alloc' (Andres Freund <andres@anarazel.de>) |
Responses |
Re: terminate called after throwing an instance of 'std::bad_alloc'
|
List | pgsql-hackers |
Hi, Sorry for the second send of this email, but somehow I managed to mangle the headers in the last version I sent. On 2020-10-03 19:01:27 -0700, Andres Freund wrote: > On 2020-10-03 18:30:46 -0500, Justin Pryzby wrote: > > On Sat, Oct 03, 2020 at 04:01:49PM -0700, Andres Freund wrote: > > > > I was able to make a small, apparent leak like so. This applies to > > > > postgis3.0/postgres12/LLVM5/centos7, and > > > > postgis3.1alpha/postgres13/LLVM9/centos8. > > > > > > > > Inlining is essential to see the leak, optimization is not. Showing every 9th > > > > query: > > > > > > > > $ for a in `seq 1 99`; do echo "SET jit_inline_above_cost=1; SET jit_optimize_above_cost=-1; SET jit_above_cost=0;SET jit_expressions=on; SET log_executor_stats=on; SET client_min_messages=debug; SET jit=on; explain analyzeSELECT st_x(site_geo) FROM sites WHERE site_geo IS NOT NULL;"; done |psql ts 2>&1 |grep 'resident' |sed -n '1~9p' > > > > ! 46024 kB max resident size > > > > ! 46492 kB max resident size > > > > > > Huh, that's certainly not good. Looking. > > > > Reproduce like: > > postgres=# CREATE EXTENSION postgis; > > postgres=# CREATE TABLE geo(g geometry(point)); > > postgres=# INSERT INTO geo SELECT st_GeometryFromText('POINT(11 12)',0) FROM generate_series(1,99999); > > postgres=# SET jit_above_cost=0; SET jit_inline_above_cost=0; SET client_min_messages=debug; SET log_executor_stats=on;explain analyze SELECT st_x(g) FROM geo; > > It's an issue inside LLVM. I'm not yet sure how to avoid it. Basically > whenever we load an IR module to inline from, it renames its type names > to make them not conflict with already known type names in the current > LLVMContext. That, over time, leads to a large number of type names in > memory. We can avoid the re-loading of the module by instead cloning it, > but that still leads to new types being created for some internal > reasons. It's possible to create/drop separate LLVMContext instances, > but we'd have to figure out when to do so - it's too expensive to always > do that. I spent a lot of time (probably too much time) trying to find a good way out of that. I think I mostly figured it out, but it's gonna take me a bit longer to nail down the loose ends. The nice part is that it actually speeds up inlining a lot. The bad part is that it's not a small change, so I don't yet know how to deal with existing releases. Details: The issue with the leaks is basically that when LLVM loads a bitcode module it doesn't "unify" named types with types in other bitcode module - they could after all be entirely independent. That means that we end up with a lot of types after a while of running. That ought to be addressable by not re-loading modules we inline from from disk every time we inline, but unfortunately that doesn't work well either: The "cross module import" logic unfortunately modifies the types used in the "source module", leading to subsequent imports not working well. That last issue is why I had moved to re-loading modules from "disk" during import. The partial patch I've written doesn't use the existing cross module import code anymore, but moves to the lower level functionality it uses. For traditional compilers, which is what that linker is mainly written for, it makes sense to reduce memory usage by "destructively" moving code from the source into the target - it'll only be used once. But for the JIT case it's better to do so non-destructively. That requires us to implement some "type resolution" between the source and target modules, but then gains not needing to reload / copy the source module anymore, saving a lot of the costs. It's possible that for existing releases we ought to go for a simpler approach, even if it's not as good. The only really feasible way I see is to basically count the number of times inlining has been done, and then just re-initialize after a certain time. But that's pretty darn yucky. Greetings, Andres Freund
pgsql-hackers by date: