Re: terminate called after throwing an instance of 'std::bad_alloc' - Mailing list pgsql-hackers
From | Justin Pryzby |
---|---|
Subject | Re: terminate called after throwing an instance of 'std::bad_alloc' |
Date | |
Msg-id | 20210418001324.GP3315@telsasoft.com Whole thread Raw |
In response to | Re: terminate called after throwing an instance of 'std::bad_alloc' (Justin Pryzby <pryzby@telsasoft.com>) |
Responses |
Re: terminate called after throwing an instance of 'std::bad_alloc'
|
List | pgsql-hackers |
On Fri, Apr 16, 2021 at 10:18:37PM -0500, Justin Pryzby wrote: > On Fri, Apr 16, 2021 at 09:48:54PM -0500, Justin Pryzby wrote: > > On Fri, Apr 16, 2021 at 07:17:55PM -0700, Andres Freund wrote: > > > Hi, > > > > > > On 2020-12-18 17:56:07 -0600, Justin Pryzby wrote: > > > > I'd be happy to run with a prototype fix for the leak to see if the other issue > > > > does (not) recur. > > > > > > I just posted a prototype fix to https://www.postgresql.org/message-id/20210417021602.7dilihkdc7oblrf7%40alap3.anarazel.de > > > (just because that was the first thread I re-found). It'd be cool if you > > > could have a look! > > > > This doesn't seem to address the problem triggered by the reproducer at > > https://www.postgresql.org/message-id/20210331040751.GU4431@telsasoft.com > > (sorry I didn't CC you) > > I take that back - I forgot that this doesn't release RAM until hitting a > threshold. I tried this on the query that was causing the original c++ exception. It still grows to 2GB within 5min. PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 23084 postgres 20 0 2514364 1.6g 29484 R 99.7 18.2 3:40.87 postgres: telsasoft ts 192.168.122.11(50892) SELECT PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 23084 postgres 20 0 3046960 2.1g 29484 R 100.0 24.1 4:30.64 postgres: telsasoft ts 192.168.122.11(50892) SELECT PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 23084 postgres 20 0 4323500 3.3g 29488 R 99.7 38.4 8:20.63 postgres: telsasoft ts 192.168.122.11(50892) SELECT When I first reported this issue, the affected process was a long-running, single-threaded python tool. We since updated it (partially to avoid issues like this) to use multiprocessing, therefor separate postgres backends. I'm now realizing that that's RAM use for a single query, not from continuous leaks across multiple queries. This is still true with the patch even if I #define LLVMJIT_LLVM_CONTEXT_REUSE_MAX 1 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 28438 postgres 20 0 3854264 2.8g 29428 R 98.7 33.2 8:56.79 postgres: telsasoft ts 192.168.122.11(53614) BIND python3 ./jitleak.py # runs telsasoft reports INFO: recreating LLVM context after 2 uses INFO: recreating LLVM context after 2 uses INFO: recreating LLVM context after 2 uses INFO: recreating LLVM context after 2 uses INFO: recreating LLVM context after 2 uses PID 27742 finished running report; est=None rows=40745; cols=34; ... duration:538 INFO: recreating LLVM context after 81492 uses I did: - llvm_llvm_context_reuse_count = 0; Assert(llvm_context != NULL); + elog(INFO, "recreating LLVM context after %zu uses", llvm_llvm_context_reuse_count); + llvm_llvm_context_reuse_count = 0; Maybe we're missing this condition somehow ? if (llvm_jit_context_in_use_count == 0 && Also, I just hit this assertion by cancelling the query with ^C / sigint. But I don't have a reprodcer for it. < 2021-04-17 19:14:23.509 ADT telsasoft >PANIC: LLVMJitContext in use count not 0 at exit (is 1) -- Justin
pgsql-hackers by date: