Thread: unconstrained memory growth in long running procedure stored procedure after upgrading 11-12
unconstrained memory growth in long running procedure stored procedure after upgrading 11-12
From
Merlin Moncure
Date:
Hello all, We just upgraded from postgres 11 to 12.6 and our server is running out of memory and rebooting about 1-2 times a day. Application architecture is a single threaded stored procedure, executed with CALL that loops and never terminates. With postgres 11 we had no memory issues. Ultimately the crash looks like this: terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc 2021-03-29 04:34:31.262 CDT [1413] LOG: server process (PID 9792) was terminated by signal 6: Aborted 2021-03-29 04:34:31.262 CDT [1413] DETAIL: Failed process was running: CALL Main() 2021-03-29 04:34:31.262 CDT [1413] LOG: terminating any other active server processes 2021-03-29 04:34:31.264 CDT [9741] WARNING: terminating connection because of crash of another server process 2021-03-29 04:34:31.264 CDT [9741] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2021-03-29 04:34:31.264 CDT [9741] HINT: In a moment you should be able to reconnect to the database and repeat your command. 2021-03-29 04:34:31.267 CDT [1413] LOG: archiver process (PID 9742) exited with exit code 1 2021-03-29 04:34:31.267 CDT [1413] LOG: all server processes terminated; reinitializing Attached is a self contained test case which reproduces the problem. Instructions: 1. run the attached script in psql, pgtask_test.sql. It will create a database, initialize it, and run the main procedure. dblink must be available 2. in another window, run SELECT CreateTaskChain('test', 'DEV'); In the console that ran main(), you should see output that the procedure began to do work. Once it does, a 'top' should show resident memory growth immediately. It's about a gigabyte an hour in my test. Sorry for the large-ish attachment. merlin
Attachment
Re: unconstrained memory growth in long running procedure stored procedure after upgrading 11-12
From
Justin Pryzby
Date:
On Tue, Mar 30, 2021 at 04:17:03PM -0500, Merlin Moncure wrote: > Hello all, > > We just upgraded from postgres 11 to 12.6 and our server is running > out of memory and rebooting about 1-2 times a day. Application > architecture is a single threaded stored procedure, executed with CALL > that loops and never terminates. With postgres 11 we had no memory > issues. Ultimately the crash looks like this: > > terminate called after throwing an instance of 'std::bad_alloc' > what(): std::bad_alloc > 2021-03-29 04:34:31.262 CDT [1413] LOG: server process (PID 9792) was > terminated by signal 6: Aborted I haven't tried your test, but this sounds a lot like the issue I reported with JIT, which is enabled by default in v12. https://www.postgresql.org/docs/12/release-12.html Enable Just-in-Time (JIT) compilation by default, if the server has been built with support for it (Andres Freund) Note that this support is not built by default, but has to be selected explicitly while configuring the build. https://www.postgresql.org/message-id/20201001021609.GC8476%40telsasoft.com terminate called after throwing an instance of 'std::bad_alloc' I suggest to try ALTER SYSTEM SET jit_inline_above_cost=-1; SELECT pg_reload_conf(); > memory growth immediately. It's about a gigabyte an hour in my test. > Sorry for the large-ish attachment. Your reproducer is probably much better than mine was. -- Justin
Re: unconstrained memory growth in long running procedure stored procedure after upgrading 11-12
From
Tom Lane
Date:
Justin Pryzby <pryzby@telsasoft.com> writes: > On Tue, Mar 30, 2021 at 04:17:03PM -0500, Merlin Moncure wrote: >> We just upgraded from postgres 11 to 12.6 and our server is running >> out of memory and rebooting about 1-2 times a day. > I haven't tried your test, but this sounds a lot like the issue I reported with > JIT, which is enabled by default in v12. FWIW, I just finished failing to reproduce any problem with that test case ... but I was using a non-JIT-enabled build. regards, tom lane
Re: unconstrained memory growth in long running procedure stored procedure after upgrading 11-12
From
Justin Pryzby
Date:
On Tue, Mar 30, 2021 at 04:17:03PM -0500, Merlin Moncure wrote: > Instructions: > 1. run the attached script in psql, pgtask_test.sql. It will create a > database, initialize it, and run the main procedure. dblink must be > available > 2. in another window, run SELECT CreateTaskChain('test', 'DEV'); For your reproducer, I needed to: 1.1) comment this: |INSERT INTO Task SELECT | -- 'test', 1.2) then run: CALL MAIN(); Anyway I reproduced this without an extension this time: CREATE OR REPLACE FUNCTION cfn() RETURNS void LANGUAGE PLPGSQL AS $$ declare a record; begin FOR a IN SELECT generate_series(1,99)LOOP PERFORM format('select 1'); END LOOP; END $$; $ yes 'SET jit_above_cost=0; SET jit_inline_above_cost=0; SET jit=on; SET client_min_messages=debug; SET log_executor_stats=on;SELECT cfn();' |head -11 |psql 2>&1 |grep 'max resident' ! 33708 kB max resident size ! 35956 kB max resident size ! 37800 kB max resident size ! 40300 kB max resident size ! 41928 kB max resident size ! 43928 kB max resident size ! 48496 kB max resident size ! 48964 kB max resident size ! 50460 kB max resident size ! 52272 kB max resident size ! 53740 kB max resident size There's also a relatively microscopic leak even if inline is off. It may be that this is what I reproduced last time - I couldn't see how a few hundred kB leak was causing a our process to be GB sized. It may or may not be a separate issue, though. -- Justin
Re: unconstrained memory growth in long running procedure stored procedure after upgrading 11-12
From
Merlin Moncure
Date:
On Tue, Mar 30, 2021 at 7:14 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Justin Pryzby <pryzby@telsasoft.com> writes: > > On Tue, Mar 30, 2021 at 04:17:03PM -0500, Merlin Moncure wrote: > >> We just upgraded from postgres 11 to 12.6 and our server is running > >> out of memory and rebooting about 1-2 times a day. > > > I haven't tried your test, but this sounds a lot like the issue I reported with > > JIT, which is enabled by default in v12. > > FWIW, I just finished failing to reproduce any problem with that > test case ... but I was using a non-JIT-enabled build. Yep. Disabling jit (fully, fia jit=off, not what was suggested upthread) eliminated the issue, or at least highly mitigated the leak. I was using pgdg rpm packaging, which enables jit by default. Thanks everyone for looking at this, and the workaround is quick and easy. merlin