Thread: Oom on temp (un-analyzed table caused by JIT) V16.1
Daniel,
You have a commit [1] that MIGHT fix this.
I have a script that recreates the problem, using random data in pg_temp.
And a nested cursor.
It took me a few days to reduce this from actual code that was experiencing this. If I turn off JIT, the problem goes away. (if I don't FETCH the first row, the memory loss does not happen. Maybe because opening a cursor is more decoration/prepare)
I don't have an easy way to test this script right now against the commit.
I don't have an easy way to test this script right now against the commit.
I am hopeful that your fix fixes this.
This was my first OOM issue in PG in 3yrs of working with it.
The problem goes away if the TABLE is analyzed, or JIT is disabled.
The current script, if run, will consume about 25% of my system memory (10GB).
Just call the function below until it dies if that's what you need. The only way to get the memory back down is to close the connection.
SELECT pg_temp.fx(497);
Surprisingly, to me, the report from pg_get_backend_memory_contexts() doesn't really show "missing memory", which I thought it would. (FWIW, we caught this with multiple rounds of testing our code, slowing down, then crashing... Is there ANY way to interrogate that we are above X% of system memory so we know to let this backend go?)
It takes about 18 minutes to run on my 4 CPU VM.
The current script, if run, will consume about 25% of my system memory (10GB).
Just call the function below until it dies if that's what you need. The only way to get the memory back down is to close the connection.
SELECT pg_temp.fx(497);
Surprisingly, to me, the report from pg_get_backend_memory_contexts() doesn't really show "missing memory", which I thought it would. (FWIW, we caught this with multiple rounds of testing our code, slowing down, then crashing... Is there ANY way to interrogate that we are above X% of system memory so we know to let this backend go?)
It takes about 18 minutes to run on my 4 CPU VM.
For now, we are going to add some ANALYZE statements to our code.
We will consider disabling JIT.
We will consider disabling JIT.
Thanks,
Kirk
Attachment
Hi
po 15. 1. 2024 v 7:24 odesílatel Kirk Wolak <wolakk@gmail.com> napsal:
Daniel,You have a commit [1] that MIGHT fix this.I have a script that recreates the problem, using random data in pg_temp.And a nested cursor.It took me a few days to reduce this from actual code that was experiencing this. If I turn off JIT, the problem goes away. (if I don't FETCH the first row, the memory loss does not happen. Maybe because opening a cursor is more decoration/prepare)
I don't have an easy way to test this script right now against the commit.I am hopeful that your fix fixes this.This was my first OOM issue in PG in 3yrs of working with it.The problem goes away if the TABLE is analyzed, or JIT is disabled.
The current script, if run, will consume about 25% of my system memory (10GB).
Just call the function below until it dies if that's what you need. The only way to get the memory back down is to close the connection.
SELECT pg_temp.fx(497);
Surprisingly, to me, the report from pg_get_backend_memory_contexts() doesn't really show "missing memory", which I thought it would. (FWIW, we caught this with multiple rounds of testing our code, slowing down, then crashing... Is there ANY way to interrogate that we are above X% of system memory so we know to let this backend go?)
I wrote simple extension that can show memory allocation from system perspective
It takes about 18 minutes to run on my 4 CPU VM.For now, we are going to add some ANALYZE statements to our code.
remember - don't run anything without VACUUM ANALYZE.
Without it, the queries can be slow - ANALYZE sets stats, VACUUM prepare visibility maps - without visibility maps index only scan cannot be used
autovacuum doesn't see into opened transactions, and autovacuum is executed in 1minute cycles. Autovacuum doesn't see temporary tables too. Temporary tables (data) are visible only from owner process.
We will consider disabling JIT.
Has sense only for bigger analytics queries.
Regards
Pavel
> On 15 Jan 2024, at 07:24, Kirk Wolak <wolakk@gmail.com> wrote: > You have a commit [1] that MIGHT fix this. > I have a script that recreates the problem, using random data in pg_temp. > And a nested cursor. Running your reproducer script in a tight loop for a fair bit of time on the v16 HEAD I cannot see any memory growth, so it's plausible that the upcoming 16.2 will work better in your environment. > It took me a few days to reduce this from actual code that was experiencing this. If I turn off JIT, the problem goesaway. (if I don't FETCH the first row, the memory loss does not happen. Maybe because opening a cursor is more decoration/prepare) > > I don't have an easy way to test this script right now against the commit. There are up to date snapshots of the upcoming v16 minor release which might make testing easier than building postgres from source? https://www.postgresql.org/download/snapshots/ Admittedly I don't know whether those are built with LLVM support but I think they might be. > I am hopeful that your fix fixes this. It seems likely, so it would be very valuable if you could try running the pre -release version of v16 before 16.2 ships to verify this. -- Daniel Gustafsson
On Mon, Jan 15, 2024 at 9:03 AM Daniel Gustafsson <daniel@yesql.se> wrote:
> On 15 Jan 2024, at 07:24, Kirk Wolak <wolakk@gmail.com> wrote:
> You have a commit [1] that MIGHT fix this.
> I have a script that recreates the problem, using random data in pg_temp.
> And a nested cursor.
Running your reproducer script in a tight loop for a fair bit of time on the
v16 HEAD I cannot see any memory growth, so it's plausible that the upcoming
16.2 will work better in your environment.
The script starts by creating 90 Million rows... In my world that part of the script, plus the indexes, etc. Takes about 8-9 minutes.
And has no memory loss.
I used the memory reported by HTOP to track the problem. I Forgot to mention this.
I am curious what you used? (Because it doesn't display symptoms [running dog slow] until the backend has about 85% of the machines memory)
There are up to date snapshots of the upcoming v16 minor release which might
make testing easier than building postgres from source?
https://www.postgresql.org/download/snapshots/
Thank you. I have assigned that task to the guy who maintains our VMs/Environments.
I will report back to you.
I will report back to you.
Kirk
po 15. 1. 2024 v 15:03 odesílatel Daniel Gustafsson <daniel@yesql.se> napsal:
> On 15 Jan 2024, at 07:24, Kirk Wolak <wolakk@gmail.com> wrote:
> You have a commit [1] that MIGHT fix this.
> I have a script that recreates the problem, using random data in pg_temp.
> And a nested cursor.
Running your reproducer script in a tight loop for a fair bit of time on the
v16 HEAD I cannot see any memory growth, so it's plausible that the upcoming
16.2 will work better in your environment.
yes
> It took me a few days to reduce this from actual code that was experiencing this. If I turn off JIT, the problem goes away. (if I don't FETCH the first row, the memory loss does not happen. Maybe because opening a cursor is more decoration/prepare)
>
> I don't have an easy way to test this script right now against the commit.
There are up to date snapshots of the upcoming v16 minor release which might
make testing easier than building postgres from source?
https://www.postgresql.org/download/snapshots/
Admittedly I don't know whether those are built with LLVM support but I think
they might be.
> I am hopeful that your fix fixes this.
It seems likely, so it would be very valuable if you could try running the pre
-release version of v16 before 16.2 ships to verify this.
--
Daniel Gustafsson
> On 15 Jan 2024, at 16:49, Kirk Wolak <wolakk@gmail.com> wrote: > On Mon, Jan 15, 2024 at 9:03 AM Daniel Gustafsson <daniel@yesql.se <mailto:daniel@yesql.se>> wrote: > The script starts by creating 90 Million rows... In my world that part of the script, plus the indexes, etc. Takes about8-9 minutes. > And has no memory loss. That's expected, the memory leak did not affect those operations. > I used the memory reported by HTOP to track the problem. I Forgot to mention this. > I am curious what you used? (Because it doesn't display symptoms [running dog slow] until the backend has about 85% ofthe machines memory) I use a combination of tools, in thise case I analyzed a build with Instruments on macOS. > There are up to date snapshots of the upcoming v16 minor release which might > make testing easier than building postgres from source? > > https://www.postgresql.org/download/snapshots/ <https://www.postgresql.org/download/snapshots/> > > Thank you. I have assigned that task to the guy who maintains our VMs/Environments. > I will report back to you. Great! -- Daniel Gustafsson
On Mon, Jan 15, 2024 at 9:03 AM Daniel Gustafsson <daniel@yesql.se> wrote:
> On 15 Jan 2024, at 07:24, Kirk Wolak <wolakk@gmail.com> wrote:
> You have a commit [1] that MIGHT fix this.
> I have a script that recreates the problem, using random data in pg_temp.
> And a nested cursor.
Running your reproducer script in a tight loop for a fair bit of time on the
v16 HEAD I cannot see any memory growth, so it's plausible that the upcoming
16.2 will work better in your environment.
Okay, I took the latest source off of git (17devel) and got it to work there in a VM.
It appears this issue is fixed. It must have been related to the issue originally tagged.
Thanks!
It appears this issue is fixed. It must have been related to the issue originally tagged.
Thanks!
Re: Oom on temp (un-analyzed table caused by JIT) V16.1 [Fixed Already]
From
Daniel Gustafsson
Date:
> On 16 Jan 2024, at 02:53, Kirk Wolak <wolakk@gmail.com> wrote: > > On Mon, Jan 15, 2024 at 9:03 AM Daniel Gustafsson <daniel@yesql.se <mailto:daniel@yesql.se>> wrote: > > On 15 Jan 2024, at 07:24, Kirk Wolak <wolakk@gmail.com <mailto:wolakk@gmail.com>> wrote: > > > You have a commit [1] that MIGHT fix this. > > I have a script that recreates the problem, using random data in pg_temp. > > And a nested cursor. > > Running your reproducer script in a tight loop for a fair bit of time on the > v16 HEAD I cannot see any memory growth, so it's plausible that the upcoming > 16.2 will work better in your environment. > > Okay, I took the latest source off of git (17devel) and got it to work there in a VM. > > It appears this issue is fixed. It must have been related to the issue originally tagged. Thanks for testing and confirming! Testing pre-release builds on real life workloads is invaluable for the development of Postgres so thank you taking the time. -- Daniel Gustafsson
On Tue, Jan 16, 2024 at 3:43 AM Daniel Gustafsson <daniel@yesql.se> wrote:
> On 16 Jan 2024, at 02:53, Kirk Wolak <wolakk@gmail.com> wrote:
>
> On Mon, Jan 15, 2024 at 9:03 AM Daniel Gustafsson <daniel@yesql.se <mailto:daniel@yesql.se>> wrote:
> > On 15 Jan 2024, at 07:24, Kirk Wolak <wolakk@gmail.com <mailto:wolakk@gmail.com>> wrote:
>...
> Okay, I took the latest source off of git (17devel) and got it to work there in a VM.
>
> It appears this issue is fixed. It must have been related to the issue originally tagged.
Thanks for testing and confirming! Testing pre-release builds on real life
workloads is invaluable for the development of Postgres so thank you taking the
time.
Daniel,
I did a little more checking and the reason I did not see the link MIGHT be because EXPLAIN did not show a JIT attempt.
I tried to use settings that FORCE a JIT... But to no avail.
I tried to use settings that FORCE a JIT... But to no avail.
I am now concerned that the problem is more hidden in my use case. Meaning I CANNOT conclude it is fixed.
But I know of NO WAY to force a JIT (I lowered costs to 1, etc. ).
You don't know a way to force at least the JIT analysis to happen? (because I already knew if JIT was off, the leak wouldn't happen).
Thanks,
Kirk Out!
PS: I assume there is no pg_jit(1) function I can call. LOL
PS: I assume there is no pg_jit(1) function I can call. LOL
On Thu, 2024-01-18 at 19:50 -0500, Kirk Wolak wrote: > I did a little more checking and the reason I did not see the link MIGHT be because EXPLAIN did not show a JIT attempt. > I tried to use settings that FORCE a JIT... But to no avail. > > I am now concerned that the problem is more hidden in my use case. Meaning I CANNOT conclude it is fixed. > But I know of NO WAY to force a JIT (I lowered costs to 1, etc. ). > > You don't know a way to force at least the JIT analysis to happen? (because I already knew if JIT was off, the leakwouldn't happen). If you set the limits to 0, you can trigger it easily: SET jit = on; SET jit_above_cost = 0; SET jit_inline_above_cost = 0; SET jit_optimize_above_cost = 0; EXPLAIN (ANALYZE) SELECT count(*) FROM foo; QUERY PLAN ══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════ Finalize Aggregate (cost=58889.84..58889.85 rows=1 width=8) (actual time=400.462..418.214 rows=1 loops=1) -> Gather (cost=58889.62..58889.83 rows=2 width=8) (actual time=400.300..418.190 rows=3 loops=1) Workers Planned: 2 Workers Launched: 2 -> Partial Aggregate (cost=57889.62..57889.64 rows=1 width=8) (actual time=384.876..384.878 rows=1 loops=3) -> Parallel Seq Scan on foo (cost=0.00..52681.30 rows=2083330 width=0) (actual time=0.028..168.510 rows=1666667loops=3) Planning Time: 0.133 ms JIT: Functions: 8 Options: Inlining true, Optimization true, Expressions true, Deforming true Timing: Generation 1.038 ms, Inlining 279.779 ms, Optimization 38.395 ms, Emission 73.105 ms, Total 392.316 ms Execution Time: 478.257 ms Yours, Laurenz Albe
Re: Oom on temp (un-analyzed table caused by JIT) V16.1 [Fixed Already]
From
Daniel Gustafsson
Date:
> On 19 Jan 2024, at 01:50, Kirk Wolak <wolakk@gmail.com> wrote: > I did a little more checking and the reason I did not see the link MIGHT be because EXPLAIN did not show a JIT attempt. > I tried to use settings that FORCE a JIT... But to no avail. Are you sure you are running a JIT enabled server? Did you compile it yourself or install a snapshot? > You don't know a way to force at least the JIT analysis to happen? (because I already knew if JIT was off, the leakwouldn't happen). If you set jit_above_cost=0 then postgres will compile a JIT enabled execution tree. This does bring up an interesting point, I don't think there is a way for a user to know whether the server is jit enabled or not (apart from explaining a query with costs adjusted but that's not all that userfriendly). Maybe we need a way to reliably tell if JIT is active or not. -- Daniel Gustafsson
Hi, On Fri, Jan 19, 2024 at 10:48:12AM +0100, Daniel Gustafsson wrote: > This does bring up an interesting point, I don't think there is a way > for a user to know whether the server is jit enabled or not (apart > from explaining a query with costs adjusted but that's not all that > userfriendly). Maybe we need a way to reliably tell if JIT is active > or not. I thought this is what pg_jit_available() is for? postgres=> SHOW jit; jit ----- on (1 Zeile) postgres=> SELECT pg_jit_available(); pg_jit_available ------------------ f (1 Zeile) Michael
Re: Oom on temp (un-analyzed table caused by JIT) V16.1 [Fixed Already]
From
Daniel Gustafsson
Date:
> On 19 Jan 2024, at 11:04, Michael Banck <mbanck@gmx.net> wrote: > > Hi, > > On Fri, Jan 19, 2024 at 10:48:12AM +0100, Daniel Gustafsson wrote: >> This does bring up an interesting point, I don't think there is a way >> for a user to know whether the server is jit enabled or not (apart >> from explaining a query with costs adjusted but that's not all that >> userfriendly). Maybe we need a way to reliably tell if JIT is active >> or not. > > I thought this is what pg_jit_available() is for? Ah, it is, I completely forgot we had that one. Thanks! -- Daniel Gustafsson
On Fri, Jan 19, 2024 at 4:20 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Thu, 2024-01-18 at 19:50 -0500, Kirk Wolak wrote:
> I did a little more checking and the reason I did not see the link MIGHT be because EXPLAIN did not show a JIT attempt.
> I tried to use settings that FORCE a JIT... But to no avail.
>
> I am now concerned that the problem is more hidden in my use case. Meaning I CANNOT conclude it is fixed.
> But I know of NO WAY to force a JIT (I lowered costs to 1, etc. ).
>
> You don't know a way to force at least the JIT analysis to happen? (because I already knew if JIT was off, the leak wouldn't happen).
If you set the limits to 0, you can trigger it easily:
SET jit = on;
SET jit_above_cost = 0;
SET jit_inline_above_cost = 0;
SET jit_optimize_above_cost = 0;
Okay,
I Did exactly this (just now). But the EXPLAIN does not contain the JIT?
-------------------------------------------------------------------------------
Sort (cost=5458075.88..5477861.00 rows=7914047 width=12)
Sort Key: seid
-> HashAggregate (cost=3910139.62..4280784.00 rows=7914047 width=12)
Group Key: seid, fr_field_name, st_field_name
Planned Partitions: 128
-> Seq Scan on parts (cost=0.00..1923249.00 rows=29850000 width=12)
Filter: ((seid <> 497) AND ((partnum)::text >= '1'::text))
(7 rows)
-------------------------------------------------------------------------------
Sort (cost=5458075.88..5477861.00 rows=7914047 width=12)
Sort Key: seid
-> HashAggregate (cost=3910139.62..4280784.00 rows=7914047 width=12)
Group Key: seid, fr_field_name, st_field_name
Planned Partitions: 128
-> Seq Scan on parts (cost=0.00..1923249.00 rows=29850000 width=12)
Filter: ((seid <> 497) AND ((partnum)::text >= '1'::text))
(7 rows)
From a FUTURE email, I noticed pg_jit_available()
and it's set to f??
Okay, so does this require a special BUILD command?
Okay, so does this require a special BUILD command?
postgres=# select pg_jit_available();
pg_jit_available
------------------
f
(1 row)
postgres=# \dconfig *jit*
List of configuration parameters
Parameter | Value
-------------------------+---------
jit | on
jit_above_cost | 100000
jit_debugging_support | off
jit_dump_bitcode | off
jit_expressions | on
jit_inline_above_cost | 500000
jit_optimize_above_cost | 500000
jit_profiling_support | off
jit_provider | llvmjit
jit_tuple_deforming | on
(10 rows)
pg_jit_available
------------------
f
(1 row)
postgres=# \dconfig *jit*
List of configuration parameters
Parameter | Value
-------------------------+---------
jit | on
jit_above_cost | 100000
jit_debugging_support | off
jit_dump_bitcode | off
jit_expressions | on
jit_inline_above_cost | 500000
jit_optimize_above_cost | 500000
jit_profiling_support | off
jit_provider | llvmjit
jit_tuple_deforming | on
(10 rows)
Re: Oom on temp (un-analyzed table caused by JIT) V16.1 [Fixed Already]
From
Daniel Gustafsson
Date:
> On 19 Jan 2024, at 23:09, Kirk Wolak <wolakk@gmail.com> wrote: > From a FUTURE email, I noticed pg_jit_available() and it's set to f?? Right, then this installation does not contain the necessary library to JIT compile the query. > Okay, so does this require a special BUILD command? Yes, it requires that you compile with --with-llvm. If you don't have llvm in the PATH you might need to set LLVM_CONFIG to point to your llvm-config binary. With autotools that would be something like: ./configure <other params> --with-llvm LLVM_CONFIG=/path/to/llvm-config -- Daniel Gustafsson
On Fri, Jan 19, 2024 at 7:03 PM Daniel Gustafsson <daniel@yesql.se> wrote:
> On 19 Jan 2024, at 23:09, Kirk Wolak <wolakk@gmail.com> wrote:
> From a FUTURE email, I noticed pg_jit_available() and it's set to f??
Right, then this installation does not contain the necessary library to JIT
compile the query.
> Okay, so does this require a special BUILD command?
Yes, it requires that you compile with --with-llvm. If you don't have llvm in
the PATH you might need to set LLVM_CONFIG to point to your llvm-config binary.
With autotools that would be something like:
./configure <other params> --with-llvm LLVM_CONFIG=/path/to/llvm-config
--
Daniel Gustafsson
Thank you, that made it possible to build and run...
UNFORTUNATELY this has a CLEAR memory leak (visible in htop)
I am watching it already consuming 6% of my system memory.
I am re-attaching my script. WHICH includes the settings to FORCE JIT.
It also does an EXPLAIN so you can verify that JIT is on (this is what I added/noticed!)
And it takes over 20 minutes to get this far. It's still running.
I am re-attaching the script. (as I tweaked it).
It also does an EXPLAIN so you can verify that JIT is on (this is what I added/noticed!)
And it takes over 20 minutes to get this far. It's still running.
I am re-attaching the script. (as I tweaked it).
This creates 90 million rows of data, so it takes a while.
I BELIEVE that it consumes far less memory if you do not fetch any rows (I had problems reproducing it if no rows were fetched).
So, this may be beyond the planning stages.
So, this may be beyond the planning stages.
Thanks,
Kirk Out!
Attachment
On Mon, Jan 22, 2024 at 1:30 AM Kirk Wolak <wolakk@gmail.com> wrote:
On Fri, Jan 19, 2024 at 7:03 PM Daniel Gustafsson <daniel@yesql.se> wrote:> On 19 Jan 2024, at 23:09, Kirk Wolak <wolakk@gmail.com> wrote:
...
./configure <other params> --with-llvm LLVM_CONFIG=/path/to/llvm-config
--
Daniel GustafssonThank you, that made it possible to build and run...UNFORTUNATELY this has a CLEAR memory leak (visible in htop)I am watching it already consuming 6% of my system memory.
Daniel,
In the previous email, I made note that once the JIT was enabled, the problem exists in 17Devel.
I re-included my script, which forced the JIT to be used...
I attached an updated script that forced the settings.
But this is still leaking memory (outside of the pg_backend_memory_context() calls).
Probably because it's at the LLVM level? And it does NOT happen from planning/opening the query. It appears I have to fetch the rows to see the problem.
Thanks in advance. Let me know if I should be doing something different?
Kirk Out!
PS: I was wondering if we had a function that showed total memory of the backend. For helping to determine if we might have a 3rd party leak? [increase in total memory consumed not noticed by pg_backend_memory_contexts)
#include "postgres.h"
#include <sys/resource.h>
PG_MODULE_MAGIC;
PG_FUNCTION_INFO_V1(pg_backend_memory_footprint);
Datum pg_backend_memory_footprint(PG_FUNCTION_ARGS) {
long memory_usage_bytes = 0;
struct rusage usage;
getrusage(RUSAGE_SELF, &usage);
memory_usage_bytes = usage.ru_maxrss * 1024;
PG_RETURN_INT64(memory_usage_bytes);
}
I re-included my script, which forced the JIT to be used...
I attached an updated script that forced the settings.
But this is still leaking memory (outside of the pg_backend_memory_context() calls).
Probably because it's at the LLVM level? And it does NOT happen from planning/opening the query. It appears I have to fetch the rows to see the problem.
Thanks in advance. Let me know if I should be doing something different?
Kirk Out!
PS: I was wondering if we had a function that showed total memory of the backend. For helping to determine if we might have a 3rd party leak? [increase in total memory consumed not noticed by pg_backend_memory_contexts)
#include "postgres.h"
#include <sys/resource.h>
PG_MODULE_MAGIC;
PG_FUNCTION_INFO_V1(pg_backend_memory_footprint);
Datum pg_backend_memory_footprint(PG_FUNCTION_ARGS) {
long memory_usage_bytes = 0;
struct rusage usage;
getrusage(RUSAGE_SELF, &usage);
memory_usage_bytes = usage.ru_maxrss * 1024;
PG_RETURN_INT64(memory_usage_bytes);
}
On Thu, Jan 25, 2024 at 8:51 AM Kirk Wolak <wolakk@gmail.com> wrote: > getrusage(RUSAGE_SELF, &usage); > memory_usage_bytes = usage.ru_maxrss * 1024; FWIW log_statement_stats = on shows that in the logs. See ShowUsage() in postgres.c.
On Wed, Jan 24, 2024 at 4:16 PM Thomas Munro <thomas.munro@gmail.com> wrote:
On Thu, Jan 25, 2024 at 8:51 AM Kirk Wolak <wolakk@gmail.com> wrote:
> getrusage(RUSAGE_SELF, &usage);
> memory_usage_bytes = usage.ru_maxrss * 1024;
FWIW log_statement_stats = on shows that in the logs. See ShowUsage()
in postgres.c.
Thank you for this, here is the TERMINAL (Below is the tail of the log). Notice that the pg_backend_memory_contexts does NOT show the memory consumed.
But your logging sure did! (I wonder if I enable logging during planning, but there is like 82,000 cursors being opened... (This removed the FETCH and still leaks)
But your logging sure did! (I wonder if I enable logging during planning, but there is like 82,000 cursors being opened... (This removed the FETCH and still leaks)
7:01:08 kwolak@postgres= # select pg_temp.fx(497);
NOTICE: ("9848 kB","10 MB","638 kB")
NOTICE: -----------after close, Count a: 82636, count b: 82636
NOTICE: ("9997 kB","10 MB","648 kB")
fx
----
(1 row)
Time: 525870.117 ms (08:45.870)
Tail:
! 0.001792 s user, 0.000000 s system, 0.005349 s elapsed
! [560.535969 s user, 31.441656 s system total]
! 185300 kB max resident size
! 232/0 [29219648/54937864] filesystem blocks in/out
! 0/25 [0/1016519] page faults/reclaims, 0 [0] swaps
! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
! 10/1 [62671/9660] voluntary/involuntary context switches
2024-01-24 17:01:08.752 EST [28804] STATEMENT: explain SELECT DISTINCT seid, fr_field_name, st_field_name
FROM pg_temp.parts
WHERE seid <> 497 AND partnum >= '1'
ORDER BY seid;
2024-01-24 17:01:08.759 EST [28804] LOG: QUERY STATISTICS
2024-01-24 17:01:08.759 EST [28804] DETAIL: ! system usage stats:
! 0.006207 s user, 0.000092 s system, 0.006306 s elapsed
! [560.542262 s user, 31.441748 s system total]
! 185300 kB max resident size
! 0/0 [29219648/54937864] filesystem blocks in/out
! 0/4 [0/1016523] page faults/reclaims, 0 [0] swaps
! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
! 0/1 [62672/9661] voluntary/involuntary context switches
2024-01-24 17:01:08.759 EST [28804] STATEMENT: SELECT 'pg_temp.fx(497); -- Not run, do \dt+ parts';
2024-01-24 17:04:30.844 EST [28746] LOG: checkpoint starting: time
2024-01-24 17:04:32.931 EST [28746] LOG: checkpoint complete: wrote 21 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=2.008 s, sync=0.006 s, total=2.087 s; sync files=15, longest=0.001 s, average=0.001 s; distance=98 kB, estimate=134 kB; lsn=0/16304D8, redo lsn=0/1630480
2024-01-24 17:11:06.350 EST [28804] LOG: QUERY STATISTICS
2024-01-24 17:11:06.350 EST [28804] DETAIL: ! system usage stats:
! 515.952870 s user, 6.688389 s system, 525.869933 s elapsed
! [1076.495280 s user, 38.130145 s system total]
! 708104 kB max resident size
! 370000/3840 [29589648/54941712] filesystem blocks in/out
! 0/327338 [0/1343861] page faults/reclaims, 0 [0] swaps
! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
! 22001/5216 [84675/14878] voluntary/involuntary context switches
2024-01-24 17:11:06.350 EST [28804] STATEMENT: select pg_temp.fx(497);
2024-01-24 17:12:16.162 EST [28804] LOG: QUERY STATISTICS
2024-01-24 17:12:16.162 EST [28804] DETAIL: ! system usage stats:
! 1.130029 s user, 0.007727 s system, 1.157486 s elapsed
! [1077.625396 s user, 38.137921 s system total]
! 708104 kB max resident size
! 992/0 [29590640/54941720] filesystem blocks in/out
! 3/41 [3/1343902] page faults/reclaims, 0 [0] swaps
! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
! 9/68 [84685/14946] voluntary/involuntary context switches
2024-01-24 17:12:16.162 EST [28804] STATEMENT: select now();
2024-01-24 17:12:30.944 EST [28804] LOG: QUERY STATISTICS
2024-01-24 17:12:30.944 EST [28804] DETAIL: ! system usage stats:
! 0.004561 s user, 0.000019 s system, 0.004580 s elapsed
! [1077.630064 s user, 38.137944 s system total]
! 708104 kB max resident size
! 0/0 [29590640/54941728] filesystem blocks in/out
! 0/4 [3/1343906] page faults/reclaims, 0 [0] swaps
! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
! 0/0 [84686/14947] voluntary/involuntary context switches
2024-01-24 17:12:30.944 EST [28804] STATEMENT: select now();
Hi, On Wed, Jan 24, 2024 at 02:50:52PM -0500, Kirk Wolak wrote: > On Mon, Jan 22, 2024 at 1:30 AM Kirk Wolak <wolakk@gmail.com> wrote: > > On Fri, Jan 19, 2024 at 7:03 PM Daniel Gustafsson <daniel@yesql.se> wrote: > >> > On 19 Jan 2024, at 23:09, Kirk Wolak <wolakk@gmail.com> wrote: > > Thank you, that made it possible to build and run... > > UNFORTUNATELY this has a CLEAR memory leak (visible in htop) > > I am watching it already consuming 6% of my system memory. > > > Daniel, > In the previous email, I made note that once the JIT was enabled, the > problem exists in 17Devel. > I re-included my script, which forced the JIT to be used... > > I attached an updated script that forced the settings. > But this is still leaking memory (outside of the > pg_backend_memory_context() calls). > Probably because it's at the LLVM level? And it does NOT happen from > planning/opening the query. It appears I have to fetch the rows to > see the problem. I had a look at this (and blogged about it here[1]) and was also wondering what was going on with 17devel and the recent back-branch releases, cause I could also reproduce those continuing memory leaks. Adding some debug logging when llvm_inline_reset_caches() is called solves the mystery: as you are calling a function, the fix that is in 17devel and the back-branch releases is not applicable and only after the function returns llvm_inline_reset_caches() is being called (as llvm_jit_context_in_use_count is greater than zero, presumably, so it never reaches the call-site of llvm_inline_reset_caches()). If you instead run your SQL in a DO-loop (as in the blog post) and not as a PL/PgSQL function, you should see that it no longer leaks. This might be obvious to some (and Andres mentioned it in https://www.postgresql.org/message-id/20210421002056.gjd6rpe6toumiqd6%40alap3.anarazel.de) but it took me a while to figure out/find. Michael [1] https://www.credativ.de/en/blog/postgresql-en/quick-benchmark-postgresql-2024q1-release-performance-improvements/
On Thu, Feb 22, 2024 at 4:49 PM Michael Banck <mbanck@gmx.net> wrote:
Hi,
On Wed, Jan 24, 2024 at 02:50:52PM -0500, Kirk Wolak wrote:
> On Mon, Jan 22, 2024 at 1:30 AM Kirk Wolak <wolakk@gmail.com> wrote:
> > On Fri, Jan 19, 2024 at 7:03 PM Daniel Gustafsson <daniel@yesql.se> wrote:
> >> > On 19 Jan 2024, at 23:09, Kirk Wolak <wolakk@gmail.com> wrote:
> > Thank you, that made it possible to build and run...
> > UNFORTUNATELY this has a CLEAR memory leak (visible in htop)
> > I am watching it already consuming 6% of my system memory.
...
I had a look at this (and blogged about it here[1]) and was also
wondering what was going on with 17devel and the recent back-branch
releases, cause I could also reproduce those continuing memory leaks.
Adding some debug logging when llvm_inline_reset_caches() is called
solves the mystery: as you are calling a function, the fix that is in
17devel and the back-branch releases is not applicable and only after
the function returns llvm_inline_reset_caches() is being called (as
llvm_jit_context_in_use_count is greater than zero, presumably, so it
never reaches the call-site of llvm_inline_reset_caches()).
If you instead run your SQL in a DO-loop (as in the blog post) and not
as a PL/PgSQL function, you should see that it no longer leaks. This
might be obvious to some (and Andres mentioned it in
https://www.postgresql.org/message-id/20210421002056.gjd6rpe6toumiqd6%40alap3.anarazel.de)
but it took me a while to figure out/find.
Thanks for confirming. Inside a routine is required. But disabling JIT was good enough for us.
Curious if there was another way to end up calling the cleanup? But it had that "feel" that the context of the cleanup was
being lost between iterations of the loop?