A potential memory leak on Merge Join when Sort node is not below Materialize node - Mailing list pgsql-hackers
From | Önder Kalacı |
---|---|
Subject | A potential memory leak on Merge Join when Sort node is not below Materialize node |
Date | |
Msg-id | CACawEhVxe0ufR26UcqtU7GYGRuubq3p6ZWPGXL4cxy_uexpAAQ@mail.gmail.com Whole thread Raw |
Responses |
Re: A potential memory leak on Merge Join when Sort node is not below Materialize node
|
List | pgsql-hackers |
Hi hackers,
With PG 15 (rc1 or beta4), I'm observing an interesting memory pattern. I have not seen a similar discussion on the mailing list. If I missed that, please refer me there. The problem that I'm going to explain does not happen on PG 13/14.
It seems like there is a memory leak(?) with $title. Still, not sure about what is going on and, thought it'd be useful to share at least my initial investigation.
After running the query and waiting a few minutes (see steps to repro below), use pg_log_backend_memory_contexts() to get the contexts of the backend executing the command. See that it goes beyond 100GB. And depending on vm.overcommit_memory, you get an OOM error or OOM crash eventually.
```
2022-09-28 17:33:38.155 CEST [32224] LOG: level: 2; PortalContext: 1024 total in 1 blocks; 592 free (0 chunks); 432 used: <unnamed>
2022-09-28 17:33:38.159 CEST [32224] LOG: level: 3; ExecutorState: 114923929600 total in 13710 blocks; 7783264 free (3 chunks); 114916146336 used
2022-09-28 17:33:38.159 CEST [32224] LOG: level: 4; TupleSort main: 8192 total in 1 blocks; 3928 free (0 chunks); 4264 used
2022-09-28 17:33:38.159 CEST [32224] LOG: level: 5; TupleSort sort: 295096 total in 8 blocks; 256952 free (67 chunks); 38144 used
2022-09-28 17:33:38.159 CEST [32224] LOG: level: 6; Caller tuples: 8192 total in 1 blocks (0 chunks); 7992 free (0 chunks); 200 used
2022-09-28 17:33:38.159 CEST [32224] LOG: level: 4; TupleSort main: 8192 total in 1 blocks; 3928 free (0 chunks); 4264 used
2022-09-28 17:33:38.159 CEST [32224] LOG: level: 5; TupleSort sort: 4309736 total in 18 blocks; 263864 free (59 chunks); 4045872 used
2022-09-28 17:33:38.159 CEST [32224] LOG: level: 6; Caller tuples: 8192 total in 1 blocks (0 chunks); 7992 free (0 chunks); 200 used
...
2022-09-28 17:33:38.160 CEST [32224] LOG: Grand total: 114930446784 bytes in 13972 blocks; 8802248 free (275 chunks); 114921644536 used
```
I observed this with a merge join involving a table and set returning function. To simulate the problem with two tables, I have the following steps:
```
CREATE TABLE t1 (a text);
CREATE TABLE t2 (a text);
-- make the text a little large by adding 100000000000
INSERT INTO t1 SELECT (100000000000+i%1000)::text FROM generate_series(0,10000000) i;
-- make the text a little large by adding 100000000000
INSERT INTO t2 SELECT (100000000000+i%10000)::text FROM generate_series(0,10000000) i;
-- to simplify the explain plan, not strictly necessary
SET max_parallel_workers_per_gather TO 0;
-- these two are necessary so that the problem is triggered
-- these are helpful to use Merge join and avoid materialization
SET enable_hashjoin TO false;
SET enable_material TO false;
-- the join is on a TEXT column
-- when the join is on INT column with a similar setup, I do not observe this problem
SELECT count(*) FROM t1 JOIN t2 USING (a);
```
The explain output for the query like the following:
```
explain SELECT count(*) FROM t1 JOIN t2 USING (a);
┌─────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN │
├─────────────────────────────────────────────────────────────────────────────────┤
│ Aggregate (cost=177735283.36..177735283.37 rows=1 width=8) │
│ -> Merge Join (cost=2556923.81..152703372.24 rows=10012764448 width=0) │
│ Merge Cond: (t1.a = t2.a) │
│ -> Sort (cost=1658556.19..1683556.63 rows=10000175 width=13) │
│ Sort Key: t1.a │
│ -> Seq Scan on t1 (cost=0.00..154056.75 rows=10000175 width=13) │
│ -> Sort (cost=1658507.28..1683506.93 rows=9999861 width=13) │
│ Sort Key: t2.a │
│ -> Seq Scan on t2 (cost=0.00..154053.61 rows=9999861 width=13) │
└─────────────────────────────────────────────────────────────────────────────────┘
(9 rows)
```
With PG 15 (rc1 or beta4), I'm observing an interesting memory pattern. I have not seen a similar discussion on the mailing list. If I missed that, please refer me there. The problem that I'm going to explain does not happen on PG 13/14.
It seems like there is a memory leak(?) with $title. Still, not sure about what is going on and, thought it'd be useful to share at least my initial investigation.
After running the query and waiting a few minutes (see steps to repro below), use pg_log_backend_memory_contexts() to get the contexts of the backend executing the command. See that it goes beyond 100GB. And depending on vm.overcommit_memory, you get an OOM error or OOM crash eventually.
```
2022-09-28 17:33:38.155 CEST [32224] LOG: level: 2; PortalContext: 1024 total in 1 blocks; 592 free (0 chunks); 432 used: <unnamed>
2022-09-28 17:33:38.159 CEST [32224] LOG: level: 3; ExecutorState: 114923929600 total in 13710 blocks; 7783264 free (3 chunks); 114916146336 used
2022-09-28 17:33:38.159 CEST [32224] LOG: level: 4; TupleSort main: 8192 total in 1 blocks; 3928 free (0 chunks); 4264 used
2022-09-28 17:33:38.159 CEST [32224] LOG: level: 5; TupleSort sort: 295096 total in 8 blocks; 256952 free (67 chunks); 38144 used
2022-09-28 17:33:38.159 CEST [32224] LOG: level: 6; Caller tuples: 8192 total in 1 blocks (0 chunks); 7992 free (0 chunks); 200 used
2022-09-28 17:33:38.159 CEST [32224] LOG: level: 4; TupleSort main: 8192 total in 1 blocks; 3928 free (0 chunks); 4264 used
2022-09-28 17:33:38.159 CEST [32224] LOG: level: 5; TupleSort sort: 4309736 total in 18 blocks; 263864 free (59 chunks); 4045872 used
2022-09-28 17:33:38.159 CEST [32224] LOG: level: 6; Caller tuples: 8192 total in 1 blocks (0 chunks); 7992 free (0 chunks); 200 used
...
2022-09-28 17:33:38.160 CEST [32224] LOG: Grand total: 114930446784 bytes in 13972 blocks; 8802248 free (275 chunks); 114921644536 used
```
I observed this with a merge join involving a table and set returning function. To simulate the problem with two tables, I have the following steps:
```
CREATE TABLE t1 (a text);
CREATE TABLE t2 (a text);
-- make the text a little large by adding 100000000000
INSERT INTO t1 SELECT (100000000000+i%1000)::text FROM generate_series(0,10000000) i;
-- make the text a little large by adding 100000000000
INSERT INTO t2 SELECT (100000000000+i%10000)::text FROM generate_series(0,10000000) i;
-- to simplify the explain plan, not strictly necessary
SET max_parallel_workers_per_gather TO 0;
-- these two are necessary so that the problem is triggered
-- these are helpful to use Merge join and avoid materialization
SET enable_hashjoin TO false;
SET enable_material TO false;
-- the join is on a TEXT column
-- when the join is on INT column with a similar setup, I do not observe this problem
SELECT count(*) FROM t1 JOIN t2 USING (a);
```
The explain output for the query like the following:
```
explain SELECT count(*) FROM t1 JOIN t2 USING (a);
┌─────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN │
├─────────────────────────────────────────────────────────────────────────────────┤
│ Aggregate (cost=177735283.36..177735283.37 rows=1 width=8) │
│ -> Merge Join (cost=2556923.81..152703372.24 rows=10012764448 width=0) │
│ Merge Cond: (t1.a = t2.a) │
│ -> Sort (cost=1658556.19..1683556.63 rows=10000175 width=13) │
│ Sort Key: t1.a │
│ -> Seq Scan on t1 (cost=0.00..154056.75 rows=10000175 width=13) │
│ -> Sort (cost=1658507.28..1683506.93 rows=9999861 width=13) │
│ Sort Key: t2.a │
│ -> Seq Scan on t2 (cost=0.00..154053.61 rows=9999861 width=13) │
└─────────────────────────────────────────────────────────────────────────────────┘
(9 rows)
```
In the end, my investigation mostly got me to the following palloc(), where we seem to allocate memory over and over again as memory grows:
```
(gdb) bt
#0 __GI___libc_malloc (bytes=bytes@entry=8388608) at malloc.c:3038
#1 0x00005589f3c55444 in AllocSetAlloc (context=0x5589f4896300, size=14) at aset.c:920
#2 0x00005589f3c5d763 in palloc (size=size@entry=14) at mcxt.c:1082
#3 0x00005589f3b1f553 in datumCopy (value=94051002161216, typByVal=typByVal@entry=false,
typLen=<optimized out>) at datum.c:162
#4 0x00005589f3c6ed0b in tuplesort_getdatum (state=state@entry=0x5589f49274e0,
forward=forward@entry=true, val=0x5589f48d7860, isNull=0x5589f48d7868, abbrev=abbrev@entry=0x0)
at tuplesort.c:2675
#5 0x00005589f3947925 in ExecSort (pstate=0x5589f48d0a38) at nodeSort.c:200
#6 0x00005589f393d74c in ExecProcNode (node=0x5589f48d0a38)
at ../../../src/include/executor/executor.h:259
#7 ExecMergeJoin (pstate=0x5589f4896cc8) at nodeMergejoin.c:871
#8 0x00005589f391fbc8 in ExecProcNode (node=0x5589f4896cc8)
at ../../../src/include/executor/executor.h:259
#9 fetch_input_tuple (aggstate=aggstate@entry=0x5589f4896670) at nodeAgg.c:563
#10 0x00005589f3923742 in agg_retrieve_direct (aggstate=aggstate@entry=0x5589f4896670)
at nodeAgg.c:2441
....
```
(gdb) bt
#0 __GI___libc_malloc (bytes=bytes@entry=8388608) at malloc.c:3038
#1 0x00005589f3c55444 in AllocSetAlloc (context=0x5589f4896300, size=14) at aset.c:920
#2 0x00005589f3c5d763 in palloc (size=size@entry=14) at mcxt.c:1082
#3 0x00005589f3b1f553 in datumCopy (value=94051002161216, typByVal=typByVal@entry=false,
typLen=<optimized out>) at datum.c:162
#4 0x00005589f3c6ed0b in tuplesort_getdatum (state=state@entry=0x5589f49274e0,
forward=forward@entry=true, val=0x5589f48d7860, isNull=0x5589f48d7868, abbrev=abbrev@entry=0x0)
at tuplesort.c:2675
#5 0x00005589f3947925 in ExecSort (pstate=0x5589f48d0a38) at nodeSort.c:200
#6 0x00005589f393d74c in ExecProcNode (node=0x5589f48d0a38)
at ../../../src/include/executor/executor.h:259
#7 ExecMergeJoin (pstate=0x5589f4896cc8) at nodeMergejoin.c:871
#8 0x00005589f391fbc8 in ExecProcNode (node=0x5589f4896cc8)
at ../../../src/include/executor/executor.h:259
#9 fetch_input_tuple (aggstate=aggstate@entry=0x5589f4896670) at nodeAgg.c:563
#10 0x00005589f3923742 in agg_retrieve_direct (aggstate=aggstate@entry=0x5589f4896670)
at nodeAgg.c:2441
....
```
Could this be a bug, or am I missing anything?
Thanks,
Onder KALACI
pgsql-hackers by date: