回复:BUG #19040: Memory leak in hashed subplan node due to missing hashtempcxt reset - Mailing list pgsql-bugs

From 李海洋(陌痕)
Subject 回复:BUG #19040: Memory leak in hashed subplan node due to missing hashtempcxt reset
Date
Msg-id b748477c-6778-4339-ba3d-ec815a7f49af.mohen.lhy@alibaba-inc.com
Whole thread Raw
In response to BUG #19040: Memory leak in hashed subplan node due to missing hashtempcxt reset  (PG Bug reporting form <noreply@postgresql.org>)
List pgsql-bugs
I've attached 'v01_fix_memory_leak_in_hashed_subplan_node.patch' to address this.

--
Thanks,
Haiyang Li


------------------------------------------------------------------
发件人:PG Bug reporting form <noreply@postgresql.org>
发送时间:2025年9月3日(周三) 01:27
收件人:"pgsql-bugs"<pgsql-bugs@lists.postgresql.org>
抄 送:"李海洋(陌痕)"<mohen.lhy@alibaba-inc.com>
主 题:BUG #19040: Memory leak in hashed subplan node due to missing hashtempcxt reset

The following bug has been logged on the website:

Bug reference:      19040
Logged by:          haiyang li
Email address:      mohen.lhy@alibaba-inc.com
PostgreSQL version: 18beta3
Operating system:   centos7 5.10.84 x86_64
Description:       

Hello, all!

I found a query which consumes a lot of memory and triggers OOM killer.
Memory leak occurs in hashed subplan node.

I was able to create reproducible test case on machine with default config
and postgresql 18beta3:

CREATE TABLE test1(
    a numeric,
    b int);
INSERT INTO
    test1
SELECT
    i,
    i
FROM
    generate_series(1, 30000000) i; -- Make the running time longer
EXPLAIN ANALYZE SELECT
    *
FROM
    test1
WHERE
    a NOT IN(
        SELECT
            i
        FROM
            generate_series(1, 10000) i
);

plan:
                                                            QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on test1  (cost=125.00..612432.24 rows=15000108 width=10) (actual
time=135.191..25832.808 rows=29990000 loops=1)
  Filter: (NOT (hashed SubPlan 1))
  Rows Removed by Filter: 10000
  SubPlan 1
    ->  Function Scan on generate_series i  (cost=0.00..100.00 rows=10000
width=4) (actual time=36.999..38.296 rows=10000 loops=1)
Planning Time: 0.280 ms
JIT:
  Functions: 15
  Options: Inlining true, Optimization true, Expressions true, Deforming
true
  Timing: Generation 1.155 ms, Inlining 25.929 ms, Optimization 60.700 ms,
Emission 23.018 ms, Total 110.802 ms
Execution Time: 28217.026 ms
(11 rows)


I observed that the process's RES (resident memory) was increasing rapidly
during SQL execution by using 'top -p <pid>' command.

Furthermore, during SQL execution, I ran 'select
pg_log_backend_memory_contexts(<pid>)'
to print memory context statistics. The context with abnormally high memory
usage was
"Subplan HashTable Temp Context." The key part of the log is as follows:

...
LOG:  level: 5; Subplan HashTable Temp Context: 514834432 total in 62849
blocks; 973712 free (60695 chunks); 513860720 used
LOG:  level: 5; Subplan HashTable Context: 524288 total in 7 blocks; 123808
free (5 chunks); 400480 used
...
Grand total: 518275344 bytes in 63116 blocks; 2025560 free (60976 chunks);
516249784 used
...

If I change the SQL from "a NOT IN" to "b NOT IN" and do the same action, I
can not
observe abnormally high memory usage. Likewise, the key part of the log is
as follows:
...
LOG:  level: 5; Subplan HashTable Temp Context: 1024 total in 1 blocks; 784
free (0 chunks); 240 used
LOG:  level: 5; Subplan HashTable Context: 524288 total in 7 blocks; 123808
free (5 chunks); 400480 used
...
Grand total: 3441936 bytes in 268 blocks; 1050520 free (281 chunks); 2391416
used
...

While analyzing the source code, I found that the hashed subplan node fails
to reset
the 'hashtempcxt' context after probing the hash table for each slot.

When variable-length datatypes (e.g., numeric) are processed, this can
trigger calls
to 'detoast_attr', which allocate memory in hashtempcxt. Without a reset,
this memory
is not reclaimed until the context itself is destroyed, resulting in a
memory leak
when processing large numbers of slots.

A patch implementing this fix will be included in the follow-up email.

--
Thanks,
Haiyang Li


Attachment

pgsql-bugs by date:

Previous
From: PG Bug reporting form
Date:
Subject: BUG #19040: Memory leak in hashed subplan node due to missing hashtempcxt reset
Next
From: Richard Guo
Date:
Subject: Re: BUG #19037: Planner fails on estimating array length with "no relation entry" error