Re: Use BumpContext contexts for TupleHashTables' tablecxt - Mailing list pgsql-hackers

From Chao Li
Subject Re: Use BumpContext contexts for TupleHashTables' tablecxt
Date
Msg-id 9046053E-6FDA-4180-AF65-BDB5B5BE5DD7@gmail.com
Whole thread Raw
In response to Use BumpContext contexts for TupleHashTables' tablecxt  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers

> On Oct 27, 2025, at 04:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
>
> I looked at the callers of BuildTupleHashTable, and realized that
> (a) every one of them can use a BumpContext for the "tablecxt",
> and (b) Jeff Davis already noticed that for nodeAgg.c, in commit
> cc721c459.  So we have precedent for the idea working.  Here's
> a fleshed-out patch to fix the remaining callers.
>
> regards, tom lane
>
> From 15ef2e50085ac83728cb9189e482964ff02e5aae Mon Sep 17 00:00:00 2001
> From: Tom Lane <tgl@sss.pgh.pa.us>
> Date: Sun, 26 Oct 2025 16:46:57 -0400
> Subject: [PATCH v1] Use BumpContext contexts for TupleHashTables' tablecxt.
>
> execGrouping.c itself does nothing with this context except to
> allocate new hash entries in it, and the callers do nothing with it
> except to reset the whole context.  So this is an ideal use-case for
> a BumpContext, and the hash tables are frequently big enough for the
> savings to be significant.
>

I just did a test. My test procedure is like:

1. Set the following options in postgrs.conf:
```
work_mem = '512MB’
max_parallel_workers = 0
shared_buffers = '1GB’
log_statement_stats = on
```

2. Prepare for some data
```
CREATE TABLE t1 AS
  SELECT g AS id, md5(g::text) AS txt
  FROM generate_series(1, 5e6) g;

CREATE TABLE t2 AS
  SELECT g AS id, md5(g::text) AS txt
  FROM generate_series(1, 5e6) g
  WHERE g % 2 = 0;

VACUUM FREEZE t1;
VACUUM FREEZE t2;
```

3. Run
```
\timing on
EXPLAIN (ANALYZE, BUFFERS)
  SELECT * FROM t1
  INTERSECT
  SELECT * FROM t2;
```

Here is the test result:

Without the patch:

- stats:
```
2025-10-27 11:04:43.145 CST [88599] LOG:  QUERY STATISTICS
2025-10-27 11:04:43.145 CST [88599] DETAIL:  ! system usage stats:
    !    1.609178 s user, 0.051652 s system, 1.661308 s elapsed
    !    [1.612991 s user, 0.056877 s system total]
    !    601392 kB max resident size
    !    0/0 [0/0] filesystem blocks in/out
    !    1/37016 [1/37621] page faults/reclaims, 0 [0] swaps
    !    0 [0] signals rcvd, 0/0 [2/1] messages rcvd/sent
    !    0/18 [0/44] voluntary/involuntary context switches
```

- SQL execution:
```
evantest=# EXPLAIN (ANALYZE, BUFFERS)
                                                SELECT * FROM t1
                                                                                                INTERSECT

                       SELECT * FROM t2; 
                                                           QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------
 HashSetOp Intersect  (cost=174993.73..181243.73 rows=2500000 width=39) (actual time=1500.129..1565.293 rows=2500000.00
loops=1)
   Buffers: shared read=62501
   ->  Seq Scan on t2  (cost=0.00..45834.00 rows=2500000 width=39) (actual time=0.323..68.214 rows=2500000.00 loops=1)
         Buffers: shared read=20834
   ->  Seq Scan on t1  (cost=0.00..91662.15 rows=4999515 width=39) (actual time=0.077..113.602 rows=5000000.00 loops=1)
         Buffers: shared read=41667
 Planning:
   Buffers: shared hit=75 read=25
 Planning Time: 1.800 ms
 Execution Time: 1655.722 ms
(10 rows)

Time: 1661.561 ms (00:01.662)
```

With the patch:

- stats:
```
2025-10-27 11:02:14.656 CST [87387] LOG:  QUERY STATISTICS
2025-10-27 11:02:14.656 CST [87387] DETAIL:  ! system usage stats:
    !    1.625830 s user, 0.046199 s system, 1.672351 s elapsed
    !    [1.630466 s user, 0.052191 s system total]
    !    487264 kB max resident size
    !    0/0 [0/0] filesystem blocks in/out
    !    1/29891 [1/30489] page faults/reclaims, 0 [0] swaps
    !    0 [0] signals rcvd, 0/0 [2/1] messages rcvd/sent
    !    0/11 [0/36] voluntary/involuntary context switches
```

- SQL execution:
```
                                                           QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------
 HashSetOp Intersect  (cost=174993.73..181243.73 rows=2500000 width=39) (actual time=1471.164..1602.767 rows=2500000.00
loops=1)
   Buffers: shared read=62501
   ->  Seq Scan on t2  (cost=0.00..45834.00 rows=2500000 width=39) (actual time=0.296..67.955 rows=2500000.00 loops=1)
         Buffers: shared read=20834
   ->  Seq Scan on t1  (cost=0.00..91662.15 rows=4999515 width=39) (actual time=0.088..114.052 rows=5000000.00 loops=1)
         Buffers: shared read=41667
 Planning:
   Buffers: shared hit=75 read=25
 Planning Time: 1.875 ms
 Execution Time: 1666.881 ms
(10 rows)

Time: 1672.574 ms (00:01.673)
```

Looks like:

* No obvious execution time improvement
* Max resident size reduced from 600MB to 487MB, ~19% reduction
* Page reclaims dropped from 37k -> 30k, ~19% reduction

I ran the test several rounds, and the results are consistent: roughly 19% memory usage reduction.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/







pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: display hot standby state in psql prompt
Next
From: Shinya Kato
Date:
Subject: Re: Enhance statistics reset functions to return reset timestamp