Home > mailing lists

Re: Fix overflow of nbatch - Mailing list pgsql-hackers

From	Chao Li
Subject	Re: Fix overflow of nbatch
Date	September 23 04:41:40
Msg-id	B9E29213-1A66-4969-B637-83F18AE1CB76@gmail.com Whole thread Raw
In response to	Re: Fix overflow of nbatch (David Rowley <dgrowleyml@gmail.com>)
Responses	Re: Fix overflow of nbatch
List	pgsql-hackers

Tree view

On Sep 23, 2025, at 07:35, David Rowley <dgrowleyml@gmail.com> wrote:

On Tue, 23 Sept 2025 at 11:21, Chao Li <li.evan.chao@gmail.com> wrote:
I guess that because earlier in the function, nbatch is always clamped with:

nbatch = pg_nextpower2_32(Max(2, minbatch));

I don't follow which part of that line could be constituted as
clamping. Maybe you've confused Max with Min?

David

Sorry for the misleading. I actually meant “minbatch”.

I remember I ever traced the function several times. First, with a normal (not much data involved) query,

if (inner_rel_bytes + bucket_bytes > hash_table_bytes)
{

Is hard to meet, then nbatch will be just 1.

With big data involved, it will enter the “if” clause, but minbatch is also hard to go very high.

To clarify, I just created a test:

```

evantest=# SET enable_nestloop = off;

SET

evantest=# SET enable_mergejoin = off;

SET

evantest=# SET enable_hashjoin = on;

SET

evantest=# CREATE TEMP TABLE inner_tbl AS

evantest-# SELECT g AS id, repeat('x', 2000) AS filler

evantest-# FROM generate_series(1, 200000) g;

SELECT 200000

evantest=# CREATE TEMP TABLE outer_tbl AS

evantest-# SELECT g AS id FROM generate_series(1, 1000000) g;

SELECT 1000000

evantest=#

evantest=# EXPLAIN ANALYZE

evantest-# SELECT *

evantest-# FROM outer_tbl o

evantest-# JOIN inner_tbl i

evantest-# ON o.id = i.id;

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------

Hash Join (cost=34647.00..1270978635.52 rows=36306020352 width=40) (actual time=353.908..1355.735 rows=200000.00 loops=1)

Hash Cond: (i.id = o.id)

Buffers: local read=54528 dirtied=54425 written=54418, temp read=45853 written=45853

-> Seq Scan on inner_tbl i (cost=0.00..113608.96 rows=6356096 width=36) (actual time=1.132..460.711 rows=200000.00 loops=1)

Buffers: local read=50048 dirtied=50000 written=49993

-> Hash (cost=15904.00..15904.00 rows=1142400 width=4) (actual time=351.280..351.282 rows=1000000.00 loops=1)

Buckets: 262144 Batches: 8 Memory Usage: 6446kB

Buffers: local read=4480 dirtied=4425 written=4425, temp written=2560

-> Seq Scan on outer_tbl o (cost=0.00..15904.00 rows=1142400 width=4) (actual time=0.760..162.229 rows=1000000.00 loops=1)

Buffers: local read=4480 dirtied=4425 written=4425

Planning:

Buffers: shared hit=14

Planning Time: 389649.420 ms

Execution Time: 1362.392 ms

(14 rows)

```

In this test, minbatch is just 64.

But I agree, I did never test with large amount of data. I don’t actually know how much data can make nbatch to reach to ~130K (the value will lead to overflow if nbatch is of int type).

Best regards,

Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

pgsql-hackers by date:

From: David Rowley
Date: 23 September, 04:20:13
Subject: Re: Fix overflow of nbatch

From: Peter Smith
Date: 23 September, 05:54:13
Subject: Re: [WIP]Vertical Clustered Index (columnar store extension) - take2

Re: Fix overflow of nbatch - Mailing list pgsql-hackers

Previous

Next