Thread: pgsql: Improve accounting for memory used by shared hash tables

pgsql: Improve accounting for memory used by shared hash tables

From
Tomas Vondra
Date:
Improve accounting for memory used by shared hash tables

pg_shmem_allocations tracks the memory allocated by ShmemInitStruct(),
but for shared hash tables that covered only the header and hash
directory.  The remaining parts (segments and buckets) were allocated
later using ShmemAlloc(), which does not update the shmem accounting.
Thus, these allocations were not shown in pg_shmem_allocations.

This commit improves the situation by allocating all the hash table
parts at once, using a single ShmemInitStruct() call. This way the
ShmemIndex entries (and thus pg_shmem_allocations) better reflect the
proper size of the hash table.

This affects allocations for private (non-shared) hash tables too, as
the hash_create() code is shared. For non-shared tables this however
makes no practical difference.

This changes the alignment a bit. ShmemAlloc() aligns the chunks using
CACHELINEALIGN(), which means some parts (header, directory, segments)
were aligned this way. Allocating all parts as a single chunk removes
this (implicit) alignment. We've considered adding explicit alignment,
but we've decided not to - it seems to be merely a coincidence due to
using the ShmemAlloc() API, not due to necessity.

Author: Rahila Syed <rahilasyed90@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Discussion: https://postgr.es/m/CAH2L28vHzRankszhqz7deXURxKncxfirnuW68zD7+hVAqaS5GQ@mail.gmail.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/f5930f9a98ea65d659d41600a138e608988ad122

Modified Files
--------------
src/backend/storage/ipc/shmem.c   |   4 +-
src/backend/utils/hash/dynahash.c | 281 +++++++++++++++++++++++++++++---------
src/include/utils/hsearch.h       |   3 +-
3 files changed, 222 insertions(+), 66 deletions(-)


Re: pgsql: Improve accounting for memory used by shared hash tables

From
David Rowley
Date:
On Thu, 3 Apr 2025 at 04:16, Tomas Vondra <tomas.vondra@postgresql.org> wrote:
> Improve accounting for memory used by shared hash tables

I've not looked into why, but this is causing an issue in the
join_rel_hash during add_join_rel(). See the attached script.

ERROR:  pfree called with invalid pointer 0x60a15edc44e0 (header
0x0000002000000008)

bisecting found this commit.

#0  BogusFree (pointer=0x5b36cd85a5e8) at ../src/backend/utils/mmgr/mcxt.c:288
        __func__ = "BogusFree"
#1  0x00005b36c3d70e74 in dir_realloc (hashp=0x5b36cd4c0aa0) at
../src/backend/utils/hash/dynahash.c:1787
        new_dirsize = <optimized out>
        p = 0x5b36cd4c0b30
        old_p = 0x5b36cd85a5e8
        new_dsize = <optimized out>
        old_dirsize = <optimized out>
        p = <optimized out>
        old_p = <optimized out>
        new_dsize = <optimized out>
        old_dirsize = <optimized out>
        new_dirsize = <optimized out>
        _vstart = <optimized out>
        _val = <optimized out>
        _len = <optimized out>
        _start = <optimized out>
        _stop = <optimized out>
#2  expand_table (hashp=0x5b36cd4c0aa0) at
../src/backend/utils/hash/dynahash.c:1691
        hctl = 0x5b36cd85a298
        old_bucket = <optimized out>
        newlink = <optimized out>
        nextElement = <optimized out>
        old_seg = <optimized out>
        new_bucket = 65536
        new_segnum = 256
        new_segndx = 0
        currElement = <optimized out>
        new_seg = <optimized out>
        old_segnum = <optimized out>
        old_segndx = <optimized out>
        oldlink = <optimized out>
        hctl = <optimized out>
        old_seg = <optimized out>
        new_seg = <optimized out>
        old_bucket = <optimized out>
        new_bucket = <optimized out>
        new_segnum = <optimized out>
        new_segndx = <optimized out>
        old_segnum = <optimized out>
        old_segndx = <optimized out>
        oldlink = <optimized out>
        newlink = <optimized out>
        currElement = <optimized out>
        nextElement = <optimized out>
#3  hash_search_with_hash_value (hashp=0x5b36cd4c0aa0,
keyPtr=0x5b3716ccc6f0, hashvalue=2952019273, action=<optimized out>,
foundPtr=<optimized out>) at ../src/backend/utils/hash/dynahash.c:1112
        hctl = 0x5b36cd85a298
        freelist_idx = 0
        keysize = <optimized out>
        currBucket = <optimized out>
        prevBucketPtr = <optimized out>
        match = <optimized out>
        __func__ = "hash_search_with_hash_value"
#4  0x00005b36c3d71031 in hash_search (hashp=<optimized out>,
keyPtr=<optimized out>, action=<optimized out>, foundPtr=<optimized
out>) at ../src/backend/utils/hash/dynahash.c:1069
No locals.
#5  0x00005b36c3b195b3 in add_join_rel (root=0x5b36cd420d28,
joinrel=0x5b3716ccc6e8) at ../src/backend/optimizer/util/relnode.c:638
        hentry = <optimized out>
        found = false

David

Attachment

Re: pgsql: Improve accounting for memory used by shared hash tables

From
Tomas Vondra
Date:
On 4/4/25 00:57, David Rowley wrote:
> On Thu, 3 Apr 2025 at 04:16, Tomas Vondra <tomas.vondra@postgresql.org> wrote:
>> Improve accounting for memory used by shared hash tables
> 
> I've not looked into why, but this is causing an issue in the
> join_rel_hash during add_join_rel(). See the attached script.
> 
> ERROR:  pfree called with invalid pointer 0x60a15edc44e0 (header
> 0x0000002000000008)
> 

Thanks for the report and reproducer. I'll take a look tomorrow.

-- 
Tomas Vondra




Re: pgsql: Improve accounting for memory used by shared hash tables

From
Tomas Vondra
Date:
On 4/4/25 01:43, Tomas Vondra wrote:
> On 4/4/25 00:57, David Rowley wrote:
>> On Thu, 3 Apr 2025 at 04:16, Tomas Vondra <tomas.vondra@postgresql.org> wrote:
>>> Improve accounting for memory used by shared hash tables
>>
>> I've not looked into why, but this is causing an issue in the
>> join_rel_hash during add_join_rel(). See the attached script.
>>
>> ERROR:  pfree called with invalid pointer 0x60a15edc44e0 (header
>> 0x0000002000000008)
>>
> 
> Thanks for the report and reproducer. I'll take a look tomorrow.
> 

I took a quick look, and I think the reason is fairly simple - the
commit allocates the header and the directory as a single chunk. And for
shared hash tables that's fine, because those have non-expandable
directory. But the patch does the same thing for non-shared hash tables
(not intentionally), which means that if we end up expanding the hash,
it fails in dir_realloc(). Because hashp->dir is not a separately
allocated chunk.

This is clearly a bug in the patch, I should have caught this during a
review. But I'm also quite surprised none of the regression tests seems
to expand the hash table ...

I'll think about a way to fix this tomorrow.

-- 
Tomas Vondra




Re: pgsql: Improve accounting for memory used by shared hash tables

From
Tomas Vondra
Date:
On 4/4/25 02:41, Tomas Vondra wrote:
> On 4/4/25 01:43, Tomas Vondra wrote:
>> On 4/4/25 00:57, David Rowley wrote:
>>> On Thu, 3 Apr 2025 at 04:16, Tomas Vondra <tomas.vondra@postgresql.org> wrote:
>>>> Improve accounting for memory used by shared hash tables
>>>
>>> I've not looked into why, but this is causing an issue in the
>>> join_rel_hash during add_join_rel(). See the attached script.
>>>
>>> ERROR:  pfree called with invalid pointer 0x60a15edc44e0 (header
>>> 0x0000002000000008)
>>>
>>
>> Thanks for the report and reproducer. I'll take a look tomorrow.
>>
> 
> I took a quick look, and I think the reason is fairly simple - the
> commit allocates the header and the directory as a single chunk. And for
> shared hash tables that's fine, because those have non-expandable
> directory. But the patch does the same thing for non-shared hash tables
> (not intentionally), which means that if we end up expanding the hash,
> it fails in dir_realloc(). Because hashp->dir is not a separately
> allocated chunk.
> 
> This is clearly a bug in the patch, I should have caught this during a
> review. But I'm also quite surprised none of the regression tests seems
> to expand the hash table ...
> 
> I'll think about a way to fix this tomorrow.
> 

I ended up reverting this. Unfortunately, the patch assumed the
directory is pre-allocated and not expanding in more places. I wasn't
sure how long would it take me to fix this, or how invasive the fix
would be. It seems more appropriate to revert and then maybe apply a
reworked patch (not going to happen for PG18).

Thanks for the report, sorry for missing the issue in the first place.

-- 
Tomas Vondra