Re: Better shared data structure management and resizable shared data structures - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Better shared data structure management and resizable shared data structures
Date
Msg-id 5a37c2e3-619d-4816-84d7-0b27e3e6797f@iki.fi
Whole thread Raw
In response to Better shared data structure management and resizable shared data structures  (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>)
Responses Re: Better shared data structure management and resizable shared data structures
List pgsql-hackers
On 13/02/2026 13:47, Ashutosh Bapat wrote:
> `man madvise` has this
>         MADV_REMOVE (since Linux 2.6.16)
>                Free  up a given range of pages and its associated
> backing store.  This is equivalent to punching a
>                hole in the corresponding byte range of the backing
> store (see fallocate(2)).  Subsequent  accesses
>                in the specified address range will see bytes containing zero.
> 
>                The  specified  address  range  must be mapped shared
> and writable.  This flag cannot be applied to
>                locked pages, Huge TLB pages, or VM_PFNMAP pages.
> 
>                In the initial implementation, only tmpfs(5) was
> supported MADV_REMOVE; but since  Linux  3.5,  any
>                filesystem  which  supports  the  fallocate(2)
> FALLOC_FL_PUNCH_HOLE mode also supports MADV_REMOVE.
>                Hugetlbfs fails with the error EINVAL and other
> filesystems fail with the error EOPNOTSUPP.
> 
> It says the flag can not be applied to Huge TLB pages. We won't be
> able to make resizable shared memory structures allocated with huge
> pages. That seems like a serious restriction.

Per https://man7.org/linux/man-pages/man2/madvise.2.html:

MADV_REMOVE (since Linux 2.6.16)
               ...

               Support for the Huge TLB filesystem was added in Linux
               v4.3.

> I may be misunderstanding something, but it seems like this is useful
> to free already allocated memory, not necessarily allocate more
> memory. I don't understand how a user would start with a larger
> reserved address space with only small portions of that space being
> backed by memory.

Hmm, I guess you'll need to use MAP_NORESERVE in the first mmap() call. 
to reserve address space for the maximum size, and then 
madvise(MADV_POPULATE_WRITE) using the initial size. Later, 
madvise(MADV_REMOVE) to shrink, and madvise(MADV_POPULATE_WRITE) to grow 
again.

- Heikki



pgsql-hackers by date:

Previous
From: Ashutosh Bapat
Date:
Subject: Re: Changing shared_buffers without restart
Next
From: Nitin Motiani
Date:
Subject: Re: [PATCH] Support reading large objects with pg_read_all_data