Re: munmap() failure due to sloppy handling of hugepage size - Mailing list pgsql-hackers

From Merlin Moncure
Subject Re: munmap() failure due to sloppy handling of hugepage size
Date
Msg-id CAHyXU0wB7oT58jSzYniy7df7bwQauyt=TVNKvsHvu1eRPSaMDQ@mail.gmail.com
Whole thread Raw
In response to Re: munmap() failure due to sloppy handling of hugepage size  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: munmap() failure due to sloppy handling of hugepage size  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Wed, Oct 12, 2016 at 5:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>> Tom Lane wrote:
>>> According to
>>> https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt
>>> looking into /proc/meminfo is the longer-standing API and thus is
>>> likely to work on more kernel versions.  Also, if you look into
>>> /sys then you are going to see multiple possible values and it's
>>> not clear how to choose the right one.
>
>> I'm not sure that this is the best rationale.  In my system there are
>> 2MB and 1GB huge page sizes; in systems with lots of memory (let's say 8
>> GB of shared memory is requested) it seems a clear winner to allocate 8
>> 1GB hugepages than 4096 2MB hugepages because the page table is so much
>> smaller.  The /proc interface only shows the 2MB page size, so if we go
>> that route we'd not be getting the full benefit of the feature.
>
> And you'll tell mmap() which one to do how exactly?  I haven't found
> anything explaining how applications get to choose which page size applies
> to their request.  The kernel document says that /proc/meminfo reflects
> the "default" size, and I'd assume that that's what we'll get from mmap.

hm. for (recent) linux, I see:
      MAP_HUGE_2MB, MAP_HUGE_1GB (since Linux 3.8)             Used in conjunction with MAP_HUGETLB to select
alternative            hugetlb page sizes (respectively, 2 MB and 1 GB) on systems             that support multiple
hugetlbpage sizes.
 
             More generally, the desired huge page size can be configured             by encoding the base-2 logarithm
ofthe desired page size in             the six bits at the offset MAP_HUGE_SHIFT.  (A value of zero             in this
bitfield provides the default huge page size; the             default huge page size can be discovered vie the
Hugepagesize            field exposed by /proc/meminfo.)  Thus, the above two             constants are defined as:
 
                 #define MAP_HUGE_2MB    (21 << MAP_HUGE_SHIFT)                 #define MAP_HUGE_1GB    (30 <<
MAP_HUGE_SHIFT)
             The range of huge page sizes that are supported by the system             can be discovered by listing the
subdirectoriesin             /sys/kernel/mm/hugepages.
 


via: http://man7.org/linux/man-pages/man2/mmap.2.html#NOTES

ISTM all this silliness is pretty much unique to linux anyways.
Instead of reading the filesystem, what about doing test map and test
unmap?  We could zero in on the page size for default I think with
some probing of known possible values.

merlin



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: munmap() failure due to sloppy handling of hugepage size
Next
From: Tom Lane
Date:
Subject: Re: munmap() failure due to sloppy handling of hugepage size