Re: patch: add MAP_HUGETLB to mmap() where supported (WIP) - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: patch: add MAP_HUGETLB to mmap() where supported (WIP) |
Date | |
Msg-id | 20130916131850.GB5249@awork2.anarazel.de Whole thread Raw |
In response to | Re: patch: add MAP_HUGETLB to mmap() where supported (WIP) (Heikki Linnakangas <hlinnakangas@vmware.com>) |
Responses |
Re: patch: add MAP_HUGETLB to mmap() where supported (WIP)
|
List | pgsql-hackers |
On 2013-09-16 16:13:57 +0300, Heikki Linnakangas wrote: > On 16.09.2013 13:15, Andres Freund wrote: > >On 2013-09-16 11:15:28 +0300, Heikki Linnakangas wrote: > >>On 14.09.2013 02:41, Richard Poole wrote: > >>>The attached patch adds the MAP_HUGETLB flag to mmap() for shared memory > >>>on systems that support it. It's based on Christian Kruse's patch from > >>>last year, incorporating suggestions from Andres Freund. > >> > >>I don't understand the logic in figuring out the pagesize, and the smallest > >>supported hugepage size. First of all, even without the patch, why do we > >>round up the size passed to mmap() to the _SC_PAGE_SIZE? Surely the kernel > >>will round up the request all by itself. The mmap() man page doesn't say > >>anything about length having to be a multiple of pages size. > > > >I think it does: > > EINVAL We don't like addr, length, or offset (e.g., they are too > > large, or not aligned on a page boundary). > > That doesn't mean that they *all* have to be aligned on a page boundary. > It's understandable that 'addr' and 'offset' have to be, but it doesn't make > much sense for 'length'. > > >and > > A file is mapped in multiples of the page size. For a file that is not a multiple > > of the page size, the remaining memory is zeroed when mapped, and writes to that > > region are not written out to the file. The effect of changing the size of the > > underlying file of a mapping on the pages that correspond to added or removed > > regions of the file is unspecified. > > > >And no, according to my past experience, the kernel does *not* do any > >such rounding up. It will just fail. > > I wrote a little test program to play with different values (attached). I > tried this on my laptop with a 3.2 kernel (uname -r: 3.10-2-amd6), and on a > VM with a fresh Centos 6.4 install with 2.6.32 kernel > (2.6.32-358.18.1.el6.x86_64), and they both work the same: > > $ ./mmaptest 100 # mmap 100 bytes > > in a different terminal: > $ cat /proc/meminfo | grep HugePages_Rsvd > HugePages_Rsvd: 1 > > So even a tiny allocation, much smaller than any page size, succeeds, and it > reserves a huge page. I tried the same with larger values; the kernel always > uses huge pages, and rounds up the allocation to a multiple of the huge page > size. When developing the prototype I am pretty sure I had to add the rounding up - but I am not sure why now, because after chatting with Heikki about it, I've looked around and the initial MAP_HUGETLB support in the kernel (commit 4e52780d41a741fb4861ae1df2413dd816ec11b1) has support for rounding up. > So, let's just get rid of the /sys scanning code. Alternatively we could round up NBuffers to actually use the additionally allocated space. Not sure if that's worth the amount of code, but wasting several megabytes - or even gigabytes - of memory isn't nice either. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: