Re: patch: add MAP_HUGETLB to mmap() where supported (WIP) - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: patch: add MAP_HUGETLB to mmap() where supported (WIP)
Date
Msg-id 52370415.6060108@vmware.com
Whole thread Raw
In response to Re: patch: add MAP_HUGETLB to mmap() where supported (WIP)  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: patch: add MAP_HUGETLB to mmap() where supported (WIP)
Re: patch: add MAP_HUGETLB to mmap() where supported (WIP)
List pgsql-hackers
On 16.09.2013 13:15, Andres Freund wrote:
> On 2013-09-16 11:15:28 +0300, Heikki Linnakangas wrote:
>> On 14.09.2013 02:41, Richard Poole wrote:
>>> The attached patch adds the MAP_HUGETLB flag to mmap() for shared memory
>>> on systems that support it. It's based on Christian Kruse's patch from
>>> last year, incorporating suggestions from Andres Freund.
>>
>> I don't understand the logic in figuring out the pagesize, and the smallest
>> supported hugepage size. First of all, even without the patch, why do we
>> round up the size passed to mmap() to the _SC_PAGE_SIZE? Surely the kernel
>> will round up the request all by itself. The mmap() man page doesn't say
>> anything about length having to be a multiple of pages size.
>
> I think it does:
>         EINVAL We don't like addr, length, or offset (e.g., they are  too
>                large,  or  not aligned on a page boundary).

That doesn't mean that they *all* have to be aligned on a page boundary.
It's understandable that 'addr' and 'offset' have to be, but it doesn't
make much sense for 'length'.

> and
>         A file is mapped in multiples of the page size.  For a file that is not a multiple
>         of  the  page size, the remaining memory is zeroed when mapped, and writes to that
>         region are not written out to the file.  The effect of changing the  size  of  the
>         underlying  file  of  a  mapping  on the pages that correspond to added or removed
>         regions of the file is unspecified.
>
> And no, according to my past experience, the kernel does *not* do any
> such rounding up. It will just fail.

I wrote a little test program to play with different values (attached).
I tried this on my laptop with a 3.2 kernel (uname -r: 3.10-2-amd6), and
on a VM with a fresh Centos 6.4 install with 2.6.32 kernel
(2.6.32-358.18.1.el6.x86_64), and they both work the same:

$ ./mmaptest 100 # mmap 100 bytes

in a different terminal:
$ cat /proc/meminfo  | grep HugePages_Rsvd
HugePages_Rsvd:        1

So even a tiny allocation, much smaller than any page size, succeeds,
and it reserves a huge page. I tried the same with larger values; the
kernel always uses huge pages, and rounds up the allocation to a
multiple of the huge page size.

So, let's just get rid of the /sys scanning code.

Robert, do you remember why you put the "pagesize =
sysconf(_SC_PAGE_SIZE);" call in the new mmap() shared memory allocator?

- Heikki

Attachment

pgsql-hackers by date:

Previous
From: "MauMau"
Date:
Subject: Re: UTF8 national character data type support WIP patch and list of open issues.
Next
From: Andres Freund
Date:
Subject: Re: patch: add MAP_HUGETLB to mmap() where supported (WIP)