Re: [PATCH] Use MAP_HUGETLB where supported (v3) - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: [PATCH] Use MAP_HUGETLB where supported (v3)
Date
Msg-id 52861EEC.2090702@vmware.com
Whole thread Raw
In response to Re: [PATCH] Use MAP_HUGETLB where supported (v3)  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: [PATCH] Use MAP_HUGETLB where supported (v3)  (Sameer Kumar <sameer.kumar@ashnik.com>)
Re: [PATCH] Use MAP_HUGETLB where supported (v3)  (Abhijit Menon-Sen <ams@2ndQuadrant.com>)
Re: [PATCH] Use MAP_HUGETLB where supported (v3)  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Re: [PATCH] Use MAP_HUGETLB where supported (v3)  (Christian Kruse <christian@2ndQuadrant.com>)
List pgsql-hackers
On 30.10.2013 19:11, Andres Freund wrote:
> On 2013-10-30 22:39:20 +0530, Abhijit Menon-Sen wrote:
>> At 2013-10-30 11:04:36 -0400, tgl@sss.pgh.pa.us wrote:
>>>
>>>> As a compromise, perhaps we can unconditionally round the size up to be
>>>> a multiple of 2MB? […]
>>>
>>> That sounds reasonably painless to me.
>>
>> Here's a patch that does that and adds a DEBUG1 log message when we try
>> with MAP_HUGETLB and fail and fallback to ordinary mmap.
>
> But it's in no way guaranteed that the smallest hugepage size is
> 2MB. It'll be on current x86 hardware, but not on any other platform...

Sure, but there's no big harm done. We're just trying to avoid hitting a
kernel bug, and as a bonus, we avoid wasting some memory that would
otherwise be lost due to the kernel rounding the allocation. If the
smallest hugepage size is smaller than 2MB, we round up the allocation
unnecessarily, but that doesn't seem serious.


I spent some time whacking this around, new patch version attached. I
moved the mmap() code into a new function, that leaves the
PGSharedMemoryCreate more readable.

I modified the patch so that it throws an error if you set
huge_tlb_pages=on, and the platform doesn't support MAP_HUGETLB (ie.
non-Linux, or EXEC_BACKEND). 'try' is the default, so this only affects
you if you explicitly set it to 'on'. I think that's the right behavior;
if you explicitly ask for it, and you don't get it, that should be an
error. But I'm not wedded to the idea if someone objects; a log message
might also be reasonable: "LOG: huge TLB pages are not supported on this
platform, but huge_tlb_pages was 'on'"

The error message on failed allocation, if huge_tlb_pages=on, needs
updating:

$ bin/postmaster -D data
FATAL:  could not map anonymous shared memory: Cannot allocate memory
HINT:  This error usually means that PostgreSQL's request for a shared
memory segment exceeded available memory or swap space. To reduce the
request size (currently 189390848 bytes), reduce PostgreSQL's shared
memory usage, perhaps by reducing shared_buffers or max_connections.

The reason the allocation failed in this case was that I used
huge_tlb_pages=on, but had not configured the kernel for huge pages. The
hint is quite misleading in that case, it should advise to configure the
kernel, or turn off huge_tlb_pages.

The documentation needs some work. I think it's pretty user-unfriendly
to link to https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt.
It gives a lot of details, and although it explains stuff that is
relevant, like setting the nr_hugepages sysctl, it also contains a lot
of stuff that is not relevant to us, like how to mount hugetlbfs. Can we
do better than that? Is there a better guide somewhere on how to set the
kernel settings. If not, we should include step-by-step instructions in
our manual.

The "Managing Kernel Resources" section in the user manual should also
be updated to mention how to enable huge pages.

Also, now that I changed huge_tlb_pages='on' to fail on platforms where
it's not supported at all, the docs need to be updated to reflect it.

- Heikki

Attachment

pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Proof of concept: standalone backend with full FE/BE protocol
Next
From: Pavel Golub
Date:
Subject: Re: LISTEN / NOTIFY enhancement request for Postgresql