Re: munmap() failure due to sloppy handling of hugepage size - Mailing list pgsql-hackers

From Andres Freund
Subject Re: munmap() failure due to sloppy handling of hugepage size
Date
Msg-id 25C01331-B1AE-45A6-BD1F-D8AE2DE40F86@anarazel.de
Whole thread Raw
In response to munmap() failure due to sloppy handling of hugepage size  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: munmap() failure due to sloppy handling of hugepage size
List pgsql-hackers

On October 12, 2016 1:25:54 PM PDT, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>If any of you were following the thread at
>https://www.postgresql.org/message-id/flat/CAOan6TnQeSGcu_627NXQ2Z%2BWyhUzBjhERBm5RN9D0QFWmk7PoQ%40mail.gmail.com
>I spent quite a bit of time following a bogus theory, but the problem
>turns out to be very simple: on Linux, munmap() is pickier than mmap()
>about the length of a hugepage allocation.  The comments in
>sysv_shmem.c
>mention that on older kernels mmap() with MAP_HUGETLB will fail if
>given
>a length request that's not a multiple of the hugepage size.  Well, the
>behavior they replaced that with is little better: mmap() succeeds, but
>it gives you back a region that's been silently enlarged to the next
>hugepage boundary, and then munmap() will fail if you specify the
>region
>size you asked for rather than the region size you were given.
>
>Since AFAICS there is no way to inquire what region size you were
>given,
>this API is astonishingly brain-dead IMO.  But that seems to be what
>we've got.  Chris Richards reported it against a 3.16.7 kernel, and
>I can replicate the behavior on RHEL6 (2.6.32) by asking for an
>odd-size
>huge page region.
>
>We've mostly masked this by rounding up to a 2MB boundary, which is
>what
>the hugepage size typically is.  But that assumption is wrong on some
>hardware, and it's not likely to get less wrong as time passes.
>
>A little bit of research suggests that on Linux the thing to do would
>be
>to get the actual default hugepage size by reading /proc/meminfo and
>looking for a line like "Hugepagesize:       2048 kB".  I don't know
>of any more-portable API, so this does nothing for non-Linux kernels.
>But we have not heard of similar misbehavior on other platforms, even
>though IA64 and PPC64 can both have hugepages larger than 2MB, so it's
>reasonable to hope that other implementations of munmap() don't have
>the same gotcha.

We had that, but Heikki ripped it out when merging... I think you're supposed to use /sys to get the available size.

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: munmap() failure due to sloppy handling of hugepage size
Next
From: Vitaly Burovoy
Date:
Subject: Re: macaddr 64 bit (EUI-64) datatype support