Thread: patch: add MAP_HUGETLB to mmap() where supported (WIP)

patch: add MAP_HUGETLB to mmap() where supported (WIP)

From

Richard Poole

Date:

13 September 2013, 23:41:31

The attached patch adds the MAP_HUGETLB flag to mmap() for shared memory
on systems that support it. It's based on Christian Kruse's patch from
last year, incorporating suggestions from Andres Freund.

On a system with 4GB shared_buffers, doing pgbench runs long enough for
each backend to touch most of the buffers, this patch saves nearly 8MB of
memory per backend and improves performances by just over 2% on average.

It is still WIP as there are a couple of points that Andres has pointed
out to me that haven't been addressed yet; also, the documentation is
incomplete.

Richard

--
Richard Poole                 http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment

hugepages-v1.patch

Re: patch: add MAP_HUGETLB to mmap() where supported (WIP)

From

Peter Eisentraut

Date:

15 September 2013, 02:03:56

On Sat, 2013-09-14 at 00:41 +0100, Richard Poole wrote:
> The attached patch adds the MAP_HUGETLB flag to mmap() for shared
> memory on systems that support it. 

Please fix the tabs in the SGML files.

Re: patch: add MAP_HUGETLB to mmap() where supported (WIP)

From

Heikki Linnakangas

Date:

16 September 2013, 08:15:40

On 14.09.2013 02:41, Richard Poole wrote:
> The attached patch adds the MAP_HUGETLB flag to mmap() for shared memory
> on systems that support it. It's based on Christian Kruse's patch from
> last year, incorporating suggestions from Andres Freund.

I don't understand the logic in figuring out the pagesize, and the 
smallest supported hugepage size. First of all, even without the patch, 
why do we round up the size passed to mmap() to the _SC_PAGE_SIZE? 
Surely the kernel will round up the request all by itself. The mmap() 
man page doesn't say anything about length having to be a multiple of 
pages size.

And with the patch, why do you bother detecting the minimum supported 
hugepage size? Surely the kernel will choose the appropriate hugepage 
size just fine on its own, no?

> It is still WIP as there are a couple of points that Andres has pointed
> out to me that haven't been addressed yet;

Which points are those?

I wonder if it would be better to allow setting huge_tlb_pages=try even 
on platforms that don't have hugepages. It would simply mean the same as 
'off' on such platforms.

- Heikki

Re: patch: add MAP_HUGETLB to mmap() where supported (WIP)

From

Andres Freund

Date:

16 September 2013, 10:15:47

On 2013-09-16 11:15:28 +0300, Heikki Linnakangas wrote:
> On 14.09.2013 02:41, Richard Poole wrote:
> >The attached patch adds the MAP_HUGETLB flag to mmap() for shared memory
> >on systems that support it. It's based on Christian Kruse's patch from
> >last year, incorporating suggestions from Andres Freund.
> 
> I don't understand the logic in figuring out the pagesize, and the smallest
> supported hugepage size. First of all, even without the patch, why do we
> round up the size passed to mmap() to the _SC_PAGE_SIZE? Surely the kernel
> will round up the request all by itself. The mmap() man page doesn't say
> anything about length having to be a multiple of pages size.

I think it does:      EINVAL We don't like addr, length, or offset (e.g., they are  too             large,  or  not
alignedon a page boundary).

and      A file is mapped in multiples of the page size.  For a file that is not a multiple      of  the  page size,
theremaining memory is zeroed when mapped, and writes to that      region are not written out to the file.  The effect
ofchanging the  size  of  the      underlying  file  of  a  mapping  on the pages that correspond to added or removed
  regions of the file is unspecified.

And no, according to my past experience, the kernel does *not* do any
such rounding up. It will just fail.

> And with the patch, why do you bother detecting the minimum supported
> hugepage size? Surely the kernel will choose the appropriate hugepage size
> just fine on its own, no?

It will fail if it's not a multiple.

> >It is still WIP as there are a couple of points that Andres has pointed
> >out to me that haven't been addressed yet;
> 
> Which points are those?

I don't know which point Richard already has fixed, so I'll let him
comment on that.

> I wonder if it would be better to allow setting huge_tlb_pages=try even on
> platforms that don't have hugepages. It would simply mean the same as 'off'
> on such platforms.

I wouldn't argue against that.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: patch: add MAP_HUGETLB to mmap() where supported (WIP)

From

Heikki Linnakangas

Date:

16 September 2013, 13:14:18

On 16.09.2013 13:15, Andres Freund wrote:
> On 2013-09-16 11:15:28 +0300, Heikki Linnakangas wrote:
>> On 14.09.2013 02:41, Richard Poole wrote:
>>> The attached patch adds the MAP_HUGETLB flag to mmap() for shared memory
>>> on systems that support it. It's based on Christian Kruse's patch from
>>> last year, incorporating suggestions from Andres Freund.
>>
>> I don't understand the logic in figuring out the pagesize, and the smallest
>> supported hugepage size. First of all, even without the patch, why do we
>> round up the size passed to mmap() to the _SC_PAGE_SIZE? Surely the kernel
>> will round up the request all by itself. The mmap() man page doesn't say
>> anything about length having to be a multiple of pages size.
>
> I think it does:
>         EINVAL We don't like addr, length, or offset (e.g., they are  too
>                large,  or  not aligned on a page boundary).

That doesn't mean that they *all* have to be aligned on a page boundary.
It's understandable that 'addr' and 'offset' have to be, but it doesn't
make much sense for 'length'.

> and
>         A file is mapped in multiples of the page size.  For a file that is not a multiple
>         of  the  page size, the remaining memory is zeroed when mapped, and writes to that
>         region are not written out to the file.  The effect of changing the  size  of  the
>         underlying  file  of  a  mapping  on the pages that correspond to added or removed
>         regions of the file is unspecified.
>
> And no, according to my past experience, the kernel does *not* do any
> such rounding up. It will just fail.

I wrote a little test program to play with different values (attached).
I tried this on my laptop with a 3.2 kernel (uname -r: 3.10-2-amd6), and
on a VM with a fresh Centos 6.4 install with 2.6.32 kernel
(2.6.32-358.18.1.el6.x86_64), and they both work the same:

$ ./mmaptest 100 # mmap 100 bytes

in a different terminal:
$ cat /proc/meminfo  | grep HugePages_Rsvd
HugePages_Rsvd:        1

So even a tiny allocation, much smaller than any page size, succeeds,
and it reserves a huge page. I tried the same with larger values; the
kernel always uses huge pages, and rounds up the allocation to a
multiple of the huge page size.

So, let's just get rid of the /sys scanning code.

Robert, do you remember why you put the "pagesize =
sysconf(_SC_PAGE_SIZE);" call in the new mmap() shared memory allocator?

- Heikki

Attachment

mmaptest.c

Re: patch: add MAP_HUGETLB to mmap() where supported (WIP)

From

Andres Freund

Date:

16 September 2013, 13:19:00

On 2013-09-16 16:13:57 +0300, Heikki Linnakangas wrote:
> On 16.09.2013 13:15, Andres Freund wrote:
> >On 2013-09-16 11:15:28 +0300, Heikki Linnakangas wrote:
> >>On 14.09.2013 02:41, Richard Poole wrote:
> >>>The attached patch adds the MAP_HUGETLB flag to mmap() for shared memory
> >>>on systems that support it. It's based on Christian Kruse's patch from
> >>>last year, incorporating suggestions from Andres Freund.
> >>
> >>I don't understand the logic in figuring out the pagesize, and the smallest
> >>supported hugepage size. First of all, even without the patch, why do we
> >>round up the size passed to mmap() to the _SC_PAGE_SIZE? Surely the kernel
> >>will round up the request all by itself. The mmap() man page doesn't say
> >>anything about length having to be a multiple of pages size.
> >
> >I think it does:
> >        EINVAL We don't like addr, length, or offset (e.g., they are  too
> >               large,  or  not aligned on a page boundary).
> 
> That doesn't mean that they *all* have to be aligned on a page boundary.
> It's understandable that 'addr' and 'offset' have to be, but it doesn't make
> much sense for 'length'.
> 
> >and
> >        A file is mapped in multiples of the page size.  For a file that is not a multiple
> >        of  the  page size, the remaining memory is zeroed when mapped, and writes to that
> >        region are not written out to the file.  The effect of changing the  size  of  the
> >        underlying  file  of  a  mapping  on the pages that correspond to added or removed
> >        regions of the file is unspecified.
> >
> >And no, according to my past experience, the kernel does *not* do any
> >such rounding up. It will just fail.
> 
> I wrote a little test program to play with different values (attached). I
> tried this on my laptop with a 3.2 kernel (uname -r: 3.10-2-amd6), and on a
> VM with a fresh Centos 6.4 install with 2.6.32 kernel
> (2.6.32-358.18.1.el6.x86_64), and they both work the same:
> 
> $ ./mmaptest 100 # mmap 100 bytes
> 
> in a different terminal:
> $ cat /proc/meminfo  | grep HugePages_Rsvd
> HugePages_Rsvd:        1
> 
> So even a tiny allocation, much smaller than any page size, succeeds, and it
> reserves a huge page. I tried the same with larger values; the kernel always
> uses huge pages, and rounds up the allocation to a multiple of the huge page
> size.

When developing the prototype I am pretty sure I had to add the rounding
up - but I am not sure why now, because after chatting with Heikki about
it, I've looked around and the initial MAP_HUGETLB support in the kernel
(commit 4e52780d41a741fb4861ae1df2413dd816ec11b1) has support for
rounding up.

> So, let's just get rid of the /sys scanning code.

Alternatively we could round up NBuffers to actually use the
additionally allocated space. Not sure if that's worth the amount of
code, but wasting several megabytes - or even gigabytes - of memory
isn't nice either.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: patch: add MAP_HUGETLB to mmap() where supported (WIP)

From

Andres Freund

Date:

16 September 2013, 13:23:19

On 2013-09-16 15:18:50 +0200, Andres Freund wrote:
> > So even a tiny allocation, much smaller than any page size, succeeds, and it
> > reserves a huge page. I tried the same with larger values; the kernel always
> > uses huge pages, and rounds up the allocation to a multiple of the huge page
> > size.
> 
> When developing the prototype I am pretty sure I had to add the rounding
> up - but I am not sure why now, because after chatting with Heikki about
> it, I've looked around and the initial MAP_HUGETLB support in the kernel
> (commit 4e52780d41a741fb4861ae1df2413dd816ec11b1) has support for
> rounding up.

Ok, the reason for that seems to have been the following bug
https://bugzilla.kernel.org/show_bug.cgi?id=56881

Greetings,

Andres Freund

Re: patch: add MAP_HUGETLB to mmap() where supported (WIP)

From

Robert Haas

Date:

17 September 2013, 20:09:44

On Mon, Sep 16, 2013 at 9:13 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> Robert, do you remember why you put the "pagesize = sysconf(_SC_PAGE_SIZE);"
> call in the new mmap() shared memory allocator?

Hmm, no.  Unfortunately, I don't.  We could try ripping it out and see
if the buildfarm breaks. If it is needed, then the dynamic shared
memory patch I posted probably needs it as well.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[PATCH] Use MAP_HUGETLB where supported (v3)

From

Abhijit Menon-Sen

Date:

24 October 2013, 06:03:24

Hi.

This is a slightly reworked version of the patch submitted by Richard
Poole last month, which was based on Christian Kruse's earlier patch.

Apart from doing various minor cleanups and documentation fixes, I also
tested this patch against HEAD on a machine with 256GB of RAM. Here's an
overview of the results.

I set nr_hugepages to 32768 (== 64GB), which (took a very long time and)
allowed me to set shared_buffers to 60GB. I then ran pgbench -s 1000 -i,
and did some runs of "pgbench -c 100 -j 10 -t 1000" with huge_tlb_pages
set to off and on respectively.

With huge_tlb_pages=off, this is the best result I got:

    tps = 8680.771068 (including connections establishing)
    tps = 8721.504838 (excluding connections establishing)

With huge_tlb_pages=on, this is the best result I got:

    tps = 9932.245203 (including connections establishing)
    tps = 9983.190304 (excluding connections establishing)

(Even the worst result I got in the latter case was a smidgen faster
than the best with huge_tlb_pages=off: 8796.344078 vs. 8721.504838.)

>From /proc/$pid/status, VmPTE was 2880kb with huge_tlb_pages=off, and
56kb with it turned on.

One open question is what to do about rounding up the size. It should
not be necessary, but for the fairly recent bug described at the link
in the comment (https://bugzilla.kernel.org/show_bug.cgi?id=56881). I
tried it without the rounding-up, and it fails on Ubuntu's 3.5.0-28
kernel (mmap returns EINVAL).

Any thoughts?

-- Abhijit

Attachment

hugepages-v3.patch

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Abhijit Menon-Sen

Date:

24 October 2013, 06:06:56

At 2013-10-24 11:33:13 +0530, ams@2ndquadrant.com wrote:
>
> >From /proc/$pid/status, VmPTE was 2880kb with huge_tlb_pages=off, and
> 56kb with it turned on.

(VmPTE is the size of the process's page tables.)

-- Abhijit

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Heikki Linnakangas

Date:

24 October 2013, 13:07:08

On 24.10.2013 09:03, Abhijit Menon-Sen wrote:
> This is a slightly reworked version of the patch submitted by Richard
> Poole last month, which was based on Christian Kruse's earlier patch.

Thanks.

> With huge_tlb_pages=off, this is the best result I got:
>
>      tps = 8680.771068 (including connections establishing)
>      tps = 8721.504838 (excluding connections establishing)
>
> With huge_tlb_pages=on, this is the best result I got:
>
>      tps = 9932.245203 (including connections establishing)
>      tps = 9983.190304 (excluding connections establishing)
>
> (Even the worst result I got in the latter case was a smidgen faster
> than the best with huge_tlb_pages=off: 8796.344078 vs. 8721.504838.)

That's really impressive.

> One open question is what to do about rounding up the size. It should
> not be necessary, but for the fairly recent bug described at the link
> in the comment (https://bugzilla.kernel.org/show_bug.cgi?id=56881). I
> tried it without the rounding-up, and it fails on Ubuntu's 3.5.0-28
> kernel (mmap returns EINVAL).

Let's get rid of the rounding. It's clearly a kernel bug, and it 
shouldn't be our business to add workarounds for any kernel bug out 
there. And the worst that will happen if you're running a buggy kernel 
version is that you fall back to not using huge pages (assuming 
huge_tlb_pages=try).

Other comments:

* guc.c doesn't actually need sys/mman.h for anything. Getting rid of 
the #include also lets you remove the configure test.

* the documentation should perhaps mention that the setting only has an 
effect if POSIX shared memory is used. That's the default on Linux, but 
we will try to fall back to SystemV shared memory if it fails.

- Heikki

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Robert Haas

Date:

24 October 2013, 15:04:11

On Thu, Oct 24, 2013 at 9:06 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> * the documentation should perhaps mention that the setting only has an
> effect if POSIX shared memory is used. That's the default on Linux, but we
> will try to fall back to SystemV shared memory if it fails.

This is true for dynamic shared memory, but not for the main shared
memory segment.   The main shared memory segment is always the
combination of a small, fixed-size System V shared memory chunk and a
anonymous shared memory region created by mmap(NULL, ..., MAP_SHARED).POSIX shared memory is not used.

(Exceptions: Anonymous shared memory isn't used on Windows, which has
its own mechanism, or when compiling with EXEC_BACKEND, when the whole
chunk is allocated as System V shared memory.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Andres Freund

Date:

24 October 2013, 17:00:35

On 2013-10-24 16:06:19 +0300, Heikki Linnakangas wrote:
> On 24.10.2013 09:03, Abhijit Menon-Sen wrote:
> >One open question is what to do about rounding up the size. It should
> >not be necessary, but for the fairly recent bug described at the link
> >in the comment (https://bugzilla.kernel.org/show_bug.cgi?id=56881). I
> >tried it without the rounding-up, and it fails on Ubuntu's 3.5.0-28
> >kernel (mmap returns EINVAL).
> 
> Let's get rid of the rounding. It's clearly a kernel bug, and it shouldn't
> be our business to add workarounds for any kernel bug out there. And the
> worst that will happen if you're running a buggy kernel version is that you
> fall back to not using huge pages (assuming huge_tlb_pages=try).

But it's a range of relatively popular kernels, that will stay around
for a good while. So I am hesitant to just not do anything about it. The
directory scanning code isn't that bad imo.

Either way:
I think we should log when we tried to use hugepages but fell back to
plain mmap, currently it's hard to see whether they are used.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Robert Haas

Date:

24 October 2013, 17:14:06

On Thu, Oct 24, 2013 at 1:00 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2013-10-24 16:06:19 +0300, Heikki Linnakangas wrote:
>> On 24.10.2013 09:03, Abhijit Menon-Sen wrote:
>> >One open question is what to do about rounding up the size. It should
>> >not be necessary, but for the fairly recent bug described at the link
>> >in the comment (https://bugzilla.kernel.org/show_bug.cgi?id=56881). I
>> >tried it without the rounding-up, and it fails on Ubuntu's 3.5.0-28
>> >kernel (mmap returns EINVAL).
>>
>> Let's get rid of the rounding. It's clearly a kernel bug, and it shouldn't
>> be our business to add workarounds for any kernel bug out there. And the
>> worst that will happen if you're running a buggy kernel version is that you
>> fall back to not using huge pages (assuming huge_tlb_pages=try).
>
> But it's a range of relatively popular kernels, that will stay around
> for a good while. So I am hesitant to just not do anything about it. The
> directory scanning code isn't that bad imo.
>
> Either way:
> I think we should log when we tried to use hugepages but fell back to
> plain mmap, currently it's hard to see whether they are used.

Logging it might be a good idea, but suppose the systems been running
for 6 months and you don't have the startup logs.  Might be a good way
to have an easy way to discover later what happened back then.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Sergey Konoplev

Date:

30 October 2013, 03:53:00

Hi,

On Wed, Oct 23, 2013 at 11:03 PM, Abhijit Menon-Sen <ams@2ndquadrant.com> wrote:
> This is a slightly reworked version of the patch submitted by Richard
> Poole last month, which was based on Christian Kruse's earlier patch.

Is it possible that this patch will be included in a minor version of
9.3? IMHO hugepages is a very important ability that postgres lost in
9.3, and it would be great to have it back ASAP.

Thank you.

-- 
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray.ru@gmail.com

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Tom Lane

Date:

30 October 2013, 04:32:03

Sergey Konoplev <gray.ru@gmail.com> writes:
> On Wed, Oct 23, 2013 at 11:03 PM, Abhijit Menon-Sen <ams@2ndquadrant.com> wrote:
>> This is a slightly reworked version of the patch submitted by Richard
>> Poole last month, which was based on Christian Kruse's earlier patch.

> Is it possible that this patch will be included in a minor version of
> 9.3? IMHO hugepages is a very important ability that postgres lost in
> 9.3, and it would be great to have it back ASAP.

Say what?  There's never been any hugepages support in Postgres.
        regards, tom lane

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Abhijit Menon-Sen

Date:

30 October 2013, 04:47:03

At 2013-10-24 16:06:19 +0300, hlinnakangas@vmware.com wrote:
>
> Let's get rid of the rounding.

I share Andres's concern that the bug is present in various recent
kernels that are going to stick around for quite some time. Given
the rather significant performance gain, I think it's worth doing
something, though I'm not a big fan of the directory-scanning code
myself.

As a compromise, perhaps we can unconditionally round the size up to be
a multiple of 2MB? That way, we can use huge pages more often, but also
avoid putting in a lot of code and effort into the workaround and waste
only a little space (if any at all).

> Other comments:
> 
> * guc.c doesn't actually need sys/mman.h for anything. Getting rid
> of the #include also lets you remove the configure test.

You're right, guc.c doesn't use it any more; I've removed the #include.

sysv_shmem.c does use it (MAP_*, PROT_*), however, so I've left the test
in configure alone. I see that sys/mman.h is included elsewhere with an
#ifdef WIN32 or HAVE_SHM_OPEN guard, but HAVE_SYS_MMAN_H seems better.

> * the documentation should perhaps mention that the setting only has
> an effect if POSIX shared memory is used.

As Robert said, this is not correct, so I haven't changed anything.

-- Abhijit

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Abhijit Menon-Sen

Date:

30 October 2013, 04:58:32

At 2013-10-24 19:00:28 +0200, andres@2ndquadrant.com wrote:
>
> I think we should log when we tried to use hugepages but fell back to
> plain mmap, currently it's hard to see whether they are used.

Good idea, thanks. I'll do this in the next patch I post (which will be
after we reach some consensus about how to handle the rounding problem).

-- Abhijit

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Sergey Konoplev

Date:

30 October 2013, 06:08:30

On Tue, Oct 29, 2013 at 9:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Sergey Konoplev <gray.ru@gmail.com> writes:
>> On Wed, Oct 23, 2013 at 11:03 PM, Abhijit Menon-Sen <ams@2ndquadrant.com> wrote:
>>> This is a slightly reworked version of the patch submitted by Richard
>>> Poole last month, which was based on Christian Kruse's earlier patch.
>
>> Is it possible that this patch will be included in a minor version of
>> 9.3? IMHO hugepages is a very important ability that postgres lost in
>> 9.3, and it would be great to have it back ASAP.
>
> Say what?  There's never been any hugepages support in Postgres.

There were an ability to back shared memory with hugepages when using
<=9.2. I use it on ~30 servers for several years and it brings 8-17%
of performance depending on the memory size. Here you will find
several paragraphs of the description about how to do it
https://github.com/grayhemp/pgcookbook/blob/master/database_server_configuration.md.
Just search for the 'hugepages' word on the page.

-- 
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray.ru@gmail.com

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

David Fetter

Date:

30 October 2013, 07:08:31

On Tue, Oct 29, 2013 at 11:08:05PM -0700, Sergey Konoplev wrote:
> On Tue, Oct 29, 2013 at 9:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Sergey Konoplev <gray.ru@gmail.com> writes:
> >> On Wed, Oct 23, 2013 at 11:03 PM, Abhijit Menon-Sen <ams@2ndquadrant.com> wrote:
> >>> This is a slightly reworked version of the patch submitted by Richard
> >>> Poole last month, which was based on Christian Kruse's earlier patch.
> >
> >> Is it possible that this patch will be included in a minor version of
> >> 9.3? IMHO hugepages is a very important ability that postgres lost in
> >> 9.3, and it would be great to have it back ASAP.
> >
> > Say what?  There's never been any hugepages support in Postgres.
> 
> There were an ability to back shared memory with hugepages when using
> <=9.2. I use it on ~30 servers for several years and it brings 8-17%
> of performance depending on the memory size. Here you will find
> several paragraphs of the description about how to do it
> https://github.com/grayhemp/pgcookbook/blob/master/database_server_configuration.md.
> Just search for the 'hugepages' word on the page.

For better or worse, we add new features exactly and only in .0
releases.  It's what's made it possible for people to plan
deployments, given us a deserved reputation for stability, etc., etc.

I guess what I'm saying here is that awesome as any particular feature
might be to back-patch, that benefit is overwhelmed by the cost of
having unstable releases.

-infininty from me to any proposal that gets us into "are you using
PostgreSQL x.y.z or x.y.w?" when it comes to features.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

David Fetter

Date:

30 October 2013, 07:10:44

On Wed, Oct 30, 2013 at 10:16:57AM +0530, Abhijit Menon-Sen wrote:
> At 2013-10-24 16:06:19 +0300, hlinnakangas@vmware.com wrote:
> >
> > Let's get rid of the rounding.
> 
> I share Andres's concern that the bug is present in various recent
> kernels that are going to stick around for quite some time. Given
> the rather significant performance gain, I think it's worth doing
> something, though I'm not a big fan of the directory-scanning code
> myself.
> 
> As a compromise, perhaps we can unconditionally round the size up to be
> a multiple of 2MB?

How about documenting that 2MB is the quantum (OK, we'll say
"indivisible unit" or "smallest division" or something) and failing
with a message to that effect if someone tries to set it otherwise?

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Abhijit Menon-Sen

Date:

30 October 2013, 07:28:39

At 2013-10-30 00:10:39 -0700, david@fetter.org wrote:
>
> How about documenting that 2MB is the quantum (OK, we'll say
> "indivisible unit" or "smallest division" or something) and failing
> with a message to that effect if someone tries to set it otherwise?

I don't think you understand the problem. We're not discussing a user
setting here. The size that is passed to PGSharedMemoryCreate is based
on shared_buffers and our estimates of how much memory we need for other
things like ProcArray (see ipci.c:CreateSharedMemoryAndSemaphores).

If this calculated size is not a multiple of a page size supported by
the hardware (usually 2/4/16MB etc.), the allocation will fail under
some commonly-used kernels. We can either ignore the problem and let
the allocation fail, or try to discover the smallest supported huge
page size (what the patch does now), or assume that 2MB pages can be
used if any huge pages can be used and align accordingly.

We could use a larger size, e.g. if we aligned to 16MB then it would
work on hardware that supported 2/4/8/16MB pages, but we'd waste the
extra memory unless we also increased NBuffers after the rounding up
(which is also something Andres suggested earlier).

I don't have a strong opinion on the available options, other than not
liking the "do nothing" approach.

-- Abhijit

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Tom Lane

Date:

30 October 2013, 15:04:46

Abhijit Menon-Sen <ams@2ndquadrant.com> writes:
> As a compromise, perhaps we can unconditionally round the size up to be
> a multiple of 2MB? That way, we can use huge pages more often, but also
> avoid putting in a lot of code and effort into the workaround and waste
> only a little space (if any at all).

That sounds reasonably painless to me.  Note that at least in our main
shmem segment, "extra" space is not useless, because it allows slop for
the main hash tables, notably the locks table.
        regards, tom lane

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Tom Lane

Date:

30 October 2013, 15:11:16

Sergey Konoplev <gray.ru@gmail.com> writes:
> On Tue, Oct 29, 2013 at 9:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Say what?  There's never been any hugepages support in Postgres.

> There were an ability to back shared memory with hugepages when using
> <=9.2. I use it on ~30 servers for several years and it brings 8-17%
> of performance depending on the memory size. Here you will find
> several paragraphs of the description about how to do it
> https://github.com/grayhemp/pgcookbook/blob/master/database_server_configuration.md.

What this describes is how to modify Postgres to request huge pages.
That's hardly built-in support.

In any case, as David already explained, we don't do feature additions
in minor releases.  We'd be especially unlikely to make an exception
for this, since it has uncertain portability and benefits.  Anything
that carries portability risks has got to go through a beta testing
cycle before we'll unleash it on the masses.
        regards, tom lane

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Abhijit Menon-Sen

Date:

30 October 2013, 17:09:35

At 2013-10-30 11:04:36 -0400, tgl@sss.pgh.pa.us wrote:
>
> > As a compromise, perhaps we can unconditionally round the size up to be
> > a multiple of 2MB? […]
>
> That sounds reasonably painless to me.

Here's a patch that does that and adds a DEBUG1 log message when we try
with MAP_HUGETLB and fail and fallback to ordinary mmap.

-- Abhijit

Attachment

hugepages-v4.patch

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Andres Freund

Date:

30 October 2013, 17:11:40

On 2013-10-30 22:39:20 +0530, Abhijit Menon-Sen wrote:
> At 2013-10-30 11:04:36 -0400, tgl@sss.pgh.pa.us wrote:
> >
> > > As a compromise, perhaps we can unconditionally round the size up to be
> > > a multiple of 2MB? […]
> >
> > That sounds reasonably painless to me.
>
> Here's a patch that does that and adds a DEBUG1 log message when we try
> with MAP_HUGETLB and fail and fallback to ordinary mmap.

But it's in no way guaranteed that the smallest hugepage size is
2MB. It'll be on current x86 hardware, but not on any other platform...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Sergey Konoplev

Date:

30 October 2013, 17:15:30

On Wed, Oct 30, 2013 at 8:11 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Sergey Konoplev <gray.ru@gmail.com> writes:
>> On Tue, Oct 29, 2013 at 9:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Say what?  There's never been any hugepages support in Postgres.
>
>> There were an ability to back shared memory with hugepages when using
>> <=9.2. I use it on ~30 servers for several years and it brings 8-17%
>> of performance depending on the memory size. Here you will find
>> several paragraphs of the description about how to do it
>> https://github.com/grayhemp/pgcookbook/blob/master/database_server_configuration.md.
>
> What this describes is how to modify Postgres to request huge pages.
> That's hardly built-in support.

I wasn't talking about a built-in support. It was about an ability (a
way) to back sh_buf with hugepages.

> In any case, as David already explained, we don't do feature additions
> in minor releases.  We'd be especially unlikely to make an exception
> for this, since it has uncertain portability and benefits.  Anything
> that carries portability risks has got to go through a beta testing
> cycle before we'll unleash it on the masses.

Yes, I got the idea. Thanks both of you for clarification.

-- 
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray.ru@gmail.com

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Alvaro Herrera

Date:

30 October 2013, 18:50:26

Sergey Konoplev escribió:
> On Wed, Oct 30, 2013 at 8:11 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Sergey Konoplev <gray.ru@gmail.com> writes:

> >> There were an ability to back shared memory with hugepages when using
> >> <=9.2. I use it on ~30 servers for several years and it brings 8-17%
> >> of performance depending on the memory size. Here you will find
> >> several paragraphs of the description about how to do it
> >> https://github.com/grayhemp/pgcookbook/blob/master/database_server_configuration.md.
> >
> > What this describes is how to modify Postgres to request huge pages.
> > That's hardly built-in support.
> 
> I wasn't talking about a built-in support. It was about an ability (a
> way) to back sh_buf with hugepages.

Then what you need is to set 
dynamic_shared_memory_type = sysv
in postgresql.conf.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Sergey Konoplev

Date:

30 October 2013, 19:15:16

On Wed, Oct 30, 2013 at 11:50 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
>> >> There were an ability to back shared memory with hugepages when using
>> >> <=9.2. I use it on ~30 servers for several years and it brings 8-17%
>> >> of performance depending on the memory size. Here you will find
>> >> several paragraphs of the description about how to do it
>> >> https://github.com/grayhemp/pgcookbook/blob/master/database_server_configuration.md.
>> >
>> > What this describes is how to modify Postgres to request huge pages.
>> > That's hardly built-in support.
>>
>> I wasn't talking about a built-in support. It was about an ability (a
>> way) to back sh_buf with hugepages.
>
> Then what you need is to set
> dynamic_shared_memory_type = sysv
> in postgresql.conf.

Neither I found this parameter in the docs nor it works when I specify
it in postgresql.conf.

LOG:  unrecognized configuration parameter
"dynamic_shared_memory_type" in file
"/etc/postgresql/9.3/main/postgresql.conf" line 114
FATAL:  configuration file "/etc/postgresql/9.3/main/postgresql.conf"
contains errors

-- 
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray.ru@gmail.com

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Alvaro Herrera

Date:

30 October 2013, 19:17:33

Alvaro Herrera escribió:
> Sergey Konoplev escribió:

> > I wasn't talking about a built-in support. It was about an ability (a
> > way) to back sh_buf with hugepages.
> 
> Then what you need is to set 
> dynamic_shared_memory_type = sysv
> in postgresql.conf.

The above is mistaken -- there's no way to disable the mmap() segment in
9.3, other than recompiling with EXEC_BACKEND which is probably
undesirable for other reasons.

I don't think I had ever heard of that recipe to use huge pages in
previous versions; since the win is probably significant in some
systems, we could have made this configurable.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Sergey Konoplev

Date:

30 October 2013, 19:51:41

On Wed, Oct 30, 2013 at 12:17 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
>> > I wasn't talking about a built-in support. It was about an ability (a
>> > way) to back sh_buf with hugepages.
>>
>> Then what you need is to set
>> dynamic_shared_memory_type = sysv
>> in postgresql.conf.
>
> The above is mistaken -- there's no way to disable the mmap() segment in
> 9.3, other than recompiling with EXEC_BACKEND which is probably
> undesirable for other reasons.

Alternatively, I assume it could be linked with libhugetlbfs and you
don't need any source modifications in this case. However I am not
sure it will work with shared memory.

> I don't think I had ever heard of that recipe to use huge pages in
> previous versions; since the win is probably significant in some
> systems, we could have made this configurable.

There are several articles in the web describing how to do this,
except the mine one. And the win becomes mostly significant when you
have 64GB and more on your server.

-- 
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray.ru@gmail.com

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Sergey Konoplev

Date:

30 October 2013, 21:29:14

On Wed, Oct 30, 2013 at 12:51 PM, Sergey Konoplev <gray.ru@gmail.com> wrote:
> On Wed, Oct 30, 2013 at 12:17 PM, Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:
>>> > I wasn't talking about a built-in support. It was about an ability (a
>>> > way) to back sh_buf with hugepages.
>>>
>>> Then what you need is to set
>>> dynamic_shared_memory_type = sysv
>>> in postgresql.conf.
>>
>> The above is mistaken -- there's no way to disable the mmap() segment in
>> 9.3, other than recompiling with EXEC_BACKEND which is probably
>> undesirable for other reasons.
>
> Alternatively, I assume it could be linked with libhugetlbfs and you
> don't need any source modifications in this case. However I am not
> sure it will work with shared memory.

BTW, I managed to run 9.3 backed with hugepages after I put
HUGETLB_MORECORE (see man libhugetlbfs) to the environment yesterday,
but, after some time of working, it failed with messages showed below.

syslog:

Oct 29 17:53:13 grayhemp kernel: [150579.903875] PID 7584 killed due
to inadequate hugepage pool

postgres:

libhugetlbfslibhugetlbfs2013-10-29 17:53:21 PDT LOG:  server process
(PID 7584) was terminated by signal 7: Bus error
2013-10-29 17:53:21 PDT LOG:  terminating any other active server processes
2013-10-29 1
7:53:21 PDT WARNING:  terminating connection because of crash of
another server process
2013-10-29 17:53:21 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.

My theory is that it has happened after the amount of huge pages
(vm.nr_overcommit_hugepages + vm.nr_hugepages) was exceeded, but I
might be wrong.

Does anybody has some thoughts of why it has happened and how to work abound it?

-- 
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray.ru@gmail.com

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Heikki Linnakangas

Date:

15 November 2013, 13:17:48

On 30.10.2013 19:11, Andres Freund wrote:
> On 2013-10-30 22:39:20 +0530, Abhijit Menon-Sen wrote:
>> At 2013-10-30 11:04:36 -0400, tgl@sss.pgh.pa.us wrote:
>>>
>>>> As a compromise, perhaps we can unconditionally round the size up to be
>>>> a multiple of 2MB? […]
>>>
>>> That sounds reasonably painless to me.
>>
>> Here's a patch that does that and adds a DEBUG1 log message when we try
>> with MAP_HUGETLB and fail and fallback to ordinary mmap.
>
> But it's in no way guaranteed that the smallest hugepage size is
> 2MB. It'll be on current x86 hardware, but not on any other platform...

Sure, but there's no big harm done. We're just trying to avoid hitting a
kernel bug, and as a bonus, we avoid wasting some memory that would
otherwise be lost due to the kernel rounding the allocation. If the
smallest hugepage size is smaller than 2MB, we round up the allocation
unnecessarily, but that doesn't seem serious.

I spent some time whacking this around, new patch version attached. I
moved the mmap() code into a new function, that leaves the
PGSharedMemoryCreate more readable.

I modified the patch so that it throws an error if you set
huge_tlb_pages=on, and the platform doesn't support MAP_HUGETLB (ie.
non-Linux, or EXEC_BACKEND). 'try' is the default, so this only affects
you if you explicitly set it to 'on'. I think that's the right behavior;
if you explicitly ask for it, and you don't get it, that should be an
error. But I'm not wedded to the idea if someone objects; a log message
might also be reasonable: "LOG: huge TLB pages are not supported on this
platform, but huge_tlb_pages was 'on'"

The error message on failed allocation, if huge_tlb_pages=on, needs
updating:

$ bin/postmaster -D data
FATAL:  could not map anonymous shared memory: Cannot allocate memory
HINT:  This error usually means that PostgreSQL's request for a shared
memory segment exceeded available memory or swap space. To reduce the
request size (currently 189390848 bytes), reduce PostgreSQL's shared
memory usage, perhaps by reducing shared_buffers or max_connections.

The reason the allocation failed in this case was that I used
huge_tlb_pages=on, but had not configured the kernel for huge pages. The
hint is quite misleading in that case, it should advise to configure the
kernel, or turn off huge_tlb_pages.

The documentation needs some work. I think it's pretty user-unfriendly
to link to https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt.
It gives a lot of details, and although it explains stuff that is
relevant, like setting the nr_hugepages sysctl, it also contains a lot
of stuff that is not relevant to us, like how to mount hugetlbfs. Can we
do better than that? Is there a better guide somewhere on how to set the
kernel settings. If not, we should include step-by-step instructions in
our manual.

The "Managing Kernel Resources" section in the user manual should also
be updated to mention how to enable huge pages.

Also, now that I changed huge_tlb_pages='on' to fail on platforms where
it's not supported at all, the docs need to be updated to reflect it.

- Heikki

Attachment

hugepages-v5.patch

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Sameer Kumar

Date:

17 November 2013, 04:22:56

I was recently running some tests with huge page tables. I ran them on two different architectures: x86 and PPC64.

I saw some discussion going on over here so thought of sharing.

I was using 3 Cores, 8GB RAM, 2 LUN for filesystem (1 for dbfiles and 1 for logfiles) for these tests...

I had dedicated

(shared_buffers + 400bytes*max_connection + wal_buffers)/Pagesize [from /proc/meminfo] for huge pages. I kept some overcommit_hugepages to be used by work_mem (max_connection*work_mem)/Pagesize

x86_64 bit gave me a benefit of 2-5% for TPC-C workload( I scaled from 1 to 100 users). PPC64 which uses 16MB and 64MB did not give me any benefits in fact the performance degraded as the concurrency of system increased.

my 2 cents, hope it helps.

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Abhijit Menon-Sen

Date:

18 November 2013, 07:29:58

At 2013-11-15 15:17:32 +0200, hlinnakangas@vmware.com wrote:
>
> I spent some time whacking this around, new patch version attached.

Thanks.

> But I'm not wedded to the idea if someone objects; a log message might
> also be reasonable: "LOG: huge TLB pages are not supported on this
> platform, but huge_tlb_pages was 'on'"

Put that way, I have to wonder if the right thing to do is just to have
a "try_huge_pages=on|off" setting, and log a warning if the attempt did
not succeed. It would be easier to document, and I don't think there's
much point in making it an error if the allocation fails.

-- Abhijit

P.S. I'd be happy to do the followup work for this patch (updating
documentation, etc.), but it'll have to wait until I recover from
this !#$&@! stomach bug.

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Alvaro Herrera

Date:

21 November 2013, 21:09:47

Abhijit Menon-Sen wrote:
> At 2013-11-15 15:17:32 +0200, hlinnakangas@vmware.com wrote:

> > But I'm not wedded to the idea if someone objects; a log message might
> > also be reasonable: "LOG: huge TLB pages are not supported on this
> > platform, but huge_tlb_pages was 'on'"
> 
> Put that way, I have to wonder if the right thing to do is just to have
> a "try_huge_pages=on|off" setting, and log a warning if the attempt did
> not succeed. It would be easier to document, and I don't think there's
> much point in making it an error if the allocation fails.

What about
huge_tlb_pages={off,try}

Or maybe
huge_tlb_pages={off,try,require}

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Andres Freund

Date:

21 November 2013, 21:14:53

On 2013-11-21 18:09:38 -0300, Alvaro Herrera wrote:
> Abhijit Menon-Sen wrote:
> > At 2013-11-15 15:17:32 +0200, hlinnakangas@vmware.com wrote:
> 
> > > But I'm not wedded to the idea if someone objects; a log message might
> > > also be reasonable: "LOG: huge TLB pages are not supported on this
> > > platform, but huge_tlb_pages was 'on'"
> > 
> > Put that way, I have to wonder if the right thing to do is just to have
> > a "try_huge_pages=on|off" setting, and log a warning if the attempt did
> > not succeed. It would be easier to document, and I don't think there's
> > much point in making it an error if the allocation fails.
> 
> What about
> huge_tlb_pages={off,try}
> 
> Or maybe
> huge_tlb_pages={off,try,require}

I'd certainly want a setting that errors out if it cannot get the memory
using hugetables. If you rely on the reduction in memory (which can be
significant on large s_b, large max_connections), it's rather annoying
not to know whether it suceeded using it.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Robert Haas

Date:

21 November 2013, 21:25:00

On Thu, Nov 21, 2013 at 4:09 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Abhijit Menon-Sen wrote:
>> At 2013-11-15 15:17:32 +0200, hlinnakangas@vmware.com wrote:
>
>> > But I'm not wedded to the idea if someone objects; a log message might
>> > also be reasonable: "LOG: huge TLB pages are not supported on this
>> > platform, but huge_tlb_pages was 'on'"
>>
>> Put that way, I have to wonder if the right thing to do is just to have
>> a "try_huge_pages=on|off" setting, and log a warning if the attempt did
>> not succeed. It would be easier to document, and I don't think there's
>> much point in making it an error if the allocation fails.
>
> What about
> huge_tlb_pages={off,try}
>
> Or maybe
> huge_tlb_pages={off,try,require}

I'd spell "require" as "on", or at least accept that as a synonym.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Andres Freund

Date:

21 November 2013, 21:58:22

On 2013-11-21 16:24:56 -0500, Robert Haas wrote:
> > What about
> > huge_tlb_pages={off,try}
> >
> > Or maybe
> > huge_tlb_pages={off,try,require}
> 
> I'd spell "require" as "on", or at least accept that as a synonym.

That's off,try, on is what the patch currently implements, Abhijit just
was arguing for dropping the error-out option.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Abhijit Menon-Sen

Date:

25 November 2013, 03:29:29

At 2013-11-21 22:14:35 +0100, andres@2ndquadrant.com wrote:
>
> I'd certainly want a setting that errors out if it cannot get the
> memory using hugetables.

OK, then the current try/on/off settings are fine.

I'm better today, so I'll read the patch Heikki posted and see what more
needs to be done there.

-- Abhijit

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Alvaro Herrera

Date:

27 January 2014, 19:20:34

Heikki Linnakangas wrote:

> I spent some time whacking this around, new patch version attached.
> I moved the mmap() code into a new function, that leaves the
> PGSharedMemoryCreate more readable.

Did this patch go anywhere?

Someone just pinged me about a kernel scalability problem in Linux with
huge pages; if someone did performance measurements with this patch,
perhaps it'd be good to measure again with the kernel patch in place.

https://lkml.org/lkml/2014/1/26/227

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Heikki Linnakangas

Date:

28 January 2014, 11:51:15

On 01/27/2014 09:20 PM, Alvaro Herrera wrote:
> Heikki Linnakangas wrote:
>
>> I spent some time whacking this around, new patch version attached.
>> I moved the mmap() code into a new function, that leaves the
>> PGSharedMemoryCreate more readable.
>
> Did this patch go anywhere?

Oh darn, I remembered we had already committed this, but clearly not. 
I'd love to still get this into 9.4. The latest patch 
(hugepages-v5.patch) was pretty much ready for commit, except for 
documentation.

- Heikki

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Christian Kruse

Date:

28 January 2014, 12:00:09

Hi,

On 28/01/14 13:51, Heikki Linnakangas wrote:
> Oh darn, I remembered we had already committed this, but clearly not. I'd
> love to still get this into 9.4. The latest patch (hugepages-v5.patch) was
> pretty much ready for commit, except for documentation.

I'm working on it. I ported it to HEAD and currently doing some
benchmarks. Next will be documentation.

Best regards,

-- Christian Kruse               http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Christian Kruse

Date:

28 January 2014, 13:13:49

Hi,

On 15/11/13 15:17, Heikki Linnakangas wrote:
> I spent some time whacking this around, new patch version attached. I moved
> the mmap() code into a new function, that leaves the PGSharedMemoryCreate
> more readable.

I think there's a bug in this version of the patch. Have a look at
this:

+    if (huge_tlb_pages == HUGE_TLB_ON || huge_tlb_pages == HUGE_TLB_TRY)
+    {
[…]
+        ptr = mmap(NULL, *size, PROT_READ | PROT_WRITE,
+                   PG_MMAP_FLAGS | MAP_HUGETLB, -1, 0);
[…]
+    }
+#endif
+
+    if (huge_tlb_pages == HUGE_TLB_OFF || huge_tlb_pages == HUGE_TLB_TRY)
+    {
+        allocsize = *size;
+        ptr = mmap(NULL, *size, PROT_READ | PROT_WRITE, PG_MMAP_FLAGS, -1, 0);
+    }

This will lead to a duplicate mmap() if hugepages work and
huge_tlb_pages == HUGE_TLB_TRY, or am I missing something?
I think it should be like this:
if (huge_tlb_pages == HUGE_TLB_OFF ||    (huge_tlb_pages == HUGE_TLB_TRY && ptr == MAP_FAILED))

Best regards,

-- Christian Kruse               http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Christian Kruse

Date:

28 January 2014, 16:12:16

Hi,

attached you will find a new version of the patch, ported to HEAD,
fixed the mentioned bug and - hopefully - dealing the the remaining
issues.

Best regards,

--
 Christian Kruse               http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Heikki Linnakangas

Date:

29 January 2014, 12:13:03

On 01/28/2014 06:11 PM, Christian Kruse wrote:
> Hi,
>
> attached you will find a new version of the patch, ported to HEAD,
> fixed the mentioned bug and - hopefully - dealing the the remaining
> issues.

Thanks, I have committed this now.

The documentation is still lacking. We should explain somewhere how to 
set nr.hugepages, for example. The "Managing Kernel Resources" section 
ought to mention setting. Could I ask you to work on that, please?

- Heikki

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Vik Fearing

Date:

29 January 2014, 14:01:22

On 01/29/2014 01:12 PM, Heikki Linnakangas wrote:
> On 01/28/2014 06:11 PM, Christian Kruse wrote:
>> Hi,
>>
>> attached you will find a new version of the patch, ported to HEAD,
>> fixed the mentioned bug and - hopefully - dealing the the remaining
>> issues.
>
> Thanks, I have committed this now.
>
> The documentation is still lacking.
>

The documentation is indeed lacking since it breaks the build.

doc/src/sgml/config.sgml contains the line

    normal allocation if that fails. With <literal>on</literal, failure

which doesn't correctly terminate the closing </literal> tag.

Trivial patch attached.

--
Vik

Attachment

fix_tag.patch

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Heikki Linnakangas

Date:

29 January 2014, 14:18:52

On 01/29/2014 04:01 PM, Vik Fearing wrote:
> On 01/29/2014 01:12 PM, Heikki Linnakangas wrote:
>> The documentation is still lacking.
>
> The documentation is indeed lacking since it breaks the build.
>
> doc/src/sgml/config.sgml contains the line
>
>      normal allocation if that fails. With <literal>on</literal, failure
>
> which doesn't correctly terminate the closing </literal> tag.
>
> Trivial patch attached.

Thanks, applied!

- Heikki

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Merlin Moncure

Date:

29 January 2014, 14:40:55

On Tue, Jan 28, 2014 at 5:58 AM, Christian Kruse
<christian@2ndquadrant.com> wrote:
> Hi,
>
> On 28/01/14 13:51, Heikki Linnakangas wrote:
>> Oh darn, I remembered we had already committed this, but clearly not. I'd
>> love to still get this into 9.4. The latest patch (hugepages-v5.patch) was
>> pretty much ready for commit, except for documentation.
>
> I'm working on it. I ported it to HEAD and currently doing some
> benchmarks. Next will be documentation.

you mentioned benchmarks -- do you happen to have the results handy? (curious)

merlin

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Jeff Janes

Date:

29 January 2014, 18:11:33

On Wed, Jan 29, 2014 at 4:12 AM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:

On 01/28/2014 06:11 PM, Christian Kruse wrote:
Hi,

attached you will find a new version of the patch, ported to HEAD,
fixed the mentioned bug and - hopefully - dealing the the remaining
issues.

Thanks, I have committed this now.

I'm getting this warning now with gcc (GCC) 4.4.7:

pg_shmem.c: In function 'PGSharedMemoryCreate':

pg_shmem.c:332: warning: 'allocsize' may be used uninitialized in this function

pg_shmem.c:332: note: 'allocsize' was declared here

Cheers,

Jeff

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Christian Kruse

Date:

29 January 2014, 19:06:47

Hi,

On 29/01/14 14:12, Heikki Linnakangas wrote:
> The documentation is still lacking. We should explain somewhere how to set
> nr.hugepages, for example. The "Managing Kernel Resources" section ought to
> mention setting. Could I ask you to work on that, please?

Of course! Attached you will find a patch for better documentation.

Best regards,

--
 Christian Kruse               http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Christian Kruse

Date:

29 January 2014, 19:15:42

Hi,

On 29/01/14 10:11, Jeff Janes wrote:
> I'm getting this warning now with gcc (GCC) 4.4.7:

Interesting. I don't get that warning. But the compiler is (formally)
right.

> pg_shmem.c: In function 'PGSharedMemoryCreate':
> pg_shmem.c:332: warning: 'allocsize' may be used uninitialized in this
> function
> pg_shmem.c:332: note: 'allocsize' was declared here

Attached patch should fix that.

Best regards,

--
 Christian Kruse               http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Heikki Linnakangas

Date:

29 January 2014, 19:37:07

On 01/29/2014 09:18 PM, Christian Kruse wrote:
> Hi,
>
> On 29/01/14 10:11, Jeff Janes wrote:
>> I'm getting this warning now with gcc (GCC) 4.4.7:
>
> Interesting. I don't get that warning. But the compiler is (formally)
> right.
>
>> pg_shmem.c: In function 'PGSharedMemoryCreate':
>> pg_shmem.c:332: warning: 'allocsize' may be used uninitialized in this
>> function
>> pg_shmem.c:332: note: 'allocsize' was declared here

Hmm, I didn't get that warning either.

> Attached patch should fix that.

That's not quite right. If the first mmap() fails, allocsize is set to 
the rounded-up size, but the second mmap() uses the original size for 
the allocation. So it returns a too high value to the caller.

Ugh, it's actually broken anyway :-(. The first allocation also passes 
*size to mmap(), so the calculated rounded-up allocsize value is not 
used for anything.

Fix pushed.

- Heikki

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Christian Kruse

Date:

29 January 2014, 19:57:25

Hi,

On 29/01/14 21:36, Heikki Linnakangas wrote:
> […]
> Fix pushed.

You are right. Thanks. But there is another bug, see

<20140128154307.GC24091@defunct.ch>

ff. Attached you will find a patch fixing that.

Best regards,

--
 Christian Kruse               http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Heikki Linnakangas

Date:

29 January 2014, 20:17:32

On 01/29/2014 09:59 PM, Christian Kruse wrote:
> Hi,
>
> On 29/01/14 21:36, Heikki Linnakangas wrote:
>> […]
>> Fix pushed.
>
> You are right. Thanks. But there is another bug, see
>
> <20140128154307.GC24091@defunct.ch>
>
> ff. Attached you will find a patch fixing that.

Thanks. There are more cases of that in InternalIpcMemoryCreate, they 
ought to be fixed as well. And should also grep the rest of the codebase 
for more instances of that. And this needs to be back-patched.

- Heikki

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Christian Kruse

Date:

29 January 2014, 20:21:26

Hi,

On 29/01/14 22:17, Heikki Linnakangas wrote:
> Thanks. There are more cases of that in InternalIpcMemoryCreate, they ought
> to be fixed as well. And should also grep the rest of the codebase for more
> instances of that. And this needs to be back-patched.

I'm way ahead of you ;-) Working on it.

Best regards,

-- Christian Kruse               http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Christian Kruse

Date:

30 January 2014, 07:30:18

Hi,

after I finally got documentation compilation working I updated the
patch to be syntactically correct. You will find it attached.

Best regards,

--
 Christian Kruse               http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Peter Eisentraut

Date:

25 February 2014, 15:29:41

On 1/30/14, 2:28 AM, Christian Kruse wrote:
> after I finally got documentation compilation working I updated the
> patch to be syntactically correct. You will find it attached.

I don't think we should be explaining the basics of OS memory management
in our documentation.  And if we did, we shouldn't copy it verbatim from
the Debian wiki without attribution.

I think this patch should be cut down to the paragraphs that cover the
actual configuration.

On a technical note, use <xref> instead of <link> for linking.
doc/src/sgml/README.links contains some information.

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Robert Haas

Date:

25 February 2014, 15:53:49

On Tue, Feb 25, 2014 at 10:29 AM, Peter Eisentraut <peter_e@gmx.net> wrote:
> And if we did, we shouldn't copy it verbatim from
> the Debian wiki without attribution.

That is seriously not cool.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Andres Freund

Date:

25 February 2014, 16:01:39

On 2014-02-25 10:29:32 -0500, Peter Eisentraut wrote:
> On 1/30/14, 2:28 AM, Christian Kruse wrote:
> > after I finally got documentation compilation working I updated the
> > patch to be syntactically correct. You will find it attached.
> 
> I don't think we should be explaining the basics of OS memory management
> in our documentation.

Agreed.

> And if we did, we shouldn't copy it verbatim from the Debian wiki
> without attribution.

Is it actually? A quick comparison doesn't show that many similarities?
Christian?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Christian Kruse

Date:

25 February 2014, 16:02:29

Hi,

On 25/02/14 10:29, Peter Eisentraut wrote:
> I don't think we should be explaining the basics of OS memory management
> in our documentation.

Well, I'm confused. I thought that's exactly what has been asked.

> And if we did, we shouldn't copy it verbatim from the Debian wiki
> without attribution.

I didn't. This is a write-up of several articles, blog posts and
documentation I read about this topic.

However, if you think the texts are too similar, then we should add a
note, yes. Didn't mean to copy w/o referring to a source.

> I think this patch should be cut down to the paragraphs that cover the
> actual configuration.

I tried to cover the issues Heikki brought up in
<52861EEC.2090702@vmware.com>.

> On a technical note, use <xref> instead of <link> for linking.
> doc/src/sgml/README.links contains some information.

OK, I will post an updated patch later this evening.

Best regards,

-- Christian Kruse               http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Christian Kruse

Date:

25 February 2014, 16:08:17

Hi,

On 25/02/14 17:01, Andres Freund wrote:
> > And if we did, we shouldn't copy it verbatim from the Debian wiki
> > without attribution.
>
> Is it actually? A quick comparison doesn't show that many similarities?
> Christian?

Not as far as I know. But of course, as I wrote the text I _also_
(that's not my only source) read the Debian article and I was
influenced by it. It may be that the texts are more similar then I
thought, although I still don't see it.

Best regards,

-- Christian Kruse               http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Peter Eisentraut

Date:

25 February 2014, 17:18:07

On 2/25/14, 11:08 AM, Christian Kruse wrote:
> Hi,
> 
> On 25/02/14 17:01, Andres Freund wrote:
>>> And if we did, we shouldn't copy it verbatim from the Debian wiki
>>> without attribution.
>>
>> Is it actually? A quick comparison doesn't show that many similarities?
>> Christian?
> 
> Not as far as I know. But of course, as I wrote the text I _also_
> (that's not my only source) read the Debian article and I was
> influenced by it. It may be that the texts are more similar then I
> thought, although I still don't see it.

I suspect that it was done subconsciously.  But I did notice it right
away, so there is something to it.

As I mentioned, I would just cut those introductory parts out.

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Bruce Momjian

Date:

25 February 2014, 17:39:46

On Tue, Feb 25, 2014 at 12:18:02PM -0500, Peter Eisentraut wrote:
> On 2/25/14, 11:08 AM, Christian Kruse wrote:
> > Hi,
> > 
> > On 25/02/14 17:01, Andres Freund wrote:
> >>> And if we did, we shouldn't copy it verbatim from the Debian wiki
> >>> without attribution.
> >>
> >> Is it actually? A quick comparison doesn't show that many similarities?
> >> Christian?
> > 
> > Not as far as I know. But of course, as I wrote the text I _also_
> > (that's not my only source) read the Debian article and I was
> > influenced by it. It may be that the texts are more similar then I
> > thought, although I still don't see it.
> 
> I suspect that it was done subconsciously.  But I did notice it right
> away, so there is something to it.
> 
> As I mentioned, I would just cut those introductory parts out.

Should we link to the Debian wiki content?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Tom Lane

Date:

25 February 2014, 18:22:08

Bruce Momjian <bruce@momjian.us> writes:
> On Tue, Feb 25, 2014 at 12:18:02PM -0500, Peter Eisentraut wrote:
>> As I mentioned, I would just cut those introductory parts out.

> Should we link to the Debian wiki content?

-1.  We generally don't link to our *own* wiki in our SGML docs, let alone
things that aren't even under our project's control.  Moreover, Debian
is not going to be explaining these things in a way that accounts for
non-Linux operating systems.
        regards, tom lane

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Andres Freund

Date:

25 February 2014, 18:28:24

On 2014-02-25 13:21:46 -0500, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > On Tue, Feb 25, 2014 at 12:18:02PM -0500, Peter Eisentraut wrote:
> >> As I mentioned, I would just cut those introductory parts out.
> 
> > Should we link to the Debian wiki content?
> 
> -1.  We generally don't link to our *own* wiki in our SGML docs, let alone
> things that aren't even under our project's control.

Agreed. Especially as the interesting bit is the postgres specific
logic, not the rest.

I think all that's needed is to cut the first paragraphs that generally
explain what huge pages are in some detail from the text and make sure
the later paragraphs don't refer to the earlier ones.

> Moreover, Debian
> is not going to be explaining these things in a way that accounts for
> non-Linux operating systems.

It's a linux only feature so far, so that alone wouldn't be a problem.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Christian Kruse

Date:

26 February 2014, 08:35:38

Hi,

On 25/02/14 19:28, Andres Freund wrote:
> I think all that's needed is to cut the first paragraphs that generally
> explain what huge pages are in some detail from the text and make sure
> the later paragraphs don't refer to the earlier ones.

Attached you will find a new version of the patch.

Best regards,

--
 Christian Kruse               http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Christian Kruse

Date:

26 February 2014, 10:38:45

Hi Peter,

after a night of sleep I'm still not able to swallow the pill. To be
honest I'm a little bit angry about this accusation.

I didn't mean to copy from the Debian wiki and after re-checking the
text again I'm still convinced that I didn't.

Of course the text SAYS something similar, but this is in the nature
of things. Structure, diction and focus are different. Also the
information transferred is different and gathered from various
articles, including the Debian wiki, the huge page docs of the kernel,
the Wikipedia and some old IBM and Oracle docs.

Best regards,

-- Christian Kruse               http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Heikki Linnakangas

Date:

26 February 2014, 12:35:03

On 02/26/2014 10:35 AM, Christian Kruse wrote:
> On 25/02/14 19:28, Andres Freund wrote:
>> I think all that's needed is to cut the first paragraphs that generally
>> explain what huge pages are in some detail from the text and make sure
>> the later paragraphs don't refer to the earlier ones.
>
> Attached you will find a new version of the patch.

Thanks!

> huge_tlb_pages (enum)
>
>     Enables/disables the use of huge TLB pages. Valid values are try (the default), on, and off.
>
>     At present, this feature is supported only on Linux. The setting is ignored on other systems.
>
>     The use of huge TLB pages results in smaller page tables and less CPU time spent on memory management, increasing
performance.For more details, see Section 17.4.4.
 
>
>     With huge_tlb_pages set to try, the server will try to use huge pages, but fall back to using normal allocation
ifthat fails. With on, failure to use huge pages will prevent the server from starting up. With off, huge pages will
notbe used.
 

That still says "The setting is ignored on other systems". That's not 
quite true: as explained later in the section, if you set 
huge_tlb_pages=on and the platform doesn't support it, the server will 
refuse to start.

> 17.4.4. Linux huge TLB pages

This section looks good to me. I'm OK with the level of detail, although 
maybe just a sentence or two about what huge TLB pages are and what 
benefits they have would still be in order. How about adding something 
like this as the first sentence:

"Using huge TLB pages reduces overhead when using large contiguous 
chunks of memory, like PostgreSQL does."

> To enable this feature in PostgreSQL you need a kernel with CONFIG_HUGETLBFS=y and CONFIG_HUGETLB_PAGE=y. You also
haveto tune the system setting vm.nr_hugepages. To calculate the number of necessary huge pages start PostgreSQL
withouthuge pages enabled and check the VmPeak value from the proc filesystem:
 
>
> $ head -1 /path/to/data/directory/postmaster.pid
> 4170
> $ grep ^VmPeak /proc/4170/status
> VmPeak:  6490428 kB
>
> 6490428 / 2048 (PAGE_SIZE is 2MB in this case) are roughly 3169.154 huge pages, so you will need at least 3170 huge
pages:
>
> $ sysctl -w vm.nr_hugepages=3170

That's good advice, but perhaps s/calculate/estimate/. It's just an 
approximation, after all.

- Heikki

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Christian Kruse

Date:

26 February 2014, 14:25:43

Hi,

On 26/02/14 14:34, Heikki Linnakangas wrote:
> That still says "The setting is ignored on other systems". That's not quite
> true: as explained later in the section, if you set huge_tlb_pages=on and
> the platform doesn't support it, the server will refuse to start.

I added a sentence about it.

> "Using huge TLB pages reduces overhead when using large contiguous chunks of
> memory, like PostgreSQL does."

Sentence added.

> That's good advice, but perhaps s/calculate/estimate/. It's just an
> approximation, after all.

Fixed.

New patch version is attached.

Best regards,

--
 Christian Kruse               http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Alvaro Herrera

Date:

26 February 2014, 16:13:12

There's one thing that rubs me the wrong way about all this
functionality, which is that we've named it "huge TLB pages".  That is
wrong -- the TLB pages are not huge.  In fact, as far as I understand,
the TLB doesn't have pages at all.  It's the pages that are huge, but
those pages are not TLB pages, they are just memory pages.

I think we have named it this way only because Linux for some reason
named the mmap() flag MAP_HUGETLB for some reason.  The TLB is not huge
either (in fact you can't alter the size of the TLB at all; it's a
hardware thing.) I think this flag means "use the TLB entries reserved
for huge pages for the memory I'm requesting".

Since we haven't released any of this, should we discuss renaming it to
just "huge pages"?

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Heikki Linnakangas

Date:

26 February 2014, 17:56:58

On 02/26/2014 06:13 PM, Alvaro Herrera wrote:
>
> There's one thing that rubs me the wrong way about all this
> functionality, which is that we've named it "huge TLB pages".  That is
> wrong -- the TLB pages are not huge.  In fact, as far as I understand,
> the TLB doesn't have pages at all.  It's the pages that are huge, but
> those pages are not TLB pages, they are just memory pages.
>
> I think we have named it this way only because Linux for some reason
> named the mmap() flag MAP_HUGETLB for some reason.  The TLB is not huge
> either (in fact you can't alter the size of the TLB at all; it's a
> hardware thing.) I think this flag means "use the TLB entries reserved
> for huge pages for the memory I'm requesting".
>
> Since we haven't released any of this, should we discuss renaming it to
> just "huge pages"?

Linux calls it "huge tlb pages" in many places, not just MAP_HUGETLB. 
Like in CONFIG_HUGETLB_PAGES and hugetlbfs. I agree it's a bit weird. 
Linux also calls it just "huge pages" in many other places, like in 
/proc/meminfo output.

FreeBSD calls them superpages and Windows calls them "large pages". 
Yeah, it would seem better to call them just "huge pages", so that it's 
more reminiscent of those names, if we ever implement support for 
super/huge/large pages on other platforms.

- Heikki

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Stephen Frost

Date:

26 February 2014, 19:36:25

Christian,

Thanks for working on all of this and dealing with the requests for
updates and changes, as well as for dealing very professionally with an
inappropriate and incorrect remark.  Unfortunately, mailing lists can
make communication difficult and someone's knee-jerk reaction (not
referring to your reaction here) can end up causing much frustration.

Remind me when we're at a conference somewhere and I'll gladly buy you a
beer (or whatever your choice is).  Seriously, thanks for working on the
'huge pages' changes and documentation- it's often a thankless job and
clearly one which can be extremely frustrating.
Thanks again,
    Stephen

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Christian Kruse

Date:

27 February 2014, 07:35:03

Hi,

On 26/02/14 13:13, Alvaro Herrera wrote:
>
> There's one thing that rubs me the wrong way about all this
> functionality, which is that we've named it "huge TLB pages".  That is
> wrong -- the TLB pages are not huge.  In fact, as far as I understand,
> the TLB doesn't have pages at all.  It's the pages that are huge, but
> those pages are not TLB pages, they are just memory pages.

I didn't think about this, yet, but you are totally right.

> Since we haven't released any of this, should we discuss renaming it to
> just "huge pages"?

Attached is a patch with the updated documentation (now uses
consistently huge pages) as well as a renamed GUC, consistent wording
(always use huge pages) as well as renamed variables.

Should I create a new commit fest entry for this and delete the old
one? Or should this be done in two patches? Locally in my repo this is
done with two commits, so it would be easy to split that.

Best regards,

--
 Christian Kruse               http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Christian Kruse

Date:

27 February 2014, 07:35:49

Hi Peter,

thank you for your nice words, much appreciated. I'm sorry that I was
so whiny about this in the last post.

Best regards,

-- Christian Kruse               http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Christian Kruse

Date:

27 February 2014, 08:08:43

Hi,

On 27/02/14 08:35, Christian Kruse wrote:
> Hi Peter,

Sorry, Stephen of course – it was definitely to early.

Best regards,

-- Christian Kruse               http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Heikki Linnakangas

Date:

28 February 2014, 17:44:10

On 02/27/2014 09:34 AM, Christian Kruse wrote:
> Hi,
>
> On 26/02/14 13:13, Alvaro Herrera wrote:
>>
>> There's one thing that rubs me the wrong way about all this
>> functionality, which is that we've named it "huge TLB pages".  That is
>> wrong -- the TLB pages are not huge.  In fact, as far as I understand,
>> the TLB doesn't have pages at all.  It's the pages that are huge, but
>> those pages are not TLB pages, they are just memory pages.
>
> I didn't think about this, yet, but you are totally right.
>
>> Since we haven't released any of this, should we discuss renaming it to
>> just "huge pages"?
>
> Attached is a patch with the updated documentation (now uses
> consistently huge pages) as well as a renamed GUC, consistent wording
> (always use huge pages) as well as renamed variables.

Hmm, I wonder if that could now be misunderstood to have something to do 
with the PostgreSQL page size? Maybe add the word "memory" or "operating 
system" in the first sentence in the docs, like this: "Enables/disables 
the use of huge memory pages".

>        <para>
>         At present, this feature is supported only on Linux. The setting is
>         ignored on other systems when set to <literal>try</literal>.
>         <productname>PostgreSQL</productname> will
>         refuse to start when set to <literal>on</literal>.
>        </para>

Is it clear enough that PostgreSQL will only refuse to start up when 
it's set to on, *if the feature's not supported on the platform*? 
Perhaps just leave that last sentence out. It's mentioned later that " 
With <literal>on</literal>, failure to use huge pages will prevent the 
server from starting up.", that's probably enough.

- Heikki

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Peter Geoghegan

Date:

01 March 2014, 01:58:07

On Fri, Feb 28, 2014 at 9:43 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> Hmm, I wonder if that could now be misunderstood to have something to do
> with the PostgreSQL page size? Maybe add the word "memory" or "operating
> system" in the first sentence in the docs, like this: "Enables/disables the
> use of huge memory pages".

Whenever I wish to emphasize that distinction, I tend to use the term
"MMU pages".


-- 
Peter Geoghegan

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Christian Kruse

Date:

03 March 2014, 09:34:33

Hi,

> >Attached is a patch with the updated documentation (now uses
> >consistently huge pages) as well as a renamed GUC, consistent wording
> >(always use huge pages) as well as renamed variables.
>
> Hmm, I wonder if that could now be misunderstood to have something to do
> with the PostgreSQL page size? Maybe add the word "memory" or "operating
> system" in the first sentence in the docs, like this: "Enables/disables the
> use of huge memory pages".

Accepted, see attached patch.

> >       <para>
> >        At present, this feature is supported only on Linux. The setting is
> >        ignored on other systems when set to <literal>try</literal>.
> >        <productname>PostgreSQL</productname> will
> >        refuse to start when set to <literal>on</literal>.
> >       </para>
>
> Is it clear enough that PostgreSQL will only refuse to start up when it's
> set to on, *if the feature's not supported on the platform*? Perhaps just
> leave that last sentence out. It's mentioned later that " With
> <literal>on</literal>, failure to use huge pages will prevent the server
> from starting up.", that's probably enough.

Fixed.

Best regards,

--
 Christian Kruse               http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Christian Kruse

Date:

03 March 2014, 09:37:53

Hi,

On 28/02/14 17:58, Peter Geoghegan wrote:
> On Fri, Feb 28, 2014 at 9:43 AM, Heikki Linnakangas
> <hlinnakangas@vmware.com> wrote:
> > Hmm, I wonder if that could now be misunderstood to have something to do
> > with the PostgreSQL page size? Maybe add the word "memory" or "operating
> > system" in the first sentence in the docs, like this: "Enables/disables the
> > use of huge memory pages".
>
> Whenever I wish to emphasize that distinction, I tend to use the term
> "MMU pages".

I don't like to distinct that much from Linux terminology, this may
lead to confusion. And to use this term only in one place doesn't seem
to make sense, too – naming will then be inconsistent and thus lead to
confusion, too. Do you agree?

Best regards,

-- Christian Kruse               http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Heikki Linnakangas

Date:

03 March 2014, 19:09:49

On 03/03/2014 11:34 AM, Christian Kruse wrote:
> Hi,
>
>>> Attached is a patch with the updated documentation (now uses
>>> consistently huge pages) as well as a renamed GUC, consistent wording
>>> (always use huge pages) as well as renamed variables.
>>
>> Hmm, I wonder if that could now be misunderstood to have something to do
>> with the PostgreSQL page size? Maybe add the word "memory" or "operating
>> system" in the first sentence in the docs, like this: "Enables/disables the
>> use of huge memory pages".
>
> Accepted, see attached patch.

Thanks, committed!

I spotted this in section "17.4.1 Shared Memory and Semaphores":

> Linux
>
>     The default maximum segment size is 32 MB, and the default maximum total size is 2097152 pages. A page is almost
always4096 bytes except in unusual kernel configurations with "huge pages" (use getconf PAGE_SIZE to verify).
 

It's not any more wrong now than it's always been, but I don't think 
huge pages ever affect PAGE_SIZE... Could I cajole you into rephrasing 
that, too?

- Heikki

Re: [PATCH] Use MAP_HUGETLB where supported (v3)

From

Christian Kruse

Date:

04 March 2014, 10:53:31

Hi,

On 03/03/14 21:03, Heikki Linnakangas wrote:
> I spotted this in section "17.4.1 Shared Memory and Semaphores":
>
> >Linux
> >
> >    The default maximum segment size is 32 MB, and the default maximum total size is 2097152 pages. A page is almost
always4096 bytes except in unusual kernel configurations with "huge pages" (use getconf PAGE_SIZE to verify). 
>
> It's not any more wrong now than it's always been, but I don't think huge
> pages ever affect PAGE_SIZE... Could I cajole you into rephrasing that, too?

Hm… to be honest, I'm not sure how to change that. What about this?

        The default maximum segment size is 32 MB, and the
        default maximum total size is 2097152
        pages.  A page is almost always 4096 bytes except in
        kernel configurations with <quote>huge pages</quote>
        (use <literal>cat /proc/meminfo | grep Hugepagesize</literal>
        to verify), but they have to be enabled explicitely via
        <xref linkend="guc-huge-pages">. See
        <xref linkend="linux-huge-pages"> for details.

I attached a patch doing this change.

Best regards,

--
 Christian Kruse               http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services