Re: [PATCH] Add support for choosing huge page size - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: [PATCH] Add support for choosing huge page size
Date
Msg-id CA+hUKG+xxHxj--5JZwv0OrfYgCcLQ+bVQ=L9wWqhdy93M6SjoA@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Add support for choosing huge page size  (Odin Ugedal <odin@ugedal.com>)
Responses Re: [PATCH] Add support for choosing huge page size
List pgsql-hackers
On Wed, Jun 10, 2020 at 2:24 AM Odin Ugedal <odin@ugedal.com> wrote:
> Attached v2 of patch, updated with the comments from Thomas (again,
> thanks). I also changed the mmap flags to only set size if the
> selected huge page size is not the default on (on linux). The support
> for this functionality was added in Linux 3.8, and therefore it was
> not supported before then. Should we add that to the docs, or what do
> you think? The definitions of MAP_HUGE_MASK and MAP_HUGE_SHIFT were
> added in Linux 3.8 too, but since they are a part of libc/musl, and
> are "used" at compile time, that shouldn't be a problem, or?

Oh, so maybe we need a configure test for them?  And if you don't have
it, a runtime error if you try to set the page size to something other
than 0 (like we do for effective_io_concurrency if you don't have a
posix_fadvise() function).

> If a huge page size that is not supported on the system is chosen via
> huge_page_size (and huge_pages = on), it will result in "FATAL:  could
> not map anonymous shared memory: Invalid argument". This is the same
> that happens today when huge pages aren't supported at all, so I guess
> it is ok for now (and then we can consider verifying that it is
> supported at a later stage).

If you set it to an unsupported size, that seems reasonable to me.  If
you set it to an unsupported size and have huge_pages=try, do we fall
back to using no huge pages?

> Also, thanks for the information about the Windows. Have been
> searching about info on huge pages in windows and "superpages" in bsd,
> without that much luck. I only have experience on linux, so I think we
> can do as you said, to let someone else look at it. :)

For what it's worth, here's what I know about this on other operating systems:

1.  AIX can do huge pages, but only if you use System V shared memory
(not for mmap() anonymous shared).  In
https://commitfest.postgresql.org/25/1960/ we got as far as adding
support for shared_memory_type=sysv, but to go further we'll need
someone willing to hack on the patch on an AIX system, preferably with
root access so they can grant the postgres user wired memory
privileges (or whatever they call that over there).  But at a glance,
they don't have a way to ask for a specific page size, just "large".

2.  FreeBSD doesn't currently have a way to ask for super pages
explicitly at all; it does something like Linux Transparent Huge
Pages, except that it's transparent.  It does seem to do a pretty good
job of putting PostgreSQL text/code, shared memory and heap memory
into super pages automatically on my systems.  One small detail is
that there is a flag MAP_ALIGNED_SUPER that might help get better
alignment; it'd be bad if the lower pages of our shared memory
happened to be the location of lock arrays, proc array, buffer mapping
or other largish and very hot stuff and also happened to be on 4kb
pages due to misalignment stuff, but I wonder if the flag is really
needed to avoid that on current FreeBSD or not.  I should probably go
and check some time!  (I have no clue for other BSDs.)

3.  Last time I checked, Solaris and illumos seemed to have the same
philosophy as FreeBSD and not give you explicit control; my info could
be out of date, and I have no clue beyond that.

4.  What I said above about Windows; the explicit page size thing
seems to be bleeding edge and barely documented.

5.  macOS does have flags to ask for super pages with various sizes,
but apparently such mappings are not inherited by child processes.  So
that's useless for us.

As for the relevance of all this to your patch, I think we just need a
check callback for the GUC, that says "ERROR: huge_page_size must be
set to 0 on this platform".



pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: Parallel Seq Scan vs kernel read ahead
Next
From: Thomas Munro
Date:
Subject: Re: Parallel Seq Scan vs kernel read ahead