Thread: huge tlb support

huge tlb support

From
Robert Haas
Date:
On Fri, Jun 29, 2012 at 3:52 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> Hi All,
>
> In a *very* quick patch I tested using huge pages/MAP_HUGETLB for the mmap'ed
> memory.
> That gives around 9.5% performance benefit in a read-only pgbench run (-n -S -
> j 64 -c 64 -T 10 -M prepared, scale 200, 6GB s_b, 8 cores, 24GB mem).
>
> It also saves a bunch of memory per process due to the smaller page table
> (shared_buffers 6GB):
> cat /proc/$pid_of_pg_backend/status |grep VmPTE
> VmPTE:      6252 kB
> vs
> VmPTE:        60 kB
>
> Additionally it has the advantage that top/ps/... output under linux now looks
> like:
>   PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND
> 10603 andres    20   0 6381m 4924 1952 R    21  0.0   0:28.04 postgres
>
> i.e. RES now actually shows something usable... Which is rather nice imo.
>
> I don't have the time atm into making this something useable, maybe somebody
> else want to pick it up? Looks pretty worthwile investing some time.
>
> Because of the required setup we sure cannot make this the default but...

So, considering that there is required setup, it seems that the
obvious thing to do here is add a GUC: huge_tlb_pages (boolean).

The other alternative is to try with MAP_HUGETLB and, if it fails, try
again without MAP_HUGETLB.

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: huge tlb support

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Jun 29, 2012 at 3:52 PM, Andres Freund <andres@2ndquadrant.com> wrote:
>> In a *very* quick patch I tested using huge pages/MAP_HUGETLB for the mmap'ed
>> memory.

> So, considering that there is required setup, it seems that the
> obvious thing to do here is add a GUC: huge_tlb_pages (boolean).

> The other alternative is to try with MAP_HUGETLB and, if it fails, try
> again without MAP_HUGETLB.

+1 for not making people configure this manually.

Also, I was under the impression that recent Linux kernels use hugepages
automatically if they can, so I wonder exactly what Andres was testing
on ...
        regards, tom lane


Re: huge tlb support

From
Andres Freund
Date:
On Tuesday, July 03, 2012 05:18:04 AM Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Fri, Jun 29, 2012 at 3:52 PM, Andres Freund <andres@2ndquadrant.com> 
wrote:
> >> In a *very* quick patch I tested using huge pages/MAP_HUGETLB for the
> >> mmap'ed memory.
> > 
> > So, considering that there is required setup, it seems that the
> > obvious thing to do here is add a GUC: huge_tlb_pages (boolean).
We also need some logic to figure out how big the huge tlb size is... 
/sys/kernel/mm/hugepages/* contains a directory for each possible size. A bit 
unfortunately named though "hugepages-2048kB". We need to parse that.

> > The other alternative is to try with MAP_HUGETLB and, if it fails, try
> > again without MAP_HUGETLB.
> +1 for not making people configure this manually.
I don't think thats going to fly that well. You need to specifically allocate 
hugepages at boot or shortly thereafter. If postgres just grabs some of the 
available space without asking it very well might cause other applications not 
to be able to start. Were not allocating half of the system memory without 
asking either...

> Also, I was under the impression that recent Linux kernels use hugepages
> automatically if they can, so I wonder exactly what Andres was testing
> on ...
At the time I was running the test I was running a moderately new kernel:

andres@awork2:~$ uname -a
Linux awork2 3.4.3-andres #138 SMP Mon Jun 19 12:46:32 CEST 2012 x86_64 
GNU/Linux
andres@awork2:~$ zcat /proc/config.gz |grep HUGE
CONFIG_TRANSPARENT_HUGEPAGE=y
CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y
# CONFIG_TRANSPARENT_HUGEPAGE_MADVISE is not set
CONFIG_HUGETLBFS=y
CONFIG_HUGETLB_PAGE=y

So, transparent hugepages are enabled by default.

The problem is that the kernel needs 2MB of adjacent physical memory mapping 
to 2MB of adjacent virtual memory. In on-demand, cow virtual memory systems 
that just doesn't happen all the time if youre not doing file mmap while 
triggering massive readaheads. Especially if the system has been running for 
some time because the memory just gets too fragmented to have lots of adjacent 
physical memory around.
There was/is talk about moving physical memory around to make room for more 
huge pages but thats not there yet and the patches I have seen incurred quite 
some overhead.
Btw, the introduction of transparent hugepages advocated that there are still 
benefits in manual hugepage setups.

Btw, should anybody want to test this:
After boot you can allocate huge pages with:
during runtime:
echo 3000 > /proc/sys/vm/nr_hugepages
or at boot you can add a parameter:
hugepages=3000
(allocates 6GB of huge pages on x86-64)

The runtime one might take quite a time till it has found enough pages or even 
fall short.

You can see the huge page status with:
andres@awork2:~$ cat /proc/meminfo |grep Huge
AnonHugePages:    591872 kB
HugePages_Total:    3000
HugePages_Free:     3000
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB



Greetings,

Andres

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services


Re: huge tlb support

From
Andres Freund
Date:
On Tuesday, July 03, 2012 04:49:10 AM Robert Haas wrote:
> So, considering that there is required setup, it seems that the
> obvious thing to do here is add a GUC: huge_tlb_pages (boolean).
> 
> The other alternative is to try with MAP_HUGETLB and, if it fails, try
> again without MAP_HUGETLB.
What about huge_tlb_pages = off|try|on with try being the default?

Greetings,

Andres
-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services


Re: huge tlb support

From
Robert Haas
Date:
On Tue, Jul 3, 2012 at 8:23 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> On Tuesday, July 03, 2012 04:49:10 AM Robert Haas wrote:
>> So, considering that there is required setup, it seems that the
>> obvious thing to do here is add a GUC: huge_tlb_pages (boolean).
>>
>> The other alternative is to try with MAP_HUGETLB and, if it fails, try
>> again without MAP_HUGETLB.
> What about huge_tlb_pages = off|try|on with try being the default?

That seems reasonable to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: huge tlb support

From
Tom Lane
Date:
yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi) writes:
>> Also, I was under the impression that recent Linux kernels use hugepages
>> automatically if they can, so I wonder exactly what Andres was testing
>> on ...

> if you mean the "trasparent hugepage" feature, iirc it doesn't affect
> MAP_SHARED mappings like this.

Oh!  That would explain some things.  It seems like a pretty nasty
restriction though ... do you know why they did that?
        regards, tom lane


Re: huge tlb support

From
Martijn van Oosterhout
Date:
On Mon, Jul 09, 2012 at 02:11:00AM -0400, Tom Lane wrote:
> yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi) writes:
> >> Also, I was under the impression that recent Linux kernels use hugepages
> >> automatically if they can, so I wonder exactly what Andres was testing
> >> on ...
>
> > if you mean the "trasparent hugepage" feature, iirc it doesn't affect
> > MAP_SHARED mappings like this.
>
> Oh!  That would explain some things.  It seems like a pretty nasty
> restriction though ... do you know why they did that?

It doesn't say explicitly in the documentation (found at
http://lwn.net/Articles/423592/ aka transhuge.txt) but reading between
the lines I'm guessing it's due to the fact that huge pages must be
aligned to 2 or 4MB and when dealing with a shared mapping you probably
need to require it to be aligned is all address spaces.

However, it seems it should work for SysV shared memory, see:
http://lwn.net/Articles/375096/ .  The same page suggests shared
mappings should work fine.  However, this page refers to the
non-transparent feature.

If you think about it, it must work since huge pages are inherited
through fork().

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.  -- Arthur Schopenhauer

Re: huge tlb support

From
Andres Freund
Date:
On Monday, July 09, 2012 08:11:00 AM Tom Lane wrote:
> yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi) writes:
> >> Also, I was under the impression that recent Linux kernels use hugepages
> >> automatically if they can, so I wonder exactly what Andres was testing
> >> on ...
> > 
> > if you mean the "trasparent hugepage" feature, iirc it doesn't affect
> > MAP_SHARED mappings like this.
> 
> Oh!  That would explain some things.  It seems like a pretty nasty
> restriction though ... do you know why they did that?
Looking a bit deeper they explicitly only work on private memory. The reason 
apparently being that its too hard to update the page table entries in 
multiple processes at once without introducing locking problems/scalability 
issues.

To be sure one can check /proc/$pid_of_pg_proccess/smaps and look for the 
mapping to /dev/zero or the biggest mapping ;). Its not counted as Anonymous 
memory and it doesn't have transparent hugepages. I was confused before 
because there is quite some (400mb here) huge pages allocated for postgres 
during a pgbench run but thats just all the local memory...

Greetings,

Andres

PS: The important #define is in mm/huge_memory.c:

#define VM_NO_THP (VM_SPECIAL|VM_INSERTPAGE|VM_MIXEDMAP|VM_SAO| \       VM_HUGETLB|VM_SHARED|VM_MAYSHARE)

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services


Re: huge tlb support

From
yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi)
Date:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Fri, Jun 29, 2012 at 3:52 PM, Andres Freund <andres@2ndquadrant.com> wrote:
>>> In a *very* quick patch I tested using huge pages/MAP_HUGETLB for the mmap'ed
>>> memory.
> 
>> So, considering that there is required setup, it seems that the
>> obvious thing to do here is add a GUC: huge_tlb_pages (boolean).
> 
>> The other alternative is to try with MAP_HUGETLB and, if it fails, try
>> again without MAP_HUGETLB.
> 
> +1 for not making people configure this manually.
> 
> Also, I was under the impression that recent Linux kernels use hugepages
> automatically if they can, so I wonder exactly what Andres was testing
> on ...

if you mean the "trasparent hugepage" feature, iirc it doesn't affect
MAP_SHARED mappings like this.

YAMAMOTO Takashi

> 
>             regards, tom lane
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


Re: huge tlb support

From
yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi)
Date:
> yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi) writes:
>>> Also, I was under the impression that recent Linux kernels use hugepages
>>> automatically if they can, so I wonder exactly what Andres was testing
>>> on ...
> 
>> if you mean the "trasparent hugepage" feature, iirc it doesn't affect
>> MAP_SHARED mappings like this.
> 
> Oh!  That would explain some things.  It seems like a pretty nasty
> restriction though ... do you know why they did that?

i don't know.  simply because it wasn't trivial, i guess.
the feature was implemented for kvm's guest memory, which is
non-shared anonymous memory from POV of host kernel.

YAMAMOTO Takashi

> 
>             regards, tom lane
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


Re: huge tlb support

From
David Gould
Date:
On Mon, 9 Jul 2012 12:30:23 +0200
Andres Freund <andres@2ndquadrant.com> wrote:

> On Monday, July 09, 2012 08:11:00 AM Tom Lane wrote:
> > yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi) writes:
> > >> Also, I was under the impression that recent Linux kernels use
> > >> hugepages automatically if they can, so I wonder exactly what
> > >> Andres was testing on ...
> > > 
> > > if you mean the "trasparent hugepage" feature, iirc it doesn't
> > > affect MAP_SHARED mappings like this.
> > 
> > Oh!  That would explain some things.  It seems like a pretty nasty
> > restriction though ... do you know why they did that?
> Looking a bit deeper they explicitly only work on private memory. The
> reason apparently being that its too hard to update the page table
> entries in multiple processes at once without introducing locking
> problems/scalability issues.
> 
> To be sure one can check /proc/$pid_of_pg_proccess/smaps and look for
> the mapping to /dev/zero or the biggest mapping ;). Its not counted as
> Anonymous memory and it doesn't have transparent hugepages. I was
> confused before because there is quite some (400mb here) huge pages
> allocated for postgres during a pgbench run but thats just all the
> local memory...

A warning, on RHEL 6.1 (2.6.32-131.4.1.el6.x86_64 #1 SMP) we have had
horrible problems caused by transparent_hugepages running postgres on
largish systems (128GB to 512GB memory, 32 cores). The system sometimes
goes 99% system time and is very slow and unresponsive to the point of
not successfully completing new tcp connections. Turning off
transparent_hugepages fixes it. 

That said, explicit hugepage support for the buffer cache would be a big
win especially for high connection counts.

-dg


-- 
David Gould                                   daveg@sonic.net
If simplicity worked, the world would be overrun with insects.



Re: huge tlb support

From
Robert Haas
Date:
On Thu, Aug 16, 2012 at 10:53 PM, David Gould <daveg@sonic.net> wrote:
> A warning, on RHEL 6.1 (2.6.32-131.4.1.el6.x86_64 #1 SMP) we have had
> horrible problems caused by transparent_hugepages running postgres on
> largish systems (128GB to 512GB memory, 32 cores). The system sometimes
> goes 99% system time and is very slow and unresponsive to the point of
> not successfully completing new tcp connections. Turning off
> transparent_hugepages fixes it.

Yikes!  Any idea WHY that happens?

I'm inclined to think this torpedos any idea we might have of enabling
hugepages automatically whenever possible.  I think we should just add
a GUC for this and call it good.  If the state of the world improves
sufficiently in the future, we can adjust, but I think for right now
we should just do this in the simplest way possible and move on.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: huge tlb support

From
Andres Freund
Date:
On Tuesday, August 21, 2012 05:30:28 PM Robert Haas wrote:
> On Thu, Aug 16, 2012 at 10:53 PM, David Gould <daveg@sonic.net> wrote:
> > A warning, on RHEL 6.1 (2.6.32-131.4.1.el6.x86_64 #1 SMP) we have had
> > horrible problems caused by transparent_hugepages running postgres on
> > largish systems (128GB to 512GB memory, 32 cores). The system sometimes
> > goes 99% system time and is very slow and unresponsive to the point of
> > not successfully completing new tcp connections. Turning off
> > transparent_hugepages fixes it.
> 
> Yikes!  Any idea WHY that happens?
> 
> I'm inclined to think this torpedos any idea we might have of enabling
> hugepages automatically whenever possible.  I think we should just add
> a GUC for this and call it good.  If the state of the world improves
> sufficiently in the future, we can adjust, but I think for right now
> we should just do this in the simplest way possible and move on.
He is talking about transparent hugepages not hugepages afaics.

Andres
-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: huge tlb support

From
Robert Haas
Date:
On Tue, Aug 21, 2012 at 11:31 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> On Tuesday, August 21, 2012 05:30:28 PM Robert Haas wrote:
>> On Thu, Aug 16, 2012 at 10:53 PM, David Gould <daveg@sonic.net> wrote:
>> > A warning, on RHEL 6.1 (2.6.32-131.4.1.el6.x86_64 #1 SMP) we have had
>> > horrible problems caused by transparent_hugepages running postgres on
>> > largish systems (128GB to 512GB memory, 32 cores). The system sometimes
>> > goes 99% system time and is very slow and unresponsive to the point of
>> > not successfully completing new tcp connections. Turning off
>> > transparent_hugepages fixes it.
>>
>> Yikes!  Any idea WHY that happens?
>>
>> I'm inclined to think this torpedos any idea we might have of enabling
>> hugepages automatically whenever possible.  I think we should just add
>> a GUC for this and call it good.  If the state of the world improves
>> sufficiently in the future, we can adjust, but I think for right now
>> we should just do this in the simplest way possible and move on.
> He is talking about transparent hugepages not hugepages afaics.

Hmm.  I guess you're right.  But why would it be different?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: huge tlb support

From
Andres Freund
Date:
On Tuesday, August 21, 2012 05:56:58 PM Robert Haas wrote:
> On Tue, Aug 21, 2012 at 11:31 AM, Andres Freund <andres@2ndquadrant.com> 
wrote:
> > On Tuesday, August 21, 2012 05:30:28 PM Robert Haas wrote:
> >> On Thu, Aug 16, 2012 at 10:53 PM, David Gould <daveg@sonic.net> wrote:
> >> > A warning, on RHEL 6.1 (2.6.32-131.4.1.el6.x86_64 #1 SMP) we have had
> >> > horrible problems caused by transparent_hugepages running postgres on
> >> > largish systems (128GB to 512GB memory, 32 cores). The system
> >> > sometimes goes 99% system time and is very slow and unresponsive to
> >> > the point of not successfully completing new tcp connections. Turning
> >> > off
> >> > transparent_hugepages fixes it.
> >> 
> >> Yikes!  Any idea WHY that happens?
Afair there were several bugs that could cause that in earlier version of the 
hugepage feature. The prominent was something around never really stopping to 
search for mergeable pages even though the probability was small or such.

I am not a rhel person, so I cannot directly interpret that kernel version, is 
that the latest kernel?

> >> I'm inclined to think this torpedos any idea we might have of enabling
> >> hugepages automatically whenever possible.  I think we should just add
> >> a GUC for this and call it good.  If the state of the world improves
> >> sufficiently in the future, we can adjust, but I think for right now
> >> we should just do this in the simplest way possible and move on.
> > 
> > He is talking about transparent hugepages not hugepages afaics.
> 
> Hmm.  I guess you're right.  But why would it be different?
Because in this case explicit hugepage usage reduces the pain instead of 
increasing it. And we cannot do much against transparent hugepages being 
enabled by default.
Unless I misremember how things work the problem is/was independent of 
anonymous mmap or sysv shmem.


Greetings,

Andres
-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: huge tlb support

From
David Gould
Date:
On Tue, 21 Aug 2012 18:06:38 +0200
Andres Freund <andres@2ndquadrant.com> wrote:

> On Tuesday, August 21, 2012 05:56:58 PM Robert Haas wrote:
> > On Tue, Aug 21, 2012 at 11:31 AM, Andres Freund
> > <andres@2ndquadrant.com> 
> wrote:
> > > On Tuesday, August 21, 2012 05:30:28 PM Robert Haas wrote:
> > >> On Thu, Aug 16, 2012 at 10:53 PM, David Gould <daveg@sonic.net>
> > >> wrote:
> > >> > A warning, on RHEL 6.1 (2.6.32-131.4.1.el6.x86_64 #1 SMP) we
> > >> > have had horrible problems caused by transparent_hugepages
> > >> > running postgres on largish systems (128GB to 512GB memory, 32
> > >> > cores). The system sometimes goes 99% system time and is very
> > >> > slow and unresponsive to the point of not successfully
> > >> > completing new tcp connections. Turning off
> > >> > transparent_hugepages fixes it.
> > >> 
> > >> Yikes!  Any idea WHY that happens?
> Afair there were several bugs that could cause that in earlier version
> of the hugepage feature. The prominent was something around never
> really stopping to search for mergeable pages even though the
> probability was small or such.

This is what I think was going on. We did see a lot (99%) of time in some
routine in the VM (I forget exactly which), and my interpretation was
that it was trying to create hugepages from scattered fragments.

> > >> I'm inclined to think this torpedos any idea we might have of
> > >> enabling hugepages automatically whenever possible.  I think we
> > >> should just add a GUC for this and call it good.  If the state of
> > >> the world improves sufficiently in the future, we can adjust, but
> > >> I think for right now we should just do this in the simplest way
> > >> possible and move on.
> > > 
> > > He is talking about transparent hugepages not hugepages afaics.
> > 
> > Hmm.  I guess you're right.  But why would it be different?
> Because in this case explicit hugepage usage reduces the pain instead
> of increasing it. And we cannot do much against transparent hugepages
> being enabled by default.
> Unless I misremember how things work the problem is/was independent of 
> anonymous mmap or sysv shmem.

Explicit hugepages work because the pages can be created early before all
of memory is fragmented and you either succeed or fail. Transparent
hugepages uses a daemon that looks for processe that might benefit from
hugepages and tries to create hugepages on the fly. On a system that has
been up for a some time memory may be so fragmented that this is just a
waste of time.

Real as opposed to transparent hugepages would be a huge win for
applications that try to use high connection counts. Each backend
attached to the postgresql shared memory uses its own set of page table
entries at the rate of 2KB per MB of mapped shared memory. At 8GB of
shared buffers and 1000 connections this uses 16GB just for page tables.

-dg

-- 
David Gould              510 282 0869         daveg@sonic.net
If simplicity worked, the world would be overrun with insects.