Re: huge tlb support - Mailing list pgsql-hackers

From David Gould
Subject Re: huge tlb support
Date
Msg-id 20120821131254.1415a545@jekyl.davidgould.org
Whole thread Raw
In response to Re: huge tlb support  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-hackers
On Tue, 21 Aug 2012 18:06:38 +0200
Andres Freund <andres@2ndquadrant.com> wrote:

> On Tuesday, August 21, 2012 05:56:58 PM Robert Haas wrote:
> > On Tue, Aug 21, 2012 at 11:31 AM, Andres Freund
> > <andres@2ndquadrant.com> 
> wrote:
> > > On Tuesday, August 21, 2012 05:30:28 PM Robert Haas wrote:
> > >> On Thu, Aug 16, 2012 at 10:53 PM, David Gould <daveg@sonic.net>
> > >> wrote:
> > >> > A warning, on RHEL 6.1 (2.6.32-131.4.1.el6.x86_64 #1 SMP) we
> > >> > have had horrible problems caused by transparent_hugepages
> > >> > running postgres on largish systems (128GB to 512GB memory, 32
> > >> > cores). The system sometimes goes 99% system time and is very
> > >> > slow and unresponsive to the point of not successfully
> > >> > completing new tcp connections. Turning off
> > >> > transparent_hugepages fixes it.
> > >> 
> > >> Yikes!  Any idea WHY that happens?
> Afair there were several bugs that could cause that in earlier version
> of the hugepage feature. The prominent was something around never
> really stopping to search for mergeable pages even though the
> probability was small or such.

This is what I think was going on. We did see a lot (99%) of time in some
routine in the VM (I forget exactly which), and my interpretation was
that it was trying to create hugepages from scattered fragments.

> > >> I'm inclined to think this torpedos any idea we might have of
> > >> enabling hugepages automatically whenever possible.  I think we
> > >> should just add a GUC for this and call it good.  If the state of
> > >> the world improves sufficiently in the future, we can adjust, but
> > >> I think for right now we should just do this in the simplest way
> > >> possible and move on.
> > > 
> > > He is talking about transparent hugepages not hugepages afaics.
> > 
> > Hmm.  I guess you're right.  But why would it be different?
> Because in this case explicit hugepage usage reduces the pain instead
> of increasing it. And we cannot do much against transparent hugepages
> being enabled by default.
> Unless I misremember how things work the problem is/was independent of 
> anonymous mmap or sysv shmem.

Explicit hugepages work because the pages can be created early before all
of memory is fragmented and you either succeed or fail. Transparent
hugepages uses a daemon that looks for processe that might benefit from
hugepages and tries to create hugepages on the fly. On a system that has
been up for a some time memory may be so fragmented that this is just a
waste of time.

Real as opposed to transparent hugepages would be a huge win for
applications that try to use high connection counts. Each backend
attached to the postgresql shared memory uses its own set of page table
entries at the rate of 2KB per MB of mapped shared memory. At 8GB of
shared buffers and 1000 connections this uses 16GB just for page tables.

-dg

-- 
David Gould              510 282 0869         daveg@sonic.net
If simplicity worked, the world would be overrun with insects.



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: 9.2RC1 wraps this Thursday ...
Next
From: Robert Haas
Date:
Subject: Re: reviewing the "Reduce sinval synchronization overhead" patch / b4fbe392f8ff6ff1a66b488eb7197eef9e1770a4