Re: huge tlb support - Mailing list pgsql-hackers

From Andres Freund
Subject Re: huge tlb support
Date
Msg-id 201207031330.36372.andres@2ndquadrant.com
Whole thread Raw
In response to Re: huge tlb support  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Tuesday, July 03, 2012 05:18:04 AM Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Fri, Jun 29, 2012 at 3:52 PM, Andres Freund <andres@2ndquadrant.com> 
wrote:
> >> In a *very* quick patch I tested using huge pages/MAP_HUGETLB for the
> >> mmap'ed memory.
> > 
> > So, considering that there is required setup, it seems that the
> > obvious thing to do here is add a GUC: huge_tlb_pages (boolean).
We also need some logic to figure out how big the huge tlb size is... 
/sys/kernel/mm/hugepages/* contains a directory for each possible size. A bit 
unfortunately named though "hugepages-2048kB". We need to parse that.

> > The other alternative is to try with MAP_HUGETLB and, if it fails, try
> > again without MAP_HUGETLB.
> +1 for not making people configure this manually.
I don't think thats going to fly that well. You need to specifically allocate 
hugepages at boot or shortly thereafter. If postgres just grabs some of the 
available space without asking it very well might cause other applications not 
to be able to start. Were not allocating half of the system memory without 
asking either...

> Also, I was under the impression that recent Linux kernels use hugepages
> automatically if they can, so I wonder exactly what Andres was testing
> on ...
At the time I was running the test I was running a moderately new kernel:

andres@awork2:~$ uname -a
Linux awork2 3.4.3-andres #138 SMP Mon Jun 19 12:46:32 CEST 2012 x86_64 
GNU/Linux
andres@awork2:~$ zcat /proc/config.gz |grep HUGE
CONFIG_TRANSPARENT_HUGEPAGE=y
CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y
# CONFIG_TRANSPARENT_HUGEPAGE_MADVISE is not set
CONFIG_HUGETLBFS=y
CONFIG_HUGETLB_PAGE=y

So, transparent hugepages are enabled by default.

The problem is that the kernel needs 2MB of adjacent physical memory mapping 
to 2MB of adjacent virtual memory. In on-demand, cow virtual memory systems 
that just doesn't happen all the time if youre not doing file mmap while 
triggering massive readaheads. Especially if the system has been running for 
some time because the memory just gets too fragmented to have lots of adjacent 
physical memory around.
There was/is talk about moving physical memory around to make room for more 
huge pages but thats not there yet and the patches I have seen incurred quite 
some overhead.
Btw, the introduction of transparent hugepages advocated that there are still 
benefits in manual hugepage setups.

Btw, should anybody want to test this:
After boot you can allocate huge pages with:
during runtime:
echo 3000 > /proc/sys/vm/nr_hugepages
or at boot you can add a parameter:
hugepages=3000
(allocates 6GB of huge pages on x86-64)

The runtime one might take quite a time till it has found enough pages or even 
fall short.

You can see the huge page status with:
andres@awork2:~$ cat /proc/meminfo |grep Huge
AnonHugePages:    591872 kB
HugePages_Total:    3000
HugePages_Free:     3000
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB



Greetings,

Andres

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services


pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: xlog filename formatting functions in recovery
Next
From: Robert Haas
Date:
Subject: Re: xlog filename formatting functions in recovery