Re: huge tlb support - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: huge tlb support |
Date | |
Msg-id | 201207031330.36372.andres@2ndquadrant.com Whole thread Raw |
In response to | Re: huge tlb support (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
On Tuesday, July 03, 2012 05:18:04 AM Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > On Fri, Jun 29, 2012 at 3:52 PM, Andres Freund <andres@2ndquadrant.com> wrote: > >> In a *very* quick patch I tested using huge pages/MAP_HUGETLB for the > >> mmap'ed memory. > > > > So, considering that there is required setup, it seems that the > > obvious thing to do here is add a GUC: huge_tlb_pages (boolean). We also need some logic to figure out how big the huge tlb size is... /sys/kernel/mm/hugepages/* contains a directory for each possible size. A bit unfortunately named though "hugepages-2048kB". We need to parse that. > > The other alternative is to try with MAP_HUGETLB and, if it fails, try > > again without MAP_HUGETLB. > +1 for not making people configure this manually. I don't think thats going to fly that well. You need to specifically allocate hugepages at boot or shortly thereafter. If postgres just grabs some of the available space without asking it very well might cause other applications not to be able to start. Were not allocating half of the system memory without asking either... > Also, I was under the impression that recent Linux kernels use hugepages > automatically if they can, so I wonder exactly what Andres was testing > on ... At the time I was running the test I was running a moderately new kernel: andres@awork2:~$ uname -a Linux awork2 3.4.3-andres #138 SMP Mon Jun 19 12:46:32 CEST 2012 x86_64 GNU/Linux andres@awork2:~$ zcat /proc/config.gz |grep HUGE CONFIG_TRANSPARENT_HUGEPAGE=y CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y # CONFIG_TRANSPARENT_HUGEPAGE_MADVISE is not set CONFIG_HUGETLBFS=y CONFIG_HUGETLB_PAGE=y So, transparent hugepages are enabled by default. The problem is that the kernel needs 2MB of adjacent physical memory mapping to 2MB of adjacent virtual memory. In on-demand, cow virtual memory systems that just doesn't happen all the time if youre not doing file mmap while triggering massive readaheads. Especially if the system has been running for some time because the memory just gets too fragmented to have lots of adjacent physical memory around. There was/is talk about moving physical memory around to make room for more huge pages but thats not there yet and the patches I have seen incurred quite some overhead. Btw, the introduction of transparent hugepages advocated that there are still benefits in manual hugepage setups. Btw, should anybody want to test this: After boot you can allocate huge pages with: during runtime: echo 3000 > /proc/sys/vm/nr_hugepages or at boot you can add a parameter: hugepages=3000 (allocates 6GB of huge pages on x86-64) The runtime one might take quite a time till it has found enough pages or even fall short. You can see the huge page status with: andres@awork2:~$ cat /proc/meminfo |grep Huge AnonHugePages: 591872 kB HugePages_Total: 3000 HugePages_Free: 3000 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: