Thread: huge tlb support
On Fri, Jun 29, 2012 at 3:52 PM, Andres Freund <andres@2ndquadrant.com> wrote: > Hi All, > > In a *very* quick patch I tested using huge pages/MAP_HUGETLB for the mmap'ed > memory. > That gives around 9.5% performance benefit in a read-only pgbench run (-n -S - > j 64 -c 64 -T 10 -M prepared, scale 200, 6GB s_b, 8 cores, 24GB mem). > > It also saves a bunch of memory per process due to the smaller page table > (shared_buffers 6GB): > cat /proc/$pid_of_pg_backend/status |grep VmPTE > VmPTE: 6252 kB > vs > VmPTE: 60 kB > > Additionally it has the advantage that top/ps/... output under linux now looks > like: > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 10603 andres 20 0 6381m 4924 1952 R 21 0.0 0:28.04 postgres > > i.e. RES now actually shows something usable... Which is rather nice imo. > > I don't have the time atm into making this something useable, maybe somebody > else want to pick it up? Looks pretty worthwile investing some time. > > Because of the required setup we sure cannot make this the default but... So, considering that there is required setup, it seems that the obvious thing to do here is add a GUC: huge_tlb_pages (boolean). The other alternative is to try with MAP_HUGETLB and, if it fails, try again without MAP_HUGETLB. Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, Jun 29, 2012 at 3:52 PM, Andres Freund <andres@2ndquadrant.com> wrote: >> In a *very* quick patch I tested using huge pages/MAP_HUGETLB for the mmap'ed >> memory. > So, considering that there is required setup, it seems that the > obvious thing to do here is add a GUC: huge_tlb_pages (boolean). > The other alternative is to try with MAP_HUGETLB and, if it fails, try > again without MAP_HUGETLB. +1 for not making people configure this manually. Also, I was under the impression that recent Linux kernels use hugepages automatically if they can, so I wonder exactly what Andres was testing on ... regards, tom lane
On Tuesday, July 03, 2012 05:18:04 AM Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > On Fri, Jun 29, 2012 at 3:52 PM, Andres Freund <andres@2ndquadrant.com> wrote: > >> In a *very* quick patch I tested using huge pages/MAP_HUGETLB for the > >> mmap'ed memory. > > > > So, considering that there is required setup, it seems that the > > obvious thing to do here is add a GUC: huge_tlb_pages (boolean). We also need some logic to figure out how big the huge tlb size is... /sys/kernel/mm/hugepages/* contains a directory for each possible size. A bit unfortunately named though "hugepages-2048kB". We need to parse that. > > The other alternative is to try with MAP_HUGETLB and, if it fails, try > > again without MAP_HUGETLB. > +1 for not making people configure this manually. I don't think thats going to fly that well. You need to specifically allocate hugepages at boot or shortly thereafter. If postgres just grabs some of the available space without asking it very well might cause other applications not to be able to start. Were not allocating half of the system memory without asking either... > Also, I was under the impression that recent Linux kernels use hugepages > automatically if they can, so I wonder exactly what Andres was testing > on ... At the time I was running the test I was running a moderately new kernel: andres@awork2:~$ uname -a Linux awork2 3.4.3-andres #138 SMP Mon Jun 19 12:46:32 CEST 2012 x86_64 GNU/Linux andres@awork2:~$ zcat /proc/config.gz |grep HUGE CONFIG_TRANSPARENT_HUGEPAGE=y CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y # CONFIG_TRANSPARENT_HUGEPAGE_MADVISE is not set CONFIG_HUGETLBFS=y CONFIG_HUGETLB_PAGE=y So, transparent hugepages are enabled by default. The problem is that the kernel needs 2MB of adjacent physical memory mapping to 2MB of adjacent virtual memory. In on-demand, cow virtual memory systems that just doesn't happen all the time if youre not doing file mmap while triggering massive readaheads. Especially if the system has been running for some time because the memory just gets too fragmented to have lots of adjacent physical memory around. There was/is talk about moving physical memory around to make room for more huge pages but thats not there yet and the patches I have seen incurred quite some overhead. Btw, the introduction of transparent hugepages advocated that there are still benefits in manual hugepage setups. Btw, should anybody want to test this: After boot you can allocate huge pages with: during runtime: echo 3000 > /proc/sys/vm/nr_hugepages or at boot you can add a parameter: hugepages=3000 (allocates 6GB of huge pages on x86-64) The runtime one might take quite a time till it has found enough pages or even fall short. You can see the huge page status with: andres@awork2:~$ cat /proc/meminfo |grep Huge AnonHugePages: 591872 kB HugePages_Total: 3000 HugePages_Free: 3000 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Tuesday, July 03, 2012 04:49:10 AM Robert Haas wrote: > So, considering that there is required setup, it seems that the > obvious thing to do here is add a GUC: huge_tlb_pages (boolean). > > The other alternative is to try with MAP_HUGETLB and, if it fails, try > again without MAP_HUGETLB. What about huge_tlb_pages = off|try|on with try being the default? Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Jul 3, 2012 at 8:23 AM, Andres Freund <andres@2ndquadrant.com> wrote: > On Tuesday, July 03, 2012 04:49:10 AM Robert Haas wrote: >> So, considering that there is required setup, it seems that the >> obvious thing to do here is add a GUC: huge_tlb_pages (boolean). >> >> The other alternative is to try with MAP_HUGETLB and, if it fails, try >> again without MAP_HUGETLB. > What about huge_tlb_pages = off|try|on with try being the default? That seems reasonable to me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi) writes: >> Also, I was under the impression that recent Linux kernels use hugepages >> automatically if they can, so I wonder exactly what Andres was testing >> on ... > if you mean the "trasparent hugepage" feature, iirc it doesn't affect > MAP_SHARED mappings like this. Oh! That would explain some things. It seems like a pretty nasty restriction though ... do you know why they did that? regards, tom lane
On Mon, Jul 09, 2012 at 02:11:00AM -0400, Tom Lane wrote: > yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi) writes: > >> Also, I was under the impression that recent Linux kernels use hugepages > >> automatically if they can, so I wonder exactly what Andres was testing > >> on ... > > > if you mean the "trasparent hugepage" feature, iirc it doesn't affect > > MAP_SHARED mappings like this. > > Oh! That would explain some things. It seems like a pretty nasty > restriction though ... do you know why they did that? It doesn't say explicitly in the documentation (found at http://lwn.net/Articles/423592/ aka transhuge.txt) but reading between the lines I'm guessing it's due to the fact that huge pages must be aligned to 2 or 4MB and when dealing with a shared mapping you probably need to require it to be aligned is all address spaces. However, it seems it should work for SysV shared memory, see: http://lwn.net/Articles/375096/ . The same page suggests shared mappings should work fine. However, this page refers to the non-transparent feature. If you think about it, it must work since huge pages are inherited through fork(). Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > He who writes carelessly confesses thereby at the very outset that he does > not attach much importance to his own thoughts. -- Arthur Schopenhauer
On Monday, July 09, 2012 08:11:00 AM Tom Lane wrote: > yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi) writes: > >> Also, I was under the impression that recent Linux kernels use hugepages > >> automatically if they can, so I wonder exactly what Andres was testing > >> on ... > > > > if you mean the "trasparent hugepage" feature, iirc it doesn't affect > > MAP_SHARED mappings like this. > > Oh! That would explain some things. It seems like a pretty nasty > restriction though ... do you know why they did that? Looking a bit deeper they explicitly only work on private memory. The reason apparently being that its too hard to update the page table entries in multiple processes at once without introducing locking problems/scalability issues. To be sure one can check /proc/$pid_of_pg_proccess/smaps and look for the mapping to /dev/zero or the biggest mapping ;). Its not counted as Anonymous memory and it doesn't have transparent hugepages. I was confused before because there is quite some (400mb here) huge pages allocated for postgres during a pgbench run but thats just all the local memory... Greetings, Andres PS: The important #define is in mm/huge_memory.c: #define VM_NO_THP (VM_SPECIAL|VM_INSERTPAGE|VM_MIXEDMAP|VM_SAO| \ VM_HUGETLB|VM_SHARED|VM_MAYSHARE) -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
> Robert Haas <robertmhaas@gmail.com> writes: >> On Fri, Jun 29, 2012 at 3:52 PM, Andres Freund <andres@2ndquadrant.com> wrote: >>> In a *very* quick patch I tested using huge pages/MAP_HUGETLB for the mmap'ed >>> memory. > >> So, considering that there is required setup, it seems that the >> obvious thing to do here is add a GUC: huge_tlb_pages (boolean). > >> The other alternative is to try with MAP_HUGETLB and, if it fails, try >> again without MAP_HUGETLB. > > +1 for not making people configure this manually. > > Also, I was under the impression that recent Linux kernels use hugepages > automatically if they can, so I wonder exactly what Andres was testing > on ... if you mean the "trasparent hugepage" feature, iirc it doesn't affect MAP_SHARED mappings like this. YAMAMOTO Takashi > > regards, tom lane > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers
> yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi) writes: >>> Also, I was under the impression that recent Linux kernels use hugepages >>> automatically if they can, so I wonder exactly what Andres was testing >>> on ... > >> if you mean the "trasparent hugepage" feature, iirc it doesn't affect >> MAP_SHARED mappings like this. > > Oh! That would explain some things. It seems like a pretty nasty > restriction though ... do you know why they did that? i don't know. simply because it wasn't trivial, i guess. the feature was implemented for kvm's guest memory, which is non-shared anonymous memory from POV of host kernel. YAMAMOTO Takashi > > regards, tom lane > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, 9 Jul 2012 12:30:23 +0200 Andres Freund <andres@2ndquadrant.com> wrote: > On Monday, July 09, 2012 08:11:00 AM Tom Lane wrote: > > yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi) writes: > > >> Also, I was under the impression that recent Linux kernels use > > >> hugepages automatically if they can, so I wonder exactly what > > >> Andres was testing on ... > > > > > > if you mean the "trasparent hugepage" feature, iirc it doesn't > > > affect MAP_SHARED mappings like this. > > > > Oh! That would explain some things. It seems like a pretty nasty > > restriction though ... do you know why they did that? > Looking a bit deeper they explicitly only work on private memory. The > reason apparently being that its too hard to update the page table > entries in multiple processes at once without introducing locking > problems/scalability issues. > > To be sure one can check /proc/$pid_of_pg_proccess/smaps and look for > the mapping to /dev/zero or the biggest mapping ;). Its not counted as > Anonymous memory and it doesn't have transparent hugepages. I was > confused before because there is quite some (400mb here) huge pages > allocated for postgres during a pgbench run but thats just all the > local memory... A warning, on RHEL 6.1 (2.6.32-131.4.1.el6.x86_64 #1 SMP) we have had horrible problems caused by transparent_hugepages running postgres on largish systems (128GB to 512GB memory, 32 cores). The system sometimes goes 99% system time and is very slow and unresponsive to the point of not successfully completing new tcp connections. Turning off transparent_hugepages fixes it. That said, explicit hugepage support for the buffer cache would be a big win especially for high connection counts. -dg -- David Gould daveg@sonic.net If simplicity worked, the world would be overrun with insects.
On Thu, Aug 16, 2012 at 10:53 PM, David Gould <daveg@sonic.net> wrote: > A warning, on RHEL 6.1 (2.6.32-131.4.1.el6.x86_64 #1 SMP) we have had > horrible problems caused by transparent_hugepages running postgres on > largish systems (128GB to 512GB memory, 32 cores). The system sometimes > goes 99% system time and is very slow and unresponsive to the point of > not successfully completing new tcp connections. Turning off > transparent_hugepages fixes it. Yikes! Any idea WHY that happens? I'm inclined to think this torpedos any idea we might have of enabling hugepages automatically whenever possible. I think we should just add a GUC for this and call it good. If the state of the world improves sufficiently in the future, we can adjust, but I think for right now we should just do this in the simplest way possible and move on. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tuesday, August 21, 2012 05:30:28 PM Robert Haas wrote: > On Thu, Aug 16, 2012 at 10:53 PM, David Gould <daveg@sonic.net> wrote: > > A warning, on RHEL 6.1 (2.6.32-131.4.1.el6.x86_64 #1 SMP) we have had > > horrible problems caused by transparent_hugepages running postgres on > > largish systems (128GB to 512GB memory, 32 cores). The system sometimes > > goes 99% system time and is very slow and unresponsive to the point of > > not successfully completing new tcp connections. Turning off > > transparent_hugepages fixes it. > > Yikes! Any idea WHY that happens? > > I'm inclined to think this torpedos any idea we might have of enabling > hugepages automatically whenever possible. I think we should just add > a GUC for this and call it good. If the state of the world improves > sufficiently in the future, we can adjust, but I think for right now > we should just do this in the simplest way possible and move on. He is talking about transparent hugepages not hugepages afaics. Andres -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Aug 21, 2012 at 11:31 AM, Andres Freund <andres@2ndquadrant.com> wrote: > On Tuesday, August 21, 2012 05:30:28 PM Robert Haas wrote: >> On Thu, Aug 16, 2012 at 10:53 PM, David Gould <daveg@sonic.net> wrote: >> > A warning, on RHEL 6.1 (2.6.32-131.4.1.el6.x86_64 #1 SMP) we have had >> > horrible problems caused by transparent_hugepages running postgres on >> > largish systems (128GB to 512GB memory, 32 cores). The system sometimes >> > goes 99% system time and is very slow and unresponsive to the point of >> > not successfully completing new tcp connections. Turning off >> > transparent_hugepages fixes it. >> >> Yikes! Any idea WHY that happens? >> >> I'm inclined to think this torpedos any idea we might have of enabling >> hugepages automatically whenever possible. I think we should just add >> a GUC for this and call it good. If the state of the world improves >> sufficiently in the future, we can adjust, but I think for right now >> we should just do this in the simplest way possible and move on. > He is talking about transparent hugepages not hugepages afaics. Hmm. I guess you're right. But why would it be different? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tuesday, August 21, 2012 05:56:58 PM Robert Haas wrote: > On Tue, Aug 21, 2012 at 11:31 AM, Andres Freund <andres@2ndquadrant.com> wrote: > > On Tuesday, August 21, 2012 05:30:28 PM Robert Haas wrote: > >> On Thu, Aug 16, 2012 at 10:53 PM, David Gould <daveg@sonic.net> wrote: > >> > A warning, on RHEL 6.1 (2.6.32-131.4.1.el6.x86_64 #1 SMP) we have had > >> > horrible problems caused by transparent_hugepages running postgres on > >> > largish systems (128GB to 512GB memory, 32 cores). The system > >> > sometimes goes 99% system time and is very slow and unresponsive to > >> > the point of not successfully completing new tcp connections. Turning > >> > off > >> > transparent_hugepages fixes it. > >> > >> Yikes! Any idea WHY that happens? Afair there were several bugs that could cause that in earlier version of the hugepage feature. The prominent was something around never really stopping to search for mergeable pages even though the probability was small or such. I am not a rhel person, so I cannot directly interpret that kernel version, is that the latest kernel? > >> I'm inclined to think this torpedos any idea we might have of enabling > >> hugepages automatically whenever possible. I think we should just add > >> a GUC for this and call it good. If the state of the world improves > >> sufficiently in the future, we can adjust, but I think for right now > >> we should just do this in the simplest way possible and move on. > > > > He is talking about transparent hugepages not hugepages afaics. > > Hmm. I guess you're right. But why would it be different? Because in this case explicit hugepage usage reduces the pain instead of increasing it. And we cannot do much against transparent hugepages being enabled by default. Unless I misremember how things work the problem is/was independent of anonymous mmap or sysv shmem. Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Tue, 21 Aug 2012 18:06:38 +0200 Andres Freund <andres@2ndquadrant.com> wrote: > On Tuesday, August 21, 2012 05:56:58 PM Robert Haas wrote: > > On Tue, Aug 21, 2012 at 11:31 AM, Andres Freund > > <andres@2ndquadrant.com> > wrote: > > > On Tuesday, August 21, 2012 05:30:28 PM Robert Haas wrote: > > >> On Thu, Aug 16, 2012 at 10:53 PM, David Gould <daveg@sonic.net> > > >> wrote: > > >> > A warning, on RHEL 6.1 (2.6.32-131.4.1.el6.x86_64 #1 SMP) we > > >> > have had horrible problems caused by transparent_hugepages > > >> > running postgres on largish systems (128GB to 512GB memory, 32 > > >> > cores). The system sometimes goes 99% system time and is very > > >> > slow and unresponsive to the point of not successfully > > >> > completing new tcp connections. Turning off > > >> > transparent_hugepages fixes it. > > >> > > >> Yikes! Any idea WHY that happens? > Afair there were several bugs that could cause that in earlier version > of the hugepage feature. The prominent was something around never > really stopping to search for mergeable pages even though the > probability was small or such. This is what I think was going on. We did see a lot (99%) of time in some routine in the VM (I forget exactly which), and my interpretation was that it was trying to create hugepages from scattered fragments. > > >> I'm inclined to think this torpedos any idea we might have of > > >> enabling hugepages automatically whenever possible. I think we > > >> should just add a GUC for this and call it good. If the state of > > >> the world improves sufficiently in the future, we can adjust, but > > >> I think for right now we should just do this in the simplest way > > >> possible and move on. > > > > > > He is talking about transparent hugepages not hugepages afaics. > > > > Hmm. I guess you're right. But why would it be different? > Because in this case explicit hugepage usage reduces the pain instead > of increasing it. And we cannot do much against transparent hugepages > being enabled by default. > Unless I misremember how things work the problem is/was independent of > anonymous mmap or sysv shmem. Explicit hugepages work because the pages can be created early before all of memory is fragmented and you either succeed or fail. Transparent hugepages uses a daemon that looks for processe that might benefit from hugepages and tries to create hugepages on the fly. On a system that has been up for a some time memory may be so fragmented that this is just a waste of time. Real as opposed to transparent hugepages would be a huge win for applications that try to use high connection counts. Each backend attached to the postgresql shared memory uses its own set of page table entries at the rate of 2KB per MB of mapped shared memory. At 8GB of shared buffers and 1000 connections this uses 16GB just for page tables. -dg -- David Gould 510 282 0869 daveg@sonic.net If simplicity worked, the world would be overrun with insects.