Thread: PostgreSQL and HugePage
Dear All
I want to use hugepage function on Linux platform, my question is if PostgreSQL supports hugepage in default, if not, what's the code need to be modified?
Thank you for your greate support
Hsien-Wen
On Tue, Oct 19, 2010 at 8:39 PM, Hsien-Wen Chu <chu.hsien.wen@gmail.com> wrote: > I want to use hugepage function on Linux platform, my question is if > PostgreSQL supports hugepage in default, if not, what's the code need to be > modified? Unfortunately, I don't think this is too simple. PostgreSQL uses sysv shared memory, not POSIX shared memory. I don't know of a way to use sysv shared memory with hugetlb. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 20/10/10 15:10, Robert Haas wrote: > On Tue, Oct 19, 2010 at 8:39 PM, Hsien-Wen Chu<chu.hsien.wen@gmail.com> wrote: > >> I want to use hugepage function on Linux platform, my question is if >> PostgreSQL supports hugepage in default, if not, what's the code need to be >> modified? >> > Unfortunately, I don't think this is too simple. PostgreSQL uses sysv > shared memory, not POSIX shared memory. I don't know of a way to use > sysv shared memory with hugetlb. > > According to: http://devresources.linuxfoundation.org/dev/robustmutexes/src/fusyn.hg/Documentation/vm/hugetlbpage.txt shmget and friends are hugetlbpage aware, so it seems it should 'just work'. However, I have not checked... Cheers Mark
On 20/10/10 16:05, Mark Kirkwood wrote: > > > shmget and friends are hugetlbpage aware, so it seems it should 'just > work'. > Heh - provided you specify SHM_HUGETLB in the relevant call that is :-)
On Wed, Oct 20, 2010 at 04:08:37PM +1300, Mark Kirkwood wrote: > On 20/10/10 16:05, Mark Kirkwood wrote: > > > > > >shmget and friends are hugetlbpage aware, so it seems it should 'just > >work'. > > > > Heh - provided you specify > > SHM_HUGETLB > > > in the relevant call that is :-) I had a patch for this against 8.3 that I could update if there is any interest. I suspect it is helpful. -dg -- David Gould daveg@sonic.net 510 536 1443 510 282 0869 If simplicity worked, the world would be overrun with insects.
On Tue, Oct 19, 2010 at 11:30 PM, daveg <daveg@sonic.net> wrote: > On Wed, Oct 20, 2010 at 04:08:37PM +1300, Mark Kirkwood wrote: >> On 20/10/10 16:05, Mark Kirkwood wrote: >> > >> > >> >shmget and friends are hugetlbpage aware, so it seems it should 'just >> >work'. >> > >> >> Heh - provided you specify >> >> SHM_HUGETLB >> >> >> in the relevant call that is :-) > > I had a patch for this against 8.3 that I could update if there is any > interest. I suspect it is helpful. I think it would be a good feature. Of course, we would need appropriate documentation, and some benchmarks showing that it really works. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Oct 19, 2010 at 11:30 PM, daveg <daveg@sonic.net> wrote: >> On Wed, Oct 20, 2010 at 04:08:37PM +1300, Mark Kirkwood wrote: >>> Heh - provided you specify >>> SHM_HUGETLB >>> in the relevant call that is :-) >> I had a patch for this against 8.3 that I could update if there is any >> interest. I suspect it is helpful. > I think it would be a good feature. Of course, we would need > appropriate documentation, and some benchmarks showing that it really > works. I believe that for the equivalent Solaris option, we just automatically enable it when available. So there'd be no need for user documentation. However, I definitely *would* like to see some benchmarks proving that the change actually does something useful. I've always harbored the suspicion that this is just a knob to satisfy people who need knobs to frob. regards, tom lane
On Wed, Oct 20, 2010 at 10:10:00AM -0400, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > On Tue, Oct 19, 2010 at 11:30 PM, daveg <daveg@sonic.net> wrote: > >> On Wed, Oct 20, 2010 at 04:08:37PM +1300, Mark Kirkwood wrote: > >>> Heh - provided you specify > >>> SHM_HUGETLB > >>> in the relevant call that is :-) > > >> I had a patch for this against 8.3 that I could update if there is any > >> interest. I suspect it is helpful. > > > I think it would be a good feature. Of course, we would need > > appropriate documentation, and some benchmarks showing that it really > > works. > > I believe that for the equivalent Solaris option, we just automatically > enable it when available. So there'd be no need for user documentation. > However, I definitely *would* like to see some benchmarks proving that > the change actually does something useful. I've always harbored the > suspicion that this is just a knob to satisfy people who need knobs to > frob. > > regards, tom lane > Oracle apparently uses hugepages if they are available by first trying with the SHM_HUGETLB option. If it fails, they reissue the command without that option. This article does mention some of the benefits of the larger pagesizes with large shared memory regions: http://appcrawler.com/wordpress/?p=686 Regard, Ken
On Wed, Oct 20, 2010 at 7:10 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I believe that for the equivalent Solaris option, we just automatically > enable it when available. So there'd be no need for user documentation. > However, I definitely *would* like to see some benchmarks proving that > the change actually does something useful. I've always harbored the > suspicion that this is just a knob to satisfy people who need knobs to > frob. Well saving a few megabytes of kernel space memory isn't a bad thing. But I think the major effect is on forking new processes. Having to copy that page map is a major cost when you're talking about very large memory footprints. While machine memory has gotten larger the 4k page size hasn't. I don't think it's a big cost once all the processes have been forked if you're reusing them beyond perhaps slightly more efficient cache usage. -- greg
On Wed, Oct 20, 2010 at 12:17 PM, Greg Stark <gsstark@mit.edu> wrote: > I don't think it's a big cost once all the processes > have been forked if you're reusing them beyond perhaps slightly more > efficient cache usage. Hm, this site claims to get a 13% win just from the reduced tlb misses using a preload hack with Pg 8.2. That would be pretty substantial. http://oss.linbit.com/hugetlb/ -- greg
Excerpts from Greg Stark's message of mié oct 20 16:28:25 -0300 2010: > On Wed, Oct 20, 2010 at 12:17 PM, Greg Stark <gsstark@mit.edu> wrote: > > I don't think it's a big cost once all the processes > > have been forked if you're reusing them beyond perhaps slightly more > > efficient cache usage. > > Hm, this site claims to get a 13% win just from the reduced tlb misses > using a preload hack with Pg 8.2. That would be pretty substantial. > > http://oss.linbit.com/hugetlb/ Wow, is there no other way to get the huge page size other than opening and reading /proc/meminfo? -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Wed, Oct 20, 2010 at 3:47 PM, daveg <daveg@sonic.net> wrote: > On Wed, Oct 20, 2010 at 12:28:25PM -0700, Greg Stark wrote: >> On Wed, Oct 20, 2010 at 12:17 PM, Greg Stark <gsstark@mit.edu> wrote: >> > I don't think it's a big cost once all the processes >> > have been forked if you're reusing them beyond perhaps slightly more >> > efficient cache usage. >> >> Hm, this site claims to get a 13% win just from the reduced tlb misses >> using a preload hack with Pg 8.2. That would be pretty substantial. >> >> http://oss.linbit.com/hugetlb/ > > That was my motivation in trying a patch. TLB misses can be a substantial > overhead. I'm not current on the state of play, but working at Sun's > benchmark lab on a DB TPC-B benchmark something for the first generation > of MP systems, something like 30% of all bus traffic was TLB misses. The > next iteration of the hardward had a much larger TLB. > > I have a client with 512GB memory systems, currently with 128GB configured > as postgresql buffer cache. Which is 32M TLB entires trying to fit in the > few dozed cpu TLB slots. I suspect there may be some contention. > > I'll benchmark of course. Do you mean 128GB shared buffers, or shared buffers + OS cache? I think that the general wisdom is that performance tails off beyond 8-10GB of shared buffers anyway, so a performance improvement on 128GB shared buffers might not mean much unless you can also show that 128GB shared buffers actually performs better than some smaller amount. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Oct 20, 2010 at 12:28:25PM -0700, Greg Stark wrote: > On Wed, Oct 20, 2010 at 12:17 PM, Greg Stark <gsstark@mit.edu> wrote: > > I don't think it's a big cost once all the processes > > have been forked if you're reusing them beyond perhaps slightly more > > efficient cache usage. > > Hm, this site claims to get a 13% win just from the reduced tlb misses > using a preload hack with Pg 8.2. That would be pretty substantial. > > http://oss.linbit.com/hugetlb/ That was my motivation in trying a patch. TLB misses can be a substantial overhead. I'm not current on the state of play, but working at Sun's benchmark lab on a DB TPC-B benchmark something for the first generation of MP systems, something like 30% of all bus traffic was TLB misses. The next iteration of the hardward had a much larger TLB. I have a client with 512GB memory systems, currently with 128GB configured as postgresql buffer cache. Which is 32M TLB entires trying to fit in the few dozed cpu TLB slots. I suspect there may be some contention. I'll benchmark of course. -dg -- David Gould daveg@sonic.net 510 536 1443 510 282 0869 If simplicity worked, the world would be overrun with insects.
On Wed, Oct 20, 2010 at 1:13 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, Oct 20, 2010 at 3:47 PM, daveg <daveg@sonic.net> wrote: >> On Wed, Oct 20, 2010 at 12:28:25PM -0700, Greg Stark wrote: >>> On Wed, Oct 20, 2010 at 12:17 PM, Greg Stark <gsstark@mit.edu> wrote: >>> > I don't think it's a big cost once all the processes >>> > have been forked if you're reusing them beyond perhaps slightly more >>> > efficient cache usage. >>> >>> Hm, this site claims to get a 13% win just from the reduced tlb misses >>> using a preload hack with Pg 8.2. That would be pretty substantial. >>> >>> http://oss.linbit.com/hugetlb/ >> >> That was my motivation in trying a patch. TLB misses can be a substantial >> overhead. I'm not current on the state of play, but working at Sun's >> benchmark lab on a DB TPC-B benchmark something for the first generation >> of MP systems, something like 30% of all bus traffic was TLB misses. The >> next iteration of the hardward had a much larger TLB. >> >> I have a client with 512GB memory systems, currently with 128GB configured >> as postgresql buffer cache. Which is 32M TLB entires trying to fit in the >> few dozed cpu TLB slots. I suspect there may be some contention. >> >> I'll benchmark of course. > > Do you mean 128GB shared buffers, or shared buffers + OS cache? I > think that the general wisdom is that performance tails off beyond > 8-10GB of shared buffers anyway, so a performance improvement on 128GB > shared buffers might not mean much unless you can also show that 128GB > shared buffers actually performs better than some smaller amount. I'm sure someone will correct me if I'm wrong, but when I looked at this a couple years ago I believe a side effect of using hugetlbs is that these segments are never swapped out. I made a weak attempt to patch postgres to use hugetlbs when allocating shared memory. If I can find that patch I'll send it out.. Mark
On Tue, Oct 19, 2010 at 8:30 PM, daveg <daveg@sonic.net> wrote: > On Wed, Oct 20, 2010 at 04:08:37PM +1300, Mark Kirkwood wrote: >> On 20/10/10 16:05, Mark Kirkwood wrote: >> > >> > >> >shmget and friends are hugetlbpage aware, so it seems it should 'just >> >work'. >> > >> >> Heh - provided you specify >> >> SHM_HUGETLB >> >> >> in the relevant call that is :-) > > I had a patch for this against 8.3 that I could update if there is any > interest. I suspect it is helpful. Oh, probably better than me digging up my broken one. Send it out as is if you don't want to update it. :) Regards, Mark
On Thu, Oct 21, 2010 at 08:16:27AM -0700, Mark Wong wrote: > On Tue, Oct 19, 2010 at 8:30 PM, daveg <daveg@sonic.net> wrote: > > On Wed, Oct 20, 2010 at 04:08:37PM +1300, Mark Kirkwood wrote: > >> On 20/10/10 16:05, Mark Kirkwood wrote: > >> > > >> > > >> >shmget and friends are hugetlbpage aware, so it seems it should 'just > >> >work'. > >> > > >> > >> Heh - provided you specify > >> > >> SHM_HUGETLB > >> > >> > >> in the relevant call that is :-) > > > > I had a patch for this against 8.3 that I could update if there is any > > interest. I suspect it is helpful. > > Oh, probably better than me digging up my broken one. Send it out as > is if you don't want to update it. :) I'll update it and see if I can get a largish machine to test, at least with pgbench on. But not today alas. -dg -- David Gould daveg@sonic.net 510 536 1443 510 282 0869 If simplicity worked, the world would be overrun with insects.
On Thu, Oct 21, 2010 at 12:10:22PM -0700, David Gould wrote: > On Thu, Oct 21, 2010 at 08:16:27AM -0700, Mark Wong wrote: > > On Tue, Oct 19, 2010 at 8:30 PM, daveg <daveg@sonic.net> wrote: > > > On Wed, Oct 20, 2010 at 04:08:37PM +1300, Mark Kirkwood wrote: > > >> On 20/10/10 16:05, Mark Kirkwood wrote: > > >> > > > >> > > > >> >shmget and friends are hugetlbpage aware, so it seems it should 'just > > >> >work'. > > >> > > > >> > > >> Heh - provided you specify > > >> > > >> SHM_HUGETLB > > >> > > >> > > >> in the relevant call that is :-) > > > > > > I had a patch for this against 8.3 that I could update if there is any > > > interest. I suspect it is helpful. > > > > Oh, probably better than me digging up my broken one. Send it out as > > is if you don't want to update it. :) > > I'll update it and see if I can get a largish machine to test, at least with > pgbench on. But not today alas. If you'd be so kind as to update it, others can probably find the aforementioned largish machine to test it on :) Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
as my known, FreeBSD implements this feature called superpage, it's similar with Solaris, so is it enabled in default? or any default parameter need to be set?
Many thank
Hsien-Wen
On Wed, Oct 20, 2010 at 10:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Oct 19, 2010 at 11:30 PM, daveg <daveg@sonic.net> wrote:
>> On Wed, Oct 20, 2010 at 04:08:37PM +1300, Mark Kirkwood wrote:>>> Heh - provided you specifyI believe that for the equivalent Solaris option, we just automatically
>>> SHM_HUGETLB
>>> in the relevant call that is :-)
>> I had a patch for this against 8.3 that I could update if there is any
>> interest. I suspect it is helpful.
> I think it would be a good feature. Of course, we would need
> appropriate documentation, and some benchmarks showing that it really
> works.
enable it when available. So there'd be no need for user documentation.
However, I definitely *would* like to see some benchmarks proving that
the change actually does something useful. I've always harbored the
suspicion that this is just a knob to satisfy people who need knobs to
frob.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hsien-Wen Chu <chu.hsien.wen@gmail.com> writes: > as my known, FreeBSD implements this feature called superpage, it's similar > with Solaris, so is it enabled in default? or any default parameter need to > be set? The Solaris-specific code is just that if SHM_SHARE_MMU is defined (by <sys/ipc.h>, I think) we include it in the flags parameter for shmat(2). If FreeBSD does it the same way as Solaris then that should work. regards, tom lane