Re: Report: Linux huge pages with Postgres - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Report: Linux huge pages with Postgres
Date
Msg-id 27373.1290989571@sss.pgh.pa.us
Whole thread Raw
In response to Re: Report: Linux huge pages with Postgres  (Kenneth Marshall <ktm@rice.edu>)
Responses Re: Report: Linux huge pages with Postgres
List pgsql-hackers
Kenneth Marshall <ktm@rice.edu> writes:
> On Sat, Nov 27, 2010 at 02:27:12PM -0500, Tom Lane wrote:
>> ... A bigger problem is that the shmem request size must be a
>> multiple of the system's hugepage size, which is *not* a constant
>> even though the test patch just uses 2MB as the assumed value.  For a
>> production-grade patch we'd have to scrounge the active value out of
>> someplace in the /proc filesystem (ick).

> I would expect that you can just iterate through the size possibilities
> pretty quickly and just use the first one that works -- no /proc
> groveling.

It's not really that easy, because (at least on the kernel version I
tested) it's not the shmget that fails, it's the later shmat.  Releasing
and reacquiring the shm segment would require significant code
restructuring, and at least on some platforms could produce weird
failure cases --- I seem to recall having heard of kernels where the
release isn't instantaneous, so that you could run up against SHMMAX
for no apparent reason.  Really you do want to scrape the value.

>> 2. You have to manually allocate some huge pages --- there doesn't
>> seem to be any setting that says "just give them out on demand".
>> I did this:
>> sudo sh -c "echo 600 >/proc/sys/vm/nr_hugepages"
>> which gave me a bit over 1GB of space reserved as huge pages.
>> Again, this'd have to be done over again at each system boot.

> Same.

The fact that hugepages have to be manually managed, and that any
unaccounted-for represent completely wasted RAM, seems like a pretty
large PITA to me.  I don't see anybody buying into that for gains
measured in single-digit percentages.

> 1GB of shared buffers would not be enough to cause TLB thrashing with
> most processors.

Well, bigger cases would be useful to try, although Simon was claiming
that the TLB starts to fall over at 4MB of working set.  I don't have a
large enough machine to try the sort of test you're suggesting, so if
anyone thinks this is worth pursuing, there's the patch ... go test it.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: contrib: auth_delay module
Next
From: Tom Lane
Date:
Subject: Re: profiling connection overhead