Rohit Seth recently added support for the use of large TLB pages on
Linux if the processor architecture supports them (I believe the
SPARC, IA32, and IA64 have hugetlb support, more archs will probably
be added). The patch was merged into Linux 2.5.36, so it will more
than likely be in Linux 2.6. For more information on large TLB pages
and why they are generally viewed to improve database performance, see
here:
http://lwn.net/Articles/6535/ (the patch this refers to is an earlier implementation, I believe, but the
ideais the same) http://lwn.net/Articles/10293/ (item #4)
I'd like to enable PostgreSQL to use large TLB pages, if the OS and
processor support them. In talking to the author of the TLB patches
for Linux (Rohit Seth), he described the current API:
======
1) Only two system calls. These are:
sys_alloc_hugepages(int key, unsigned long addr, unsigned long len, int prot, int flag)
sys_free_hugepages(unsigned long addr)
Key will be equal to zero if user wants these huge pages as private.
A positive int value will be used for unrelated apps to share the same
physical huge pages.
addr is the user prefered address. The kernel may decide to allocate
a different virtual address (depending on availability and alignment
factors).
len is the requested size of memory wanted by user app.
prot could get the value of PROT_READ, PROT_WRITE, PROT_EXEC
flag: The only allowed value right now is IPC_CREAT, which in case of
shred hugepages (across processes) tells the kernel to create a new
segment if none is already created. If this flag is not provided and
there is no hugepage segment corresponding to the "key" then ENOENT is
returned. More like on the lines of IPC_CREAT flag for shmget
routine.
On success sys_alloc_hugepages returns the virtual address allocated
by kernel.
=====
So as I understand it, we would basically replace the calls to
shmget(), shmdt(), etc. with these system calls. The behavior will be
slightly different, however -- I'm not sure if this API supports
everything we expect the SysV IPC API to support (e.g. telling the #
of clients attached to a given segment). Can anyone comment on
exactly what functionality we expect when dealing with the storage
mechanism of the shared buffer?
Any comments would be appreciated.
Cheers,
Neil
--
Neil Conway <neilc@samurai.com> || PGP Key ID: DB3C29FC