making use of large TLB pages - Mailing list pgsql-hackers

From Neil Conway
Subject making use of large TLB pages
Date
Msg-id 87lm5r1oqd.fsf@mailbox.samurai.com
Whole thread Raw
Responses Re: making use of large TLB pages
List pgsql-hackers
Rohit Seth recently added support for the use of large TLB pages on
Linux if the processor architecture supports them (I believe the
SPARC, IA32, and IA64 have hugetlb support, more archs will probably
be added). The patch was merged into Linux 2.5.36, so it will more
than likely be in Linux 2.6. For more information on large TLB pages
and why they are generally viewed to improve database performance, see
here:
       http://lwn.net/Articles/6535/ (the patch this refers to is an       earlier implementation, I believe, but the
ideais the same)       http://lwn.net/Articles/10293/ (item #4)
 

I'd like to enable PostgreSQL to use large TLB pages, if the OS and
processor support them. In talking to the author of the TLB patches
for Linux (Rohit Seth), he described the current API:

======
1) Only two system calls. These are:

sys_alloc_hugepages(int key, unsigned long addr, unsigned long len,                   int prot, int flag)

sys_free_hugepages(unsigned long addr)

Key will be equal to zero if user wants these huge pages as private.
A positive int value will be used for unrelated apps to share the same
physical huge pages.

addr is the user prefered address.  The kernel may decide to allocate
a different virtual address (depending on availability and alignment
factors).

len is the requested size of memory wanted by user app.

prot could get the value of PROT_READ, PROT_WRITE, PROT_EXEC

flag: The only allowed value right now is IPC_CREAT, which in case of
shred hugepages (across processes) tells the kernel to create a new
segment if none is already created.  If this flag is not provided and
there is no hugepage segment corresponding to the "key" then ENOENT is
returned.  More like on the lines of IPC_CREAT flag for shmget
routine.

On success sys_alloc_hugepages returns the virtual address allocated
by kernel.
=====

So as I understand it, we would basically replace the calls to
shmget(), shmdt(), etc. with these system calls. The behavior will be
slightly different, however -- I'm not sure if this API supports
everything we expect the SysV IPC API to support (e.g. telling the #
of clients attached to a given segment). Can anyone comment on
exactly what functionality we expect when dealing with the storage
mechanism of the shared buffer?

Any comments would be appreciated.

Cheers,

Neil

-- 
Neil Conway <neilc@samurai.com> || PGP Key ID: DB3C29FC



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: ECPG
Next
From: Yury Bokhoncovich
Date:
Subject: Re: Default privileges for 7.3