Re: making use of large TLB pages - Mailing list pgsql-hackers

From Tom Lane
Subject Re: making use of large TLB pages
Date
Msg-id 12491.1033225545@sss.pgh.pa.us
Whole thread Raw
In response to Re: making use of large TLB pages  (Neil Conway <neilc@samurai.com>)
Responses Re: making use of large TLB pages  (Bruce Momjian <pgman@candle.pha.pa.us>)
List pgsql-hackers
Neil Conway <neilc@samurai.com> writes:
> If we used a key that would remain the same between runs of the
> postmaster, this should ensure that there isn't a possibility of two
> independant sets of backends operating on the same data dir. The most
> logical way to do this IMHO would be to just hash the data dir, but I
> suppose the current method of using the port number should work as
> well.

You should stick as closely as possible to the key logic currently used
for SysV shmem keys.  That logic is intended to cope with the case where
someone else is already using the key# that we initially generate, as
well as the case where we discover a collision with a pre-existing
backend set.  (We tell the difference by looking for a magic number at
the start of the shmem segment.)

Note that we do not assume the key is the same on each run; that's why
we store it in postmaster.pid.

>         (1) call sys_alloc_hugepages() without IPC_EXCL. If it returns
>             an error, we're in the clear: there's no page matching
>             that key. If it returns a pointer to a previously existing
>             segment, panic: it is very likely that there are some
>             orphaned backends still active.

s/panic/and the PG magic number appears in the segment header, panic/

>         - if we're compiling on a Linux system but the kernel headers
>           don't define the syscalls we need, use some reasonable
>           defaults (e.g. the syscall numbers for the current hugepage
>           syscalls in Linux 2.5)

I think this is overkill, and quite possibly dangerous.  If we don't see
the symbols then don't try to compile the code.

On the whole it seems that this allows a very nearly one-to-one mapping
to the existing SysV functionality.  We don't have the "number of
connected processes" syscall, perhaps, but we don't need it: if a
hugepages segment exists we can assume the number of connected processes
is greater than 0, and that's all we really need to know.

I think it's okay to stuff this support into the existing
port/sysv_shmem.c file, rather than make a separate file (particularly
given your point that we have to be able to fall back to SysV calls at
runtime).  I'd suggest reorganizing the code in that file slightly to
separate the actual syscalls from the controlling logic in
PGSharedMemoryCreate().  Probably also will have to extend the API for
PGSharedMemoryIsInUse() and RecordSharedMemoryInLockFile() to allow
three fields to be recorded in postmaster.pid, not two --- you'll want
a boolean indicating whether the stored key is for a SysV or hugepage
segment.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: How to REINDEX in high volume environments?
Next
From: Alvaro Herrera
Date:
Subject: Re: Vacuum from within a function crashes backend