Thread: concurrent Postgres on NUMA - howto ?

concurrent Postgres on NUMA - howto ?

From
"Mauricio Breternitz"
Date:
Folks:    I'm planning a port of Postgres to a multiprocessor
architecture in which all nodes have both local memory
and fast access to a shared memory. Shared memory it more
expensive than local memory.
My intent is to put the shmem & lock structures in
shared memory, but use a copy-in / copy-out approach to
maintain coherence in the buffer cache:- copy buffer from shared memroy on buffer allocate- write back buffer to shared
memorywhen it is dirtied.
 
Is that enough ?
The idea sketch is as follows (mostly, changes
contained to storage/buffer/bufmgr.c):
-change BufferAlloc, etc, to create a node-local copy
of the buffer (from shared memory). Copy both the BufferDesc
entry and the buffer->data array
-change WriteBuffer to copy the (locally changed) bufferto shared memory (this is the point in which the BM_DIRTYbit is
set).[ I am assuming the buffer is locked & thisis a safe time to make the buffer visible to other backends].
 

[Assume, for this discussion, that the sem / locks structs in
shared memory have been ported & work ]. Ditto for the hash access.
My concern is whether that is enough to maintain consistency
in the buffer cache (i.e, are there other places in the code
where a backend might have a leftover pointer to somewhere in
the buffer cache ? ) Because, in the scheme above, the buffer
cache is not directly accessible to the backend except via this
copy in / copy -out approach.
[BTW, I think this might be a way of providing a 'cluster'
version of Postgers, by using some global communication module to
obtain/post the 'buffer cache' values]
    thanks        regards            Mauricio
                               mbjsql@hotmail.com



_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com



Re: concurrent Postgres on NUMA - howto ?

From
Tom Lane
Date:
"Mauricio Breternitz" <mbjsql@hotmail.com> writes:
>     My concern is whether that is enough to maintain consistency
> in the buffer cache

No, it isn't --- for one thing, WriteBuffer wouldn't cause other
backends to update their copies of the page.  At the very least you'd
need to synchronize where the LockBuffer calls are, not where
WriteBuffer is called.

I really question whether you want to do anything like this at all.
Seems like accessing the shared buffers right where they are will be
fastest; your approach will entail a huge amount of extra data copying.
Considering that a backend doesn't normally touch every byte on a page
that it accesses, I wouldn't be surprised if full-page copying would
net out to being more shared-memory traffic, rather than less.
        regards, tom lane


Re: concurrent Postgres on NUMA - howto ?

From
"Mauricio Breternitz"
Date:
Tom:  Notice that WriteBuffer would just put the fresh copy of the page
out in the shared space.  Other backends would get the latest  copy of the page when
THEY execute BufferAlloc() afterwards. [Remember, backends would
not have a local buffer cache, only (temporary) copies of one buffer
per BufferAlloc()/release pair].  [Granted about the bandwidth needs. In my target arch,
access to shmem is costlier and local mem, and cannot be done
via pointers (so a lot of code that might have pointers inside the
shmem buffer may need to be tracked down & changed)].  My idea is to use high-bandwidth access  via the
copy-in/copy-out
approach (hopefully pay only once that round-trip cost once per pair
BufferAlloc -> make buffer dirty].

[Mhy reasoning for this is that a backend needs to have exclusive
access to a buffer when it writes to it. And I think it 'advertises'
the new buffer contents to the world when it sets the BM_DIRTY flag.]
  About your suggestion of LockBuffer as synchronization points -
a simple protocol might be:          - copy 'in' the buffer on a READ. SHARE  or lock acquire             (may have to
becareful on an upgrade of a READ to a               write lock)   - copy 'out' the buffer on a WRITE lock release  I
wouldappreciate comments and input on this approach, as I
 
foresee putting a lot of effort into it soon,        regards                  Mauricio


>From: Tom Lane <tgl@sss.pgh.pa.us>
>To: "Mauricio Breternitz" <mbjsql@hotmail.com>
>CC: pgsql-hackers@postgresql.org
>Subject: Re: [HACKERS] concurrent Postgres on NUMA - howto ?
>Date: Mon, 23 Apr 2001 19:43:05 -0400
>
>"Mauricio Breternitz" <mbjsql@hotmail.com> writes:
> >     My concern is whether that is enough to maintain consistency
> > in the buffer cache
>
>No, it isn't --- for one thing, WriteBuffer wouldn't cause other
>backends to update their copies of the page.  At the very least you'd
>need to synchronize where the LockBuffer calls are, not where
>WriteBuffer is called.
>
>I really question whether you want to do anything like this at all.
>Seems like accessing the shared buffers right where they are will be
>fastest; your approach will entail a huge amount of extra data copying.
>Considering that a backend doesn't normally touch every byte on a page
>that it accesses, I wouldn't be surprised if full-page copying would
>net out to being more shared-memory traffic, rather than less.
>
>            regards, tom lane
>
>---------------------------(end of broadcast)---------------------------
>TIP 5: Have you checked our extensive FAQ?
>
>http://www.postgresql.org/users-lounge/docs/faq.html

_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com



Re: concurrent Postgres on NUMA - howto ?

From
Tom Lane
Date:
"Mauricio Breternitz" <mbjsql@hotmail.com> writes:
>    Notice that WriteBuffer would just put the fresh copy of the page
> out in the shared space.
>    Other backends would get the latest  copy of the page when
> THEY execute BufferAlloc() afterwards.

You seem to be assuming that BufferAlloc is mutually exclusive across
backends --- it's not.  As I said, you'd have to look at transferring
data at LockBuffer time to make this work.

>    [Granted about the bandwidth needs. In my target arch,
> access to shmem is costlier and local mem, and cannot be done
> via pointers

What?  How do you manage to memcpy out of shmem then?

> (so a lot of code that might have pointers inside the
> shmem buffer may need to be tracked down & changed)].

You're correct, Postgres assumes it can have pointers to data inside the
page buffers.  I don't think changing that is feasible.  I find it hard
to believe that you can't have pointers to shmem though; IMHO it's not
shmem if it can't be pointed at.

> [Mhy reasoning for this is that a backend needs to have exclusive
> access to a buffer when it writes to it. And I think it 'advertises'
> the new buffer contents to the world when it sets the BM_DIRTY flag.]

No.  BM_DIRTY only advises the buffer manager that the page must
eventually be written back to disk; it does not have anything to do with
when/whether other backends see data changes within the page.  One more
time: LockBuffer is what you need to be looking at.
        regards, tom lane