I was just contemplating how to make postgres parallel (for DSS
applications)... Has anyone done work on this? It looks to me like there
are a couple of obvious places to add parallel operation:
Stage 1) I/O , perhaps through MPIO - would improve tablescanning and
load/unload operations. One (or more) Postgresql servers would use
MPIO/ROMIO to access a parallel file system like PVFS or GPFS(IBM).
Stage 2) Parallel Postgres Servers, with the postmaster spawning off the
server on a different node (possibly borrowing some code from GNU queue)
and doing any buffer twiddling with RPC for that connection, The client
connection would still be through the proxy on the postmaster node? (kind
of like MOSIX)
It might be more efficient to also do a DSM for the disk buffers and
tables, but would be more complicated, I think...
To handle the case where data will be updated, any query doing update or
insert type actions would be restricted to run on the node with the
postmaster.
I don't see immediately how to make the postmasters parallel.
Thoughts? Anyone?
I was also contemplating the possibility of using some of the techniques
of "Monet" (see monet on sourceforge) which just became open source
relatively recently. It makes heavy use of the notion of decomposed
storage (each atttribute stored individually) which can be a huge win for
some kinds of queries as it dramatically reduces IO, cache misses and
other time intensive activities. I have not given that as much thought
yet (i.e. I haven't found the right places in the architecture for this
yet). It might be possible to snag the plan and use that to drive monet
itself (less coding?) instead of trying to build it into the executor
code directly.