Thread: Postgresql on SAN

Postgresql on SAN

From

Yannick Lecaillez

Date:

06 July 2004, 13:27:13

Hi ppl,
What need to do (understand, to devel) to allow several
postgres instance  running from several server to access to the
same data (no replication at all) hosted on a SAN ?
I'm probably wrong but i think this type of dev should
be easier to realize than replication ? Because all node are
always consistent since they use exactly the same data ? 
In fact i'm asking why opensource db go only to the
replication solution than "true" clustering solution using SAN
like Oracle RAC ? I don't see the big advantage of the replication
method (of course, for application which need more than few nodes
hosting "small" db). The price could be an answer but what about a db of
several hundred giga byte ? When a node is added it must contain this
capacity nearly for "nothing" (i thinks its more and more a waste when
number of node grow ...).
Is this a planed feature ? 
Sincerely, Yannick.

Re: Postgresql on SAN

From

Bruno Wolff III

Date:

06 July 2004, 14:02:23

On Tue, Jul 06, 2004 at 18:17:16 +0200, Yannick Lecaillez <yl@itioweb.com> wrote:
> Hi ppl,
> 
>     What need to do (understand, to devel) to allow several
> postgres instance  running from several server to access to the
> same data (no replication at all) hosted on a SAN ?

Only once such instance can be running at once. You want to have some
sort of interlock to make sure this doesn't happen or you will lose
your data. You can use this system for rapid fail over if the primary
server dies.

Re: Postgresql on SAN

From

Richard Huxton

Date:

06 July 2004, 14:02:26

Yannick Lecaillez wrote:
> Hi ppl,
> 
>     What need to do (understand, to devel) to allow several
> postgres instance  running from several server to access to the
> same data (no replication at all) hosted on a SAN ?
> 
>     I'm probably wrong but i think this type of dev should
> be easier to realize than replication ? Because all node are
> always consistent since they use exactly the same data ? 

The problem is the PG uses shared memory to coordinate the various 
backend processes. Sharing the files is not the difficult bit, sharing 
info about who is doing what with which tuple/has which locks is.

>     In fact i'm asking why opensource db go only to the
> replication solution than "true" clustering solution using SAN
> like Oracle RAC ? I don't see the big advantage of the replication
> method (of course, for application which need more than few nodes
> hosting "small" db). The price could be an answer but what about a db of
> several hundred giga byte ? When a node is added it must contain this
> capacity nearly for "nothing" (i thinks its more and more a waste when
> number of node grow ...).

The advantage of replication is that you can just use standard server 
boxes. There is a company offering specialised high-bandwidth hardware 
that is supposed to offer PG clustering, but I'm afraid I know no more 
about it.

--   Richard Huxton  Archonet Ltd

Re: Postgresql on SAN

From

Alvaro Herrera

Date:

06 July 2004, 14:08:41

On Tue, Jul 06, 2004 at 06:17:16PM +0200, Yannick Lecaillez wrote:

>     What need to do (understand, to devel) to allow several
> postgres instance  running from several server to access to the
> same data (no replication at all) hosted on a SAN ?

Clustered shared memory, cluster-wide spinlocks.  And with decent
performance, while at it ...

>     Is this a planed feature ? 

No, I don't think so.

-- 
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Es filósofo el que disfruta con los enigmas" (G. Coli)

Re: Postgresql on SAN

From

Peter Eisentraut

Date:

06 July 2004, 14:10:56

Yannick Lecaillez wrote:
>     What need to do (understand, to devel) to allow several
> postgres instance  running from several server to access to the
> same data (no replication at all) hosted on a SAN ?

This is impossible.

You can use a SAN if only one node is active at a time, and that is 
indeed a good high-availability solution that is usually easier to set 
up and maintain than replication, but you can't have several instances 
running on the same data at once.

Re: Postgresql on SAN

From

Yannick Lecaillez

Date:

06 July 2004, 14:33:10

Le mar 06/07/2004 à 19:07, Alvaro Herrera a écrit :
> On Tue, Jul 06, 2004 at 06:17:16PM +0200, Yannick Lecaillez wrote:
> 
> >     What need to do (understand, to devel) to allow several
> > postgres instance  running from several server to access to the
> > same data (no replication at all) hosted on a SAN ?
> 
> Clustered shared memory, cluster-wide spinlocks.  And with decent
> performance, while at it ...
Perhaps could be interesting to look at cluster file system which
seems to have same problems and find solution about locking (i.eOpenGFS). 

http://opengfs.sourceforge.net/showdoc.php?docpath=cvsmirror/opengfs/docs/ogfs-locking&doctitle=Locking&docauthor=ben.m.cahill(at)intel.com

Found on google a clustered shared memory (openMosix project)
http://www.unixreview.com/documents/s=8989/ur0404l/

I would have the pgsql-hackers genius for do that :) . I think its the
only feature which force company to buy 50000$ Oracle licence ...

Sincerely, Yannick.

Re: Postgresql on SAN

From

Simon Riggs

Date:

06 July 2004, 18:57:08

On Tue, 2004-07-06 at 18:22, Yannick Lecaillez wrote:
> Le mar 06/07/2004 à 19:07, Alvaro Herrera a écrit :
> > On Tue, Jul 06, 2004 at 06:17:16PM +0200, Yannick Lecaillez wrote:
> >
> > >     What need to do (understand, to devel) to allow several
> > > postgres instance  running from several server to access to the
> > > same data (no replication at all) hosted on a SAN ?
> >
> > Clustered shared memory, cluster-wide spinlocks.  And with decent
> > performance, while at it ...
> Perhaps could be interesting to look at cluster file system which
> seems to have same problems and find solution about locking (i.e
>  OpenGFS).
>
http://opengfs.sourceforge.net/showdoc.php?docpath=cvsmirror/opengfs/docs/ogfs-locking&doctitle=Locking&docauthor=ben.m.cahill(at)intel.com
>
> Found on google a clustered shared memory (openMosix project)
> http://www.unixreview.com/documents/s=8989/ur0404l/
>
> I would have the pgsql-hackers genius for do that :) . I think its the
> only feature which force company to buy 50000$ Oracle licence ...
>

I would note that Oracle first released OPS on UNIX at 7.0.13, in 1993.
Major performance issues were not resolved until 9i emerged, almost 10
years later...

It won't take PostgreSQL 10 years, but its impossible now, as Peter
observes.

Best regards, Simon Riggs

Re: Postgresql on SAN

From

Greg Stark

Date:

06 July 2004, 22:14:08

Simon Riggs <simon@2ndquadrant.com> writes:

> On Tue, 2004-07-06 at 18:22, Yannick Lecaillez wrote:
>
> > I would have the pgsql-hackers genius for do that :) . I think its the
> > only feature which force company to buy 50000$ Oracle licence ...

Fwiw, I think you've underestimated the price on those Oracle licenses by an
order of magnitude at least.

If there are as many companies willing to pony up for some postgres developers
I'm sure there would be people interested, but it's not the kind of project
someone's going to be doing in their spare time. As Oracle found, it's *hard*.
And moreover, it results in a system that's hard to use. Those companies that
need are also ponying up much more than $50k/year just for the DBAs capable of
running such beasts.

Free Software runs on a very different operating model than commercial
software. Instead of a sharp division between paying clients and profiting
developers, most Free Software exists because the programmers themselves found
they had a need and solved it for themselves.

For that reason I would be skeptical about seeing huge clustered postgres
systems a la Oracle OPS, simply because it's a very specialized need, and not
one that any postgres developer is likely to run into on his own. They're more
likely to run screaming when asked to provide such a monster than sit down and
start coding...

What most people need is some way to promise rapid recovery from failures. In
my personal opinion the smoothest most reliable method of providing that is a
PITR-based warm standby machine. I'm overjoyed that someone else saw the same
need and has been working feverishly on that for 7.5. 

There does seem to be an awful lot of people on this list lobbying for some
feature or another. It always seems a bit weird, like a basic misunderstanding
is at play. The developers are working for their employers or for themselves.
It doesn't really matter how many new users the Windows port will bring on,
for example. This isn't some proselytising religion. It'll get done if a
developer needs it either for him- or herself or for a client, not because you
made some convincing argument about how there are lots of other people who
would benefit.

-- 
greg

Re: Postgresql on SAN

From

Yannick Lecaillez

Date:

07 July 2004, 07:49:32

Thanks a lot for all people which answer.

I have this "clustering on SAN" problem today and i think it could be
less harder to implement this today than it was for Oracle in 1993
(since i can find a lot of work in opensource which could be interesting
in this project : distributed lock, clustered shared memory, etc ...).

I will try to have a look about all these things btw, i haven't got
the pretention to suceed to do anything ... Just look and see
a bit more deeply how it could be hard.

Is there something like a "big picture" about postgres internal ?
What path is used from the SQL query to the hard-disk files ? About
the internal locking mechanism ? about the usage of the shared memory ?
I see there is everything described in the postgres doc but no "big
picture" which could help me to understand faster ... :(

I'm using postgres for several year and i have nerver problem with
it i'm really happy with these trigger and inheritance mechanism which
help me more than one time and i would present my respect to those guy
which write this so good piece of software.

I hope to see you later with something to show ...
Sincerly, Yannick.

Re: Postgresql on SAN

From

Bruce Momjian

Date:

07 July 2004, 09:53:31

Yannick Lecaillez wrote:
> Thanks a lot for all people which answer.
> 
> I have this "clustering on SAN" problem today and i think it could be
> less harder to implement this today than it was for Oracle in 1993
> (since i can find a lot of work in opensource which could be interesting
> in this project : distributed lock, clustered shared memory, etc ...).
> 
> I will try to have a look about all these things btw, i haven't got
> the pretention to suceed to do anything ... Just look and see
> a bit more deeply how it could be hard.
> 
> Is there something like a "big picture" about postgres internal ?
> What path is used from the SQL query to the hard-disk files ? About
> the internal locking mechanism ? about the usage of the shared memory ?
> I see there is everything described in the postgres doc but no "big
> picture" which could help me to understand faster ... :(

Have you read the developers FAQ on the developers web page?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: Postgresql on SAN

From

Robert Treat

Date:

07 July 2004, 14:51:28

On Wed, 2004-07-07 at 06:39, Yannick Lecaillez wrote:
> I have this "clustering on SAN" problem today and 
<snip>

Me thinks you've fallen into the trap of proprietary vendors.  Your
problem isn't that you need "clustering on SAN", your problem is you
want some form of high availability solution for your database. You
*think* "clustering on San" is the best solution for this, but others
have come to differing opinions on this which is why we have things like
slony and pgpool (or clusgres for that matter). Not saying your wrong,
just saying that adding a feature shouldn't be your end goal, solving a
problem should be. 


Robert Treat
-- 
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL

SAN, clustering, MPI, Backplane Re: Postgresql on SAN

From

Andrew Piskorski

Date:

08 July 2004, 09:22:23

On Wed, Jul 07, 2004 at 12:39:34PM +0200, Yannick Lecaillez wrote:
> Thanks a lot for all people which answer.
> 
> I have this "clustering on SAN" problem today and i think it could be
> less harder to implement this today than it was for Oracle in 1993

You want to do clustering for failover/reliability reasons, for
performance/scalability reasons, or for both?

For some stuff to read, see the dozen or so links I posted here:
 http://openacs.org/forums/message-view?message_id=128060

E.g., the Lustre cluster file system claims full POSIX file system
semantics (locking, etc.), so you should certainly be able to run
PostgreSQL on it.  No idea how well that works, but it should
certainly let you do fail over.

Perhaps you could even somehow, eventually, get multiple PostgreSQL
instances on different machines to all cooperate with read/write
access to the same database files over the network.  (And without
using super-expensive SCI hardware the way Clusgres does.)  That might
get you a true cluster RDBMS, if it worked well.

Another thing I've been wondering about, but haven't been able to find
any discussion of:

Just how closely tied is PostgreSQL to its use of shared memory?  At
least in principle, could all direct use of shared memory be replaced
with MPI?  Could that even work at all?  And in the degenerate case
where the MPI implementation is itself actually using shared memory
underneath (rather than sending messages across the network), would
performance still suck?

In other words, if MPI is unsuitable for PostgreSQL, is it because the
SEMANTICS of MPI are inherently unsuitable, or is it just
implementation or performance issues?

What about PostgreSQL specifically makes message passing no good, and
is the same also true for ALL RDBMSs?  What about systems like
Backplane, which claims to be "the only open-source, transactional,
truly distributed database."?
 http://www.backplane.com/

-- 
Andrew Piskorski <atp@piskorski.com>
http://www.piskorski.com/

Re: SAN, clustering, MPI, Backplane Re: Postgresql on

From

Yannick Lecaillez

Date:

09 July 2004, 10:34:35

Le jeu 08/07/2004 à 14:22, Andrew Piskorski a écrit :
> You want to do clustering for failover/reliability reasons, for
> performance/scalability reasons, or for both?
for all that of course :)

> For some stuff to read, see the dozen or so links I posted here:
> 
>   http://openacs.org/forums/message-view?message_id=128060
I already see all these one, but thanks :)

> E.g., the Lustre cluster file system claims full POSIX file system
> semantics (locking, etc.), so you should certainly be able to run
> PostgreSQL on it.  No idea how well that works, but it should
> certainly let you do fail over.
As a standard filesystem on a SAN configured for take care about
only one node can mount the fs at the same time (using stonith for
example).

> Perhaps you could even somehow, eventually, get multiple PostgreSQL
> instances on different machines to all cooperate with read/write
> access to the same database files over the network.  (And without
> using super-expensive SCI hardware the way Clusgres does.)  That might
> get you a true cluster RDBMS, if it worked well.
It's exactly what i want.

> Just how closely tied is PostgreSQL to its use of shared memory?
I see very interesting article about openMosix which support clustered
shared memory and distributed locking and processus migration (at OS
level) : http://howto.ipng.be/MigSHM-openMosix/x90.html

Seems the only things which forbid using Postgres with OpenMosix
is the "Actually PostgreSQL uses shared memory but not the system
semaphores for locking it. Thus, it does not satisfy migShm constraints
and so it cannot benefit from migShm."

migShm constraint are here: http://mcaserta.com/maask/assumptions.html

> What about PostgreSQL specifically makes message passing no good, and
> is the same also true for ALL RDBMSs?  What about systems like
> Backplane, which claims to be "the only open-source, transactional,
> truly distributed database."?
> 
>   http://www.backplane.com/
Thanks for this links, since a time i thinked to found exactly
what i'm searching ... Until i read that :

LIMITATIONS     * Only one datatype is implemented, 'varchar'. In otherwords,       everything is a string.      * We
donot support triggers      * We do not support UNIQUE      * We only support AND clauses - no parenthesis or OR. Yet.
    ....
 

http://www.backplane.com/docs.shtml?doc=2

Re: SAN, clustering, MPI, Backplane Re: Postgresql on SAN

From

Tom Lane

Date:

10 July 2004, 00:48:35

Andrew Piskorski <atp@piskorski.com> writes:
> Another thing I've been wondering about, but haven't been able to find
> any discussion of:
> Just how closely tied is PostgreSQL to its use of shared memory?

Pretty damn closely.  You would not be happy with the performance of
anything that tried to insert a network communication layer into access
to what we think of as shared memory.

For a datapoint, check the list archives for discussions a few months
ago about performance with multiple Xeons.  We were seeing significant
performance degradation simply because the communications architecture
for multiple Xeon chips on one motherboard is badly designed :-(
The particular issue we were able to document was cache-line swapping
for spinlock variables, but AFAICS the issue would not go away even
if we had a magic zero-overhead locking mechanism: the Xeons would
still suck, because of contention for access to the shared variables
that the spinlocks are protecting.

OpenMosix is in the category of "does not work, and would be unusably
slow if it did work" ... AFAIK any similar design would have the same
problem.
        regards, tom lane

Re: SAN, clustering, MPI, Backplane Re: Postgresql on SAN

From

Gaetano Mendola

Date:

12 July 2004, 20:29:57

Tom Lane wrote:

> Andrew Piskorski <atp@piskorski.com> writes:
> 
>>Another thing I've been wondering about, but haven't been able to find
>>any discussion of:
>>Just how closely tied is PostgreSQL to its use of shared memory?
> 
> 
> Pretty damn closely.  You would not be happy with the performance of
> anything that tried to insert a network communication layer into access
> to what we think of as shared memory.
> 
> For a datapoint, check the list archives for discussions a few months
> ago about performance with multiple Xeons.  We were seeing significant
> performance degradation simply because the communications architecture
> for multiple Xeon chips on one motherboard is badly designed :-(
> The particular issue we were able to document was cache-line swapping
> for spinlock variables, but AFAICS the issue would not go away even
> if we had a magic zero-overhead locking mechanism: the Xeons would
> still suck, because of contention for access to the shared variables
> that the spinlocks are protecting.
> 
> OpenMosix is in the category of "does not work, and would be unusably
> slow if it did work" ... AFAIK any similar design would have the same
> problem.

However shall be nice if the postmaster is not selfish as is it now (two
postmastera are not able to work on the same shared memory segment),
projects like cashmere ( www.cs.rochester.edu/research/cashmere/ ) or
this www.tu-chemnitz.de/informatik/HomePages/RA/projects/VIA_SCI/via_sci_hardware.html

are able to run a single database mananged by a postmaster for each node in a
distributed architecture.

I seen these hardware working at CeBIT some years ago and it's possible to setup
any kind of configuration: linear, triangular, cube, ipercube. Basically each node
share part of the local RAM in order to create a big shared memory segment and the
shared memory is managed "without kernel intervention".



Regards
Gaetano Mendola