Thread: Postgresql on SAN
Hi ppl, What need to do (understand, to devel) to allow several postgres instance running from several server to access to the same data (no replication at all) hosted on a SAN ? I'm probably wrong but i think this type of dev should be easier to realize than replication ? Because all node are always consistent since they use exactly the same data ? In fact i'm asking why opensource db go only to the replication solution than "true" clustering solution using SAN like Oracle RAC ? I don't see the big advantage of the replication method (of course, for application which need more than few nodes hosting "small" db). The price could be an answer but what about a db of several hundred giga byte ? When a node is added it must contain this capacity nearly for "nothing" (i thinks its more and more a waste when number of node grow ...). Is this a planed feature ? Sincerely, Yannick.
On Tue, Jul 06, 2004 at 18:17:16 +0200, Yannick Lecaillez <yl@itioweb.com> wrote: > Hi ppl, > > What need to do (understand, to devel) to allow several > postgres instance running from several server to access to the > same data (no replication at all) hosted on a SAN ? Only once such instance can be running at once. You want to have some sort of interlock to make sure this doesn't happen or you will lose your data. You can use this system for rapid fail over if the primary server dies.
Yannick Lecaillez wrote: > Hi ppl, > > What need to do (understand, to devel) to allow several > postgres instance running from several server to access to the > same data (no replication at all) hosted on a SAN ? > > I'm probably wrong but i think this type of dev should > be easier to realize than replication ? Because all node are > always consistent since they use exactly the same data ? The problem is the PG uses shared memory to coordinate the various backend processes. Sharing the files is not the difficult bit, sharing info about who is doing what with which tuple/has which locks is. > In fact i'm asking why opensource db go only to the > replication solution than "true" clustering solution using SAN > like Oracle RAC ? I don't see the big advantage of the replication > method (of course, for application which need more than few nodes > hosting "small" db). The price could be an answer but what about a db of > several hundred giga byte ? When a node is added it must contain this > capacity nearly for "nothing" (i thinks its more and more a waste when > number of node grow ...). The advantage of replication is that you can just use standard server boxes. There is a company offering specialised high-bandwidth hardware that is supposed to offer PG clustering, but I'm afraid I know no more about it. -- Richard Huxton Archonet Ltd
On Tue, Jul 06, 2004 at 06:17:16PM +0200, Yannick Lecaillez wrote: > What need to do (understand, to devel) to allow several > postgres instance running from several server to access to the > same data (no replication at all) hosted on a SAN ? Clustered shared memory, cluster-wide spinlocks. And with decent performance, while at it ... > Is this a planed feature ? No, I don't think so. -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "Es filósofo el que disfruta con los enigmas" (G. Coli)
Yannick Lecaillez wrote: > What need to do (understand, to devel) to allow several > postgres instance running from several server to access to the > same data (no replication at all) hosted on a SAN ? This is impossible. You can use a SAN if only one node is active at a time, and that is indeed a good high-availability solution that is usually easier to set up and maintain than replication, but you can't have several instances running on the same data at once.
Le mar 06/07/2004 à 19:07, Alvaro Herrera a écrit : > On Tue, Jul 06, 2004 at 06:17:16PM +0200, Yannick Lecaillez wrote: > > > What need to do (understand, to devel) to allow several > > postgres instance running from several server to access to the > > same data (no replication at all) hosted on a SAN ? > > Clustered shared memory, cluster-wide spinlocks. And with decent > performance, while at it ... Perhaps could be interesting to look at cluster file system which seems to have same problems and find solution about locking (i.eOpenGFS). http://opengfs.sourceforge.net/showdoc.php?docpath=cvsmirror/opengfs/docs/ogfs-locking&doctitle=Locking&docauthor=ben.m.cahill(at)intel.com Found on google a clustered shared memory (openMosix project) http://www.unixreview.com/documents/s=8989/ur0404l/ I would have the pgsql-hackers genius for do that :) . I think its the only feature which force company to buy 50000$ Oracle licence ... Sincerely, Yannick.
On Tue, 2004-07-06 at 18:22, Yannick Lecaillez wrote: > Le mar 06/07/2004 à 19:07, Alvaro Herrera a écrit : > > On Tue, Jul 06, 2004 at 06:17:16PM +0200, Yannick Lecaillez wrote: > > > > > What need to do (understand, to devel) to allow several > > > postgres instance running from several server to access to the > > > same data (no replication at all) hosted on a SAN ? > > > > Clustered shared memory, cluster-wide spinlocks. And with decent > > performance, while at it ... > Perhaps could be interesting to look at cluster file system which > seems to have same problems and find solution about locking (i.e > OpenGFS). > http://opengfs.sourceforge.net/showdoc.php?docpath=cvsmirror/opengfs/docs/ogfs-locking&doctitle=Locking&docauthor=ben.m.cahill(at)intel.com > > Found on google a clustered shared memory (openMosix project) > http://www.unixreview.com/documents/s=8989/ur0404l/ > > I would have the pgsql-hackers genius for do that :) . I think its the > only feature which force company to buy 50000$ Oracle licence ... > I would note that Oracle first released OPS on UNIX at 7.0.13, in 1993. Major performance issues were not resolved until 9i emerged, almost 10 years later... It won't take PostgreSQL 10 years, but its impossible now, as Peter observes. Best regards, Simon Riggs
Simon Riggs <simon@2ndquadrant.com> writes: > On Tue, 2004-07-06 at 18:22, Yannick Lecaillez wrote: > > > I would have the pgsql-hackers genius for do that :) . I think its the > > only feature which force company to buy 50000$ Oracle licence ... Fwiw, I think you've underestimated the price on those Oracle licenses by an order of magnitude at least. If there are as many companies willing to pony up for some postgres developers I'm sure there would be people interested, but it's not the kind of project someone's going to be doing in their spare time. As Oracle found, it's *hard*. And moreover, it results in a system that's hard to use. Those companies that need are also ponying up much more than $50k/year just for the DBAs capable of running such beasts. Free Software runs on a very different operating model than commercial software. Instead of a sharp division between paying clients and profiting developers, most Free Software exists because the programmers themselves found they had a need and solved it for themselves. For that reason I would be skeptical about seeing huge clustered postgres systems a la Oracle OPS, simply because it's a very specialized need, and not one that any postgres developer is likely to run into on his own. They're more likely to run screaming when asked to provide such a monster than sit down and start coding... What most people need is some way to promise rapid recovery from failures. In my personal opinion the smoothest most reliable method of providing that is a PITR-based warm standby machine. I'm overjoyed that someone else saw the same need and has been working feverishly on that for 7.5. There does seem to be an awful lot of people on this list lobbying for some feature or another. It always seems a bit weird, like a basic misunderstanding is at play. The developers are working for their employers or for themselves. It doesn't really matter how many new users the Windows port will bring on, for example. This isn't some proselytising religion. It'll get done if a developer needs it either for him- or herself or for a client, not because you made some convincing argument about how there are lots of other people who would benefit. -- greg
Thanks a lot for all people which answer. I have this "clustering on SAN" problem today and i think it could be less harder to implement this today than it was for Oracle in 1993 (since i can find a lot of work in opensource which could be interesting in this project : distributed lock, clustered shared memory, etc ...). I will try to have a look about all these things btw, i haven't got the pretention to suceed to do anything ... Just look and see a bit more deeply how it could be hard. Is there something like a "big picture" about postgres internal ? What path is used from the SQL query to the hard-disk files ? About the internal locking mechanism ? about the usage of the shared memory ? I see there is everything described in the postgres doc but no "big picture" which could help me to understand faster ... :( I'm using postgres for several year and i have nerver problem with it i'm really happy with these trigger and inheritance mechanism which help me more than one time and i would present my respect to those guy which write this so good piece of software. I hope to see you later with something to show ... Sincerly, Yannick.
Yannick Lecaillez wrote: > Thanks a lot for all people which answer. > > I have this "clustering on SAN" problem today and i think it could be > less harder to implement this today than it was for Oracle in 1993 > (since i can find a lot of work in opensource which could be interesting > in this project : distributed lock, clustered shared memory, etc ...). > > I will try to have a look about all these things btw, i haven't got > the pretention to suceed to do anything ... Just look and see > a bit more deeply how it could be hard. > > Is there something like a "big picture" about postgres internal ? > What path is used from the SQL query to the hard-disk files ? About > the internal locking mechanism ? about the usage of the shared memory ? > I see there is everything described in the postgres doc but no "big > picture" which could help me to understand faster ... :( Have you read the developers FAQ on the developers web page? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
On Wed, 2004-07-07 at 06:39, Yannick Lecaillez wrote: > I have this "clustering on SAN" problem today and <snip> Me thinks you've fallen into the trap of proprietary vendors. Your problem isn't that you need "clustering on SAN", your problem is you want some form of high availability solution for your database. You *think* "clustering on San" is the best solution for this, but others have come to differing opinions on this which is why we have things like slony and pgpool (or clusgres for that matter). Not saying your wrong, just saying that adding a feature shouldn't be your end goal, solving a problem should be. Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
On Wed, Jul 07, 2004 at 12:39:34PM +0200, Yannick Lecaillez wrote: > Thanks a lot for all people which answer. > > I have this "clustering on SAN" problem today and i think it could be > less harder to implement this today than it was for Oracle in 1993 You want to do clustering for failover/reliability reasons, for performance/scalability reasons, or for both? For some stuff to read, see the dozen or so links I posted here: http://openacs.org/forums/message-view?message_id=128060 E.g., the Lustre cluster file system claims full POSIX file system semantics (locking, etc.), so you should certainly be able to run PostgreSQL on it. No idea how well that works, but it should certainly let you do fail over. Perhaps you could even somehow, eventually, get multiple PostgreSQL instances on different machines to all cooperate with read/write access to the same database files over the network. (And without using super-expensive SCI hardware the way Clusgres does.) That might get you a true cluster RDBMS, if it worked well. Another thing I've been wondering about, but haven't been able to find any discussion of: Just how closely tied is PostgreSQL to its use of shared memory? At least in principle, could all direct use of shared memory be replaced with MPI? Could that even work at all? And in the degenerate case where the MPI implementation is itself actually using shared memory underneath (rather than sending messages across the network), would performance still suck? In other words, if MPI is unsuitable for PostgreSQL, is it because the SEMANTICS of MPI are inherently unsuitable, or is it just implementation or performance issues? What about PostgreSQL specifically makes message passing no good, and is the same also true for ALL RDBMSs? What about systems like Backplane, which claims to be "the only open-source, transactional, truly distributed database."? http://www.backplane.com/ -- Andrew Piskorski <atp@piskorski.com> http://www.piskorski.com/
Le jeu 08/07/2004 à 14:22, Andrew Piskorski a écrit : > You want to do clustering for failover/reliability reasons, for > performance/scalability reasons, or for both? for all that of course :) > For some stuff to read, see the dozen or so links I posted here: > > http://openacs.org/forums/message-view?message_id=128060 I already see all these one, but thanks :) > E.g., the Lustre cluster file system claims full POSIX file system > semantics (locking, etc.), so you should certainly be able to run > PostgreSQL on it. No idea how well that works, but it should > certainly let you do fail over. As a standard filesystem on a SAN configured for take care about only one node can mount the fs at the same time (using stonith for example). > Perhaps you could even somehow, eventually, get multiple PostgreSQL > instances on different machines to all cooperate with read/write > access to the same database files over the network. (And without > using super-expensive SCI hardware the way Clusgres does.) That might > get you a true cluster RDBMS, if it worked well. It's exactly what i want. > Just how closely tied is PostgreSQL to its use of shared memory? I see very interesting article about openMosix which support clustered shared memory and distributed locking and processus migration (at OS level) : http://howto.ipng.be/MigSHM-openMosix/x90.html Seems the only things which forbid using Postgres with OpenMosix is the "Actually PostgreSQL uses shared memory but not the system semaphores for locking it. Thus, it does not satisfy migShm constraints and so it cannot benefit from migShm." migShm constraint are here: http://mcaserta.com/maask/assumptions.html > What about PostgreSQL specifically makes message passing no good, and > is the same also true for ALL RDBMSs? What about systems like > Backplane, which claims to be "the only open-source, transactional, > truly distributed database."? > > http://www.backplane.com/ Thanks for this links, since a time i thinked to found exactly what i'm searching ... Until i read that : LIMITATIONS * Only one datatype is implemented, 'varchar'. In otherwords, everything is a string. * We donot support triggers * We do not support UNIQUE * We only support AND clauses - no parenthesis or OR. Yet. .... http://www.backplane.com/docs.shtml?doc=2
Andrew Piskorski <atp@piskorski.com> writes: > Another thing I've been wondering about, but haven't been able to find > any discussion of: > Just how closely tied is PostgreSQL to its use of shared memory? Pretty damn closely. You would not be happy with the performance of anything that tried to insert a network communication layer into access to what we think of as shared memory. For a datapoint, check the list archives for discussions a few months ago about performance with multiple Xeons. We were seeing significant performance degradation simply because the communications architecture for multiple Xeon chips on one motherboard is badly designed :-( The particular issue we were able to document was cache-line swapping for spinlock variables, but AFAICS the issue would not go away even if we had a magic zero-overhead locking mechanism: the Xeons would still suck, because of contention for access to the shared variables that the spinlocks are protecting. OpenMosix is in the category of "does not work, and would be unusably slow if it did work" ... AFAIK any similar design would have the same problem. regards, tom lane
Tom Lane wrote: > Andrew Piskorski <atp@piskorski.com> writes: > >>Another thing I've been wondering about, but haven't been able to find >>any discussion of: >>Just how closely tied is PostgreSQL to its use of shared memory? > > > Pretty damn closely. You would not be happy with the performance of > anything that tried to insert a network communication layer into access > to what we think of as shared memory. > > For a datapoint, check the list archives for discussions a few months > ago about performance with multiple Xeons. We were seeing significant > performance degradation simply because the communications architecture > for multiple Xeon chips on one motherboard is badly designed :-( > The particular issue we were able to document was cache-line swapping > for spinlock variables, but AFAICS the issue would not go away even > if we had a magic zero-overhead locking mechanism: the Xeons would > still suck, because of contention for access to the shared variables > that the spinlocks are protecting. > > OpenMosix is in the category of "does not work, and would be unusably > slow if it did work" ... AFAIK any similar design would have the same > problem. However shall be nice if the postmaster is not selfish as is it now (two postmastera are not able to work on the same shared memory segment), projects like cashmere ( www.cs.rochester.edu/research/cashmere/ ) or this www.tu-chemnitz.de/informatik/HomePages/RA/projects/VIA_SCI/via_sci_hardware.html are able to run a single database mananged by a postmaster for each node in a distributed architecture. I seen these hardware working at CeBIT some years ago and it's possible to setup any kind of configuration: linear, triangular, cube, ipercube. Basically each node share part of the local RAM in order to create a big shared memory segment and the shared memory is managed "without kernel intervention". Regards Gaetano Mendola