Thread: Replication Ideas
Hi-- I had been thinking of the issues of multimaster replication and how to do highly available, loadballanced clustering with PostgreSQL. Here is my outline, and I am looking for comments on the limitations of how this would work. Several PostgreSQL servers would share a virtual IP address, and would coordinate among themselves which will act as "Master" for the purposes of a single transaction (but connection could be easier). SELECT statements are handled exclusively by the transaction master while anything that writes to a database would be sent to all the the "Masters." At the end of each transaction the systems would poll eachother regarding whether they were all successful: 1: Any system which is successful in COMMITting the transaction must ignore any system which fails the transaction untill a recovery can be made. 2: Any system which fails in COMMITting the transaction must cease to be a master, provided that it recieves a signat from any other member of the cluster that indicates that that member succeeded in committing the transaction. 3: If all nodes fail to commit, then they all remain masters. Recovery would be done in several steps: 1: The database would be copied to the failed system using pg_dump. 2: A current recovery would be done from the transaction log. 3: This would be repeated in order to ensure that the database is up to date. 4: When two successive restores have been achieved with no new additions to the database, the "All Recovered" signal is sent to the cluster and the node is ready to start processing again. (need a better way of doing this). Note: Recovery is the problem, I know. my model is only a starting point for the purposes of discussion and trying to bring something to the conversation. Any thoughts or suggestions? Best Wishes, Chris Travers
On Sat, 2003-08-23 at 23:27, Chris Travers wrote: > Hi-- > > I had been thinking of the issues of multimaster replication and how to > do highly available, loadballanced clustering with PostgreSQL. Here is > my outline, and I am looking for comments on the limitations of how this > would work. > > Several PostgreSQL servers would share a virtual IP address, and would > coordinate among themselves which will act as "Master" for the purposes > of a single transaction (but connection could be easier). SELECT > statements are handled exclusively by the transaction master while > anything that writes to a database would be sent to all the the > "Masters." At the end of each transaction the systems would poll > eachother regarding whether they were all successful: > > 1: Any system which is successful in COMMITting the transaction must > ignore any system which fails the transaction untill a recovery can be made. > > 2: Any system which fails in COMMITting the transaction must cease to > be a master, provided that it recieves a signat from any other member of > the cluster that indicates that that member succeeded in committing the > transaction. > > 3: If all nodes fail to commit, then they all remain masters. > > Recovery would be done in several steps: > > 1: The database would be copied to the failed system using pg_dump. > 2: A current recovery would be done from the transaction log. > 3: This would be repeated in order to ensure that the database is up to > date. > 4: When two successive restores have been achieved with no new > additions to the database, the "All Recovered" signal is sent to the > cluster and the node is ready to start processing again. (need a better > way of doing this). > > Note: Recovery is the problem, I know. my model is only a starting > point for the purposes of discussion and trying to bring something to > the conversation. This is vaguely similar to Two Phase Commit, which is a sine qua non of distributed transactions, which is the s.q.n. of multi-master replication. -- ----------------------------------------------------------------- Ron Johnson, Jr. ron.l.johnson@cox.net Jefferson, LA USA "Eternal vigilance is the price of liberty: power is ever stealing from the many to the few. The manna of popular liberty must be gathered each day, or it is rotten... The hand entrusted with power becomes, either from human depravity or esprit de corps, the necessary enemy of the people. Only by continual oversight can the democrat in office be prevented from hardening into a despot: only by unintermitted agitation can a people be kept sufficiently awake to principle not to let liberty be smothered in material prosperity... Never look, for an age when the people can be quiet and safe. At such times despotism, like a shrouding mist, steals over the mirror of Freedom" Wendell Phillips
Ron Johnson wrote: >This is vaguely similar to Two Phase Commit, which is a sine qua >non of distributed transactions, which is the s.q.n. of multi-master >replication. > > > I may be wrong, but if I recall correctly, one of the problems with a standard 2-phase commit is that if one server goes down, the other masters cannot commit their transactions. This would make a clustered database server have a downtime equivalent to the total downtime of all of its nodes. This is a real problem. Of course my understanding of Two Phase Commit may be incorrect, in which case, I would appreciate it if someone could point out where I am wrong. It had occurred to me that the issue was one of failure handling more than one of concept. I.e. the problem is how one node's failure is handled rather than the fundamental structure of Two Phase Commit. If a single node fails, we don't want that to take down the whole cluster, and I have actually revised my logic a bit more (to make it even safer). In this I assume that: 1: General failures on any one node are rare 2: A failure is more likely to prevent a transaction from being committed than allow one to be committed. This hot-failover solution requires a transparency from a client perspective-- i.e. the client should not have to choose a different server should one go and should not need to know when a server comes back up. This also means that we need to assume that a load balancing solution can be a part of the clustering solution. I would assume that this would require a shared IP address for the public interface of the server and a private communicatiions channel where each node has a separate IP address (similar to Microsoft's implimentation of Network Load Balancing). Also, different transactions within a single connection should be able to be handled by different nodes, so if one node goes down, users don't have to reconnect. So here is my suggested logic for high availablility/load balanced clustering: 1: All nodes recognize each user connection and delegage transactions rather than connections. 2: At the beginning of a transaction, nodes decide who will take it. Any operation which does not change the information or schema of the database is handled exclusively on that node. Other operations are distributed across nodes. 3: When the transaction is committed, the nodes "vote" on whether the commitment of the transaction is valid. Majority rules, and the minority must remove themselves from the cluster until they can synchronize their databases with the existing masters. If the vote is split 50/50 (i.e. one node fails in a 2 node cluster), success is considered more likely to be valid than failure, and the node(s) which failed to commit the transaction must remove themselves from the cluster until they can recover. Best Wishes, Chris Travers
On Mon, 2003-08-25 at 12:06, Chris Travers wrote: > Ron Johnson wrote: > > >This is vaguely similar to Two Phase Commit, which is a sine qua > >non of distributed transactions, which is the s.q.n. of multi-master > >replication. > > > > > > > > I may be wrong, but if I recall correctly, one of the problems with a > standard 2-phase commit is that if one server goes down, the other > masters cannot commit their transactions. This would make a clustered > database server have a downtime equivalent to the total downtime of all > of its nodes. This is a real problem. Of course my understanding of > Two Phase Commit may be incorrect, in which case, I would appreciate it > if someone could point out where I am wrong. Note that I didn't mean to imply that 2PC is sufficient to implement M-M. The DBMS designer(s) must decide what to do (like queue up changes) if 2PC fails. -- ----------------------------------------------------------------- Ron Johnson, Jr. ron.l.johnson@cox.net Jefferson, LA USA "Our computers and their computers are the same color. The conversion should be no problem!" Unknown
On Mon, Aug 25, 2003 at 10:06:22AM -0700, Chris Travers wrote: > Ron Johnson wrote: > > >This is vaguely similar to Two Phase Commit, which is a sine qua > >non of distributed transactions, which is the s.q.n. of multi-master > >replication. > > I may be wrong, but if I recall correctly, one of the problems with a > standard 2-phase commit is that if one server goes down, the other > masters cannot commit their transactions. Before the discussion goes any further, have you read the work related to Postgres-r? It's a substantially different animal from 2PC AFAIK. -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "Right now the sectors on the hard disk run clockwise, but I heard a rumor that you can squeeze 0.2% more throughput by running them counterclockwise. It's worth the effort. Recommended." (Gerry Pourwelle)
Alvaro Herrera wrote: >Before the discussion goes any further, have you read the work related >to Postgres-r? It's a substantially different animal from 2PC AFAIK. > > > Yes I have. Postgres-r is not a high-availability solution which is capable of transparent failover, although it is a very useful project on its own. Best Wishes, Chris Travers.
Tom Lane wrote: >Chris Travers <chris@travelamericas.com> writes: > > >>Yes I have. Postgres-r is not a high-availability solution which is >>capable of transparent failover, >> >> > >What makes you say that? My understanding is it's supposed to survive >loss of individual servers. > > regards, tom lane > > > > My mistake. I must have gotten them confused with another (asynchronous) replication project. Best Wishes, Chris Travers
Chris Travers <chris@travelamericas.com> writes: > Yes I have. Postgres-r is not a high-availability solution which is > capable of transparent failover, What makes you say that? My understanding is it's supposed to survive loss of individual servers. regards, tom lane
WARNING: This is getting long ... Postgres-R is a very interesting and inspiring idea. And I've been kicking that concept around for a while now. What I don't like about it is that it requires fundamental changes in the lock mechanism and that it is based on the assumption of very low lock conflict. <explain-PG-R> In Postgres-R a committing transaction sends it's workset (WS - a list of all updates done in this transaction) to the group communication system (GC). The GC guarantees total order, meaning that all nodes will receive all WSs in the same order, no matter how they have been sent. If a node receives back it's own WS before any error occured, it goes ahead and finalizes the commit. If it receives a foreign WS, it has to apply the whole WS and commit it before it can process anything else. If now a local transaction, in progress or while waiting for it's WS to come back, holds a lock that is required to process such remote WS, the local transaction needs to be aborted to unlock it's resources ... it lost the total order race. </explain-PG-R> Postgres-R requires that all remote WSs are applied and committed before a local transaction can commit. Otherwise it couldn't correctly detect a lock conflict. So there will not be any read ahead. And since the total order really counts here, it cannot apply any two remote WSs in parallel, a race condition could possibly exist and a later WS in the total order runs faster and locks up a previous one, so we have to squeeze all remote WSs through one single replication work process. And all the locally parallel executed transactions that wait for their WSs to come back have to wait until that poor little worker is done with the whole pile. Bye bye concurrency. And I don't know how the GC will deal with the backlog either. Could well choke on it. I do not see how this will scale well in a multi-SMP-system cluster. At least the serialization of WSs will become a horror if there is significant lock contention like in a standard TPC-C on the district row containing the order number counter. I don't know for sure, but I suspect that with this kind of bottleneck, Postgres-R will have to rollback more than 50% of it's transactions when there are more than 4 nodes under heavy load (like in a benchmark run). That will suck ... But ... initially I said that it is an inspiring concept ... soooo ... I am currently hacking around with some C+PL/TclU+Spread constructs that might form a rude kind of prototype creature. My changes to the Postgres-R concept are that there will be as many replicating slave processes as there are in summary masters out in the cluster ... yes, it will try to utilize all the CPU's in the cluster! For failover reliability, A committing transaction will hold before finalizing the commit and send it's "I'm ready" to the GC. Every replicator that reaches the same state send's "I'm ready" too. Spread guarantees in SAFE_MESS mode that messages are delivered to all nodes in a group or that at least LEAVE/DISCONNECT messages are deliverd before. So if a node receives more than 50% of "I'm ready", there would be a very small gap where multiple nodes have to fail in the same split second so that the majority of nodes does NOT commit. A node that reported "I'm ready" but lost more than 50% of the cluster before committing has to rollback and rejoin or wait for operator intervention. Now the idea is to split up the communication into GC distribution groups per transaction. So working master backends and associated replication backends will join/leave a unique group for every transaction in the cluster. This way, the per process communication is reduced to the required minimum. As said, I am hacking on some code ... Jan Chris Travers wrote: > Tom Lane wrote: > >>Chris Travers <chris@travelamericas.com> writes: >> >> >>>Yes I have. Postgres-r is not a high-availability solution which is >>>capable of transparent failover, >>> >>> >> >>What makes you say that? My understanding is it's supposed to survive >>loss of individual servers. >> >> regards, tom lane >> >> >> >> > My mistake. I must have gotten them confused with another > (asynchronous) replication project. > > Best Wishes, > Chris Travers > > > ---------------------------(end of broadcast)--------------------------- > TIP 9: the planner will ignore your desire to choose an index scan if your > joining column's datatypes do not match -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
Jan Wieck wrote: > WARNING: This is getting long ... > > Postgres-R is a very interesting and inspiring idea. And I've been > kicking that concept around for a while now. What I don't like about > it is that it requires fundamental changes in the lock mechanism and > that it is based on the assumption of very low lock conflict. > > <explain-PG-R> > In Postgres-R a committing transaction sends it's workset (WS - a list > of all updates done in this transaction) to the group communication > system (GC). The GC guarantees total order, meaning that all nodes > will receive all WSs in the same order, no matter how they have been > sent. > > If a node receives back it's own WS before any error occured, it goes > ahead and finalizes the commit. If it receives a foreign WS, it has to > apply the whole WS and commit it before it can process anything else. > If now a local transaction, in progress or while waiting for it's WS > to come back, holds a lock that is required to process such remote WS, > the local transaction needs to be aborted to unlock it's resources ... > it lost the total order race. > </explain-PG-R> > > Postgres-R requires that all remote WSs are applied and committed > before a local transaction can commit. Otherwise it couldn't correctly > detect a lock conflict. So there will not be any read ahead. And since > the total order really counts here, it cannot apply any two remote WSs > in parallel, a race condition could possibly exist and a later WS in > the total order runs faster and locks up a previous one, so we have to > squeeze all remote WSs through one single replication work process. > And all the locally parallel executed transactions that wait for their > WSs to come back have to wait until that poor little worker is done > with the whole pile. Bye bye concurrency. And I don't know how the GC > will deal with the backlog either. Could well choke on it. > > I do not see how this will scale well in a multi-SMP-system cluster. > At least the serialization of WSs will become a horror if there is > significant lock contention like in a standard TPC-C on the district > row containing the order number counter. I don't know for sure, but I > suspect that with this kind of bottleneck, Postgres-R will have to > rollback more than 50% of it's transactions when there are more than 4 > nodes under heavy load (like in a benchmark run). That will suck ... > > > But ... initially I said that it is an inspiring concept ... soooo ... > > I am currently hacking around with some C+PL/TclU+Spread constructs > that might form a rude kind of prototype creature. > > My changes to the Postgres-R concept are that there will be as many > replicating slave processes as there are in summary masters out in the > cluster ... yes, it will try to utilize all the CPU's in the cluster! > For failover reliability, A committing transaction will hold before > finalizing the commit and send it's "I'm ready" to the GC. Every > replicator that reaches the same state send's "I'm ready" too. Spread > guarantees in SAFE_MESS mode that messages are delivered to all nodes > in a group or that at least LEAVE/DISCONNECT messages are deliverd > before. So if a node receives more than 50% of "I'm ready", there > would be a very small gap where multiple nodes have to fail in the > same split second so that the majority of nodes does NOT commit. A > node that reported "I'm ready" but lost more than 50% of the cluster > before committing has to rollback and rejoin or wait for operator > intervention. > > Now the idea is to split up the communication into GC distribution > groups per transaction. So working master backends and associated > replication backends will join/leave a unique group for every > transaction in the cluster. This way, the per process communication is > reduced to the required minimum. > > > As said, I am hacking on some code ... > > > Jan > > Chris Travers wrote: > >> Tom Lane wrote: >> >>> Chris Travers <chris@travelamericas.com> writes: >>> >>> >>>> Yes I have. Postgres-r is not a high-availability solution which is >>>> capable of transparent failover, >>>> >>> >>> >>> What makes you say that? My understanding is it's supposed to survive >>> loss of individual servers. >>> >>> regards, tom lane >>> >>> >>> >>> >> My mistake. I must have gotten them confused with another >> (asynchronous) replication project. >> >> Best Wishes, >> Chris Travers >> >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 9: the planner will ignore your desire to choose an index scan if >> your >> joining column's datatypes do not match > > > As my british friends would say, "Bully for you",and I applaud you playing, struggling, learning from this for our sakes. Jeez, all I think about is me,huh?
On Mon, 25 Aug 2003, Tom Lane wrote: > Chris Travers <chris@travelamericas.com> writes: > > Yes I have. Postgres-r is not a high-availability solution which is > > capable of transparent failover, > > What makes you say that? My understanding is it's supposed to survive > loss of individual servers. How does it play 'catch up' went a server comes back online? note that I did go through the 'docs' on how it works, and am/was quite impressed at what they were doing ... but, if I have a large network, say, and one group is connecting to ServerA, and another group with ServerB, what happens when ServerA and ServerB loose network connectivity for any period of time? How do they re-sync when the network comes back up again?
"Marc G. Fournier" <scrappy@hub.org> writes: > On Mon, 25 Aug 2003, Tom Lane wrote: >> What makes you say that? My understanding is it's supposed to survive >> loss of individual servers. > How does it play 'catch up' went a server comes back online? The recovered server has to run through the part of the GCS data stream that it missed the first time. This is not conceptually different from recovering using archived WAL logs (or archived trigger-driven replication data streams). As with using WAL for recovery, you have to be able to archive the message stream until you don't need it any more. regards, tom lane
On Tue, 2003-08-26 at 22:37, Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > If you can detect if outside transactions conflict with your > > transaction, you should be able to determine if the outside transactions > > conflict with each other. > > Uh ... not necessarily. That amounts to assuming that every xact has > complete knowledge of the actions of every other, which is an assumption > I'd rather not make. Detecting that what you've done conflicts with > someone else is one thing, detecting that party B has conflicted with > party C is another league entirely. Maybe some sort of Lock Manager? A process running on each node keeps a tree structure of all locks, requested locks, what is (requested to be) locked, and the type of lock. If you are running multi-master replication, each LM keeps in sync with each other, thus creating a Distributed Lock Manager. (This would also be the key to implementing database clusters. Of course, the interface to the DLM would have to be pretty deep within Postgres itself...) Using a DLM, the postmaster on node_a would know that the postmaster on node_b has just locked a certain set of tuples and index keys, and (1) will queue up it's request to lock that data into that node's LM, (2) which will propagate it to the other nodes, (3) then when the node_a postmaster executes the COMMIT WORK, the node_b postmaster can obtain it's desired locks. (4) If the postmaster on node_[ac-z] needs to lock the that same data, it will then similarly queue up to wait until the node_b postmaster executes it's COMMIT WORK. Notes: a) this is, of course, not *sufficient* for multi-master b) yes, you need a fast, low latency network for the DLM chatter. This is a tried and true method of synchronization. DEC Rdb/VMS has been using it for 19 years as the underpinnings of it's cluster technology, and Oracle licensed it from them (well, really Compaq) for it's 9i RAC. -- ----------------------------------------------------------------- Ron Johnson, Jr. ron.l.johnson@cox.net Jefferson, LA USA "The UN couldn't break up a cookie fight in a Brownie meeting." Larry Miller
On 26 Aug 2003 at 3:01, Marc G. Fournier wrote: > > > On Mon, 25 Aug 2003, Tom Lane wrote: > > > Chris Travers <chris@travelamericas.com> writes: > > > Yes I have. Postgres-r is not a high-availability solution which is > > > capable of transparent failover, > > > > What makes you say that? My understanding is it's supposed to survive > > loss of individual servers. > > How does it play 'catch up' went a server comes back online? <dumb idea> PITR + archive logs daemon? Chances of a node and an archive log daemon going down simalrenously are pretty low. If archive log daemon works on another machin, the MTBF should be pretty acceptable.. </dumb idea> Bye Shridhar -- The Briggs-Chase Law of Program Development: To determine how long it will take to write and debug a program, take your best estimate, multiply that by two, add one, and convert to the next higher units.
Ron Johnson wrote: > Notes: > a) this is, of course, not *sufficient* for multi-master > b) yes, you need a fast, low latency network for the DLM chatter. "Fast" is an understatement. The DLM you're talking about would (in our case) need to use Spread's AGREED_MESS or SAFE_MESS service type, meaning guarantee of total order. A transaction that needs any type of lock sends that request into the DLM group and then waits. The incoming stream of lock messages determines success or failure. With the overhead of these service types I don't think one single communication group for all database backends in the whole cluster guaranteeing total order will be that efficient. > > This is a tried and true method of synchronization. DEC Rdb/VMS > has been using it for 19 years as the underpinnings of it's cluster > technology, and Oracle licensed it from them (well, really Compaq) > for it's 9i RAC. Are you sure they're using it that way? Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
On Thu, 2003-08-28 at 16:00, Jan Wieck wrote: > Ron Johnson wrote: > > > Notes: > > a) this is, of course, not *sufficient* for multi-master > > b) yes, you need a fast, low latency network for the DLM chatter. > > "Fast" is an understatement. The DLM you're talking about would (in our > case) need to use Spread's AGREED_MESS or SAFE_MESS service type, > meaning guarantee of total order. A transaction that needs any type of > lock sends that request into the DLM group and then waits. The incoming > stream of lock messages determines success or failure. With the overhead > of these service types I don't think one single communication group for > all database backends in the whole cluster guaranteeing total order will > be that efficient. I guess it's the differing protocols involved. DEC made clustering (including Rdb/VMS) work over an 80Mbps protocol, back in The Day, and HPaq says that it works fine now over fast ethernet. > > This is a tried and true method of synchronization. DEC Rdb/VMS > > has been using it for 19 years as the underpinnings of it's cluster > > technology, and Oracle licensed it from them (well, really Compaq) > > for it's 9i RAC. > > Are you sure they're using it that way? Not as sure as I am that the sun will rise in the east tomorrow, but, yes, I am highly confident that O modified DLM for use in 9i RAC. Note that O purchased Rdb/VMS from DEC back in 1994, along with the Engineers, so they have long knowledge of how it works in VMS. One of the reasons they bought Rdb was to merge the tech- nology into RDBMS. -- ----------------------------------------------------------------- Ron Johnson, Jr. ron.l.johnson@cox.net Jefferson, LA USA "they love our milk and honey, but preach about another way of living" Merle Haggard, "The Fighting Side Of Me"
Are these clusters physically together using dedicate LAN lines .... or are they synchronizing over the Interwait? Ron Johnson wrote: >On Thu, 2003-08-28 at 16:00, Jan Wieck wrote: > > >>Ron Johnson wrote: >> >> >> >>>Notes: >>>a) this is, of course, not *sufficient* for multi-master >>>b) yes, you need a fast, low latency network for the DLM chatter. >>> >>> >>"Fast" is an understatement. The DLM you're talking about would (in our >>case) need to use Spread's AGREED_MESS or SAFE_MESS service type, >>meaning guarantee of total order. A transaction that needs any type of >>lock sends that request into the DLM group and then waits. The incoming >>stream of lock messages determines success or failure. With the overhead >>of these service types I don't think one single communication group for >>all database backends in the whole cluster guaranteeing total order will >>be that efficient. >> >> > >I guess it's the differing protocols involved. DEC made clustering >(including Rdb/VMS) work over an 80Mbps protocol, back in The Day, >and HPaq says that it works fine now over fast ethernet. > > > >>>This is a tried and true method of synchronization. DEC Rdb/VMS >>>has been using it for 19 years as the underpinnings of it's cluster >>>technology, and Oracle licensed it from them (well, really Compaq) >>>for it's 9i RAC. >>> >>> >>Are you sure they're using it that way? >> >> > >Not as sure as I am that the sun will rise in the east tomorrow, >but, yes, I am highly confident that O modified DLM for use in >9i RAC. Note that O purchased Rdb/VMS from DEC back in 1994, along >with the Engineers, so they have long knowledge of how it works >in VMS. One of the reasons they bought Rdb was to merge the tech- >nology into RDBMS. > > >
On Thu, 2003-08-28 at 17:52, Dennis Gearon wrote: > Are these clusters physically together using dedicate LAN lines .... or > are they synchronizing over the Interwait? There have been multiple methods over the years. In order: 1. Cluster Interconnect (CI) : There's a big box, called the CI, that in the early days was really a stripped PDP-11 running an RTOS. Each VAX (and, later, Alpha) is connected to the CI via a special adapters and cables. Disks are connected to an "HSC" Storage Controllers which also plug into the CI. Basic- ally, it's a big, intelligent switch. Disk sectors pass along the wires from VAX and Alpha to disks and back. DLM messages pass along the wires from node to node. With mul- tiple CI adapters, and HSCs (they were dual-ported) you could set up otal dual-redundancy. Up to 96 nodes can be cluster- ed. It still works, but Memory Channel is preferred now. 2. LAVC - Local Area VAX Cluster : In this scheme, disks were directly attached to nodes, and data (disk and DLM) is trans- ferred back and forth across the 10Mbps Ethernet. It could travel over TCP/IP or DECnet. For obvious reasons, LAVC was a lot cheaper and slower than CI. 3. SCSI clusters : SCSI disks are wired to a dual-ported "HSZ" Storage Controller. Then, SCSI cards on each of 2 nodes could be wired into a port. The SCSI disks could also be wired to a 2nd HSZ, and a 2nd SCSI card in each node plugged into that HSZ, dual-redundancy is achieved. With modern versions of VMS, the SCSI drivers can choose which SCSI card it wanted to send data through, to increase performance. DLM messages are passed via TCP/IP. Only 2 nodes can be clustered. A related method uses fiber channel disks on "HSG" Storage Controllers. 4. Memory Channel : A higher speed interconnect. Don't know much about it. 128 nodes can be clustered. Note that since DLM awareness is built deep into VMS and all the RTLs, every program is cluster-aware, no matter what type of cluster method is used. > Ron Johnson wrote: > > >On Thu, 2003-08-28 at 16:00, Jan Wieck wrote: > > > > > >>Ron Johnson wrote: > >> > >> > >> > >>>Notes: > >>>a) this is, of course, not *sufficient* for multi-master > >>>b) yes, you need a fast, low latency network for the DLM chatter. > >>> > >>> > >>"Fast" is an understatement. The DLM you're talking about would (in our > >>case) need to use Spread's AGREED_MESS or SAFE_MESS service type, > >>meaning guarantee of total order. A transaction that needs any type of > >>lock sends that request into the DLM group and then waits. The incoming > >>stream of lock messages determines success or failure. With the overhead > >>of these service types I don't think one single communication group for > >>all database backends in the whole cluster guaranteeing total order will > >>be that efficient. > >> > >> > > > >I guess it's the differing protocols involved. DEC made clustering > >(including Rdb/VMS) work over an 80Mbps protocol, back in The Day, > >and HPaq says that it works fine now over fast ethernet. > > > > > > > >>>This is a tried and true method of synchronization. DEC Rdb/VMS > >>>has been using it for 19 years as the underpinnings of it's cluster > >>>technology, and Oracle licensed it from them (well, really Compaq) > >>>for it's 9i RAC. > >>> > >>> > >>Are you sure they're using it that way? > >> > >> > > > >Not as sure as I am that the sun will rise in the east tomorrow, > >but, yes, I am highly confident that O modified DLM for use in > >9i RAC. Note that O purchased Rdb/VMS from DEC back in 1994, along > >with the Engineers, so they have long knowledge of how it works > >in VMS. One of the reasons they bought Rdb was to merge the tech- > >nology into RDBMS. > > > > > > -- ----------------------------------------------------------------- Ron Johnson, Jr. ron.l.johnson@cox.net Jefferson, LA USA "Oh, great altar of passive entertainment, bestow upon me thy discordant images at such speed as to render linear thought impossible" Calvin, regarding TV