Thread: RE: [GENERAL] Replication of databases (esp. postgres)
I was thinking about creating a trigger on every table that would write an insert, update, or delete statement into a log file. Each replicated database would pro-actively read these and post them. -----Original Message----- From: Thomas Antepoth [SMTP:t_antepoth@hamm.netsurf.de] Sent: Monday, February 15, 1999 9:16 PM To: pgsql-general@postgreSQL.org Subject: Re: [GENERAL] Replication of databases (esp. postgres) Dustin, thank you for your answer. On Mon, 15 Feb 1999, dustin sallings wrote: > // {Replication of r/w Databases in a two pass scheme w/ renumbering} > Just one note. Renumbering isn't all that important. I did one > that had a non-numeric unique ID for identity type columns. I thought of a field named db_id. This field combined with a unique (at least to the local database) counter should give a unique index to each relation. The rule to follow is: Every record in a relation with the same db_id and the same rec_id belongs together. But think of this situation: DB 1 gets a relation. table1 table2 record1 record1 record2 Now this relation gets replicated to DB 2. After this is done, DB 2 adds a record3 to table2 this relation. Question: Which db_id should DB 2 use? His own? In this way record3 wouldn't belong to this relation. The id of DB 1? In this way the data may not be unique to DB 1 as DB 1 might enter the same values for record3 for a completely other relation and a collision is created. So i dropped the idea of having a db_id field to identify the record of a relation w/o renumbering the relations by a master. have a nice day! t++
Michael, On Wed, 17 Feb 1999, Michael Davis wrote: > I was thinking about creating a trigger on every table that would write an > insert, update, or delete statement into a log file. Each replicated > database would pro-actively read these and post them. Right. This is one of the prerequesites to avoid a timestamp orientated replication mechanism, which is known to be vulnerable to the lost update phenomenon. (e.g. "newnews" on newsservers) But even if you do that, you have to keep the relations between the tables intact creating a really unique recordid for each record in each database. As Dustin pointed out, he solved it by merging a unique database id with a sequence into a unique primary key. This solution is only a partially one because of the insert into a relation from database 2 into a relation initially created by database 1. If the newly inserted record gets the database id of database 2 it will later not be recognized belonging to the relation by a "... where db_id=this_relations_host and recid=this_relations_sequence" on every host. If the newly inserted record gets the database id of database 1 it will be surely regocnized, but the sequence number of the record may collide with a pre existing record in database 1 when replicated. Multiple updates to a record of the same relation on different non connected databases are another problem not solved by this mechanism. Every solution which i can imagine relies on a master, which receives temporary data on inserts, creates unique record ids and re exports the data to the slaves. A slave will create in his local database a record id which is known to be "not part of all databases". e.g. all numbers below 100000 are temporary. Then it exports this data to the master and deletes the temporary data, when the master acknowledged his import. The master creates unique record ids among all temporary data and reexports the data to all his slaves. But what to do, if there are several local masters? t++
Thomas Antepoth wrote: > > But what to do, if there are several local masters? I may have interpreted the problem incorrectly, but here is a similar problem I had... Say I have 5 plants in a large company. Each of these plants has a variety of machinery and tools which have detailed maintance records. It dosn't make sence to have one centralized database, so you have 5 seperate ones. Each database with it's own copy of the machines deployed at the plant. Now comes the kicker. About 3-5 times a year there is a major migration of equipment between the plants. Plant A no longer needs such-and-such a tool, so the tool moves to plant B. The problem is, the history for that peice of equipment is tied to several, some times a hundred or more different production processes, and/or related machinery. Also, primary key conflicts cause hudge problems. To solve this problem, I switched from a numeric key to an alpha numeric key. I assigned a "birthplace" to each machine... i.e., where it was first deployed. Each birthplace had it's alpha key, e.g., plant 'A', 'B', etc. and the new object would be entered using the birth place key followed by a unique sequence or 'birth number' at that particular 'birth place'. The result of concatination, is a globally unique key that can move with the equipment from place to place. Each peice of equipment also has it's current location. If when you query the peice of equipment in the local database, and it's not the current location, then you are not allowed to update that 'archive' copy of the data. To make any changes, you must go over via remote database link to the database which 'owns' the equipment. This seemed to solve the problem. Hope this helps. Clark
Thomas Antepoth wrote: > > Michael, > > As Dustin pointed out, he solved it by merging a unique database > id with a sequence into a unique primary key. <snip> > If the newly inserted record gets the database id of database 2 > it will later not be recognized belonging to the relation by a > "... where db_id=this_relations_host and recid=this_relations_sequence" > on every host. You concatinate them. They become a 'single value'. Do not treat them as seperate things, you merge them together to generate the uniqueness. You _never_ seperate them, i.e., you _never_ change the db_id. > If the newly inserted record gets the database id of database 1 > it will be surely regocnized, but the sequence number of the > record may collide with a pre existing record in database 1 > when replicated. Nope. This is never a problem. This is *exactly* what the concatination gives you. > Multiple updates to a record of the same relation on different > non connected databases are another problem not solved by > this mechanism. Correct. If the data moves... you need a pointer telling it where it moved to. When something moves (a rare event in my case), you go back and update all of the pointers in the 'local-cashe' copy of the tuple. > Every solution which i can imagine relies on a master, which > receives temporary data on inserts, creates unique record ids > and re exports the data to the slaves. Nope. You can only query the local-cashe. If you want to update it, you need to 'transfer' the ownership of the object (tuple) from one database to the other. _Then_ you can update the local copy. > A slave will create in his local database a record id which is > known to be "not part of all databases". e.g. all numbers below > 100000 are temporary. YIKES! I would not do this.... it won't scale, and besides that it's messy! *evil grin* > Then it exports this data to the master and deletes the temporary > data, when the master acknowledged his import. Why delete it? If you just edited it, chances are you may want to query it. Sure.. you might get 'old' data, but for relatively static data this is not a problem. A replication service can go around propigating updates during off-peak times. > The master creates unique record ids among all temporary data > and reexports the data to all his slaves. > > But what to do, if there are several local masters? You have a 'chain of responsibility', the Design Pattern book will help with this one. > > t++ Best, Clark Evans