Thread: RE: [GENERAL] Replication of databases (esp. postgres)

RE: [GENERAL] Replication of databases (esp. postgres)

From
Michael Davis
Date:
I was thinking about creating a trigger on every table that would write an
insert, update, or delete statement into a log file.  Each replicated
database would pro-actively read these and post them.

    -----Original Message-----
    From:    Thomas Antepoth [SMTP:t_antepoth@hamm.netsurf.de]
    Sent:    Monday, February 15, 1999 9:16 PM
    To:    pgsql-general@postgreSQL.org
    Subject:    Re: [GENERAL] Replication of databases (esp.
postgres)

    Dustin,


    thank you for your answer.

    On Mon, 15 Feb 1999, dustin sallings wrote:

    > // {Replication of r/w Databases in a two pass scheme w/
renumbering}
    >     Just one note.  Renumbering isn't all that important.  I did
one
    > that had a non-numeric unique ID for identity type columns.

    I thought of a field named db_id. This field combined with
    a unique (at least to the local database) counter should give a
    unique index to each relation.


    The rule to follow is: Every record in a relation with the
    same db_id and the same rec_id belongs together.


    But think of this situation:

    DB 1 gets a relation.

    table1        table2
    record1       record1
                  record2

    Now this relation gets replicated to DB 2.

    After this is done, DB 2 adds a record3 to
    table2 this relation.

    Question: Which db_id should DB 2 use?

    His own? In this way record3 wouldn't belong
    to this relation.

    The id of DB 1? In this way the data may
    not be unique to DB 1 as DB 1 might enter
    the same values for record3 for a completely
    other relation and a collision is created.

    So i dropped the idea of having a db_id field
    to identify the record of a relation w/o
    renumbering the relations by a master.


    have a nice day! t++


RE: [GENERAL] Replication of databases (esp. postgres)

From
Thomas Antepoth
Date:
Michael,

On Wed, 17 Feb 1999, Michael Davis wrote:

> I was thinking about creating a trigger on every table that would write an
> insert, update, or delete statement into a log file.  Each replicated
> database would pro-actively read these and post them.

Right. This is one of the prerequesites to avoid a timestamp orientated
replication mechanism, which is known to be vulnerable to the lost
update phenomenon. (e.g. "newnews" on newsservers)

But even if you do that, you have to keep the relations between the
tables intact creating a really unique recordid for each record
in each database.

As Dustin pointed out, he solved it by merging a unique database
id with a sequence into a unique primary key.

This solution is only a partially one because of the insert
into a relation from database 2 into a relation initially
created by database 1.

If the newly inserted record gets the database id of database 2
it will later not be recognized belonging to the relation by a
"... where db_id=this_relations_host and recid=this_relations_sequence"
on every host.

If the newly inserted record gets the database id of database 1
it will be surely regocnized, but the sequence number of the
record may collide with a pre existing record in database 1
when replicated.

Multiple updates to a record of the same relation on different
non connected databases are another problem not solved by
this mechanism.

Every solution which i can imagine relies on a master, which
receives temporary data on inserts, creates unique record ids
and re exports the data to the slaves.

A slave will create in his local database a record id which is
known to be "not part of all databases". e.g. all numbers below
100000 are temporary.

Then it exports this data to the master and deletes the temporary
data, when the master acknowledged his import.

The master creates unique record ids among all temporary data
and reexports the data to all his slaves.

But what to do, if there are several local masters?

t++



Re: [GENERAL] Replication of databases (esp. postgres)

From
Clark Evans
Date:
Thomas Antepoth wrote:
>
> But what to do, if there are several local masters?

I may have interpreted the problem incorrectly, but here is
a similar problem I had...

Say I have 5 plants in a large company.  Each of these
plants has a variety of machinery and tools which have
detailed maintance records.

It dosn't make sence to have one centralized database,
so you have 5 seperate ones.  Each database with it's own
copy of the machines deployed at the plant.

Now comes the kicker.  About 3-5 times a year there is
a major migration of equipment between the plants.  Plant A
no longer needs such-and-such a tool, so the tool moves to
plant B.  The problem is, the history for that peice of
equipment is tied to several, some times a hundred or
more different production processes, and/or related machinery.
Also, primary key conflicts cause hudge problems.

To solve this problem, I switched from a numeric key to
an alpha numeric key.  I assigned a "birthplace" to
each machine... i.e., where it was first deployed.  Each
birthplace had it's alpha key, e.g., plant 'A', 'B', etc.
and the new object would be entered using the birth place
key followed by a unique sequence or 'birth number' at that
particular 'birth place'.  The result of concatination, is
a globally unique key that can move with the equipment
from place to place.

Each peice of equipment also has it's current location.
If when you query the peice of equipment in the local
database, and it's not the current location, then you
are not allowed to update that 'archive' copy of the data.
To make any changes, you must go over via remote database
link to the database which 'owns' the equipment.

This seemed to solve the problem.

Hope this helps.

Clark

Re: [GENERAL] Replication of databases (esp. postgres)

From
Clark Evans
Date:
Thomas Antepoth wrote:
>
> Michael,
>
> As Dustin pointed out, he solved it by merging a unique database
> id with a sequence into a unique primary key.

<snip>

> If the newly inserted record gets the database id of database 2
> it will later not be recognized belonging to the relation by a
> "... where db_id=this_relations_host and recid=this_relations_sequence"
> on every host.

You concatinate them.  They become a 'single value'.

Do not treat them as seperate things, you merge them together
to generate the uniqueness.  You _never_ seperate them, i.e.,
you _never_ change the db_id.

> If the newly inserted record gets the database id of database 1
> it will be surely regocnized, but the sequence number of the
> record may collide with a pre existing record in database 1
> when replicated.

Nope.  This is never a problem.  This is *exactly* what
the concatination gives you.

> Multiple updates to a record of the same relation on different
> non connected databases are another problem not solved by
> this mechanism.

Correct.  If the data moves... you need a pointer
telling it where it moved to.  When something moves
(a rare event in my case), you go back and update all of
the pointers in the 'local-cashe' copy of the tuple.

> Every solution which i can imagine relies on a master, which
> receives temporary data on inserts, creates unique record ids
> and re exports the data to the slaves.

Nope.  You can only query the local-cashe.  If you want to
update it, you need to 'transfer' the ownership of the object
(tuple) from one database to the other.  _Then_ you can
update the local copy.

> A slave will create in his local database a record id which is
> known to be "not part of all databases". e.g. all numbers below
> 100000 are temporary.

YIKES!  I would not do this.... it won't scale, and besides
that it's messy!  *evil grin*

> Then it exports this data to the master and deletes the temporary
> data, when the master acknowledged his import.

Why delete it?  If you just edited it, chances are you may
want to query it.  Sure.. you might get 'old' data, but for
relatively static data this is not a problem.  A replication
service can go around propigating updates during off-peak times.

> The master creates unique record ids among all temporary data
> and reexports the data to all his slaves.
>
> But what to do, if there are several local masters?

You have a 'chain of responsibility', the Design Pattern book
will help with this one.

>
> t++

Best,

Clark Evans