Thread: replicator

replicator

From
Karel Zak - Zakkr
Date:
Hi,

I look at your (Philippe's) replicator, but I don't good understand
your replication concept.

   node1:  SQL --IPC--> node-broker                      |                     TCP/IP                      |
      master-node --IPC--> replikator                                        |   |   |
       libpq                                        |   |   |                                      node2 node..n     
 

(Is it right picture?)

If I good understand, all nodes make connection to master node and data
replicate "replicator" on this master node. But it (master node) is very
critical space in this concept - If master node not work replication for 
*all* nodes is lost. Hmm.. but I want use replication for high available
applications...

IMHO is problem with node registration / authentification on master node.
Why concept is not more upright? As:
SQL --IPC--> node-replicator        |  |  |          via libpq send data to all nodes with                    current
client/backendauth.
 
(not exist any master node, all nodes have connection to all nodes)    


Use replicator as external proces and copy data from SQL to this replicator
via IPC is (your) very good idea. 
                        Karel


----------------------------------------------------------------------
Karel Zak <zakkr@zf.jcu.cz>              http://home.zf.jcu.cz/~zakkr/

Docs:        http://docs.linux.cz                    (big docs archive)    
Kim Project: http://home.zf.jcu.cz/~zakkr/kim/        (process manager)
FTP:         ftp://ftp2.zf.jcu.cz/users/zakkr/        (C/ncurses/PgSQL)
-----------------------------------------------------------------------



Re: [HACKERS] replicator

From
Philippe Marchesseault
Date:
Hi Karel!

Karel Wrote:
>    node1:  SQL --IPC--> node-broker
>                       |
>                     TCP/IP
>                       |
>                    master-node --IPC--> replikator
>                                         |   |   |
>                                           libpq
>                                         |   |   |
>                                       node2 node..n
>
>(Is it right picture?)

Yes, you got the concept right. I admit it's a bit complicated. Your comments
made me go back to the drawing board and I found several flaws with the design.
The first one is that this design does not allow us to use Transaction Blocks.
An example might go a long way:

Node 1, Client1 (1,1) Issues a begin statement ---> Node 2 Client 1 (2,1) (the
replicator process) sends this command.
(1, 1) Sends a INSERT statement.  ---> (2,1) Sends the INSERT to the backend.
Node 2 Client 2 (2,2) Checks (SELECT) the data and the INSERT of 1,1 is not
there. That's normal, it was not commited.

Node 1, Client 2 (1,2) Issues a BEGIN statement. ---> (2,1) Receives a warning,
about the state not being in progress.
(1,2) Does some stuff...
(1, 2) issues a Rollback Statement ---> (2,1) Sends the rollback. Node 2 rolls
back all the transactions made since 1,1 sent the BEGIN.
(1, 1) Sends the final Commit  , It fails on the remote nodes because it was
rolled back.

So the problem is that we have more than two connections on a single link. It
could be fixed by sending the statements in a block only when we do a COMMIT.
But then we might have some performance problems with big blocks of inserts.
Also I am worried about UPDATES that could be done between separate COMMITs
thus putting the database out of sync. :-(

> IMHO is problem with node registration / authentification on master node.
> Why concept is not more upright? As:
>
>         SQL --IPC--> node-replicator
>                         |  |  |
>                      via libpq send data to all nodes with
>                      current client/backend auth.

Yes, the concept can be more simple but The above would create some performance
problems. If you had many nodes, it would take a long time to send the last
statement. You would have to wait until the statement was completly processed
by all the nodes. A better solution IMHO would be to have a bit more padding
between the node-replicator and the backend.

So it could become:

SQL --IPC--> node-replicator                          |   |   |     via TCP send statements to each node
    replicator (on local node)                          |        via libpq send data to       current (local) backend.
 

>  (not exist any master node, all nodes have connection to all nodes)

Exactly, if the replicator dies only the node dies, everything else keeps
working.

Looking foward to hearing from you,

Philippe Marchesseault

PS: Please excuse me for the DIFF, it's the first time I'm contributing to an
OSS project.



Re: [HACKERS] replicator

From
Karel Zak - Zakkr
Date:
On Mon, 3 Jan 2000, Philippe Marchesseault wrote:

> So it could become:
> 
> SQL --IPC--> node-replicator
>                            |   |   |
>       via TCP send statements to each node
>                       replicator (on local node)
>                            |
>          via libpq send data to
>         current (local) backend.
> 
> >  (not exist any master node, all nodes have connection to all nodes)
> 
> Exactly, if the replicator dies only the node dies, everything else keeps
> working.

Hi,
I a little explore replication conception on Oracle and Sybase (in manuals).
(Know anyone some interesting links or publication about it?)
Firstly, I sure, untimely is write replication to PgSQL now, if we
haven't exactly conception for it. It need more suggestion from more
developers. We need firstly answers for next qestion:
1/ How replication concept choose for PG?2/ How manage transaction for nodes? (and we need define any
replicationprotocol for this)3/ How involve replication in current PG transaction code?
 

My idea (dream:-) is replication that allow you use full read-write on all
nodes and replication which use current transaction method in PG - not is
difference between more backends on one host or more backend on more hosts
- it makes "global transaction consistency".

Now is transaction manage via ICP (one host), my dream is alike manage 
this transaction, but between more host via TCP. (And make optimalization 
for this - transfer commited data/commands only.)


Any suggestion?


-------------------
Note:
(transaction oriented replication)
Sybase - I. model (only one node is read-write) 
 primary SQL data (READ-WRITE)               | replication agent (transaction log monitoring)    | primary distribution
server(one or more repl. servers)        |               /  |  \               |            nodes (READ-ONLY)
   |        secondary dist. server                         /  |  \                      nodes (READ-ONLY)
 

      If primary SQL is read-write and the other nodes *read-only*       => system good work if connection is disable
(dataare save to         replication-log and if connection is available log is write   to node).   
 

Sybase - II. model (all nodes read-write)
            SQL data 1 --->--+                        NODE I.               |            |               ^            |
      |     replication agent 1 (transaction log monitoring)               V        |    |        V               |
  |        replication server 1               |    ^               V               |        replication server 2
               NODE II.               |         |               ^         +-<-->--- SQL data 2               |
         |                                    replcation agent 2 -<--
 



Sorry, I not sure if I re-draw previous picture total good..
                            Karel