Re: [HACKERS] replicator - Mailing list pgsql-hackers

From Philippe Marchesseault
Subject Re: [HACKERS] replicator
Date
Msg-id 38714B6F.2DECAEC0@Videotron.ca
Whole thread Raw
In response to replicator  (Karel Zak - Zakkr <zakkr@zf.jcu.cz>)
Responses Re: [HACKERS] replicator
List pgsql-hackers
Hi Karel!

Karel Wrote:
>    node1:  SQL --IPC--> node-broker
>                       |
>                     TCP/IP
>                       |
>                    master-node --IPC--> replikator
>                                         |   |   |
>                                           libpq
>                                         |   |   |
>                                       node2 node..n
>
>(Is it right picture?)

Yes, you got the concept right. I admit it's a bit complicated. Your comments
made me go back to the drawing board and I found several flaws with the design.
The first one is that this design does not allow us to use Transaction Blocks.
An example might go a long way:

Node 1, Client1 (1,1) Issues a begin statement ---> Node 2 Client 1 (2,1) (the
replicator process) sends this command.
(1, 1) Sends a INSERT statement.  ---> (2,1) Sends the INSERT to the backend.
Node 2 Client 2 (2,2) Checks (SELECT) the data and the INSERT of 1,1 is not
there. That's normal, it was not commited.

Node 1, Client 2 (1,2) Issues a BEGIN statement. ---> (2,1) Receives a warning,
about the state not being in progress.
(1,2) Does some stuff...
(1, 2) issues a Rollback Statement ---> (2,1) Sends the rollback. Node 2 rolls
back all the transactions made since 1,1 sent the BEGIN.
(1, 1) Sends the final Commit  , It fails on the remote nodes because it was
rolled back.

So the problem is that we have more than two connections on a single link. It
could be fixed by sending the statements in a block only when we do a COMMIT.
But then we might have some performance problems with big blocks of inserts.
Also I am worried about UPDATES that could be done between separate COMMITs
thus putting the database out of sync. :-(

> IMHO is problem with node registration / authentification on master node.
> Why concept is not more upright? As:
>
>         SQL --IPC--> node-replicator
>                         |  |  |
>                      via libpq send data to all nodes with
>                      current client/backend auth.

Yes, the concept can be more simple but The above would create some performance
problems. If you had many nodes, it would take a long time to send the last
statement. You would have to wait until the statement was completly processed
by all the nodes. A better solution IMHO would be to have a bit more padding
between the node-replicator and the backend.

So it could become:

SQL --IPC--> node-replicator                          |   |   |     via TCP send statements to each node
    replicator (on local node)                          |        via libpq send data to       current (local) backend.
 

>  (not exist any master node, all nodes have connection to all nodes)

Exactly, if the replicator dies only the node dies, everything else keeps
working.

Looking foward to hearing from you,

Philippe Marchesseault

PS: Please excuse me for the DIFF, it's the first time I'm contributing to an
OSS project.



pgsql-hackers by date:

Previous
From: The Hermit Hacker
Date:
Subject: Re: [HACKERS] Source code format vote
Next
From: Stephen Birch
Date:
Subject: Re: [HACKERS] Inprise/Borland releasing Interbase as Open source