Thread: AW: AW: Postgres Replication

AW: AW: Postgres Replication

From

Zeugswetter Andreas SB

Date:

12 June 2001, 09:58:41

> Here are some disadvantages to using a "trigger based" approach:
> 
> 1) Triggers simply transfer individual data items when they 
> are modified, they do not keep track of transactions.
> 2) The execution of triggers within a database imposes a performance 
> overhead to that database.
> 3) Triggers require careful management by database administrators.  
> Someone needs to keep track of all the "alarms" going off.
> 4) The activation of triggers in a database cannot be easily 
> rolled back or undone.

Yes, points 2 and 3 are a given, although point 2 buys you the functionality
of transparent locking across all involved db servers.
Points 1 and 4 are only the case for a trigger mechanism that does 
not use remote connection and 2-phase commit. 

Imho an implementation that opens a separate client connection to the 
replication target is only suited for async replication, and for that a WAL 
based solution would probably impose less overhead.

Andreas

Re: AW: AW: Postgres Replication

From

Darren Johnson

Date:

12 June 2001, 10:39:11


> Imho an implementation that opens a separate client connection to the
> replication target is only suited for async replication, and for that a 
WAL
> based solution would probably impose less overhead.


Yes there is significant overhead with opening a connection to a 
client, so Postgres-R creates a pool of backends at start up, 
coupled with the group communication system (Ensemble) that
significantly reduces this issue.


Very good points,

Darren

Re: AW: AW: Postgres Replication

From

reinoud@xs4all.nl (Reinoud van Leeuwen)

Date:

12 June 2001, 18:59:38

On Tue, 12 Jun 2001 15:50:09 +0200, you wrote:

>
>> Here are some disadvantages to using a "trigger based" approach:
>> 
>> 1) Triggers simply transfer individual data items when they 
>> are modified, they do not keep track of transactions.
>> 2) The execution of triggers within a database imposes a performance 
>> overhead to that database.
>> 3) Triggers require careful management by database administrators.  
>> Someone needs to keep track of all the "alarms" going off.
>> 4) The activation of triggers in a database cannot be easily 
>> rolled back or undone.
>
>Yes, points 2 and 3 are a given, although point 2 buys you the functionality
>of transparent locking across all involved db servers.
>Points 1 and 4 are only the case for a trigger mechanism that does 
>not use remote connection and 2-phase commit. 
>
>Imho an implementation that opens a separate client connection to the 
>replication target is only suited for async replication, and for that a WAL 
>based solution would probably impose less overhead.

Well as I read back the thread I see 2 different approaches to
replication:

1: tight integrated replication. 
pro:
- bi-directional (or multidirectional): updates are possible
everywhere
- A cluster of servers allways has the same state. 
- it does not matter to which server you connect
con:
- network between servers will be a bottleneck, especially if it is a
WAN connection
- only full replication possible
- what happens if one server is down? (or the network between) are
commits still possible

2: async replication
pro:
- long distance possible
- no problems with network outages
- only changes are replicated, selects do not have impact 
- no locking issues accross servers
- partial replication possible (many->one (datawarehouse), or one-many
(queries possible everywhere, updates only central) 
- goof for failover situations (backup server is standing by)
con:
- bidirectional replication hard to set up (you'll have to implement
conflict resolution according to your business rules)
- different servers are not guaranteed to be in the same state.

I can think of some scenarios where I would definitely want to
*choose* one of the options. A load-balanced web environment would
likely want the first option, but synchronizing offices in different
continents might not work with 2-phase commit over the network....

And we have not even started talking about *managing* replicated
environments. A lot of fail-over scenarios stop planning after the
backup host has take control. But how to get back? 
-- 
__________________________________________________
"Nothing is as subjective as reality"
Reinoud van Leeuwen       reinoud@xs4all.nl
http://www.xs4all.nl/~reinoud
__________________________________________________

Re: AW: AW: Postgres Replication

From

Tom Lane

Date:

12 June 2001, 19:40:57

reinoud@xs4all.nl (Reinoud van Leeuwen) writes:
> Well as I read back the thread I see 2 different approaches to
> replication:
> ...
> I can think of some scenarios where I would definitely want to
> *choose* one of the options.

Yes.  IIRC, it looks to be possible to support a form of async
replication using the Postgres-R approach: you allow the cluster
to break apart when communications fail, and then rejoin when
your link comes back to life.  (This can work in principle, how
close it is to reality is another question; but the rejoin operation
is the same as crash recovery, so you have to have it anyway.)

So this seems to me to allow getting most of the benefits of the async
approach.  OTOH it is difficult to see how to go the other way: getting
the benefits of a synchronous solution atop a basically-async
implementation doesn't seem like it can work.
        regards, tom lane