Thread: replication/synchronisation

replication/synchronisation

From
Glenn Davy
Date:
Hi All - now heres a stupid question...
for better or worse a lot of project 'documentation' (wikis et.al.)  and
project administration occurs in web applications. A lot of my time is
spent for example in either trac or media wiki - and in my case these
nearly are always in a postgres backend.

Unfortunately I can no longer just check out the latest documents and
take em off line on my laptop, edit them and merge them back later any
more.

so obviously I want to take my laptop with a snap shot of the database,
(and wiki software etc) work on it, then synchronise it back to the
'master'.

My question (finally) is: are the existing tools for that purpose - are
pgpool or pgcluster or slony suitable? I'm tempted to try and script it
myself, but am apprehensive about manging id clashes primarily, and of
course managing rows updated in >1 db instance - and whatever else I
havent even though of

thanks for any help
glenn


Re: replication/synchronisation

From
Shane Ambler
Date:
On 7/9/2006 12:40, "Glenn Davy" <glenn@tangelosoftware.net> wrote:

> Hi All - now heres a stupid question...
> for better or worse a lot of project 'documentation' (wikis et.al.)  and
> project administration occurs in web applications. A lot of my time is
> spent for example in either trac or media wiki - and in my case these
> nearly are always in a postgres backend.
>
> Unfortunately I can no longer just check out the latest documents and
> take em off line on my laptop, edit them and merge them back later any
> more.
>
> so obviously I want to take my laptop with a snap shot of the database,
> (and wiki software etc) work on it, then synchronise it back to the
> 'master'.
>
> My question (finally) is: are the existing tools for that purpose - are
> pgpool or pgcluster or slony suitable? I'm tempted to try and script it
> myself, but am apprehensive about manging id clashes primarily, and of
> course managing rows updated in >1 db instance - and whatever else I
> havent even though of
>
Not sure of your exact scenario but sounds like you may want to consider a
custom sync app. The ones you mentioned are designed for situations where
multiple servers are running and can 'talk' to each other all the time.

If you are the only one adding/changing the db it will be easy. If others
are adding/changing then new rows inserted from your laptop copy would need
to change the insert id of a new laptop row to null (assuming you use serial
for id) to get a new id from the server sequence (so you would need to know
the last id inserted when you get a copy to know which rows are added on the
laptop) and you would need a server mod timestamp and laptop mod timestamp
to know what rows you changed that you want back to the server - this would
also be a double check to see if the server copy has changed  since you took
a copy - if the server has a new mod timestamp for a row you updated then
you decide how to handle the update.

The other thing to consider and probably the easiest way is that if you are
doing the mods through a web front end then why not connect to that from
your laptop and work on the server from wherever you are. If it is in an
internal network look at setting up a vpn or remote dial-in to the local
network if setting up port forwarding on the router isn't an option.

--

Shane Ambler
Postgres@007Marketing.com

Get Sheeky @ http://Sheeky.Biz



Re: replication/synchronisation

From
Chris Browne
Date:
glenn@tangelosoftware.net (Glenn Davy) writes:
> Hi All - now heres a stupid question...
> for better or worse a lot of project 'documentation' (wikis et.al.)  and
> project administration occurs in web applications. A lot of my time is
> spent for example in either trac or media wiki - and in my case these
> nearly are always in a postgres backend.
>
> Unfortunately I can no longer just check out the latest documents and
> take em off line on my laptop, edit them and merge them back later any
> more.
>
> so obviously I want to take my laptop with a snap shot of the database,
> (and wiki software etc) work on it, then synchronise it back to the
> 'master'.
>
> My question (finally) is: are the existing tools for that purpose - are
> pgpool or pgcluster or slony suitable? I'm tempted to try and script it
> myself, but am apprehensive about manging id clashes primarily, and of
> course managing rows updated in >1 db instance - and whatever else I
> havent even though of

Unfortunately, the "usual suspects" are generally designed to work
when systems are connected more or less all the time.

Furthermore, what you're after seems to be one of the forms of
"multimaster" replication, which is the tougher problem that generally
isn't supported.

In effect, what you're trying to do is akin to what PalmOS and Lotus
Notes solve using synchronization systems.  There aren't any tools I
can readily point you to to help do this with PostgreSQL, alas.
--
let name="cbbrowne" and tld="acm.org" in String.concat "@" [name;tld];;
http://linuxfinances.info/info/wp.html
He who laughs last thinks slowest.

Re: replication/synchronisation

From
Glenn Davy
Date:
Hey Chris
On Thu, 2006-09-07 at 18:38 -0400, Chris Browne wrote:
> glenn@tangelosoftware.net (Glenn Davy) writes:
> > Hi All - now heres a stupid question...
> > for better or worse a lot of project 'documentation' (wikis et.al.)  and
> > project administration occurs in web applications. A lot of my time is
> > spent for example in either trac or media wiki - and in my case these
> > nearly are always in a postgres backend.
> >
> > Unfortunately I can no longer just check out the latest documents and
> > take em off line on my laptop, edit them and merge them back later any
> > more.
> >
> > so obviously I want to take my laptop with a snap shot of the database,
> > (and wiki software etc) work on it, then synchronise it back to the
> > 'master'.
> >
> > My question (finally) is: are the existing tools for that purpose - are
> > pgpool or pgcluster or slony suitable? I'm tempted to try and script it
> > myself, but am apprehensive about manging id clashes primarily, and of
> > course managing rows updated in >1 db instance - and whatever else I
> > havent even though of
>
Thanks for replying

> Unfortunately, the "usual suspects" are generally designed to work
> when systems are connected more or less all the time.
>
yep - thats true
> Furthermore, what you're after seems to be one of the forms of
> "multimaster" replication,
yes that would make sense

>  which is the tougher problem that generally
> isn't supported.
>
 just wondering what you mean by 'generally'?
> In effect, what you're trying to do is akin to what PalmOS and Lotus
> Notes solve using synchronization systems.  There aren't any tools I
> can readily point you to to help do this with PostgreSQL, alas.
ok thanks - Im wondering how microsoft access and mssql server seem to
achieve this so easily - it seems to be tied up with that massively long
alpha unique row id (uid?) i wonder if adding  similar fields (ensuring
uniqueness could prove interesting) to all tables in any given schema
and triggers or rules to maintain them would allow provide a basis for
some sort of simple system? Im clearly out of my depth here - just
puzzled why it seems so doable in some platforms and nigh impossible on
others.
I guess im wondering if there is something intrinsic to postgres that
makes this idea prohibitive, or is it that developers already have hands
full with other features on their minds?

glenn

Re: replication/synchronisation

From
Shane Ambler
Date:
On 14/9/2006 15:50, "Glenn Davy" <glenn@tangelosoftware.net> wrote:

>> In effect, what you're trying to do is akin to what PalmOS and Lotus
>> Notes solve using synchronization systems.  There aren't any tools I
>> can readily point you to to help do this with PostgreSQL, alas.
> ok thanks - Im wondering how microsoft access and mssql server seem to
> achieve this so easily - it seems to be tied up with that massively long

Only seems so easy because they have taken the time to develop the feature

> alpha unique row id (uid?) i wonder if adding  similar fields (ensuring
> uniqueness could prove interesting) to all tables in any given schema
> and triggers or rules to maintain them would allow provide a basis for
> some sort of simple system? Im clearly out of my depth here - just
> puzzled why it seems so doable in some platforms and nigh impossible on
> others.

You would use a unique rowID to match rows between the two copies and two
timestamps one would be the time the server was modified the other for when
the roaming copy was modified - this will allow you to know which copy was
modified  since the last synchronisation and which way to copy changes.

> I guess im wondering if there is something intrinsic to postgres that
> makes this idea prohibitive, or is it that developers already have hands
> full with other features on their minds?

No technical reason, just needs someone to take the time to do it.

> glenn
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>      choose an index scan if your joining column's datatypes do not
>      match
>

--

Shane Ambler
Postgres@007Marketing.com

Get Sheeky @ http://Sheeky.Biz


Re: replication/synchronisation

From
"roy simkes"
Date:






> Date: Thu, 14 Sep 2006 17:24:56 +0930
> Subject: Re: [NOVICE] replication/synchronisation
> From: pgsql@007Marketing.com
> To: glenn@tangelosoftware.com.au
> CC: pgsql-novice@postgresql.org
>
> On 14/9/2006 15:50, "Glenn Davy" <glenn@tangelosoftware.net> wrote:
>
> >> In effect, what you're trying to do is akin to what PalmOS and Lotus
> >> Notes solve using synchronization systems.  There aren't any tools I
> >> can readily point you to to help do this with PostgreSQL, alas.
> > ok thanks - Im wondering how microsoft access and mssql server seem to
> > achieve this so easily - it seems to be tied up with that massively long
>
> Only seems so easy because they have taken the time to develop the feature
>
> > alpha unique row id (uid?) i wonder if adding  similar fields (ensuring
> > uniqueness could prove interesting) to all tables in any given schema
> > and triggers or rules to maintain them would allow provide a basis for
> > some sort of simple system? Im clearly out of my depth here - just
> > puzzled why it seems so doable in some platforms and nigh impossible on
> > others.
>
> You would use a unique rowID to match rows between the two copies and two
> timestamps one would be the time the server was modified the other for when
> the roaming copy was modified - this will allow you to know which copy was
> modified  since the last synchronisation and which way to copy changes.
>
> > I guess im wondering if there is something intrinsic to postgres that
> > makes this idea prohibitive, or is it that developers already have hands
> > full with other features on their minds?
>
> No technical reason, just needs someone to take the time to do it.
>  
> > glenn

MSSQL has a different name for this multimaster replication thing which is called peer to peer replication. In PgSQL and MySQL there are cluster architectures for it. Which basically they are doing what you want. Sychronous and asynchronous according to your choice. But the thing is, they require a fast connection and they are only recommended to the local area networks. I had searched this multi master replication for a while and find out that master/multi slave solution was far more better and less problematic.
In this kind of replication you could encounter with uniqueID problems. Two different nodes can have entered the same ID which will cause a problem. To solve this if you have 2 nodes. Then increase the value of serial by two. Then odd numberrs will be used by node 1 and pair numbers by node 2. There are two issues in this kind of solution which quite good for a multimaster replication. First, what if you intend to add more nodes! Your whole architecture is gone and should be remade. To prevent it, if you have two nodes then increase your number by 5 which will give you the opportunity to add more nodes. Second issue is the max value of uniqueID. You can reach the max value while there is only 1/5 of the files entered because you increase by 5. So you could prefer to use something different then the serial then.
 
It will be far more better if you execute all your queries in the master then replicate to the slaves then trying to manage a multi master multi slave system. There can be wrong things in my opinions, things I have been missing or misunderstood, but hope that helps.


Search from any Web page with powerful protection. Get the FREE Windows Live Toolbar Today! Try it now!

Re: replication/synchronisation

From
Shane Ambler
Date:
On 14/9/2006 19:18, "roy simkes" <roysimkes@hotmail.com> wrote:

>
>
>> Date: Thu, 14 Sep 2006 17:24:56 +0930> Subject: Re: [NOVICE]
>> replication/synchronisation> From: pgsql@007Marketing.com> To:
>> glenn@tangelosoftware.com.au> CC: pgsql-novice@postgresql.org> > On 14/9/2006
>> 15:50, "Glenn Davy" <glenn@tangelosoftware.net> wrote:> > >> In effect, what
>> you're trying to do is akin to what PalmOS and Lotus> >> Notes solve using
>> synchronization systems.  There aren't any tools I> >> can readily point you
>> to to help do this with PostgreSQL, alas.> > ok thanks - Im wondering how
>> microsoft access and mssql server seem to> > achieve this so easily - it
>> seems to be tied up with that massively long> > Only seems so easy because
>> they have taken the time to develop the feature> > > alpha unique row id
>> (uid?) i wonder if adding  similar fields (ensuring> > uniqueness could prove
>> interesting) to all tables in any given schema> > and triggers or rules to
>> maintain them would allow provide a basis for> > some sort of simple system?
>> Im clearly out of my depth here - just> > puzzled why it seems so doable in
>> some platforms and nigh impossible on> > others.> > You would use a unique
>> rowID to match rows between the two copies and two> timestamps one would be
>> the time the server was modified the other for when> the roaming copy was
>> modified - this will allow you to know which copy was> modified  since the
>> last synchronisation and which way to copy changes.> > > I guess im wondering
>> if there is something intrinsic to postgres that> > makes this idea
>> prohibitive, or is it that developers already have hands> > full with other
>> features on their minds?> > No technical reason, just needs someone to take
>> the time to do it.>  > > glenn
> MSSQL has a different name for this multimaster replication thing which is
> called peer to peer replication. In PgSQL and MySQL there are cluster
> architectures for it. Which basically they are doing what you want. Sychronous
> and asynchronous according to your choice. But the thing is, they require a
> fast connection and they are only recommended to the local area networks. I
> had searched this multi master replication for a while and find out that
> master/multi slave solution was far more better and less problematic.
> In this kind of replication you could encounter with uniqueID problems. Two
> different nodes can have entered the same ID which will cause a problem. To
> solve this if you have 2 nodes. Then increase the value of serial by two. Then
> odd numberrs will be used by node 1 and pair numbers by node 2. There are two
> issues in this kind of solution which quite good for a multimaster
> replication. First, what if you intend to add more nodes! Your whole
> architecture is gone and should be remade. To prevent it, if you have two
> nodes then increase your number by 5 which will give you the opportunity to
> add more nodes. Second issue is the max value of uniqueID. You can reach the
> max value while there is only 1/5 of the files entered because you increase by
> 5. So you could prefer to use something different then the serial then.
>
> It will be far more better if you execute all your queries in the master then
> replicate to the slaves then trying to manage a multi master multi slave
> system. There can be wrong things in my opinions, things I have been missing
> or misunderstood, but hope that helps.

The way I would consider tackling the problem would be having the client
distinguish between master and roaming server, maybe with an identifying
query or have a stored proc that knows which is running to enter the mod
timestamps, entries in the main server would fill in a server mod timestamp
and entries in the roaming server would enter a roaming mod timestamp.

When synchronising a null server timestamp in the roaming row would indicate
a row inserted while roaming and would need to consider renumbering the
rowID and necessary foreign keys to align with the master rowID.

Maybe a support table to track synchronising info could be used to track
last synch times and last rowID at that time to simplify getting new rows
from the master server, this could be the way to allow multiple roaming
servers.

A simple thing like blog updating wouldn't pose any problem with this
scenario but a more complex db such as one with billing/invoicing etc would
need other considerations. eg. An invoice number created by a travelling rep
would need to be unique and remain unchanged between synchronising if
printed away from the master server.


--

Shane Ambler
Postgres@007Marketing.com

Get Sheeky @ http://Sheeky.Biz



Re: replication/synchronisation

From
Chris Browne
Date:
glenn@tangelosoftware.net (Glenn Davy) writes:
> Hey Chris
> Thanks for replying
>
>> Unfortunately, the "usual suspects" are generally designed to work
>> when systems are connected more or less all the time.
>>
> yep - thats true
>> Furthermore, what you're after seems to be one of the forms of
>> "multimaster" replication,
> yes that would make sense
>
>>  which is the tougher problem that generally
>> isn't supported.
>>
>  just wondering what you mean by 'generally'?

None of the "product-like" things like Slony-I, Mammoth Replicator,
and such support multimaster replication.

>> In effect, what you're trying to do is akin to what PalmOS and Lotus
>> Notes solve using synchronization systems.  There aren't any tools I
>> can readily point you to to help do this with PostgreSQL, alas.

> ok thanks - Im wondering how microsoft access and mssql server seem
> to achieve this so easily - it seems to be tied up with that
> massively long alpha unique row id (uid?) i wonder if adding similar
> fields (ensuring uniqueness could prove interesting) to all tables
> in any given schema and triggers or rules to maintain them would
> allow provide a basis for some sort of simple system? Im clearly out
> of my depth here - just puzzled why it seems so doable in some
> platforms and nigh impossible on others.
>
> I guess im wondering if there is something intrinsic to postgres that
> makes this idea prohibitive, or is it that developers already have hands
> full with other features on their minds?

Conflict resolution is the Big Problem with asynchronous multimaster
replication.

Nobody has a direct answer to it.  People try to design schemas to
avoid conflicts as much as possible.

When conflicts occur, a DBA has to do something manual, figuring out
what the problem was and how to fix it.

Microsoft has spent a lot of developer time on tools to cover this
over.
--
output = reverse("ofni.sesabatadxunil" "@" "enworbbc")
http://linuxfinances.info/info/languages.html
"There  are  almost  unlimited  ways  for making  your  programs  more
complicated or bizarre" -- Arthur Norman