Thread: Replicating databases

Replicating databases

From
Carlos Benkendorf
Date:
Hello,
 
Currently our company has a lot of small stores distributed around the country and in the actual database configuration we have a central database and all the small stores accessing it remotely.
 
All primary key tables were designed with a column identifying the store that it belongs. In other words, the store that can update the line, other stores can read it but the system was designed in such a way that other stores can not update information that do not belong to them.
 
The performance is not good because the line speed that connects the store to the central database sometimes is overloaded. We´re thinking to replicate the central database to each store. The store would be able to read all the information from the local database but should only update lines that belong to that store.
 
When a store needs read information about other stores, it is not necessary to be updated, it can be a yesterday snapshot.
 
During the night all the local store databases will be consolidated in only one database and replicated again to the stores. In the morning, when the store opens, the local database has an updated and consolidated data.
I would appreciate suggestions about how the best way to implement such soluction.
 
Slony-1? SQL scripts?
 
Thanks in advance!
Benkendorf

__________________________________________________
Faça ligações para outros computadores com o novo Yahoo! Messenger
http://br.beta.messenger.yahoo.com/

Re: Replicating databases

From
Andrew Sullivan
Date:
On Wed, Nov 02, 2005 at 12:06:36PM +0000, Carlos Benkendorf wrote:
> I would appreciate suggestions about how the best way to implement
> such soluction.
>
> Slony-1? SQL scripts?

Maybe a combination.  My natural inclination would be to try to do
this with some tricky views+rules so that each store could write into
its own table (then everybody could replicate, and in fact you could
have the other store data updated, but maybe not as fast as real
time).  The problem is that in the central database, this is going to
turn out to be a big, nasty UNION if there are more than a handful of
stores.

But, you could do this with some batch processing in the night at
each store, such that you pulled local data into a special local
table (into which you'd write, depending on your local store id) and
the non-local table.  Again, you could use a view with rules to allow
writing into these local tables.  Then during the batch processing at
night, you could merge all these changes together, and prepare
special sets to push out to the stores so that they could see
everyone else's day old data.

It seems kludgey this way, though.  What you really need is
multimaster with conflict resolution, because you can depend on your
application to cause no conflicts.  Slony is designed to prevent you
from writing into the replicated tables.  Some of the other
master-slave ones don't have that restriction, but they're sort of
dangerous for the same reason.

A

--
Andrew Sullivan  | ajs@crankycanuck.ca
The whole tendency of modern prose is away from concreteness.
        --George Orwell

Re: Replicating databases

From
"Jim C. Nasby"
Date:
On Wed, Nov 02, 2005 at 05:45:40PM -0500, Andrew Sullivan wrote:
> On Wed, Nov 02, 2005 at 12:06:36PM +0000, Carlos Benkendorf wrote:
> > I would appreciate suggestions about how the best way to implement
> > such soluction.
> >
> > Slony-1? SQL scripts?
>
> Maybe a combination.  My natural inclination would be to try to do
> this with some tricky views+rules so that each store could write into
> its own table (then everybody could replicate, and in fact you could
> have the other store data updated, but maybe not as fast as real
> time).  The problem is that in the central database, this is going to
> turn out to be a big, nasty UNION if there are more than a handful of
> stores.
>
> But, you could do this with some batch processing in the night at
> each store, such that you pulled local data into a special local
> table (into which you'd write, depending on your local store id) and
> the non-local table.  Again, you could use a view with rules to allow
> writing into these local tables.  Then during the batch processing at
> night, you could merge all these changes together, and prepare
> special sets to push out to the stores so that they could see
> everyone else's day old data.
>
> It seems kludgey this way, though.  What you really need is
> multimaster with conflict resolution, because you can depend on your

Isn't Slony2 supposed to do just that?

> application to cause no conflicts.  Slony is designed to prevent you
> from writing into the replicated tables.  Some of the other
> master-slave ones don't have that restriction, but they're sort of
> dangerous for the same reason.
>
> A
>
> --
> Andrew Sullivan  | ajs@crankycanuck.ca
> The whole tendency of modern prose is away from concreteness.
>         --George Orwell
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match
>

--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Replicating databases

From
Marc Munro
Date:
Carlos,
What you are asking for is a multi-master replication scheme.  Slony-I
is a single master system, meaning that for each slony cluster only one
node can make changes to the data.

Without breaking slony's normal rules, I believe that there might be a
way to do it, though it will not be pretty.

Basically you would create a slony cluster for each store, which
replicates store data back to the central system.  You will also have a
master cluster that replicates central data to all stores.

For each store you will create a slony cluster CS (ie for store 1, you
create cluster C1, for store 2 cluster C2, etc).

For the central (master) database you will create a cluster CM that
replicates to all stores.

For each application table, T, you will do the following:
- create a table T_S at each source store S, and on the central database
- add T_S to the replication set for CS
- on the central db create a master table T_M
- on the central db, add triggers on T_S that copy all changes into the
master table T_M (T_M will then contain the full set of data from all
stores)
- add T_M to the replication set for cluster CM
- at each store create a view T that does select * from T_S union select
* from T_M
- create instead of triggers on T that cause updates to be performed
only on the underlying local table T_S
- at the central node create a view T that does select * from T_M, (you
don't need instead of triggers for this as the data can only be updated
at the stores)

So, for N stores you will have created N+1 slony clusters, N+1 distinct
tables for each distributed table.

This is horrible and a lot of maintenance.  It might work though if the
number of stores is quite small.

You should probably ask the question again on slony1-general.  The
experts there may suggest a better solution.  I have seen talk of
disabling the standard slony triggers to allow this sort of thing but
whether that is more or less nasty is questionable.

Good luck

__
Marc

On Wed, 2005-11-02 at 12:18 -0400, pgsql-general-owner@postgresql.org
wrote:
> Date: Wed, 2 Nov 2005 12:06:36 +0000 (GMT)
> From: Carlos Benkendorf <carlosbenkendorf@yahoo.com.br>
> To: pgsql-general@postgresql.org
> Subject: Replicating databases
> Message-ID: <20051102120637.58061.qmail@web35507.mail.mud.yahoo.com>
>
> Hello,
>
> Currently our company has a lot of small stores distributed around the
> country and in the actual database configuration we have a central
> database and all the small stores accessing it remotely.
>
> All primary key tables were designed with a column identifying the
> store that it belongs. In other words, the store that can update the
> line, other stores can read it but the system was designed in such a
> way that other stores can not update information that do not belong to
> them.
>
> The performance is not good because the line speed that connects the
> store to the central database sometimes is overloaded. Were thinking
> to replicate the central database to each store. The store would be
> able to read all the information from the local database but should
> only update lines that belong to that store.
>
> When a store needs read information about other stores, it is not
> necessary to be updated, it can be a yesterday snapshot.
>
> During the night all the local store databases will be consolidated in
> only one database and replicated again to the stores. In the morning,
> when the store opens, the local database has an updated and
> consolidated data.
> I would appreciate suggestions about how the best way to implement
> such soluction.
>
> Slony-1? SQL scripts?
>
> Thanks in advance!
>
> Benkendorf

Attachment

Re: Replicating databases

From
Sebastian Hennebrueder
Date:
Carlos Benkendorf schrieb:

> Hello,
>
> Currently our company has a lot of small stores distributed around the
> country and in the actual database configuration we have a central
> database and all the small stores accessing it remotely.
>

Hello Carlos,
There is a number of replication solution available. To give you a
better answer, here are some question to have a more precise view on
your problem:
how many tables and stores do you have?
How is your application developed, technology etc..
What is the size of the tables and how many entries do change every day.

--
Best Regards / Viele Grüße

Sebastian Hennebrueder

----

http://www.laliluna.de

* Tutorials for JSP, JavaServer Faces, Struts, Hibernate and EJB
* Seminars and Education at very fair prices.
* Get professional support and consulting for these technologies.

Re: Replicating databases

From
"codeWarrior"
Date:
It doesnt sound to me like replication is the right answer to this
problem... You are setting yourself up to try and defeat one of the major
purposes of a database in a client-server system -- namely -- centralized
storage.

If you add up all the money you are going to spend trying to manage multiple
copies of the same database along with all of the maintenance, support,
bandwidth, and costs of not allowing your end-users access to "real-time"
data (like making bad decisions based on aged data) -- I think you will
agree that it could end up being VERY expensive in the long term.

The part of your plan where you intend to synchronize all of the databases
overnight is still going to be a bottleneck.

A better alternative -- put some money into upgrading your bandwidth --
especially at the postgreSQL server end -- not necessarily at each location.

FWIW: I have a client with 472+ stores each using a 56K (fractional T1 pipe)
connection to a central postgreSQL server. They dont have any major
performance problems that I am aware of. If they did -- I can pretty much
guarantee that distributing 472 copies of the database would never ever be
considered as a "solution" to improve performance.



"Carlos Benkendorf" <carlosbenkendorf@yahoo.com.br> wrote in message
news:20051102120637.58061.qmail@web35507.mail.mud.yahoo.com...
Hello,

Currently our company has a lot of small stores distributed around the
country and in the actual database configuration we have a central database
and all the small stores accessing it remotely.

All primary key tables were designed with a column identifying the store
that it belongs. In other words, the store that can update the line, other
stores can read it but the system was designed in such a way that other
stores can not update information that do not belong to them.

The performance is not good because the line speed that connects the store
to the central database sometimes is overloaded. We�re thinking to replicate
the central database to each store. The store would be able to read all the
information from the local database but should only update lines that belong
to that store.

When a store needs read information about other stores, it is not necessary
to be updated, it can be a yesterday snapshot.

During the night all the local store databases will be consolidated in only
one database and replicated again to the stores. In the morning, when the
store opens, the local database has an updated and consolidated data.
I would appreciate suggestions about how the best way to implement such
soluction.

Slony-1? SQL scripts?

Thanks in advance!

Benkendorf
__________________________________________________
Fa�a liga��es para outros computadores com o novo Yahoo! Messenger
http://br.beta.messenger.yahoo.com/



Re: Replicating databases

From
Andrew Sullivan
Date:
On Wed, Nov 02, 2005 at 05:23:34PM -0600, Jim C. Nasby wrote:
> > It seems kludgey this way, though.  What you really need is
> > multimaster with conflict resolution, because you can depend on your
>
> Isn't Slony2 supposed to do just that?

Well, to the extent that slony 2 ever is going to do anything, our
original aim was multimaster replication in a single data centre.
Some folks have been working on various approaches that seem to be
bearing real bitter fruit, though, and I don't know how hopeful I am
these days for a general-purpose tool that will do that.

It's certainly a laudable goal, though, and if people want to work on
such a target, I'd sure like to know about it.

A

--
Andrew Sullivan  | ajs@crankycanuck.ca
"The year's penultimate month" is not in truth a good way of saying
November.
        --H.W. Fowler

Re: Replicating databases

From
Andrew Sullivan
Date:
On Wed, Nov 02, 2005 at 03:44:26PM -0800, Marc Munro wrote:

> experts there may suggest a better solution.  I have seen talk of
> disabling the standard slony triggers to allow this sort of thing but
> whether that is more or less nasty is questionable.

FWIW, I don't think that's the question; it's more like whether it'd
be merely horribly nasty or likely to break in unexpected and really
painful ways. ;-)  But the discussion around that surely should move
to the Slony list.

--
Andrew Sullivan  | ajs@crankycanuck.ca
This work was visionary and imaginative, and goes to show that visionary
and imaginative work need not end up well.
        --Dennis Ritchie

Re: Replicating databases

From
Andrew Sullivan
Date:
On Thu, Nov 03, 2005 at 08:16:01AM -0800, codeWarrior wrote:
> It doesnt sound to me like replication is the right answer to this
> problem... You are setting yourself up to try and defeat one of the major
> purposes of a database in a client-server system -- namely -- centralized
> storage.

While I have a certain amount of sympathy for this view, it's often
the case that centralised storage isn't quite what you want.  After
all, if always-fast is more important than always-right, we prefer
caches and such like.  DNS is the obvious example there.  And if
always-works is more important than always-fast or always-right, then
you have a very powerful incentive to keep things local.

That said, this case does sort of sound like money might be better
spent on improved communications that a humungous amount of work to
Rube up a Goldberg for getting all the data in every store.  But
maybe we don't have the whole picture: maybe communications links
aren't stable in some of these stores, and can't be made so
economically.

A

--
Andrew Sullivan  | ajs@crankycanuck.ca
The plural of anecdote is not data.
        --Roger Brinner

Re: Replicating databases

From
"Jim C. Nasby"
Date:
On Thu, Nov 03, 2005 at 04:49:33PM -0500, Andrew Sullivan wrote:
> On Wed, Nov 02, 2005 at 05:23:34PM -0600, Jim C. Nasby wrote:
> > > It seems kludgey this way, though.  What you really need is
> > > multimaster with conflict resolution, because you can depend on your
> >
> > Isn't Slony2 supposed to do just that?
>
> Well, to the extent that slony 2 ever is going to do anything, our
> original aim was multimaster replication in a single data centre.
> Some folks have been working on various approaches that seem to be
> bearing real bitter fruit, though, and I don't know how hopeful I am
> these days for a general-purpose tool that will do that.

How were you going about it such that it would matter if it was in a
local data center or not?
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Replicating databases

From
Christopher Browne
Date:
> On Thu, Nov 03, 2005 at 04:49:33PM -0500, Andrew Sullivan wrote:
>> On Wed, Nov 02, 2005 at 05:23:34PM -0600, Jim C. Nasby wrote:
>> > > It seems kludgey this way, though.  What you really need is
>> > > multimaster with conflict resolution, because you can depend on your
>> >
>> > Isn't Slony2 supposed to do just that?
>>
>> Well, to the extent that slony 2 ever is going to do anything, our
>> original aim was multimaster replication in a single data centre.
>> Some folks have been working on various approaches that seem to be
>> bearing real bitter fruit, though, and I don't know how hopeful I am
>> these days for a general-purpose tool that will do that.
>
> How were you going about it such that it would matter if it was in a
> local data center or not?

The approach under examination bears "remarkable similarity" to the
Postgres-R work at McGill University.

The notion is that transactions are not permitted to commit until they
are ready to commit on all of the servers.

Normally, you'd expect that to be some form of 2 Phase Commit.  There
are some unfortunate things about 2PC that it tries to avoid...

The cleverness is in trying to propagate lock requests as early as
possible in processing (whereas 2PC seems to push a lot of work right
to the END of the transaction).

In essence, this is a "fairly nearly synchronous" replication model.

The result of the "near synchronicity" is that it could only possibly
perform well if all "masters" are in the same local network.  If you
introduce a far-away host that has 2 second latency, that introduces 2
seconds of latency to *every* transaction processed on the system.

Bye, bye, performance...
--
(reverse (concatenate 'string "moc.liamg" "@" "enworbbc"))
http://cbbrowne.com/info/
If we were meant to fly, we wouldn't keep losing our luggage.

Re: Replicating databases

From
Christopher Browne
Date:
> On Wed, Nov 02, 2005 at 03:44:26PM -0800, Marc Munro wrote:
>> experts there may suggest a better solution.  I have seen talk of
>> disabling the standard slony triggers to allow this sort of thing but
>> whether that is more or less nasty is questionable.
>
> FWIW, I don't think that's the question; it's more like whether it'd
> be merely horribly nasty or likely to break in unexpected and really
> painful ways. ;-)  But the discussion around that surely should move
> to the Slony list.

It seems to me that lots of the "stuff" in Slony-I could be reapplied
to _try_ to create an asynchronous multimaster replication system.

A *major* addition would need to be some form of "conflicts queue."

That's the sort of thing they have in the analagous "O-word"
replication system.

What's a non-starter is to try to reshape the Slony-I project into
"async multimaster."  That would get considerable push-back :-).

But if someone decided to "fork" their own *new* project, perhaps
starting based on one of the releases, that would an entirely
interesting idea.
--
output = reverse("moc.enworbbc" "@" "enworbbc")
http://cbbrowne.com/info/languages.html
To quote from a friend's conference talk:  "they told me that their
network was physically secure, so I asked them `then what's with all
these do-not-leave-valuables-in-your-desk signs?'".
-- Henry Spencer

Re: Replicating databases

From
"Jim C. Nasby"
Date:
On Thu, Nov 03, 2005 at 10:29:56PM -0500, Christopher Browne wrote:
> > On Wed, Nov 02, 2005 at 03:44:26PM -0800, Marc Munro wrote:
> >> experts there may suggest a better solution.  I have seen talk of
> >> disabling the standard slony triggers to allow this sort of thing but
> >> whether that is more or less nasty is questionable.
> >
> > FWIW, I don't think that's the question; it's more like whether it'd
> > be merely horribly nasty or likely to break in unexpected and really
> > painful ways. ;-)  But the discussion around that surely should move
> > to the Slony list.
>
> It seems to me that lots of the "stuff" in Slony-I could be reapplied
> to _try_ to create an asynchronous multimaster replication system.
>
> A *major* addition would need to be some form of "conflicts queue."
>
> That's the sort of thing they have in the analagous "O-word"
> replication system.
>
> What's a non-starter is to try to reshape the Slony-I project into
> "async multimaster."  That would get considerable push-back :-).
>
> But if someone decided to "fork" their own *new* project, perhaps
> starting based on one of the releases, that would an entirely
> interesting idea.

Wouldn't async multimaster make use of most all of what slony-I
currently has? ISTM that it would make life a lot easier to use one
combined project rather than two...
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Replicating databases

From
Christopher Browne
Date:
>> But if someone decided to "fork" their own *new* project, perhaps
>> starting based on one of the releases, that would an entirely
>> interesting idea.
>
> Wouldn't async multimaster make use of most all of what slony-I
> currently has? ISTM that it would make life a lot easier to use one
> combined project rather than two...

When you combine projects, you require the participants to participate
in the union of the complexity of the projects.  The project can't
generate releases unless they all coordinate a release, and if their
interests differ, that can be tough to do...  There are OSes we could
name where increasing sets of participants are having that very
effect...

If projects remain largely independent, they can limit themselves to
their respective individual sets of complexities.  That's precisely
why the PostgreSQL project is trying to push as many of the "contrib"
things out to outside projects as possible.

There's a famous saying about "sufficient to the day is the evil
thereof;" we might substitute "project" for "day" in that ;-).
--
(reverse (concatenate 'string "moc.liamg" "@" "enworbbc"))
http://linuxfinances.info/info/wp.html
"Whenever you  find that you  are on the  side of the majority,  it is
time to reform." -- Mark Twain

Re: Replicating databases

From
vishal saberwal
Date:
I thought pgreplicator did provide multi-master replication and i did test it as well.

http://pgreplicator.sourceforge.net/

vish

On 11/3/05, Andrew Sullivan <ajs@crankycanuck.ca> wrote:
On Wed, Nov 02, 2005 at 05:23:34PM -0600, Jim C. Nasby wrote:
> > It seems kludgey this way, though.  What you really need is
> > multimaster with conflict resolution, because you can depend on your
>
> Isn't Slony2 supposed to do just that?

Well, to the extent that slony 2 ever is going to do anything, our
original aim was multimaster replication in a single data centre.
Some folks have been working on various approaches that seem to be
bearing real bitter fruit, though, and I don't know how hopeful I am
these days for a general-purpose tool that will do that.

It's certainly a laudable goal, though, and if people want to work on
such a target, I'd sure like to know about it.

A

--
Andrew Sullivan  | ajs@crankycanuck.ca
"The year's penultimate month" is not in truth a good way of saying
November.
                --H.W . Fowler

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend