Thread: Replicating databases
Hello,
Currently our company has a lot of small stores distributed around the country and in the actual database configuration we have a central database and all the small stores accessing it remotely.
All primary key tables were designed with a column identifying the store that it belongs. In other words, the store that can update the line, other stores can read it but the system was designed in such a way that other stores can not update information that do not belong to them.
The performance is not good because the line speed that connects the store to the central database sometimes is overloaded. We´re thinking to replicate the central database to each store. The store would be able to read all the information from the local database but should only update lines that belong to that store.
When a store needs read information about other stores, it is not necessary to be updated, it can be a yesterday snapshot.
During the night all the local store databases will be consolidated in only one database and replicated again to the stores. In the morning, when the store opens, the local database has an updated and consolidated data.
I would appreciate suggestions about how the best way to implement such soluction.
Slony-1? SQL scripts?
Thanks in advance!
Benkendorf
__________________________________________________
Faça ligações para outros computadores com o novo Yahoo! Messenger
http://br.beta.messenger.yahoo.com/
On Wed, Nov 02, 2005 at 12:06:36PM +0000, Carlos Benkendorf wrote: > I would appreciate suggestions about how the best way to implement > such soluction. > > Slony-1? SQL scripts? Maybe a combination. My natural inclination would be to try to do this with some tricky views+rules so that each store could write into its own table (then everybody could replicate, and in fact you could have the other store data updated, but maybe not as fast as real time). The problem is that in the central database, this is going to turn out to be a big, nasty UNION if there are more than a handful of stores. But, you could do this with some batch processing in the night at each store, such that you pulled local data into a special local table (into which you'd write, depending on your local store id) and the non-local table. Again, you could use a view with rules to allow writing into these local tables. Then during the batch processing at night, you could merge all these changes together, and prepare special sets to push out to the stores so that they could see everyone else's day old data. It seems kludgey this way, though. What you really need is multimaster with conflict resolution, because you can depend on your application to cause no conflicts. Slony is designed to prevent you from writing into the replicated tables. Some of the other master-slave ones don't have that restriction, but they're sort of dangerous for the same reason. A -- Andrew Sullivan | ajs@crankycanuck.ca The whole tendency of modern prose is away from concreteness. --George Orwell
On Wed, Nov 02, 2005 at 05:45:40PM -0500, Andrew Sullivan wrote: > On Wed, Nov 02, 2005 at 12:06:36PM +0000, Carlos Benkendorf wrote: > > I would appreciate suggestions about how the best way to implement > > such soluction. > > > > Slony-1? SQL scripts? > > Maybe a combination. My natural inclination would be to try to do > this with some tricky views+rules so that each store could write into > its own table (then everybody could replicate, and in fact you could > have the other store data updated, but maybe not as fast as real > time). The problem is that in the central database, this is going to > turn out to be a big, nasty UNION if there are more than a handful of > stores. > > But, you could do this with some batch processing in the night at > each store, such that you pulled local data into a special local > table (into which you'd write, depending on your local store id) and > the non-local table. Again, you could use a view with rules to allow > writing into these local tables. Then during the batch processing at > night, you could merge all these changes together, and prepare > special sets to push out to the stores so that they could see > everyone else's day old data. > > It seems kludgey this way, though. What you really need is > multimaster with conflict resolution, because you can depend on your Isn't Slony2 supposed to do just that? > application to cause no conflicts. Slony is designed to prevent you > from writing into the replicated tables. Some of the other > master-slave ones don't have that restriction, but they're sort of > dangerous for the same reason. > > A > > -- > Andrew Sullivan | ajs@crankycanuck.ca > The whole tendency of modern prose is away from concreteness. > --George Orwell > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match > -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Carlos, What you are asking for is a multi-master replication scheme. Slony-I is a single master system, meaning that for each slony cluster only one node can make changes to the data. Without breaking slony's normal rules, I believe that there might be a way to do it, though it will not be pretty. Basically you would create a slony cluster for each store, which replicates store data back to the central system. You will also have a master cluster that replicates central data to all stores. For each store you will create a slony cluster CS (ie for store 1, you create cluster C1, for store 2 cluster C2, etc). For the central (master) database you will create a cluster CM that replicates to all stores. For each application table, T, you will do the following: - create a table T_S at each source store S, and on the central database - add T_S to the replication set for CS - on the central db create a master table T_M - on the central db, add triggers on T_S that copy all changes into the master table T_M (T_M will then contain the full set of data from all stores) - add T_M to the replication set for cluster CM - at each store create a view T that does select * from T_S union select * from T_M - create instead of triggers on T that cause updates to be performed only on the underlying local table T_S - at the central node create a view T that does select * from T_M, (you don't need instead of triggers for this as the data can only be updated at the stores) So, for N stores you will have created N+1 slony clusters, N+1 distinct tables for each distributed table. This is horrible and a lot of maintenance. It might work though if the number of stores is quite small. You should probably ask the question again on slony1-general. The experts there may suggest a better solution. I have seen talk of disabling the standard slony triggers to allow this sort of thing but whether that is more or less nasty is questionable. Good luck __ Marc On Wed, 2005-11-02 at 12:18 -0400, pgsql-general-owner@postgresql.org wrote: > Date: Wed, 2 Nov 2005 12:06:36 +0000 (GMT) > From: Carlos Benkendorf <carlosbenkendorf@yahoo.com.br> > To: pgsql-general@postgresql.org > Subject: Replicating databases > Message-ID: <20051102120637.58061.qmail@web35507.mail.mud.yahoo.com> > > Hello, > > Currently our company has a lot of small stores distributed around the > country and in the actual database configuration we have a central > database and all the small stores accessing it remotely. > > All primary key tables were designed with a column identifying the > store that it belongs. In other words, the store that can update the > line, other stores can read it but the system was designed in such a > way that other stores can not update information that do not belong to > them. > > The performance is not good because the line speed that connects the > store to the central database sometimes is overloaded. Were thinking > to replicate the central database to each store. The store would be > able to read all the information from the local database but should > only update lines that belong to that store. > > When a store needs read information about other stores, it is not > necessary to be updated, it can be a yesterday snapshot. > > During the night all the local store databases will be consolidated in > only one database and replicated again to the stores. In the morning, > when the store opens, the local database has an updated and > consolidated data. > I would appreciate suggestions about how the best way to implement > such soluction. > > Slony-1? SQL scripts? > > Thanks in advance! > > Benkendorf
Attachment
Carlos Benkendorf schrieb: > Hello, > > Currently our company has a lot of small stores distributed around the > country and in the actual database configuration we have a central > database and all the small stores accessing it remotely. > Hello Carlos, There is a number of replication solution available. To give you a better answer, here are some question to have a more precise view on your problem: how many tables and stores do you have? How is your application developed, technology etc.. What is the size of the tables and how many entries do change every day. -- Best Regards / Viele Grüße Sebastian Hennebrueder ---- http://www.laliluna.de * Tutorials for JSP, JavaServer Faces, Struts, Hibernate and EJB * Seminars and Education at very fair prices. * Get professional support and consulting for these technologies.
It doesnt sound to me like replication is the right answer to this problem... You are setting yourself up to try and defeat one of the major purposes of a database in a client-server system -- namely -- centralized storage. If you add up all the money you are going to spend trying to manage multiple copies of the same database along with all of the maintenance, support, bandwidth, and costs of not allowing your end-users access to "real-time" data (like making bad decisions based on aged data) -- I think you will agree that it could end up being VERY expensive in the long term. The part of your plan where you intend to synchronize all of the databases overnight is still going to be a bottleneck. A better alternative -- put some money into upgrading your bandwidth -- especially at the postgreSQL server end -- not necessarily at each location. FWIW: I have a client with 472+ stores each using a 56K (fractional T1 pipe) connection to a central postgreSQL server. They dont have any major performance problems that I am aware of. If they did -- I can pretty much guarantee that distributing 472 copies of the database would never ever be considered as a "solution" to improve performance. "Carlos Benkendorf" <carlosbenkendorf@yahoo.com.br> wrote in message news:20051102120637.58061.qmail@web35507.mail.mud.yahoo.com... Hello, Currently our company has a lot of small stores distributed around the country and in the actual database configuration we have a central database and all the small stores accessing it remotely. All primary key tables were designed with a column identifying the store that it belongs. In other words, the store that can update the line, other stores can read it but the system was designed in such a way that other stores can not update information that do not belong to them. The performance is not good because the line speed that connects the store to the central database sometimes is overloaded. We�re thinking to replicate the central database to each store. The store would be able to read all the information from the local database but should only update lines that belong to that store. When a store needs read information about other stores, it is not necessary to be updated, it can be a yesterday snapshot. During the night all the local store databases will be consolidated in only one database and replicated again to the stores. In the morning, when the store opens, the local database has an updated and consolidated data. I would appreciate suggestions about how the best way to implement such soluction. Slony-1? SQL scripts? Thanks in advance! Benkendorf __________________________________________________ Fa�a liga��es para outros computadores com o novo Yahoo! Messenger http://br.beta.messenger.yahoo.com/
On Wed, Nov 02, 2005 at 05:23:34PM -0600, Jim C. Nasby wrote: > > It seems kludgey this way, though. What you really need is > > multimaster with conflict resolution, because you can depend on your > > Isn't Slony2 supposed to do just that? Well, to the extent that slony 2 ever is going to do anything, our original aim was multimaster replication in a single data centre. Some folks have been working on various approaches that seem to be bearing real bitter fruit, though, and I don't know how hopeful I am these days for a general-purpose tool that will do that. It's certainly a laudable goal, though, and if people want to work on such a target, I'd sure like to know about it. A -- Andrew Sullivan | ajs@crankycanuck.ca "The year's penultimate month" is not in truth a good way of saying November. --H.W. Fowler
On Wed, Nov 02, 2005 at 03:44:26PM -0800, Marc Munro wrote: > experts there may suggest a better solution. I have seen talk of > disabling the standard slony triggers to allow this sort of thing but > whether that is more or less nasty is questionable. FWIW, I don't think that's the question; it's more like whether it'd be merely horribly nasty or likely to break in unexpected and really painful ways. ;-) But the discussion around that surely should move to the Slony list. -- Andrew Sullivan | ajs@crankycanuck.ca This work was visionary and imaginative, and goes to show that visionary and imaginative work need not end up well. --Dennis Ritchie
On Thu, Nov 03, 2005 at 08:16:01AM -0800, codeWarrior wrote: > It doesnt sound to me like replication is the right answer to this > problem... You are setting yourself up to try and defeat one of the major > purposes of a database in a client-server system -- namely -- centralized > storage. While I have a certain amount of sympathy for this view, it's often the case that centralised storage isn't quite what you want. After all, if always-fast is more important than always-right, we prefer caches and such like. DNS is the obvious example there. And if always-works is more important than always-fast or always-right, then you have a very powerful incentive to keep things local. That said, this case does sort of sound like money might be better spent on improved communications that a humungous amount of work to Rube up a Goldberg for getting all the data in every store. But maybe we don't have the whole picture: maybe communications links aren't stable in some of these stores, and can't be made so economically. A -- Andrew Sullivan | ajs@crankycanuck.ca The plural of anecdote is not data. --Roger Brinner
On Thu, Nov 03, 2005 at 04:49:33PM -0500, Andrew Sullivan wrote: > On Wed, Nov 02, 2005 at 05:23:34PM -0600, Jim C. Nasby wrote: > > > It seems kludgey this way, though. What you really need is > > > multimaster with conflict resolution, because you can depend on your > > > > Isn't Slony2 supposed to do just that? > > Well, to the extent that slony 2 ever is going to do anything, our > original aim was multimaster replication in a single data centre. > Some folks have been working on various approaches that seem to be > bearing real bitter fruit, though, and I don't know how hopeful I am > these days for a general-purpose tool that will do that. How were you going about it such that it would matter if it was in a local data center or not? -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
> On Thu, Nov 03, 2005 at 04:49:33PM -0500, Andrew Sullivan wrote: >> On Wed, Nov 02, 2005 at 05:23:34PM -0600, Jim C. Nasby wrote: >> > > It seems kludgey this way, though. What you really need is >> > > multimaster with conflict resolution, because you can depend on your >> > >> > Isn't Slony2 supposed to do just that? >> >> Well, to the extent that slony 2 ever is going to do anything, our >> original aim was multimaster replication in a single data centre. >> Some folks have been working on various approaches that seem to be >> bearing real bitter fruit, though, and I don't know how hopeful I am >> these days for a general-purpose tool that will do that. > > How were you going about it such that it would matter if it was in a > local data center or not? The approach under examination bears "remarkable similarity" to the Postgres-R work at McGill University. The notion is that transactions are not permitted to commit until they are ready to commit on all of the servers. Normally, you'd expect that to be some form of 2 Phase Commit. There are some unfortunate things about 2PC that it tries to avoid... The cleverness is in trying to propagate lock requests as early as possible in processing (whereas 2PC seems to push a lot of work right to the END of the transaction). In essence, this is a "fairly nearly synchronous" replication model. The result of the "near synchronicity" is that it could only possibly perform well if all "masters" are in the same local network. If you introduce a far-away host that has 2 second latency, that introduces 2 seconds of latency to *every* transaction processed on the system. Bye, bye, performance... -- (reverse (concatenate 'string "moc.liamg" "@" "enworbbc")) http://cbbrowne.com/info/ If we were meant to fly, we wouldn't keep losing our luggage.
> On Wed, Nov 02, 2005 at 03:44:26PM -0800, Marc Munro wrote: >> experts there may suggest a better solution. I have seen talk of >> disabling the standard slony triggers to allow this sort of thing but >> whether that is more or less nasty is questionable. > > FWIW, I don't think that's the question; it's more like whether it'd > be merely horribly nasty or likely to break in unexpected and really > painful ways. ;-) But the discussion around that surely should move > to the Slony list. It seems to me that lots of the "stuff" in Slony-I could be reapplied to _try_ to create an asynchronous multimaster replication system. A *major* addition would need to be some form of "conflicts queue." That's the sort of thing they have in the analagous "O-word" replication system. What's a non-starter is to try to reshape the Slony-I project into "async multimaster." That would get considerable push-back :-). But if someone decided to "fork" their own *new* project, perhaps starting based on one of the releases, that would an entirely interesting idea. -- output = reverse("moc.enworbbc" "@" "enworbbc") http://cbbrowne.com/info/languages.html To quote from a friend's conference talk: "they told me that their network was physically secure, so I asked them `then what's with all these do-not-leave-valuables-in-your-desk signs?'". -- Henry Spencer
On Thu, Nov 03, 2005 at 10:29:56PM -0500, Christopher Browne wrote: > > On Wed, Nov 02, 2005 at 03:44:26PM -0800, Marc Munro wrote: > >> experts there may suggest a better solution. I have seen talk of > >> disabling the standard slony triggers to allow this sort of thing but > >> whether that is more or less nasty is questionable. > > > > FWIW, I don't think that's the question; it's more like whether it'd > > be merely horribly nasty or likely to break in unexpected and really > > painful ways. ;-) But the discussion around that surely should move > > to the Slony list. > > It seems to me that lots of the "stuff" in Slony-I could be reapplied > to _try_ to create an asynchronous multimaster replication system. > > A *major* addition would need to be some form of "conflicts queue." > > That's the sort of thing they have in the analagous "O-word" > replication system. > > What's a non-starter is to try to reshape the Slony-I project into > "async multimaster." That would get considerable push-back :-). > > But if someone decided to "fork" their own *new* project, perhaps > starting based on one of the releases, that would an entirely > interesting idea. Wouldn't async multimaster make use of most all of what slony-I currently has? ISTM that it would make life a lot easier to use one combined project rather than two... -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
>> But if someone decided to "fork" their own *new* project, perhaps >> starting based on one of the releases, that would an entirely >> interesting idea. > > Wouldn't async multimaster make use of most all of what slony-I > currently has? ISTM that it would make life a lot easier to use one > combined project rather than two... When you combine projects, you require the participants to participate in the union of the complexity of the projects. The project can't generate releases unless they all coordinate a release, and if their interests differ, that can be tough to do... There are OSes we could name where increasing sets of participants are having that very effect... If projects remain largely independent, they can limit themselves to their respective individual sets of complexities. That's precisely why the PostgreSQL project is trying to push as many of the "contrib" things out to outside projects as possible. There's a famous saying about "sufficient to the day is the evil thereof;" we might substitute "project" for "day" in that ;-). -- (reverse (concatenate 'string "moc.liamg" "@" "enworbbc")) http://linuxfinances.info/info/wp.html "Whenever you find that you are on the side of the majority, it is time to reform." -- Mark Twain
I thought pgreplicator did provide multi-master replication and i did test it as well.
http://pgreplicator.sourceforge.net/
vish
http://pgreplicator.sourceforge.net/
vish
On 11/3/05, Andrew Sullivan <ajs@crankycanuck.ca> wrote:
On Wed, Nov 02, 2005 at 05:23:34PM -0600, Jim C. Nasby wrote:
> > It seems kludgey this way, though. What you really need is
> > multimaster with conflict resolution, because you can depend on your
>
> Isn't Slony2 supposed to do just that?
Well, to the extent that slony 2 ever is going to do anything, our
original aim was multimaster replication in a single data centre.
Some folks have been working on various approaches that seem to be
bearing real bitter fruit, though, and I don't know how hopeful I am
these days for a general-purpose tool that will do that.
It's certainly a laudable goal, though, and if people want to work on
such a target, I'd sure like to know about it.
A
--
Andrew Sullivan | ajs@crankycanuck.ca
"The year's penultimate month" is not in truth a good way of saying
November.
--H.W . Fowler
---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend