Thread: peer-to-peer replication with Postgres
I'm considering using a cloud hosting solution for my website. It will probably be either Amazon, Rackspace or Hosting.com. I'm still comparing. Either way, my site will consist of multiple virtual server instances that I can create and destroy as needed. Each virtual machine instance will be self contained, meaning it'll run the website and its own instance of postgres. The website will only talk to the local DB instance. However, if I'm running several machine instances, I want all the databases to keep in sync preferably with as little lag as possible. This is not a master/slave replication issue where there's one big DB that's always up and everything syncs to, this is basically total peer-to-peer replication where any time data is updated on one server, an update command gets sent to all the other servers. I would also have to address the issue when I provision a new virtual server, I'd have to import the current data into the DB seamlessly. What's the best way to do this? Looks like something like pgPool might be what I want, but I haven't looked into it deeply yet. Thanks!! Mike
On Sat, May 8, 2010 at 12:12 AM, Mike Christensen <mike@kitchenpc.com> wrote: > What's the best way to do this? Looks like something like pgPool > might be what I want, but I haven't looked into it deeply yet. I don't think your requirement and postgres are consistent with each other. Unless your data volume is *so* tiny that copying it takes just a few seconds, this concept just won't work. Besides the fact that I don't think there is a master-master solution that does not impose a lot of overhead and will deal gracefully with nodes disappearing and appearing arbitrarily.
On Fri, May 7, 2010 at 10:12 PM, Mike Christensen <mike@kitchenpc.com> wrote: > I'm considering using a cloud hosting solution for my website. It > will probably be either Amazon, Rackspace or Hosting.com. I'm still > comparing. Either way, my site will consist of multiple virtual > server instances that I can create and destroy as needed. Each > virtual machine instance will be self contained, meaning it'll run the > website and its own instance of postgres. The website will only talk > to the local DB instance. However, if I'm running several machine > instances, I want all the databases to keep in sync preferably with as > little lag as possible. > > This is not a master/slave replication issue where there's one big DB > that's always up and everything syncs to, this is basically total > peer-to-peer replication where any time data is updated on one server, > an update command gets sent to all the other servers. I would also > have to address the issue when I provision a new virtual server, I'd > have to import the current data into the DB seamlessly. > > What's the best way to do this? I think right now you're stuck coding it up yourself. No small task. > Looks like something like pgPool > might be what I want, but I haven't looked into it deeply yet. > Thanks!! The only thing that gets close is bucardo.
Thanks for the advice. In that case, I'll stick with the standard approach of having a single SQL server and several web frontends and employ a caching mechanism such as memcache as well. Thanks! Mike On Mon, May 10, 2010 at 9:30 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > On Fri, May 7, 2010 at 10:12 PM, Mike Christensen <mike@kitchenpc.com> wrote: >> I'm considering using a cloud hosting solution for my website. It >> will probably be either Amazon, Rackspace or Hosting.com. I'm still >> comparing. Either way, my site will consist of multiple virtual >> server instances that I can create and destroy as needed. Each >> virtual machine instance will be self contained, meaning it'll run the >> website and its own instance of postgres. The website will only talk >> to the local DB instance. However, if I'm running several machine >> instances, I want all the databases to keep in sync preferably with as >> little lag as possible. >> >> This is not a master/slave replication issue where there's one big DB >> that's always up and everything syncs to, this is basically total >> peer-to-peer replication where any time data is updated on one server, >> an update command gets sent to all the other servers. I would also >> have to address the issue when I provision a new virtual server, I'd >> have to import the current data into the DB seamlessly. >> >> What's the best way to do this? > > I think right now you're stuck coding it up yourself. No small task. > >> Looks like something like pgPool >> might be what I want, but I haven't looked into it deeply yet. >> Thanks!! > > The only thing that gets close is bucardo. >
On Mon, May 10, 2010 at 7:04 PM, Mike Christensen <mike@kitchenpc.com> wrote: > Thanks for the advice. In that case, I'll stick with the standard > approach of having a single SQL server and several web frontends and > employ a caching mechanism such as memcache as well. Thanks! And with 9.0 it will be pretty easy to setup hot read PITR slaves so you can build a pretty simple failover system.
On Mon, May 10, 2010 at 7:21 PM, Mike Christensen <mike@kitchenpc.com> wrote: > Man that sounds awesome. I need that now. So does that mean you'd > have one beefy SQL server for all the updates and everything writes to > that, and then you'd have a bunch of read-only servers and new data > trickles into them from the master continuously? Yep. You can also do something similar but less efficient now with slony or some other replication engine. But they're less simple to set up and usually less efficient than log shipping.
The concept of updating one database and doing all your reads from another database is kinda confusing to me. Does that mean you have to design your whole app around that concept, have a different connection string and what not for your "writable" database and "read-only databases"? I'm using Castle ActiveRecord which I'm not even sure supports that (without a ton of custom code anyway). Is there any sort of abstraction layer (like in the driver level) that can abstract that and just make updates go to one DB and reads round-robin to other DBs? Hopefully there's a way to make this design simple to implement. Mike On Mon, May 10, 2010 at 6:23 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > On Mon, May 10, 2010 at 7:21 PM, Mike Christensen <mike@kitchenpc.com> wrote: >> Man that sounds awesome. I need that now. So does that mean you'd >> have one beefy SQL server for all the updates and everything writes to >> that, and then you'd have a bunch of read-only servers and new data >> trickles into them from the master continuously? > > Yep. You can also do something similar but less efficient now with > slony or some other replication engine. But they're less simple to > set up and usually less efficient than log shipping. >
On Mon, May 10, 2010 at 8:00 PM, Mike Christensen <mike@kitchenpc.com> wrote: > The concept of updating one database and doing all your reads from > another database is kinda confusing to me. Does that mean you have to > design your whole app around that concept, have a different connection > string and what not for your "writable" database and "read-only > databases"? I'm using Castle ActiveRecord which I'm not even sure > supports that (without a ton of custom code anyway). > > Is there any sort of abstraction layer (like in the driver level) that > can abstract that and just make updates go to one DB and reads > round-robin to other DBs? Hopefully there's a way to make this design > simple to implement. Pretty sure pgpool can do the "read from these dbs, write to this one".
Man that sounds awesome. I need that now. So does that mean you'd have one beefy SQL server for all the updates and everything writes to that, and then you'd have a bunch of read-only servers and new data trickles into them from the master continuously? Mike On Mon, May 10, 2010 at 6:09 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > On Mon, May 10, 2010 at 7:04 PM, Mike Christensen <mike@kitchenpc.com> wrote: >> Thanks for the advice. In that case, I'll stick with the standard >> approach of having a single SQL server and several web frontends and >> employ a caching mechanism such as memcache as well. Thanks! > > And with 9.0 it will be pretty easy to setup hot read PITR slaves so > you can build a pretty simple failover system. >
Scott Marlowe wrote: >> Is there any sort of abstraction layer (like in the driver level) that >> can abstract that and just make updates go to one DB and reads >> round-robin to other DBs? Hopefully there's a way to make this design >> simple to implement. >> > > Pretty sure pgpool can do the "read from these dbs, write to this one". > how would it know if you're going to do updates later on in a transaction?
On Mon, May 10, 2010 at 8:59 PM, John R Pierce <pierce@hogranch.com> wrote: > Scott Marlowe wrote: >>> >>> Is there any sort of abstraction layer (like in the driver level) that >>> can abstract that and just make updates go to one DB and reads >>> round-robin to other DBs? Hopefully there's a way to make this design >>> simple to implement. >>> >> >> Pretty sure pgpool can do the "read from these dbs, write to this one". >> > > > how would it know if you're going to do updates later on in a transaction? Go here: http://pgpool.projects.postgresql.org/pgpool-II/doc/pgpool-en.html look for replicate_select
On 10/05/10 23:20, Vick Khera wrote: > On Sat, May 8, 2010 at 12:12 AM, Mike Christensen <mike@kitchenpc.com> wrote: >> What's the best way to do this? Looks like something like pgPool >> might be what I want, but I haven't looked into it deeply yet. > > I don't think your requirement and postgres are consistent with each > other. Seconded. I think the OP needs to look into "eventually consistent" horizontally-scalable databases that're designed for this use - but don't usually offer an SQL interface or ACID semantics. -- Craig Ringer Tech-related writing: http://soapyfrogs.blogspot.com/