Thread: peer-to-peer replication with Postgres

peer-to-peer replication with Postgres

From

Mike Christensen

Date:

08 May 2010, 02:18:39

I'm considering using a cloud hosting solution for my website.  It
will probably be either Amazon, Rackspace or Hosting.com.  I'm still
comparing.  Either way, my site will consist of multiple virtual
server instances that I can create and destroy as needed.  Each
virtual machine instance will be self contained, meaning it'll run the
website and its own instance of postgres.  The website will only talk
to the local DB instance.  However, if I'm running several machine
instances, I want all the databases to keep in sync preferably with as
little lag as possible.

This is not a master/slave replication issue where there's one big DB
that's always up and everything syncs to, this is basically total
peer-to-peer replication where any time data is updated on one server,
an update command gets sent to all the other servers.  I would also
have to address the issue when I provision a new virtual server, I'd
have to import the current data into the DB seamlessly.

What's the best way to do this?  Looks like something like pgPool
might be what I want, but I haven't looked into it deeply yet.
Thanks!!

Mike

Re: peer-to-peer replication with Postgres

From

Vick Khera

Date:

10 May 2010, 12:20:38

On Sat, May 8, 2010 at 12:12 AM, Mike Christensen <mike@kitchenpc.com> wrote:
> What's the best way to do this?  Looks like something like pgPool
> might be what I want, but I haven't looked into it deeply yet.

I don't think your requirement and postgres are consistent with each
other.    Unless your data volume is *so* tiny that copying it takes
just a few seconds, this concept just won't work.  Besides the fact
that I don't think there is a master-master solution that does not
impose a lot of overhead and will deal gracefully with nodes
disappearing and appearing arbitrarily.

Re: peer-to-peer replication with Postgres

From

Scott Marlowe

Date:

10 May 2010, 13:30:59

On Fri, May 7, 2010 at 10:12 PM, Mike Christensen <mike@kitchenpc.com> wrote:
> I'm considering using a cloud hosting solution for my website.  It
> will probably be either Amazon, Rackspace or Hosting.com.  I'm still
> comparing.  Either way, my site will consist of multiple virtual
> server instances that I can create and destroy as needed.  Each
> virtual machine instance will be self contained, meaning it'll run the
> website and its own instance of postgres.  The website will only talk
> to the local DB instance.  However, if I'm running several machine
> instances, I want all the databases to keep in sync preferably with as
> little lag as possible.
>
> This is not a master/slave replication issue where there's one big DB
> that's always up and everything syncs to, this is basically total
> peer-to-peer replication where any time data is updated on one server,
> an update command gets sent to all the other servers.  I would also
> have to address the issue when I provision a new virtual server, I'd
> have to import the current data into the DB seamlessly.
>
> What's the best way to do this?

I think right now you're stuck coding it up yourself.  No small task.

>  Looks like something like pgPool
> might be what I want, but I haven't looked into it deeply yet.
> Thanks!!

The only thing that gets close is bucardo.

Re: peer-to-peer replication with Postgres

From

Mike Christensen

Date:

10 May 2010, 22:04:15

Thanks for the advice.  In that case, I'll stick with the standard
approach of having a single SQL server and several web frontends and
employ a caching mechanism such as memcache as well.  Thanks!

Mike

On Mon, May 10, 2010 at 9:30 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Fri, May 7, 2010 at 10:12 PM, Mike Christensen <mike@kitchenpc.com> wrote:
>> I'm considering using a cloud hosting solution for my website.  It
>> will probably be either Amazon, Rackspace or Hosting.com.  I'm still
>> comparing.  Either way, my site will consist of multiple virtual
>> server instances that I can create and destroy as needed.  Each
>> virtual machine instance will be self contained, meaning it'll run the
>> website and its own instance of postgres.  The website will only talk
>> to the local DB instance.  However, if I'm running several machine
>> instances, I want all the databases to keep in sync preferably with as
>> little lag as possible.
>>
>> This is not a master/slave replication issue where there's one big DB
>> that's always up and everything syncs to, this is basically total
>> peer-to-peer replication where any time data is updated on one server,
>> an update command gets sent to all the other servers.  I would also
>> have to address the issue when I provision a new virtual server, I'd
>> have to import the current data into the DB seamlessly.
>>
>> What's the best way to do this?
>
> I think right now you're stuck coding it up yourself.  No small task.
>
>>  Looks like something like pgPool
>> might be what I want, but I haven't looked into it deeply yet.
>> Thanks!!
>
> The only thing that gets close is bucardo.
>

Re: peer-to-peer replication with Postgres

From

Scott Marlowe

Date:

10 May 2010, 22:09:45

On Mon, May 10, 2010 at 7:04 PM, Mike Christensen <mike@kitchenpc.com> wrote:
> Thanks for the advice.  In that case, I'll stick with the standard
> approach of having a single SQL server and several web frontends and
> employ a caching mechanism such as memcache as well.  Thanks!

And with 9.0 it will be pretty easy to setup hot read PITR slaves so
you can build a pretty simple failover system.

Re: peer-to-peer replication with Postgres

From

Scott Marlowe

Date:

10 May 2010, 22:24:01

On Mon, May 10, 2010 at 7:21 PM, Mike Christensen <mike@kitchenpc.com> wrote:
> Man that sounds awesome.  I need that now.  So does that mean you'd
> have one beefy SQL server for all the updates and everything writes to
> that, and then you'd have a bunch of read-only servers and new data
> trickles into them from the master continuously?

Yep.  You can also do something similar but less efficient now with
slony or some other replication engine.  But they're less simple to
set up and usually less efficient than log shipping.

Re: peer-to-peer replication with Postgres

From

Mike Christensen

Date:

10 May 2010, 23:00:50

The concept of updating one database and doing all your reads from
another database is kinda confusing to me.  Does that mean you have to
design your whole app around that concept, have a different connection
string and what not for your "writable" database and "read-only
databases"?  I'm using Castle ActiveRecord which I'm not even sure
supports that (without a ton of custom code anyway).

Is there any sort of abstraction layer (like in the driver level) that
can abstract that and just make updates go to one DB and reads
round-robin to other DBs?  Hopefully there's a way to make this design
simple to implement.

Mike

On Mon, May 10, 2010 at 6:23 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Mon, May 10, 2010 at 7:21 PM, Mike Christensen <mike@kitchenpc.com> wrote:
>> Man that sounds awesome.  I need that now.  So does that mean you'd
>> have one beefy SQL server for all the updates and everything writes to
>> that, and then you'd have a bunch of read-only servers and new data
>> trickles into them from the master continuously?
>
> Yep.  You can also do something similar but less efficient now with
> slony or some other replication engine.  But they're less simple to
> set up and usually less efficient than log shipping.
>

Re: peer-to-peer replication with Postgres

From

Scott Marlowe

Date:

10 May 2010, 23:14:45

On Mon, May 10, 2010 at 8:00 PM, Mike Christensen <mike@kitchenpc.com> wrote:
> The concept of updating one database and doing all your reads from
> another database is kinda confusing to me.  Does that mean you have to
> design your whole app around that concept, have a different connection
> string and what not for your "writable" database and "read-only
> databases"?  I'm using Castle ActiveRecord which I'm not even sure
> supports that (without a ton of custom code anyway).
>
> Is there any sort of abstraction layer (like in the driver level) that
> can abstract that and just make updates go to one DB and reads
> round-robin to other DBs?  Hopefully there's a way to make this design
> simple to implement.

Pretty sure pgpool can do the "read from these dbs, write to this one".

Re: peer-to-peer replication with Postgres

From

Mike Christensen

Date:

10 May 2010, 23:18:44

Man that sounds awesome.  I need that now.  So does that mean you'd
have one beefy SQL server for all the updates and everything writes to
that, and then you'd have a bunch of read-only servers and new data
trickles into them from the master continuously?

Mike

On Mon, May 10, 2010 at 6:09 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Mon, May 10, 2010 at 7:04 PM, Mike Christensen <mike@kitchenpc.com> wrote:
>> Thanks for the advice.  In that case, I'll stick with the standard
>> approach of having a single SQL server and several web frontends and
>> employ a caching mechanism such as memcache as well.  Thanks!
>
> And with 9.0 it will be pretty easy to setup hot read PITR slaves so
> you can build a pretty simple failover system.
>

Re: peer-to-peer replication with Postgres

From

John R Pierce

Date:

10 May 2010, 23:59:36

Scott Marlowe wrote:
>> Is there any sort of abstraction layer (like in the driver level) that
>> can abstract that and just make updates go to one DB and reads
>> round-robin to other DBs?  Hopefully there's a way to make this design
>> simple to implement.
>>
>
> Pretty sure pgpool can do the "read from these dbs, write to this one".
>


how would it know if you're going to do updates later on in a transaction?

Re: peer-to-peer replication with Postgres

From

Scott Marlowe

Date:

11 May 2010, 00:07:46

On Mon, May 10, 2010 at 8:59 PM, John R Pierce <pierce@hogranch.com> wrote:
> Scott Marlowe wrote:
>>>
>>> Is there any sort of abstraction layer (like in the driver level) that
>>> can abstract that and just make updates go to one DB and reads
>>> round-robin to other DBs?  Hopefully there's a way to make this design
>>> simple to implement.
>>>
>>
>> Pretty sure pgpool can do the "read from these dbs, write to this one".
>>
>
>
> how would it know if you're going to do updates later on in a transaction?

Go here:
http://pgpool.projects.postgresql.org/pgpool-II/doc/pgpool-en.html
look for replicate_select

Re: peer-to-peer replication with Postgres

From

Craig Ringer

Date:

11 May 2010, 04:10:27

On 10/05/10 23:20, Vick Khera wrote:
> On Sat, May 8, 2010 at 12:12 AM, Mike Christensen <mike@kitchenpc.com> wrote:
>> What's the best way to do this?  Looks like something like pgPool
>> might be what I want, but I haven't looked into it deeply yet.
>
> I don't think your requirement and postgres are consistent with each
> other.

Seconded. I think the OP needs to look into "eventually consistent"
horizontally-scalable databases that're designed for this use - but
don't usually offer an SQL interface or ACID semantics.

--
Craig Ringer

Tech-related writing: http://soapyfrogs.blogspot.com/