Thread: replication from multiple "master" servers to a single read-only slave

replication from multiple "master" servers to a single read-only slave

From
Omar Mehmood
Date:
I'm wondering if it's possible to have a setup with multiple "master" servers replicating to a single slave.  I can
guaranteethat each server will generate unique PK values for all tables and all the data is partitioned (logically by
server)across the servers.  I would simply like to have a read-only slave that is a picture of all the servers' data
(relativelyup to date).  The individual "master" servers never need to know about each other's data (i.e. they do not
_need_to "sync" with each other, nor do I want them to be sync'd). 

Would it be possible to use PostgreSQL PITR feature to support this functionality ?  All of the data
created/updated/deletedper server is unique to that server, so replaying the log to the slave should technically be
safeand the replaying logs from multiple servers should be safe as well (as long as the relative order of replay is
preserved). I'm just wondering how to get around the numbering of the log (WAL) files and the slave's tracking of the
logfiles that it has already processed. 

I can certainly write my own application log module that runs on each server, ship over the log to the slave machine
andreplay the logs to the slave (in the meanwhile ensuring that the order of replay is preserved and all that good
stuff),but I'm trying to find a quick(er) solution for the short term. 

Please note that I'd like to avoid using PostgreSQL data partitioning as well as any DB triggers (in case anyone was
goingto go down that path as part of the solution). 

Omar




Re: replication from multiple "master" servers to a single read-only slave

From
David Fetter
Date:
On Mon, Jan 11, 2010 at 03:02:18PM -0800, Omar Mehmood wrote:
> I'm wondering if it's possible to have a setup with multiple
> "master" servers replicating to a single slave.  I can guarantee
> that each server will generate unique PK values for all tables and
> all the data is partitioned (logically by server) across the
> servers.  I would simply like to have a read-only slave that is a
> picture of all the servers' data (relatively up to date).  The
> individual "master" servers never need to know about each other's
> data (i.e. they do not _need_ to "sync" with each other, nor do I
> want them to be sync'd).
>
> Would it be possible to use PostgreSQL PITR feature to support this
> functionality ?

No, but you could use something like Slony to do this.

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: replication from multiple "master" servers to a single read-only slave

From
Omar Mehmood
Date:
Thanks for the suggestions.

I really don't want to use separate schemas for each master to logically partition the data.  I ensure that the data on
eachmaster will not clash with each other (in terms of any DB level contraints such as PK), so I'd much prefer they all
residein a single schema.  Also, my understanding is that Slony uses DB triggers to track changes (but I want to avoid
usingDB triggers). 

Another additional constraint-- the master servers may not always have connectivity to the slave machine, so the chosen
mechanismneeds to be robust and not assume 100% uptime. 

For truly simple, I could do a periodic data dump of the database on the server, ship to slave, and restore.  However,
thisis way overkill since there won't be that many changes in the data for the period of time that it will run (e.g.
every15-20 minutes).  Plus, I might end up in the situation where the slave will start to increasingly lag behind over
time(depends on the number of master servers and amount of data). 

I will check out Bucardo.

Omar

--- On Mon, 1/11/10, Ben Chobot <bench@silentmedia.com> wrote:

> From: Ben Chobot <bench@silentmedia.com>
> Subject: Re: [GENERAL] replication from multiple "master" servers to a single read-only slave
> To: "Omar Mehmood" <omarmehmood@yahoo.com>
> Date: Monday, January 11, 2010, 6:58 PM
> I'm pretty sure you can do this with
> Bucardo, and I *think* you can do it via Slony, if you're
> willing to use seperate schemas for each master. I know you
> cannot do this with PITR.
>
> On Jan 11, 2010, at 4:02 PM, Omar Mehmood wrote:
>
> > I'm wondering if it's possible to have a setup with
> multiple "master" servers replicating to a single
> slave.  I can guarantee that each server will generate
> unique PK values for all tables and all the data is
> partitioned (logically by server) across the servers. 
> I would simply like to have a read-only slave that is a
> picture of all the servers' data (relatively up to
> date).  The individual "master" servers never need to
> know about each other's data (i.e. they do not _need_ to
> "sync" with each other, nor do I want them to be sync'd).
> >
> > Would it be possible to use PostgreSQL PITR feature to
> support this functionality ?  All of the data
> created/updated/deleted per server is unique to that server,
> so replaying the log to the slave should technically be safe
> and the replaying logs from multiple servers should be safe
> as well (as long as the relative order of replay is
> preserved).  I'm just wondering how to get around the
> numbering of the log (WAL) files and the slave's tracking of
> the log files that it has already processed.
> >
> > I can certainly write my own application log module
> that runs on each server, ship over the log to the slave
> machine and replay the logs to the slave (in the meanwhile
> ensuring that the order of replay is preserved and all that
> good stuff), but I'm trying to find a quick(er) solution for
> the short term.
> >
> > Please note that I'd like to avoid using PostgreSQL
> data partitioning as well as any DB triggers (in case anyone
> was going to go down that path as part of the solution).
> >
> > Omar
> >
> >
> >
> >
> > --
> > Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-general
>
>




Re: replication from multiple "master" servers to a single read-only slave

From
Craig Ringer
Date:
Omar Mehmood wrote:
> Would it be possible to use PostgreSQL PITR feature to support this
> functionality ?  All of the data created/updated/deleted per server
> is unique to that server, so replaying the log to the slave should
> technically be safe and the replaying logs from multiple servers
> should be safe as well (as long as the relative order of replay is
> preserved).  I'm just wondering how to get around the numbering of
> the log (WAL) files and the slave's tracking of the log files that it
> has already processed.

No, it can't be done with PITR and WAL-shipping. The write-ahead logs
are at too low a level and rely on the block layout of the Pg cluster.
WAL-shipping only works where master and slave start out with the exact
same data directory contents, with all the same block layout, same oids
for tables/types/etc. Just having the same tuples in tables of the same
names is not sufficient.

Given that you can't even WAL-ship from a master to a slave created by
pg_restore from a dump of the master, you can probably see why
WAL-shipping from multiple masters absolutely cannot work.

For this, you need something higher level that replicates at the
tuple-change level. A trigger-based system like Slony or Bucardo is most
likely to fit your needs.

> Please note that I'd like to avoid using PostgreSQL data partitioning
> as well as any DB triggers (in case anyone was going to go down that
> path as part of the solution).

At present Pg doesn't offer safe and convenient C-level hooks for
replication systems to attach to in order to record tuple changes.
Recording of tuple changes for replication must be done with triggers.
As you need a tuple-change level replication system, you're pretty much
out of luck.

--
Craig Ringer

Re: replication from multiple "master" servers to a single read-only slave

From
Dimitri Fontaine
Date:
Omar Mehmood <omarmehmood@yahoo.com> writes:
> I really don't want to use separate schemas for each master to
> logically partition the data.  I ensure that the data on each master
> will not clash with each other (in terms of any DB level contraints
> such as PK), so I'd much prefer they all reside in a single schema.
> Also, my understanding is that Slony uses DB triggers to track changes
> (but I want to avoid using DB triggers).

If you want to avoid all PostgreSQL features… well I don't see that I
can help you here.

If you were to change your mind the following document talks about how
to federate data from several databases to the same central one, and
using either inheritance or triggers to move the incoming data from the
N origin schemas to the central one.

I guess how the data gets to being available for your central queries is
not solved by refusing to use a schema per origin server.

  http://wiki.postgresql.org/wiki/Londiste_Tutorial#Federated_database

> Another additional constraint-- the master servers may not always have
> connectivity to the slave machine, so the chosen mechanism needs to be
> robust and not assume 100% uptime.

Londiste fits this need.

Regards,
--
dim

Re: replication from multiple "master" servers to a single read-only slave

From
Joshua Tolley
Date:
On Mon, Jan 11, 2010 at 04:18:30PM -0800, Omar Mehmood wrote:
> (but I want to avoid using DB triggers).

<snip>

> I will check out Bucardo.

Bucardo uses triggers just like Slony does. That said, it seems strange that
you'd want to avoid them. Is there any particular reason you want to avoid
them?

Bucardo should handle the disconnection problems you described just fine.

--
Joshua Tolley / eggyknap
End Point Corporation
http://www.endpoint.com

Attachment

Re: replication from multiple "master" servers to a single read-only slave

From
Omar Mehmood
Date:
Thanks for note.  I'm trying to avoid triggers based on past trauma from going into a client setup that uses a zillion
ofthem and having to wait in order of seconds for a simple mod DML statement to execute :)  I've been looking for an
alternativethat either sits above the database level (without building my own) or hoping there is a binary logging
option. Anyway, I've been reading up on triggers, replication and synchronization options for PostgreSQL and it seems
likethe performance impact isn't too bad.  In particular, I've been reading up on Bucardo and it seems to address all
myneeds-- I also fired off an email to Selena D on it and she's also affirmed that it looks to solve my problem.  I'm
goingto setup an environment for testing and then post my questions directly to the bucardo-users mailing list. 

Omar

--- On Tue, 1/12/10, Joshua Tolley <eggyknap@gmail.com> wrote:

> From: Joshua Tolley <eggyknap@gmail.com>
> Subject: Re: [GENERAL] replication from multiple "master" servers to a single read-only slave
> To: "Omar Mehmood" <omarmehmood@yahoo.com>
> Cc: "Ben Chobot" <bench@silentmedia.com>, pgsql-general@postgresql.org
> Date: Tuesday, January 12, 2010, 2:12 PM
> On Mon, Jan 11, 2010 at 04:18:30PM
> -0800, Omar Mehmood wrote:
> > (but I want to avoid using DB triggers).
>
> <snip>
>
> > I will check out Bucardo.
>
> Bucardo uses triggers just like Slony does. That said, it
> seems strange that
> you'd want to avoid them. Is there any particular reason
> you want to avoid
> them?
>
> Bucardo should handle the disconnection problems you
> described just fine.
>
> --
> Joshua Tolley / eggyknap
> End Point Corporation
> http://www.endpoint.com
>