Re: PG replication across DataCenters - Mailing list pgsql-general

From Bill Moran
Subject Re: PG replication across DataCenters
Date
Msg-id 20131229081927.ee3d1638e65f0f8e1d35b509@potentialtech.com
Whole thread Raw
In response to Re: PG replication across DataCenters  (Sameer Kumar <sameer.kumar@ashnik.com>)
Responses Re: PG replication across DataCenters  (Sameer Kumar <sameer.kumar@ashnik.com>)
List pgsql-general
On Tue, 24 Dec 2013 14:39:42 +0800 Sameer Kumar <sameer.kumar@ashnik.com> wrote:
>
> * Cascading replication chains (a really big deal when you want
>
>   multiple slaves in the secondary facility and don't want to hog
>
>   your bandwidth)
>
> Really? which version of Postgres are we talking about? I think cascaded
> replication facility is available since v9.2
> http://www.postgresql.org/docs/9.2/static/warm-standby.html#CASCADING-REPLICATION

Nice.  Seems it's been longer than I realized since I last evaluated
what streaming replication is capble of.

> * Quick and easy movement of the master to any of the database in
>
>   the cluster without destroying replication.
>
> Again, which version? Re-mastering is made simple in v9.3.

I'm not seeing that in the documentation.  In fact, what I'm finding
seems to suggest the opposite: that each node's master is configured
in a config file, so in the case of a complicated replication setup,
I would have to run around editing config files on multiple servers
to move the master ... unless I'm missing something in the documentation.

> * Seeding of new slaves without interrupting existing nodes (assuming
>
>   your hardware has a little free capacity)
>
> AFAIK, streaming replication does not cause any interruption while you add
> a new node.

The process is still significantly more involved than Slony's subscription
commands.  In our lab setups, I've watched junior DBA's fail time and time
again to get a proper seed with streaming replication.

> In general I do not like trigger based (replication) solutions for huge
> clusters [this is my personal opinion and does not necessarily indicate my
> employer's opinion ;-)] and for databases which has huge write volume
> specifically if you do bulk insert/delete/update operations.

There are definitely drawbacks, I'll grant you that.  If it's raw throughput
you need, Slony probably isn't going to cut it for you.

> I think if it's slony or streaming replication will depend on below factors:
>
> 1) The change-set that you want to replicate contributes how much of your
> total change set? e.g. on a per minute basis if it's 70% or above, I will
> recommend you to go for streaming replication

While this is a strong argument in favor of streaming over Slony, the
70% number seems rather arbitrary, and you're advocating that this point
alone is enough to outweight the other advantages of Slony, which may be
more important in a particular case.

> 2) Do you have too many tables to be added to replication set? lets say
> above 70% of your total tables needs to be replication (unless rest 30%
> have high write operations), then go for streaming replication

Again, this seems arbitrary.  If the management that Slony provides is
needed, then why would I care what percentage of tables are involved?

> 3) Do you too many bulk operations happening on the tables which needs to
> be replicated

This is arguably a shortcoming of trigger-based replication that trumps just
about everything else.  If Slony just can't keep up, then you don't have much
choice.

> 4) To some extent your choice will be influenced by the motivation behind
> replication, DR, HA, reporting application (esp if you are particular about
> replicating only selective tables for reports)

In my experience, this is usually the largest factor.

Once argument in favor of streaming that you missed is when you have no
control over the schema (for example, when it's 3rd party, like an openfire
database).  In those cases, the application frequently omits things like
primary keys (which are required for slony) and has an upgrade process that
assumes it can change database tables without impacting anything else.

> There are few easier ways of managing a slony cluster:
>
> 1)
>
http://shoaibmir.wordpress.com/2009/08/05/setting-up-slony-cluster-with-perltools/<http://shoaibmir.wordpress.com/2009/08/05/setting-up-slony-cluster-with-perltools/>
>
> 2) I think even pgadmin supports slony replication (not sure if its slony-I
> or slony-II)

I'll add dbsteward to this list, as we wrote it (in part) to make slony
management easier on systems that experience frequent change.

--
Bill Moran <wmoran@potentialtech.com>


pgsql-general by date:

Previous
From: Shiv Sharma
Date:
Subject: Do all Postgres queries touch Shared_Buffers at some point?
Next
From: Michael Paquier
Date:
Subject: Re: Do all Postgres queries touch Shared_Buffers at some point?