Re: Very high latency, low bandwidth replication - Mailing list pgsql-general

From Bob Jolliffe
Subject Re: Very high latency, low bandwidth replication
Date
Msg-id CACd=f9cD6qGz8Z3=bcTtTr_opvkRjNtoL2ebfyJqOpzmSz37CA@mail.gmail.com
Whole thread Raw
In response to Re: Very high latency, low bandwidth replication  (Francisco Olarte <folarte@peoplecall.com>)
Responses Re: Very high latency, low bandwidth replication
List pgsql-general
Thanks Francisco for these inputs.  I hadn't considered log shipping as I knew I didn't want to track changes to all tables (and databases).  Setting up a local partial mirror is an interesting thought which hadn't crossed my mind .. I'll giver that some consideration.  

Though currently I am thinking to address the problem of generating deltas at the application level rather than to use postgresql features which are largely optimized for a slightly different set of circumstances and requirements.

Impressive what can be done witha 2400 baud modem when you set your mind to it.  Fortunately this days are mostly behind us :-)


On 30 June 2014 13:05, Francisco Olarte <folarte@peoplecall.com> wrote:
Hi Bob.

On Mon, Jun 30, 2014 at 10:05 AM, Bob Jolliffe <bobjolliffe@gmail.com> wrote:
> What are people's thoughts about a more optimal solution?  I would like to
> use a more incremental approach to replication.  This does not have to be a
> "live" replication .. asynchronously triggering once every 24 hours is
> sufficient.  Also there are only a subset of tables which are required (the
> rest consist of data which is generated).


If you only need to replicate once every 24 hours, which means you can
tolerate lags, you could try log shipping. Instead of sending the wal
records from master to standby directly just spool them, compress them
as much as you can ( I would try pglesslog plus an XZ on it's output
), and send it once a day. This for the 'incremental part'. For the
only a subset of tables, you could try to set up a local partial
mirror using any of the trigger based replication products and then do
log-shipping of that.

Also, the logical replication slot stuff added to the latest version
seems really promissing for this kind of thing, but I'm not familiar
enough with it to recommend anything.

Also, depending on your data updating patterns, database sizes and
other stuff, a trigger based replication approach can save a lot of
traffic. I mean, if you have records which are heavily updated, but
only replicate once a day, you can collapse all the day stuff in a
single update. I once did a similar thing to transmit deltas over a
2400bps modem by making daily sorted dumps and sending daily deltas
with previous day ( it needed a bit of coding, about a couple hundred
lines, but produced ridiculously small deltas, and with a bit of care
their application was idempotent, which simplified the recovery on
errors ).

   Francisco Olarte.

pgsql-general by date:

Previous
From: Edson Richter
Date:
Subject: Re : Re : Query "top 10 and others"
Next
From: Bob Jolliffe
Date:
Subject: Re: Very high latency, low bandwidth replication