Re: Replication Node Identifiers and crashsafe Apply Progress - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Replication Node Identifiers and crashsafe Apply Progress
Date
Msg-id 20131121110455.GH7240@alap2.anarazel.de
Whole thread Raw
In response to Re: Replication Node Identifiers and crashsafe Apply Progress  (Steve Singer <steve@ssinger.info>)
List pgsql-hackers
Hi,


On 2013-11-19 18:49:27 -0500, Steve Singer wrote:
> >But for that the receiving side needs to know up to where changes have
> >been applied. One relatively easy solution for that is that the
> >receiving side does something like:
> >UPDATE replication_progress SET lsn = '0/10000600' WHERE source_id = ...;
> >before the end of every replayed transaction. But that obviously will
> >quickly cause bloat.
> 
> I don't see how this is going to cause any more bloat than what
> trigger-based slony does today with sl_confirm and I don't hear a lot of
> complaints about that being a big problem.

FWIW, bloat on slony's tables (including sl_confirm) is one of the major
reasons I've seen people move away from slony for production, and use it
only for upgrades.
It's only really a problem if you have longrunning transactions on the
standby, but that's a pretty major use-case of having replicas.

> This might be because slony doesn't do a commit on the replica for
> every transaction but groups the transactions together, logical slony
> will behave the same way where we would only commit on SYNC
> transactions.

But yes, the grouping of transactions certainly makes for a major
difference. I don't think we want to force solutions to commit
transactions in batches. Not the least because that obviously prohibits
using a standby as a synchronous replica.

> >* Do we want to allow setting (remote_lsn, remote_timestamp,
> >   remote_node_id) via SQL? Currently the remote_node_id can be set as a
> >   GUC, but the other's can't. They probably should be a function that
> >   can be called instead of GUCs?
> 
> A way of advancing the replication pointer via SQL would be nice, otherwise
> I'll just have to write my own C function that I will invoke via SQL (which
> sin't hard but everyone would need to do the same)

But don't you already essentially perform the actual inserts via C in
new slonys? That's mainly the reason I wasn't sure it's needed.

But then, providing a function to do that setup isn't hard.

> What does building up node_id key from (sysid, tlid, remote_dbid,
> local_dbid, name) get us over just mapping from an arbitrary name
> field to a 16 bit node_id ?

It avoids the need to manually assign ids to systems in many cases. I've
seen people complain about that a fair bit.

But it seems pretty clear that a more arbitrary identifier is preferred
so far, so I'll go for that.

Thanks for the comments,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: b21de4e7b32f868a23bdc5507898d36cbe146164 seems to be two bricks shy of a load
Next
From: Andres Freund
Date:
Subject: Re: Replication Node Identifiers and crashsafe Apply Progress