Hi,
On 2013-11-19 18:49:27 -0500, Steve Singer wrote:
> >But for that the receiving side needs to know up to where changes have
> >been applied. One relatively easy solution for that is that the
> >receiving side does something like:
> >UPDATE replication_progress SET lsn = '0/10000600' WHERE source_id = ...;
> >before the end of every replayed transaction. But that obviously will
> >quickly cause bloat.
>
> I don't see how this is going to cause any more bloat than what
> trigger-based slony does today with sl_confirm and I don't hear a lot of
> complaints about that being a big problem.
FWIW, bloat on slony's tables (including sl_confirm) is one of the major
reasons I've seen people move away from slony for production, and use it
only for upgrades.
It's only really a problem if you have longrunning transactions on the
standby, but that's a pretty major use-case of having replicas.
> This might be because slony doesn't do a commit on the replica for
> every transaction but groups the transactions together, logical slony
> will behave the same way where we would only commit on SYNC
> transactions.
But yes, the grouping of transactions certainly makes for a major
difference. I don't think we want to force solutions to commit
transactions in batches. Not the least because that obviously prohibits
using a standby as a synchronous replica.
> >* Do we want to allow setting (remote_lsn, remote_timestamp,
> > remote_node_id) via SQL? Currently the remote_node_id can be set as a
> > GUC, but the other's can't. They probably should be a function that
> > can be called instead of GUCs?
>
> A way of advancing the replication pointer via SQL would be nice, otherwise
> I'll just have to write my own C function that I will invoke via SQL (which
> sin't hard but everyone would need to do the same)
But don't you already essentially perform the actual inserts via C in
new slonys? That's mainly the reason I wasn't sure it's needed.
But then, providing a function to do that setup isn't hard.
> What does building up node_id key from (sysid, tlid, remote_dbid,
> local_dbid, name) get us over just mapping from an arbitrary name
> field to a 16 bit node_id ?
It avoids the need to manually assign ids to systems in many cases. I've
seen people complain about that a fair bit.
But it seems pretty clear that a more arbitrary identifier is preferred
so far, so I'll go for that.
Thanks for the comments,
Andres Freund
-- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services