Re: Replication Node Identifiers and crashsafe Apply Progress - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Replication Node Identifiers and crashsafe Apply Progress
Date
Msg-id 20131119165722.GF19293@alap2.anarazel.de
Whole thread Raw
In response to Re: Replication Node Identifiers and crashsafe Apply Progress  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Replication Node Identifiers and crashsafe Apply Progress  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Hi,

On 2013-11-19 07:40:30 -0500, Robert Haas wrote:
> > This way, after a crash, restart, disconnect the replay process can look
> > into shared memory and check how far it has already replayed and restart
> > seamlessly. With minimal effort.
> 
> It would be much less invasive for the replication apply code to fsync
> its own state on the apply side.  Obviously, that means doubling the
> fsync rate, which is not appealing, but I think that's still a useful
> way to think about what you're aiming to accomplish here: avoid
> doubling the fsync rate when applying remote transactions in a
> crash-safe manner.

Exactly.

> Although I agree that we need a way to do that, I don't have a
> particularly warm and fuzzy feeling about this particular proposal:
> there are too many bits of it that feel like entirely arbitrary design
> decisions.  If we're going to build a full-fledged logical replication
> solution into core, attempting to obsolete Slony and Bucardo and
> Londiste and everything that's out there, then I think we have a great
> deal of design work that we have to do before we start committing
> things, or even finalizing designs.  If we're going to continue with
> the philosophy of building a toolkit that can serve as a building
> block for multiple solutions, then color me unconvinced that this will
> do the job.

Imo we actually want and need both, wanting something builtin doesn't
preclude important usecases that need to be served by other solutions.

I think - while the API certainly needs work - the general idea
integrates pretty well with the pretty generic changeset extraction
mechanism and possible solutions replication between postgres servers.

Note that this really is a draft of what I think is needed, written
after the experience of developing a solution for the problem in a
specific replication solution and talking to some people implementing
replication solutions. Maybe somebody has a far better idea to implement
this: I am all ears!

> If we made the xlog system truly extensible, that seems like it'd
> punch your ticket here.  I'm not sure how practical that is, though.

I don't think it is.

> > We previously discussed the topic and some were very adverse to using
> > any sort of numeric node identifiers across systems and suggested that
> > those should only be used internally. So what the attached patch does is
> > to add a new shared system catalog called 'pg_replication_identifier'
> > (suggestions for a better name welcome) which translates a number of
> > identifying traits into a numeric identifier.
> > The set of identifiers currently are:
> > * the sysid of the remote system, combined with the remote TLI
> > * the oid of the local database
> > * the oid of the remote database
> > * an optional name
> > but that's just what we needed in our multimaster prototype, and not
> > what I necessarily think is correct.
> 
> The fact that you've included both local and remote database OIDs
> seems wrong; shouldn't the replication identifier only serve to
> identify the source node, not the replication stream?  What if you
> want to replicate from table A to table B within the same database?

The reason I chose those parameters is that they avoid the need for a
human to assign identifiers in many situations since they already are
unique. For the cases where they aren't I've included the "name" to
distinguish several streams.

The reason both source and target database are included is that it
avoids manual work if you want to replicate between two databases in
both directions.

> We need some kind of pretty flexible system here, if we're not to box
> ourselves into a corner.

Agreed. As an alternative we could just have a single - probably longer
than NAMEDATALEN - string to identify replication progress and rely on
the users of the facility to build the identifier automatically
themselves using components that are helpful in their system.

Thanks,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: David Johnston
Date:
Subject: Re: Suggestion: Issue warning when calling SET TRANSACTION outside transaction block
Next
From: Robert Haas
Date:
Subject: Re: Re: Suggestion: Issue warning when calling SET TRANSACTION outside transaction block