Thread: [HACKERS] Function to move the position of a replication slot

[HACKERS] Function to move the position of a replication slot

From
Magnus Hagander
Date:
PFA a patch that adds a new function, pg_move_replication_slot, that makes it possible to move the location of a replication slot without actually consuming all the WAL on it.

This can be useful for example to keep replication slots in sync between different servers in a replication cluster.

(Obviously this is intended for 11, as we're well into the freeze for 10. Just to be clear. so I'll go add itto the summer commitfest)

--
Attachment

Re: [HACKERS] Function to move the position of a replication slot

From
Craig Ringer
Date:
On 4 May 2017 at 20:05, Magnus Hagander <magnus@hagander.net> wrote:
> PFA a patch that adds a new function, pg_move_replication_slot, that makes
> it possible to move the location of a replication slot without actually
> consuming all the WAL on it.

> This can be useful for example to keep replication slots in sync between
> different servers in a replication cluster.

It needs a test to ensure it only operates on physical slots. It
should ERROR on a logical slot, since it has no way of correctly
advancing the catalog_xmin or finding a reasonable restart_lsn  for
logical decoding.

I'm still fine with the name, since I plan to add that capability in
pg11 by running through logical decoding and ReorderBufferSkip()ing
each xact until we reach the target lsn.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] Function to move the position of a replication slot

From
Magnus Hagander
Date:
On Thu, May 4, 2017 at 2:42 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
On 4 May 2017 at 20:05, Magnus Hagander <magnus@hagander.net> wrote:
> PFA a patch that adds a new function, pg_move_replication_slot, that makes
> it possible to move the location of a replication slot without actually
> consuming all the WAL on it.

> This can be useful for example to keep replication slots in sync between
> different servers in a replication cluster.

It needs a test to ensure it only operates on physical slots. It
should ERROR on a logical slot, since it has no way of correctly
advancing the catalog_xmin or finding a reasonable restart_lsn  for
logical decoding.

I guess that makes sense, yeah. I didn't try it with that :)

 
I'm still fine with the name, since I plan to add that capability in
pg11 by running through logical decoding and ReorderBufferSkip()ing
each xact until we reach the target lsn.

Cool.
 
--

Re: [HACKERS] Function to move the position of a replication slot

From
Craig Ringer
Date:
On 4 May 2017 at 20:45, Magnus Hagander <magnus@hagander.net> wrote:
> On Thu, May 4, 2017 at 2:42 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
>>
>> On 4 May 2017 at 20:05, Magnus Hagander <magnus@hagander.net> wrote:
>> > PFA a patch that adds a new function, pg_move_replication_slot, that
>> > makes
>> > it possible to move the location of a replication slot without actually
>> > consuming all the WAL on it.
>>
>> > This can be useful for example to keep replication slots in sync between
>> > different servers in a replication cluster.
>>
>> It needs a test to ensure it only operates on physical slots. It
>> should ERROR on a logical slot, since it has no way of correctly
>> advancing the catalog_xmin or finding a reasonable restart_lsn  for
>> logical decoding.
>
>
> I guess that makes sense, yeah. I didn't try it with that :)

Barring that and matching docs changes, looks fine to me at first glance.

Not sure you need to acquire the spinlock on the slot, since you
acquired the slot for your backend. It won't hurt, but I don't think
it's necessary.

I'm not convinced you need a WARNING for moving the slot backwards. If
one is emitted, it should be a proper ereport with current position
and requested position in errdetail. I'm not sure it's a useful
message though.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] Function to move the position of a replication slot

From
Dmitry Dolgov
Date:
> On 4 May 2017 at 20:05, Magnus Hagander <magnus@hagander.net> wrote:
>
> PFA a patch that adds a new function, pg_move_replication_slot, that makes it
> possible to move the location of a replication slot without actually
> consuming all the WAL on it.
 
Just a few questions just a few questions out of curiosity:

* does it make sense to create a few tests for this function in
  `contrib/test_decoding` (as shown in attachment)?

* what should happen if the second argument is `NULL`? There is a verification
  `XLogRecPtrIsInvalid(moveto)`, but it's possible to pass `NULL`, and looks
  like it leads to result different from boolean:

```
SELECT pg_move_replication_slot('regression_slot_5', NULL);
 pg_move_replication_slot
--------------------------
 
(1 row)
```
Attachment

Re: [HACKERS] Function to move the position of a replication slot

From
Petr Jelinek
Date:
On 10/05/17 22:17, Dmitry Dolgov wrote:
>> On 4 May 2017 at 20:05, Magnus Hagander <magnus@hagander.net
> <mailto:magnus@hagander.net>> wrote:
>>
>> PFA a patch that adds a new function, pg_move_replication_slot, that
> makes it
>> possible to move the location of a replication slot without actually
>> consuming all the WAL on it.
>  
> Just a few questions just a few questions out of curiosity:
> 
> * does it make sense to create a few tests for this function in
>   `contrib/test_decoding` (as shown in attachment)?
> 

As Craig said this will not work correctly for logical slots so it
should throw error on those at the moment (at least until we write
something that works, but that's far more complex than this patch).

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: [HACKERS] Function to move the position of a replication slot

From
Peter Eisentraut
Date:
On 5/4/17 08:05, Magnus Hagander wrote:
> PFA a patch that adds a new function, pg_move_replication_slot, that
> makes it possible to move the location of a replication slot without
> actually consuming all the WAL on it.

The name keeps confusing me.  I understand "move" to be "rename" or
possibly "move it elsewhere", but not "move around in".  I understand
that there is an analogy with FETCH/MOVE in cursors, but it's still
confusing.

I would also like to see some documentation for a use case of this.

Anyway, as discussed elsewhere in this thread, this patch needs several
changes.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Function to move the position of a replication slot

From
Andres Freund
Date:
On 2017-08-16 12:24:11 -0400, Peter Eisentraut wrote:
> On 5/4/17 08:05, Magnus Hagander wrote:
> > PFA a patch that adds a new function, pg_move_replication_slot, that
> > makes it possible to move the location of a replication slot without
> > actually consuming all the WAL on it.
> 
> The name keeps confusing me.  I understand "move" to be "rename" or
> possibly "move it elsewhere", but not "move around in".  I understand
> that there is an analogy with FETCH/MOVE in cursors, but it's still
> confusing.

pg_forward_replication_slot()?

- Andres



Re: [HACKERS] Function to move the position of a replication slot

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2017-08-16 12:24:11 -0400, Peter Eisentraut wrote:
>> On 5/4/17 08:05, Magnus Hagander wrote:
>>> PFA a patch that adds a new function, pg_move_replication_slot, that
>>> makes it possible to move the location of a replication slot without
>>> actually consuming all the WAL on it.

>> The name keeps confusing me.  I understand "move" to be "rename" or
>> possibly "move it elsewhere", but not "move around in".  I understand
>> that there is an analogy with FETCH/MOVE in cursors, but it's still
>> confusing.

> pg_forward_replication_slot()?

If I understand what this is meant to do, maybe better
pg_move_replication_slot_lsn() or pg_change_replication_slot_lsn() ?
The point being that you're adjusting the LSN pointer contained
in the slot, which is distinct from the slot itself.
        regards, tom lane



Re: [HACKERS] Function to move the position of a replication slot

From
Andres Freund
Date:
On 2017-08-16 17:06:42 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2017-08-16 12:24:11 -0400, Peter Eisentraut wrote:
> >> On 5/4/17 08:05, Magnus Hagander wrote:
> >>> PFA a patch that adds a new function, pg_move_replication_slot, that
> >>> makes it possible to move the location of a replication slot without
> >>> actually consuming all the WAL on it.
> 
> >> The name keeps confusing me.  I understand "move" to be "rename" or
> >> possibly "move it elsewhere", but not "move around in".  I understand
> >> that there is an analogy with FETCH/MOVE in cursors, but it's still
> >> confusing.
> 
> > pg_forward_replication_slot()?
> 
> If I understand what this is meant to do, maybe better
> pg_move_replication_slot_lsn() or pg_change_replication_slot_lsn() ?
> The point being that you're adjusting the LSN pointer contained
> in the slot, which is distinct from the slot itself.

I think we should constrain the API to only allow later LSNs than
currently in the slot, rather than arbitrary ones. That's why I was
thinking of "forward".  I'm not convinced it's a good / safe idea to
allow arbitrary values to be set.

Greetings,

Andres Freund



Re: [HACKERS] Function to move the position of a replication slot

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2017-08-16 17:06:42 -0400, Tom Lane wrote:
>> If I understand what this is meant to do, maybe better
>> pg_move_replication_slot_lsn() or pg_change_replication_slot_lsn() ?
>> The point being that you're adjusting the LSN pointer contained
>> in the slot, which is distinct from the slot itself.

> I think we should constrain the API to only allow later LSNs than
> currently in the slot, rather than arbitrary ones. That's why I was
> thinking of "forward".  I'm not convinced it's a good / safe idea to
> allow arbitrary values to be set.

+1 for constraining it like that, but I don't think that's an argument
against using "move" or "change" as the verb.  I don't like "forward"
because that's not the right word.  The only verb senses of "forward"
in my Mac's dictionary are "send a message on to a further destination"
and "help to advance or promote" (the latter usage is pretty obscure IMO).
Neither one seems applicable here.
        regards, tom lane



Re: [HACKERS] Function to move the position of a replication slot

From
Alvaro Herrera
Date:
Andres Freund wrote:
> On 2017-08-16 17:06:42 -0400, Tom Lane wrote:

> > If I understand what this is meant to do, maybe better
> > pg_move_replication_slot_lsn() or pg_change_replication_slot_lsn() ?
> > The point being that you're adjusting the LSN pointer contained
> > in the slot, which is distinct from the slot itself.
> 
> I think we should constrain the API to only allow later LSNs than
> currently in the slot, rather than arbitrary ones. That's why I was
> thinking of "forward".  I'm not convinced it's a good / safe idea to
> allow arbitrary values to be set.

Hmm.  In terms of safety, it is safe to move the LSN backwards, as long
as the oldest LSN across all slots is not changed -- in other words, the
actual safe limit is the oldest of all slot LSNs, rather than the
current position of the slot being manipulated (which is what you're
saying).

I don't know if this is useful for the use case Magnus described; TBH I
didn't understand that use case.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Function to move the position of a replication slot

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Andres Freund wrote:
>> I think we should constrain the API to only allow later LSNs than
>> currently in the slot, rather than arbitrary ones. That's why I was
>> thinking of "forward".  I'm not convinced it's a good / safe idea to
>> allow arbitrary values to be set.

> Hmm.  In terms of safety, it is safe to move the LSN backwards, as long
> as the oldest LSN across all slots is not changed -- in other words, the
> actual safe limit is the oldest of all slot LSNs, rather than the
> current position of the slot being manipulated (which is what you're
> saying).

True, but you'd need to take some kind of global lock to verify that,
whereas "move forward only" would only need to examine/change the one
slot.
        regards, tom lane



Re: [HACKERS] Function to move the position of a replication slot

From
Andres Freund
Date:
On 2017-08-16 18:14:45 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2017-08-16 17:06:42 -0400, Tom Lane wrote:
> >> If I understand what this is meant to do, maybe better
> >> pg_move_replication_slot_lsn() or pg_change_replication_slot_lsn() ?
> >> The point being that you're adjusting the LSN pointer contained
> >> in the slot, which is distinct from the slot itself.
> 
> > I think we should constrain the API to only allow later LSNs than
> > currently in the slot, rather than arbitrary ones. That's why I was
> > thinking of "forward".  I'm not convinced it's a good / safe idea to
> > allow arbitrary values to be set.
> 
> +1 for constraining it like that, but I don't think that's an argument
> against using "move" or "change" as the verb.  I don't like "forward"
> because that's not the right word.  The only verb senses of "forward"
> in my Mac's dictionary are "send a message on to a further destination"
> and "help to advance or promote" (the latter usage is pretty obscure IMO).
> Neither one seems applicable here.

I might have thinking too much with the German "version" of the word in
mind.  However looking it up now I do see "to advance or help onward;
promote" and "to advance or play a cassette, digital recording, slide
projector, etc., in the forward direction: " -
http://www.dictionary.com/browse/forward
which kind of seems to fit.  don't know how authoritative which
dictionary is considered to be...

- Andres



Re: [HACKERS] Function to move the position of a replication slot

From
Michael Paquier
Date:
On Thu, Aug 17, 2017 at 7:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andres Freund <andres@anarazel.de> writes:
>> On 2017-08-16 17:06:42 -0400, Tom Lane wrote:
>>> If I understand what this is meant to do, maybe better
>>> pg_move_replication_slot_lsn() or pg_change_replication_slot_lsn() ?
>>> The point being that you're adjusting the LSN pointer contained
>>> in the slot, which is distinct from the slot itself.
>
>> I think we should constrain the API to only allow later LSNs than
>> currently in the slot, rather than arbitrary ones. That's why I was
>> thinking of "forward".  I'm not convinced it's a good / safe idea to
>> allow arbitrary values to be set.
>
> +1 for constraining it like that, but I don't think that's an argument
> against using "move" or "change" as the verb.  I don't like "forward"
> because that's not the right word.  The only verb senses of "forward"
> in my Mac's dictionary are "send a message on to a further destination"
> and "help to advance or promote" (the latter usage is pretty obscure IMO).
> Neither one seems applicable here.

Definitely agreed on that. Any move function would need to check if
the WAL position given by caller is already newer than what's
available in the local pg_wal (minimum of all other slots), with a
shared lock that would need to be taken by xlog.c when recycling past
segments. A forward function works on a single entry, which should be
disabled at the moment of the update. It looks dangerous to me to do
such an operation if there is a consumer of a slot currently on it.
-- 
Michael



Re: [HACKERS] Function to move the position of a replication slot

From
Craig Ringer
Date:
On 17 August 2017 at 07:30, Michael Paquier <michael.paquier@gmail.com> wrote:

Definitely agreed on that. Any move function would need to check if
the WAL position given by caller is already newer than what's
available in the local pg_wal (minimum of all other slots), with a
shared lock that would need to be taken by xlog.c when recycling past
segments. A forward function works on a single entry, which should be
disabled at the moment of the update. It looks dangerous to me to do
such an operation if there is a consumer of a slot currently on it.


pg_advance_replication_slot(...)

ERROR's on logical slot, for now. Physical slots only.

Forward-only.

Future work to allow it to use the logical decoding infrastructure to fast-forward a slot by reading only catalog change information and invalidations, either via a dummy output plugin that filters out all xacts, or by lower level use of the decoding code.

Reasonable?

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Function to move the position of a replication slot

From
Michael Paquier
Date:
On Thu, Aug 17, 2017 at 9:19 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
> pg_advance_replication_slot(...)
>
> ERROR's on logical slot, for now. Physical slots only.
>
> Forward-only.
>
> Future work to allow it to use the logical decoding infrastructure to
> fast-forward a slot by reading only catalog change information and
> invalidations, either via a dummy output plugin that filters out all xacts,
> or by lower level use of the decoding code.
>
> Reasonable?

Yes. I did not imply logical slots in my previous message, sorry if my
words were incomplete.
-- 
Michael



Re: [HACKERS] Function to move the position of a replication slot

From
Robert Haas
Date:
On Wed, Aug 16, 2017 at 5:55 PM, Andres Freund <andres@anarazel.de> wrote:
> I think we should constrain the API to only allow later LSNs than
> currently in the slot, rather than arbitrary ones. That's why I was
> thinking of "forward".  I'm not convinced it's a good / safe idea to
> allow arbitrary values to be set.

Maybe I shouldn't play the devil's advocate here, but isn't a feature
like this by definition only for people who Know What They Are Doing?
If so, why not let them back the slot up?  I'm sure that will work out
just fine.  They Know What They Are Doing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Function to move the position of a replication slot

From
Andres Freund
Date:
On 2017-08-16 21:25:48 -0400, Robert Haas wrote:
> On Wed, Aug 16, 2017 at 5:55 PM, Andres Freund <andres@anarazel.de> wrote:
> > I think we should constrain the API to only allow later LSNs than
> > currently in the slot, rather than arbitrary ones. That's why I was
> > thinking of "forward".  I'm not convinced it's a good / safe idea to
> > allow arbitrary values to be set.
> 
> Maybe I shouldn't play the devil's advocate here, but isn't a feature
> like this by definition only for people who Know What They Are Doing?
> If so, why not let them back the slot up?  I'm sure that will work out
> just fine.  They Know What They Are Doing.

I have yet to hear a reason for allowing to move things backward etc. So
I'm not sure what the benefit would be. But more importantly I'd like to
make this available to non-superusers at some point, and there I think
it's more important that they can't do bad things. The reason for
allowing it for non-superusers is that I think it's quite a useful
function to be used by an automated system. E.g. to ensure enough, but
not too much, WAL is available for a tertiary standby both on the actual
primary and a failover node.

Greetings,

Andres Freund



Re: [HACKERS] Function to move the position of a replication slot

From
Craig Ringer
Date:
On 17 August 2017 at 09:33, Andres Freund <andres@anarazel.de> wrote:
On 2017-08-16 21:25:48 -0400, Robert Haas wrote:
> On Wed, Aug 16, 2017 at 5:55 PM, Andres Freund <andres@anarazel.de> wrote:
> > I think we should constrain the API to only allow later LSNs than
> > currently in the slot, rather than arbitrary ones. That's why I was
> > thinking of "forward".  I'm not convinced it's a good / safe idea to
> > allow arbitrary values to be set.
>
> Maybe I shouldn't play the devil's advocate here, but isn't a feature
> like this by definition only for people who Know What They Are Doing?
> If so, why not let them back the slot up?  I'm sure that will work out
> just fine.  They Know What They Are Doing.

I have yet to hear a reason for allowing to move things backward etc. So
I'm not sure what the benefit would be. But more importantly I'd like to
make this available to non-superusers at some point, and there I think
it's more important that they can't do bad things. The reason for
allowing it for non-superusers is that I think it's quite a useful
function to be used by an automated system. E.g. to ensure enough, but
not too much, WAL is available for a tertiary standby both on the actual
primary and a failover node.

I strongly agree.

If you really need to move a physical slot back (why?) you can do it with an extension that uses the low level APIs. But I can't see why you would want to.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Function to move the position of a replication slot

From
Magnus Hagander
Date:
On Thu, Aug 17, 2017 at 2:19 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
On 17 August 2017 at 07:30, Michael Paquier <michael.paquier@gmail.com> wrote:

Definitely agreed on that. Any move function would need to check if
the WAL position given by caller is already newer than what's
available in the local pg_wal (minimum of all other slots), with a
shared lock that would need to be taken by xlog.c when recycling past
segments. A forward function works on a single entry, which should be
disabled at the moment of the update. It looks dangerous to me to do
such an operation if there is a consumer of a slot currently on it.


pg_advance_replication_slot(...)

ERROR's on logical slot, for now. Physical slots only.

Forward-only.

Future work to allow it to use the logical decoding infrastructure to fast-forward a slot by reading only catalog change information and invalidations, either via a dummy output plugin that filters out all xacts, or by lower level use of the decoding code.

Reasonable?


PFA an updated and rebased patch.

Rebased. Now named pg_advance_replication_slot. ERROR on logical slots. Forward only. 

I think that, in the end, covered all the comments?

--
Attachment

Re: [HACKERS] Function to move the position of a replication slot

From
Peter Eisentraut
Date:
On 8/31/17 08:19, Magnus Hagander wrote:
> Rebased. Now named pg_advance_replication_slot. ERROR on logical slots.
> Forward only. 
> 
> I think that, in the end, covered all the comments?

I didn't see any explanation of what this would actually be useful for.
I suppose you could skip over some changes you don't want replicated,
but how do you find to what position to skip?

Logical replication has a similar mechanism, using the function
pg_replication_origin_advance().  Any overlap there?  (Maybe the names
could be aligned.)
(https://www.postgresql.org/docs/devel/static/logical-replication-conflicts.html)

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Function to move the position of a replication slot

From
Tom Lane
Date:
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> On 8/31/17 08:19, Magnus Hagander wrote:
>> I think that, in the end, covered all the comments?

> I didn't see any explanation of what this would actually be useful for.
> I suppose you could skip over some changes you don't want replicated,
> but how do you find to what position to skip?

Um ... I can see how you might expect to skip some events in a logical
replication stream and have a chance of things not being utterly broken.
But how can that work for physical replication?  Missed updates are
normally spelled "unrecoverable data corruption" at that level.
        regards, tom lane



Re: [HACKERS] Function to move the position of a replication slot

From
Michael Paquier
Date:
On Sat, Sep 2, 2017 at 12:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
>> On 8/31/17 08:19, Magnus Hagander wrote:
>>> I think that, in the end, covered all the comments?
>
>> I didn't see any explanation of what this would actually be useful for.
>> I suppose you could skip over some changes you don't want replicated,
>> but how do you find to what position to skip?
>
> Um ... I can see how you might expect to skip some events in a logical
> replication stream and have a chance of things not being utterly broken.
> But how can that work for physical replication?  Missed updates are
> normally spelled "unrecoverable data corruption" at that level.

One use-case possible, even if it is easy to counter it by dropping
and recreating a slot, is to give up with what has been retained and
allow another client to reuse the same slot for a new standby.
-- 
Michael



Re: [HACKERS] Function to move the position of a replication slot

From
Robert Haas
Date:
On Fri, Sep 1, 2017 at 11:30 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 8/31/17 08:19, Magnus Hagander wrote:
>> Rebased. Now named pg_advance_replication_slot. ERROR on logical slots.
>> Forward only.
>>
>> I think that, in the end, covered all the comments?
>
> I didn't see any explanation of what this would actually be useful for.
> I suppose you could skip over some changes you don't want replicated,
> but how do you find to what position to skip?
>
> Logical replication has a similar mechanism, using the function
> pg_replication_origin_advance().  Any overlap there?  (Maybe the names
> could be aligned.)
> (https://www.postgresql.org/docs/devel/static/logical-replication-conflicts.html)

I think you can use this to work around the absence of failover slots.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Function to move the position of a replication slot

From
Andres Freund
Date:
On 2017-09-01 23:37:07 -0400, Tom Lane wrote:
> Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> > On 8/31/17 08:19, Magnus Hagander wrote:
> >> I think that, in the end, covered all the comments?
> 
> > I didn't see any explanation of what this would actually be useful for.
> > I suppose you could skip over some changes you don't want replicated,
> > but how do you find to what position to skip?
> 
> Um ... I can see how you might expect to skip some events in a logical
> replication stream and have a chance of things not being utterly broken.
> But how can that work for physical replication?  Missed updates are
> normally spelled "unrecoverable data corruption" at that level.

Consider e.g. a standby that follows master, but isn't a target for a
failover. It can make a fair bit of sense to script things so that
there's also a slot on the standby that's marked to be the primary in
disaster cases. For that you might want to forward the slot on a regular
basis.

I don't quite see how you'd get corruption from a physical slot being
forwarded? I mean you surely can get into the situation that there's
missing WAL from wherever a standby is receiving its WAL, but that'll
"just" break replication.

- Andres



Re: [HACKERS] Function to move the position of a replication slot

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> I don't quite see how you'd get corruption from a physical slot being
> forwarded? I mean you surely can get into the situation that there's
> missing WAL from wherever a standby is receiving its WAL, but that'll
> "just" break replication.

Um, doesn't advancing a slot correspond exactly to skipping some amount
of WAL?
        regards, tom lane



Re: [HACKERS] Function to move the position of a replication slot

From
Andres Freund
Date:
On 2017-09-02 18:31:10 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > I don't quite see how you'd get corruption from a physical slot being
> > forwarded? I mean you surely can get into the situation that there's
> > missing WAL from wherever a standby is receiving its WAL, but that'll
> > "just" break replication.
> 
> Um, doesn't advancing a slot correspond exactly to skipping some amount
> of WAL?

Not for physical ones, no. The slot is just a marker on the *upstream*
(or a potential upstream) that remembers a standby's current WAL replay
position and, if enabled, it's current xmin. The prevents the upstream
to remove the WAL that the standby still need and if applicable vacuum
from removing rows the standby needs.  If the slot is at the wrong
position exactly the same things that can happen if no slot were in use
can also happen, i.e. "ERROR: requested WAL segment %s has already been removed".

For logical replication such a forward operation would have to be *more*
complicated than for physical rep, because the state that's kept is more
complicated...

- Andres



Re: [HACKERS] Function to move the position of a replication slot

From
Magnus Hagander
Date:
On Sun, Sep 3, 2017 at 12:20 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Sep 1, 2017 at 11:30 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 8/31/17 08:19, Magnus Hagander wrote:
>> Rebased. Now named pg_advance_replication_slot. ERROR on logical slots.
>> Forward only.
>>
>> I think that, in the end, covered all the comments?
>
> I didn't see any explanation of what this would actually be useful for.
> I suppose you could skip over some changes you don't want replicated,
> but how do you find to what position to skip?
>
> Logical replication has a similar mechanism, using the function
> pg_replication_origin_advance().  Any overlap there?  (Maybe the names
> could be aligned.)
> (https://www.postgresql.org/docs/devel/static/logical-replication-conflicts.html)

I think you can use this to work around the absence of failover slots.


That was the initial usecase I had for this, yes.

It can also be combined with file-based restoring to keep a "blocker" preventing removal before a segment has progressed through a workflow for example.

--

Re: [HACKERS] Function to move the position of a replication slot

From
Michael Paquier
Date:
On Thu, Aug 31, 2017 at 9:19 PM, Magnus Hagander <magnus@hagander.net> wrote:
> PFA an updated and rebased patch.
>
> Rebased. Now named pg_advance_replication_slot. ERROR on logical slots.
> Forward only.
>
> I think that, in the end, covered all the comments?

+   if (backwards)
+       ereport(WARNING,
+               (errmsg("Not moving replication slot backwards!")));
Shouldn't this be an ERROR, mentioning the current position of the slot?

+        ereport(ERROR,
+                (errmsg("Only physical replication slots can be advanced.")));
ERRCODE_FEATURE_NOT_SUPPORTED, no?
-- 
Michael



Re: [HACKERS] Function to move the position of a replication slot

From
Andres Freund
Date:
On 2017-09-05 11:36:47 +0900, Michael Paquier wrote:
> On Thu, Aug 31, 2017 at 9:19 PM, Magnus Hagander <magnus@hagander.net> wrote:
> > PFA an updated and rebased patch.
> >
> > Rebased. Now named pg_advance_replication_slot. ERROR on logical slots.
> > Forward only.
> >
> > I think that, in the end, covered all the comments?
> 
> +   if (backwards)
> +       ereport(WARNING,
> +               (errmsg("Not moving replication slot backwards!")));
> Shouldn't this be an ERROR, mentioning the current position of the slot?
> 
> +        ereport(ERROR,
> +                (errmsg("Only physical replication slots can be advanced.")));
> ERRCODE_FEATURE_NOT_SUPPORTED, no?

Seither of these seem to follow the message guidelines.

Greetings,

Andres Freund



Re: [HACKERS] Function to move the position of a replication slot

From
Michael Paquier
Date:
On Tue, Sep 5, 2017 at 11:51 AM, Andres Freund <andres@anarazel.de> wrote:
> On 2017-09-05 11:36:47 +0900, Michael Paquier wrote:
>> On Thu, Aug 31, 2017 at 9:19 PM, Magnus Hagander <magnus@hagander.net> wrote:
>> > PFA an updated and rebased patch.
>> >
>> > Rebased. Now named pg_advance_replication_slot. ERROR on logical slots.
>> > Forward only.
>> >
>> > I think that, in the end, covered all the comments?
>>
>> +   if (backwards)
>> +       ereport(WARNING,
>> +               (errmsg("Not moving replication slot backwards!")));
>> Shouldn't this be an ERROR, mentioning the current position of the slot?
>>
>> +        ereport(ERROR,
>> +                (errmsg("Only physical replication slots can be advanced.")));
>> ERRCODE_FEATURE_NOT_SUPPORTED, no?
>
> Seither of these seem to follow the message guidelines.

True as well, and the patch did not get an update in two months to
reflect that. So I am marking it as returned with feedback.
-- 
Michael


Re: [HACKERS] Function to move the position of a replication slot

From
Magnus Hagander
Date:


On Wed, Nov 29, 2017 at 7:48 AM, Michael Paquier <michael.paquier@gmail.com> wrote:
On Tue, Sep 5, 2017 at 11:51 AM, Andres Freund <andres@anarazel.de> wrote:
> On 2017-09-05 11:36:47 +0900, Michael Paquier wrote:
>> On Thu, Aug 31, 2017 at 9:19 PM, Magnus Hagander <magnus@hagander.net> wrote:
>> > PFA an updated and rebased patch.
>> >
>> > Rebased. Now named pg_advance_replication_slot. ERROR on logical slots.
>> > Forward only.
>> >
>> > I think that, in the end, covered all the comments?
>>
>> +   if (backwards)
>> +       ereport(WARNING,
>> +               (errmsg("Not moving replication slot backwards!")));
>> Shouldn't this be an ERROR, mentioning the current position of the slot?
>>
>> +        ereport(ERROR,
>> +                (errmsg("Only physical replication slots can be advanced.")));
>> ERRCODE_FEATURE_NOT_SUPPORTED, no?
>
> Seither of these seem to follow the message guidelines.

True as well, and the patch did not get an update in two months to
reflect that. So I am marking it as returned with feedback.

For the purpose of the archives: this patch has been superseded by Petrs work in https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=9c7d06d60680c7f00d931233873dee81fdb311c6 which will be in PostgreSQL 11. 


--