Thread: Logical decoding slots can go backwards when used from SQL, docs are wrong

Logical decoding slots can go backwards when used from SQL, docs are wrong

From
Craig Ringer
Date:
Hi all

I think I found a couple of logical decoding issues while writing tests for failover slots.

Despite the docs' claim that a logical slot will replay data "exactly once", a slot's confirmed_lsn can go backwards and the SQL functions can replay the same data more than once.We don't mark a slot as dirty if only its confirmed_lsn is advanced, so it isn't flushed to disk. For failover slots this means it also doesn't get replicated via WAL. After a master crash, or for failover slots after a promote event, the confirmed_lsn will go backwards.  Users of the SQL interface must keep track of the safely locally flushed slot position themselves and throw the repeated data away. Unlike with the walsender protocol it has no way to ask the server to skip that data.

Worse, because we don't dirty the slot even a *clean shutdown* causes slot confirmed_lsn to go backwards. That's a bug IMO. We should force a flush of all slots at the shutdown checkpoint, whether dirty or not, to address it.


Barring objections I intend to submit a fix to:

- Document that slots can replay data more than once
- Force flush of all slots on a shutdown checkpoint

Also, pg_logical_slot_get_changes and its _peek_ variant should have a param specifying the starting LSN to read and return. If this is lower than the restart_lsn but non-null it should ERROR; if it's greater than or equal it should use this position instead of starting at the confirmed_lsn.

Time permitting I'd like to add a pg_logical_slot_confirm function, so you can aternate _peek_changes and _confirm, making it possible to get walsender-interface-like behaviour via the SQL interface. Right now you can't reliably use the SQL interface because _get_changes can succeed locally and advance the slot but the client can fail to receive all the changes due to network issues etc. Sure, the SQL interface is meant mainly for testing, but especially for !postgresql downstreams I strongly suspect people will want to use it for "real" work rather than have to modify each client driver to support replication protocol extensions.

Opinions?

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
On 11 March 2016 at 08:19, Craig Ringer <craig@2ndquadrant.com> wrote:
Hi all

I think I found a couple of logical decoding issues while writing tests for failover slots.

Despite the docs' claim that a logical slot will replay data "exactly once", a slot's confirmed_lsn can go backwards and the SQL functions can replay the same data more than once.We don't mark a slot as dirty if only its confirmed_lsn is advanced, so it isn't flushed to disk. For failover slots this means it also doesn't get replicated via WAL. After a master crash, or for failover slots after a promote event, the confirmed_lsn will go backwards.  Users of the SQL interface must keep track of the safely locally flushed slot position themselves and throw the repeated data away. Unlike with the walsender protocol it has no way to ask the server to skip that data.

Yeh, I read that and thought it was wrong. "At least once, in order" is what we want, how it works and how it should be described.
 
Worse, because we don't dirty the slot even a *clean shutdown* causes slot confirmed_lsn to go backwards. That's a bug IMO. We should force a flush of all slots at the shutdown checkpoint, whether dirty or not, to address it.

Given the earlier understanding, above, I'm not sure I see why that must happen. But I agree it should happen.
 
Barring objections I intend to submit a fix to:

- Document that slots can replay data more than once
- Force flush of all slots on a shutdown checkpoint

+1
 
Also, pg_logical_slot_get_changes and its _peek_ variant should have a param specifying the starting LSN to read and return. If this is lower than the restart_lsn but non-null it should ERROR; if it's greater than or equal it should use this position instead of starting at the confirmed_lsn.

Maybe. I don't really like changing APIs. Can we add new funcs? Make sure a NULL LSN can be passed as well.

Is the return type of pg_logical_slot_peek_changes() incorrect in the docs?
 
Time permitting I'd like to add a pg_logical_slot_confirm function, so you can aternate _peek_changes and _confirm, making it possible to get walsender-interface-like behaviour via the SQL interface.

I thought thats what replorigins do.
 
Right now you can't reliably use the SQL interface because _get_changes can succeed locally and advance the slot but the client can fail to receive all the changes due to network issues etc. Sure, the SQL interface is meant mainly for testing, but especially for !postgresql downstreams I strongly suspect people will want to use it for "real" work rather than have to modify each client driver to support replication protocol extensions.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 11 March 2016 at 17:00, Simon Riggs <simon@2ndquadrant.com> wrote:
 
  
Also, pg_logical_slot_get_changes and its _peek_ variant should have a param specifying the starting LSN to read and return. If this is lower than the restart_lsn but non-null it should ERROR; if it's greater than or equal it should use this position instead of starting at the confirmed_lsn.

Maybe. I don't really like changing APIs. Can we add new funcs? Make sure a NULL LSN can be passed as well.

It'd be compatible with the old SQL API and we've already made one such change in 9.5, to add the parameter to immediately reserve WAL for physical slots. You just CREATE OR REPLACE FUNCTION it in system_views.sql with a DEFAULT so the old signature still works.

That works for C-level callers too, unless they try to get clever and DirectFunctionCall3 it. Even then it might, I don't recall off the top of my head, but if not we can defend against that too by testing PG_NARGS in the function its self. This wasn't done for the reserve_wal change and I'm not convinced it's needed even if the fmgr doesn't take care of the default parameters - it's not like that's a function that makes much sense to DirectFunctionCall anyway.

Is the return type of pg_logical_slot_peek_changes() incorrect in the docs?

I don't think so, why?

 
Time permitting I'd like to add a pg_logical_slot_confirm function, so you can aternate _peek_changes and _confirm, making it possible to get walsender-interface-like behaviour via the SQL interface.

I thought thats what replorigins do.

Replication origins provide an easy and efficient way for a PostgreSQL downstream (client) to track its replay state and position in a reliable way. If the client is PostgreSQL then you can use replication origins to track the last flushed LSN, yes. If you're using the walsender interface you must do so. They are not used by the server-side of logical decoding on either walsender or SQL interfaces.

If you're using the SQL interface it has its own replay position tracking. Just before pg_logical_slot_get_changes returns it updates the confirm position of the slot, as if you'd sent a replay confirmation from the client on the walsender protocol. The problem is that this happens before we know the client has received and flushed all the data we just sent it. The client might not get all (or any) of it if there are network issues, for example.

Because the replay position can go backwards the client can and must track the last-safely-replayed position its self, as established earlier, so it can skip over data it already saw. The problem is that when using the SQL interface the client can't do the reverse - it can't tell the server "send me that data again, please, I only got half of it last time because the network died". The server has advanced the confirmed position and there's no way to ask for the data again over the SQL interface.

That's why I want to add a param to let it do so, replaying from prior to the confirm location or skipping past data that'd otherwise be replayed. If you pass NULL (the default, and the only option in 9.5 or older) for that param then you get data from the server's last recorded confirm location, which is the server's best-effort tracking of your replay position from server-side.

That's also why a pg_logical_slot_confirm function could be handy. It lets you do a _peek_changes then a _confirm when you know you've flushed them, prior to the next peek.

I think the docs for pg_replication_slots are wrong about the confirm location btw, it says data from before the confirm location can't be replayed, but it can over the walsender protocol. There's just no way to ask for it over the SQL interface.


--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Logical decoding slots can go backwards when used from SQL, docs are wrong

From
Alvaro Herrera
Date:
Craig Ringer wrote:
> Hi all
> 
> I think I found a couple of logical decoding issues while writing tests for
> failover slots.
> 
> Despite the docs' claim that a logical slot will replay data "exactly
> once", a slot's confirmed_lsn can go backwards and the SQL functions can
> replay the same data more than once.We don't mark a slot as dirty if only
> its confirmed_lsn is advanced, so it isn't flushed to disk. For failover
> slots this means it also doesn't get replicated via WAL. After a master
> crash, or for failover slots after a promote event, the confirmed_lsn will
> go backwards.  Users of the SQL interface must keep track of the safely
> locally flushed slot position themselves and throw the repeated data away.
> Unlike with the walsender protocol it has no way to ask the server to skip
> that data.
> 
> Worse, because we don't dirty the slot even a *clean shutdown* causes slot
> confirmed_lsn to go backwards. That's a bug IMO. We should force a flush of
> all slots at the shutdown checkpoint, whether dirty or not, to address it.

Why don't we mark the slot dirty when confirmed_lsn advances?  If we fix
that, doesn't it fix the other problems too?

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



On 11 March 2016 at 20:15, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Craig Ringer wrote:
> Hi all
>
> I think I found a couple of logical decoding issues while writing tests for
> failover slots.
>
> Despite the docs' claim that a logical slot will replay data "exactly
> once", a slot's confirmed_lsn can go backwards and the SQL functions can
> replay the same data more than once.We don't mark a slot as dirty if only
> its confirmed_lsn is advanced, so it isn't flushed to disk. For failover
> slots this means it also doesn't get replicated via WAL. After a master
> crash, or for failover slots after a promote event, the confirmed_lsn will
> go backwards.  Users of the SQL interface must keep track of the safely
> locally flushed slot position themselves and throw the repeated data away.
> Unlike with the walsender protocol it has no way to ask the server to skip
> that data.
>
> Worse, because we don't dirty the slot even a *clean shutdown* causes slot
> confirmed_lsn to go backwards. That's a bug IMO. We should force a flush of
> all slots at the shutdown checkpoint, whether dirty or not, to address it.

Why don't we mark the slot dirty when confirmed_lsn advances?  If we fix
that, doesn't it fix the other problems too?

Yes, it does.

That'll cause slots to be written out at checkpoints when they otherwise wouldn't have to be, but I'd rather be doing a little more work in this case. Compared to the disk activity from WAL decoding etc the effect should be undetectable anyway.

Andres? Any objection to dirtying a slot when the confirmed lsn advances, so we write it out at the next checkpoint?

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
On 14/03/16 08:08, Craig Ringer wrote:
> On 11 March 2016 at 20:15, Alvaro Herrera <alvherre@2ndquadrant.com
> <mailto:alvherre@2ndquadrant.com>> wrote:
>
>     Craig Ringer wrote:
>     > Hi all
>     >
>     > I think I found a couple of logical decoding issues while writing tests for
>     > failover slots.
>     >
>     > Despite the docs' claim that a logical slot will replay data "exactly
>     > once", a slot's confirmed_lsn can go backwards and the SQL functions can
>     > replay the same data more than once.We don't mark a slot as dirty if only
>     > its confirmed_lsn is advanced, so it isn't flushed to disk. For failover
>     > slots this means it also doesn't get replicated via WAL. After a master
>     > crash, or for failover slots after a promote event, the confirmed_lsn will
>     > go backwards.  Users of the SQL interface must keep track of the safely
>     > locally flushed slot position themselves and throw the repeated data away.
>     > Unlike with the walsender protocol it has no way to ask the server to skip
>     > that data.
>     >
>     > Worse, because we don't dirty the slot even a *clean shutdown* causes slot
>     > confirmed_lsn to go backwards. That's a bug IMO. We should force a flush of
>     > all slots at the shutdown checkpoint, whether dirty or not, to address it.
>
>     Why don't we mark the slot dirty when confirmed_lsn advances?  If we fix
>     that, doesn't it fix the other problems too?
>
>
> Yes, it does.
>

It will not change the fact that slot can go backwards however even in 
clean shutdown of the server as in walsender the confirmed_lsn only 
changes after feedback message so if client crashes it won't get updated 
(for obvious reasons).

You btw can emulate asking for the specific LSN in SQL interface by 
first calling the pg_logical_slot_get_changes function with upto_lsn set 
to whatever lsn you expect to start at, but it's ugly.

--   Petr Jelinek                  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training &
Services



On 14 March 2016 at 17:16, Petr Jelinek <petr@2ndquadrant.com> wrote:
 
It will not change the fact that slot can go backwards

Sure. I don't consider that a problem in general though. It's similar to the way we lose cached sequence chunks on crash - a performance optimisation with a user visible effect. It needs to be documented correctly, but isn't IMO a problem.

Failing to save the full slot state on clean shutdown is though, IMO. It wouldn't matter if it was only used for sync rep, but since it's also used to manage the resume point for SQL-level logical decoding calls we should make a reasonable effort to preserve it ... and allow the user to handle the situation correctly when we fail to preserve it.
 
You btw can emulate asking for the specific LSN in SQL interface by first calling the pg_logical_slot_get_changes function with upto_lsn set to whatever lsn you expect to start at, but it's ugly.

Ugh.

I didn't realise pg_logical_slot_get_changes could backtrack by setting an upto_lsn in the past. That doesn't seem like something we should really be doing - if it's a limit, and we're already past that limit, we should just be returning the empty set.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
On 14/03/16 10:48, Craig Ringer wrote:
>
>     You btw can emulate asking for the specific LSN in SQL interface by
>     first calling the pg_logical_slot_get_changes function with upto_lsn
>     set to whatever lsn you expect to start at, but it's ugly.
>
>
> Ugh.
>
> I didn't realise pg_logical_slot_get_changes could backtrack by setting
> an upto_lsn in the past. That doesn't seem like something we should
> really be doing - if it's a limit, and we're already past that limit, we
> should just be returning the empty set.
>

Not past, future from the point of old confirmed_lsn at least. The point 
was that the next call will start from whatever lsn you specified as 
upto_lsn.

--   Petr Jelinek                  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training &
Services



Attached is a patch to mark a logical replication slot as dirty if its confirmed lsn is changed.

Aesthetically I'm not sure if it's better to do it per this patch and call ReplicationSlotMarkDirty after we release the spinlock, add a new ReplicationSlotMarkDirtyLocked() that skips the spinlock acquisition, or add a bool is_locked arg to ReplicationSlotMarkDirty and update all call sites. All will have the same result.

I've confirmed this works as expected as part of the failover slots test suite but it'd be pretty seriously cumbersome to extract. If anyone feels strongly about it I can add a test that shows that 

- start master
- create slot
- do some stuff
- replay from slot
- fast-stop master
- start master
- replay from slot

doesn't see the same data a second time, but if we repeat it using immediate stop it will see the same data when replaying again.

Also attached is another patch to amend the docs to reflect the fact that a slot can actually replay the same change more than once. I avoided the strong temptation to update other parts of the docs nearby.

Both these are fixes that should IMO be applied to 9.6.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
Attachment
The first patch was incorrectly created on top of failover slots not HEAD. Attached patch applies on HEAD.
Attachment

Re: Logical decoding slots can go backwards when used from SQL, docs are wrong

From
"David G. Johnston"
Date:
On Thursday, March 17, 2016, Craig Ringer <craig@2ndquadrant.com> wrote:
The first patch was incorrectly created on top of failover slots not HEAD. Attached patch applies on HEAD.

Lots of logical decoding work ongoing but this one shows as active in the September cf and the comments by Craig indicate potential relevance to 9.6.  Is this still live as-is or has it been subsumed by other threads? 

David J.

Re: Logical decoding slots can go backwards when used from SQL, docs are wrong

From
Craig Ringer
Date:
On 5 June 2016 at 09:54, David G. Johnston <david.g.johnston@gmail.com> wrote:
On Thursday, March 17, 2016, Craig Ringer <craig@2ndquadrant.com> wrote:
The first patch was incorrectly created on top of failover slots not HEAD. Attached patch applies on HEAD.

Lots of logical decoding work ongoing but this one shows as active in the September cf and the comments by Craig indicate potential relevance to 9.6.  Is this still live as-is or has it been subsumed by other threads? 


Yes. Both those patches are still pending and should be considered for commit in the next CF. (Of course, I think they should just be committed, but I would, wouldn't I?)


--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Logical decoding slots can go backwards when used from SQL, docs are wrong

From
Petr Jelinek
Date:
On 20/08/16 16:01, Craig Ringer wrote:
> On 5 June 2016 at 09:54, David G. Johnston <david.g.johnston@gmail.com
> <mailto:david.g.johnston@gmail.com>> wrote:
>
>     On Thursday, March 17, 2016, Craig Ringer <craig@2ndquadrant.com
>     <mailto:craig@2ndquadrant.com>> wrote:
>
>         The first patch was incorrectly created on top of failover slots
>         not HEAD. Attached patch applies on HEAD.
>
>
>     Lots of logical decoding work ongoing but this one shows as active
>     in the September cf and the comments by Craig indicate potential
>     relevance to 9.6.  Is this still live as-is or has it been subsumed
>     by other threads?
>
>
>
> Yes. Both those patches are still pending and should be considered for
> commit in the next CF. (Of course, I think they should just be
> committed, but I would, wouldn't I?)
>
>

I think the doc one should definitely go in and possibly be back-patched 
all the way to 9.4. I didn't look at the other one.

--   Petr Jelinek                  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training &
Services



Re: Logical decoding slots can go backwards when used from SQL, docs are wrong

From
Simon Riggs
Date:
On 20 August 2016 at 15:04, Petr Jelinek <petr@2ndquadrant.com> wrote:
> On 20/08/16 16:01, Craig Ringer wrote:
>>
>> On 5 June 2016 at 09:54, David G. Johnston <david.g.johnston@gmail.com
>> <mailto:david.g.johnston@gmail.com>> wrote:
>>
>>     On Thursday, March 17, 2016, Craig Ringer <craig@2ndquadrant.com
>>     <mailto:craig@2ndquadrant.com>> wrote:
>>
>>         The first patch was incorrectly created on top of failover slots
>>         not HEAD. Attached patch applies on HEAD.
>>
>>
>>     Lots of logical decoding work ongoing but this one shows as active
>>     in the September cf and the comments by Craig indicate potential
>>     relevance to 9.6.  Is this still live as-is or has it been subsumed
>>     by other threads?
>>
>>
>>
>> Yes. Both those patches are still pending and should be considered for
>> commit in the next CF. (Of course, I think they should just be
>> committed, but I would, wouldn't I?)
>>
>>
>
> I think the doc one should definitely go in and possibly be back-patched all
> the way to 9.4. I didn't look at the other one.

I agree the doc patch should go in, though I suggest reword it
slightly, like attached patch.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: Logical decoding slots can go backwards when used from SQL, docs are wrong

From
Craig Ringer
Date:
On 1 September 2016 at 21:19, Simon Riggs <simon@2ndquadrant.com> wrote:

> I agree the doc patch should go in, though I suggest reword it
> slightly, like attached patch.

Thanks. The rewording looks good to me and I think that
Doc-correction-each-change-once.v2.patch is ready for commit.

Meanwhile, thinking about the patch to dirty slots on
confirmed_flush_lsn advance some more, I don't think it's ideal to do
it in LogicalConfirmReceivedLocation(). I'd rather not add more
complexity there, it's already complex enough. Doing it there will
also lead to unnecessary slot write-out being done to slots at normal
checkpoints even if the slot hasn't otherwise changed, possibly in
response to lsn advances sent in response to keepalives due to
activity on other unrelated databases. Slots are always fsync()ed when
written out so we don't want to do it more than we have to.

We really only need to write out slots where only the
confirmed_flush_lsn has changed at a shutdown checkpoint since it's
not really a big worry if it goes backwards on crash, and otherwise it
can't. Even then it only _really_ matters when the SQL interface is
used. Losing the confirmed_flush_lsn is very annoying when using
pg_recvlogical too, and was the original motivation for this patch.
But I'm thinking of instead teaching pg_recvlogical to write out a
status file with its last confirmed point on exit and to be able to
take that as an argument when (re)connecting. Poor-man's replication
origins, effectively.

So here's a simpler patch that just dirties the slot when it's
replayed something from it on the SQL interface, so it's flushed out
next checkpoint or on shutdown. That's the main user visible defect
that should be fixed and it's trivial to do here. It means we'll still
forget the confirmed_flush_lsn on clean shutdown if it was advanced
using the walsender protocol, but *shrug*. That's just a little
inconvenient. I can patch pg_recvlogical separately.

The alternative is probably to add a second, "softer" dirty tracking
method that only causes a write at a clean shutdown or forced
checkpoint - and maybe doesn't bother fsync()ing. That's a bit more
invasive but would work for walsender use as well as the SQL
interface. I don't think it's worth the bother, since in the end
callers have to be prepared for repeated data on crash anyway.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Logical decoding slots can go backwards when used from SQL, docs are wrong

From
Petr Jelinek
Date:
On 02/09/16 09:58, Craig Ringer wrote:
> On 1 September 2016 at 21:19, Simon Riggs <simon@2ndquadrant.com> wrote:
>
>> I agree the doc patch should go in, though I suggest reword it
>> slightly, like attached patch.
>
> Thanks. The rewording looks good to me and I think that
> Doc-correction-each-change-once.v2.patch is ready for commit.
>
> Meanwhile, thinking about the patch to dirty slots on
> confirmed_flush_lsn advance some more, I don't think it's ideal to do
> it in LogicalConfirmReceivedLocation(). I'd rather not add more
> complexity there, it's already complex enough. Doing it there will
> also lead to unnecessary slot write-out being done to slots at normal
> checkpoints even if the slot hasn't otherwise changed, possibly in
> response to lsn advances sent in response to keepalives due to
> activity on other unrelated databases. Slots are always fsync()ed when
> written out so we don't want to do it more than we have to.
>
> We really only need to write out slots where only the
> confirmed_flush_lsn has changed at a shutdown checkpoint since it's
> not really a big worry if it goes backwards on crash, and otherwise it
> can't. Even then it only _really_ matters when the SQL interface is
> used. Losing the confirmed_flush_lsn is very annoying when using
> pg_recvlogical too, and was the original motivation for this patch.
> But I'm thinking of instead teaching pg_recvlogical to write out a
> status file with its last confirmed point on exit and to be able to
> take that as an argument when (re)connecting. Poor-man's replication
> origins, effectively.
>
> So here's a simpler patch that just dirties the slot when it's
> replayed something from it on the SQL interface, so it's flushed out
> next checkpoint or on shutdown. That's the main user visible defect
> that should be fixed and it's trivial to do here. It means we'll still
> forget the confirmed_flush_lsn on clean shutdown if it was advanced
> using the walsender protocol, but *shrug*. That's just a little
> inconvenient. I can patch pg_recvlogical separately.

Okay that sounds reasonable, the SQL interface is already somewhat 
different than walsender as it does not really "stream" so makes sense 
to improve the behavior there. As a side note, I would really like to 
have cursor-like SQL interface which behaves more like walsender one but 
that's very different patch.

>
> The alternative is probably to add a second, "softer" dirty tracking
> method that only causes a write at a clean shutdown or forced
> checkpoint - and maybe doesn't bother fsync()ing. That's a bit more
> invasive but would work for walsender use as well as the SQL
> interface. I don't think it's worth the bother, since in the end
> callers have to be prepared for repeated data on crash anyway.
>

Correct me if I am wrong but I think the only situation where it would 
matter is on server that restarts often or crashes often (as the logical 
decoding then has to do the work many times) but I don't think it's 
worth optimizing for that kind of scenario.

--   Petr Jelinek                  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training &
Services



Re: Logical decoding slots can go backwards when used from SQL, docs are wrong

From
Craig Ringer
Date:
On 2 September 2016 at 16:50, Petr Jelinek <petr@2ndquadrant.com> wrote:

>> So here's a simpler patch that just dirties the slot when it's
>> replayed something from it on the SQL interface [...]
>
> Okay that sounds reasonable, the SQL interface is already somewhat different
> than walsender as it does not really "stream" so makes sense to improve the
> behavior there.

Yep. For the walsender interface it's reasonable to expect the
downstream to keep track of its progress - even if right now
pg_recvlogical doesn't do so very well. It's not practical for the SQL
interface.

> As a side note, I would really like to have cursor-like SQL
> interface which behaves more like walsender one but that's very different
> patch.

Me too. Specifically, I'd really like to be able to "peek, confirm,
peek, confirm" and also be able to specify my own start lsn for peek.
But as you say, that's a different patch.

>> The alternative is probably to add a second, "softer" dirty tracking
>> method that only causes a write at a clean shutdown or forced
>> checkpoint - and maybe doesn't bother fsync()ing. That's a bit more
>> invasive but would work for walsender use as well as the SQL
>> interface. I don't think it's worth the bother, since in the end
>> callers have to be prepared for repeated data on crash anyway.
>>
>
> Correct me if I am wrong but I think the only situation where it would
> matter is on server that restarts often or crashes often (as the logical
> decoding then has to do the work many times) but I don't think it's worth
> optimizing for that kind of scenario.

Right. Though it's not so much doing the work many times that's the
concern - this change doesn't affect restart_lsn at all. It's that it
sends already-seen-and-confirmed changes to the output plugin and
therefore the client.

Not that big a deal. As far as I'm concerned, it's the job of users of
the walsender interface to keep track of the position they've consumed
up to and specify it to subsequent START_REPLICATION calls. You can't
do that on the SQL interface, which is why this change is needed
there.

Maybe the docs should be more explicit about the reason to specify
start lsn to START_REPLICATION not just 0/0 though, and also that we
will disregard it if it's less than the stored confirmed_flush_lsn.
Again, separate patch.

So the main change becomes the one-liner in my prior mail.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Logical decoding slots can go backwards when used from SQL, docs are wrong

From
Andres Freund
Date:
On 2016-09-02 10:50:17 +0200, Petr Jelinek wrote:
> Okay that sounds reasonable, the SQL interface is already somewhat different
> than walsender as it does not really "stream" so makes sense to improve the
> behavior there. As a side note, I would really like to have cursor-like SQL
> interface which behaves more like walsender one but that's very different
> patch.

FWIW, with the changes in https://www.postgresql.org/message-id/20160827214829.zo2dfb5jaikii5nw@alap3.anarazel.de
thats starting to be realistic for a function based interface. By
switching things over to ValuePerCall mode you could actually use a
cursor, without pre-materializing all changes.

Greetings,

Andres Freund



Re: Logical decoding slots can go backwards when used from SQL, docs are wrong

From
Craig Ringer
Date:
On 2 September 2016 at 17:49, Craig Ringer <craig@2ndquadrant.com> wrote:

> So the main change becomes the one-liner in my prior mail.

Per feedback from Simon, updated with a new test in src/test/recovery .

If you revert the change to
src/backend/replication/logical/logicalfuncs.c then the test will
start failing.

I'd quite like to backpatch the fix since it's simple and safe, but I
don't feel that it's hugely important to do so. This is an annoyance
not a serious data risking bug.

My only concern here is that we still lose position on crash. So
SQL-interface callers should still be able to cope with it. But they
won't see it happen if they're only testing with normal shutdowns,
host reboots, etc. In practice users aren't likely to notice this
anyway, though, since most people don't restart the server all the
time. I think it's better than what we have.

This issue could be eliminated completely by calling
ReplicationSlotSave() after ReplicationSlotMarkDirty(). This would
force an immediate flush after a non-peeked SQL interface decoding
call. It means more fsync()ing, though, and the SQL interface can only
be used by polling so that could be a significant performance hit.
We'd want to only flush when the position changed, but even then it'll
mean a sync every time anything gets returned.

The better alternative is to add a variant on
pg_logical_slot_get_changes(...) etc that accepts a start LSN. But
it's not convenient or easy for SQL callers to extract the last commit
LSN received from the last call to pass it to the next one, so this
isn't simple, and is probably best tackled as part of the SQL
interface API change Petr and Andres discussed in this thread.

I'm inclined to just dirty the slot for now, then provide a better
solution by adding the peek/confirm interface discussed upthread down
the track.

Andres, Petr, got an opinion here?

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Logical decoding slots can go backwards when used from SQL, docs are wrong

From
Craig Ringer
Date:
On 5 September 2016 at 10:41, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 2 September 2016 at 17:49, Craig Ringer <craig@2ndquadrant.com> wrote:
>
>> So the main change becomes the one-liner in my prior mail.
>
> Per feedback from Simon, updated with a new test in src/test/recovery .

... attached this time.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Logical decoding slots can go backwards when used from SQL, docs are wrong

From
Petr Jelinek
Date:
On 05/09/16 04:41, Craig Ringer wrote:
> On 2 September 2016 at 17:49, Craig Ringer <craig@2ndquadrant.com> wrote:
>
>> So the main change becomes the one-liner in my prior mail.
>
> Per feedback from Simon, updated with a new test in src/test/recovery .
>
> If you revert the change to
> src/backend/replication/logical/logicalfuncs.c then the test will
> start failing.
>
> I'd quite like to backpatch the fix since it's simple and safe, but I
> don't feel that it's hugely important to do so. This is an annoyance
> not a serious data risking bug.
>
> My only concern here is that we still lose position on crash. So
> SQL-interface callers should still be able to cope with it. But they
> won't see it happen if they're only testing with normal shutdowns,
> host reboots, etc. In practice users aren't likely to notice this
> anyway, though, since most people don't restart the server all the
> time. I think it's better than what we have.
>
> This issue could be eliminated completely by calling
> ReplicationSlotSave() after ReplicationSlotMarkDirty(). This would
> force an immediate flush after a non-peeked SQL interface decoding
> call. It means more fsync()ing, though, and the SQL interface can only
> be used by polling so that could be a significant performance hit.
> We'd want to only flush when the position changed, but even then it'll
> mean a sync every time anything gets returned.
>
> The better alternative is to add a variant on
> pg_logical_slot_get_changes(...) etc that accepts a start LSN. But
> it's not convenient or easy for SQL callers to extract the last commit
> LSN received from the last call to pass it to the next one, so this
> isn't simple, and is probably best tackled as part of the SQL
> interface API change Petr and Andres discussed in this thread.
>

It isn't? I thought lsn is first column of the output of that function?

--   Petr Jelinek                  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training &
Services



Re: Logical decoding slots can go backwards when used from SQL, docs are wrong

From
Simon Riggs
Date:
On 5 September 2016 at 03:41, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 2 September 2016 at 17:49, Craig Ringer <craig@2ndquadrant.com> wrote:
>
>> So the main change becomes the one-liner in my prior mail.
>
> Per feedback from Simon, updated with a new test in src/test/recovery .
>
> If you revert the change to
> src/backend/replication/logical/logicalfuncs.c then the test will
> start failing.
>
> I'd quite like to backpatch the fix since it's simple and safe, but I
> don't feel that it's hugely important to do so. This is an annoyance
> not a serious data risking bug.

I've committed to HEAD only. We can discuss backpatching elsewhere also.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Logical decoding slots can go backwards when used from SQL, docs are wrong

From
Craig Ringer
Date:
On 5 September 2016 at 16:33, Petr Jelinek <petr@2ndquadrant.com> wrote:

>> The better alternative is to add a variant on
>> pg_logical_slot_get_changes(...) etc that accepts a start LSN. But
>> it's not convenient or easy for SQL callers to extract the last commit
>> LSN received from the last call to pass it to the next one, so this
>> isn't simple, and is probably best tackled as part of the SQL
>> interface API change Petr and Andres discussed in this thread.
>>
>
> It isn't? I thought lsn is first column of the output of that function?

It is. While it's a pain to use that from SQL (psql, etc) as well as
doing something with the results, there's no point since anything
working in simple SQL will get terminated when the server restarts
anyway. So really my point above is moot.

That's doesn't reflect on the slot dirty patch, it just means there's
no need to do anything extra when we add the cursor-like interface
later in order to fully solve this.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services