Re: Logical decoding slots can go backwards when used from SQL, docs are wrong - Mailing list pgsql-hackers

From Craig Ringer
Subject Re: Logical decoding slots can go backwards when used from SQL, docs are wrong
Date
Msg-id CAMsr+YGF0LQ804Tq5iEKNS=cEXvtDxNrEFdUwbgiNO8TnNqzZA@mail.gmail.com
Whole thread Raw
In response to Re: Logical decoding slots can go backwards when used from SQL, docs are wrong  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
On 11 March 2016 at 17:00, Simon Riggs <simon@2ndquadrant.com> wrote:
 
  
Also, pg_logical_slot_get_changes and its _peek_ variant should have a param specifying the starting LSN to read and return. If this is lower than the restart_lsn but non-null it should ERROR; if it's greater than or equal it should use this position instead of starting at the confirmed_lsn.

Maybe. I don't really like changing APIs. Can we add new funcs? Make sure a NULL LSN can be passed as well.

It'd be compatible with the old SQL API and we've already made one such change in 9.5, to add the parameter to immediately reserve WAL for physical slots. You just CREATE OR REPLACE FUNCTION it in system_views.sql with a DEFAULT so the old signature still works.

That works for C-level callers too, unless they try to get clever and DirectFunctionCall3 it. Even then it might, I don't recall off the top of my head, but if not we can defend against that too by testing PG_NARGS in the function its self. This wasn't done for the reserve_wal change and I'm not convinced it's needed even if the fmgr doesn't take care of the default parameters - it's not like that's a function that makes much sense to DirectFunctionCall anyway.

Is the return type of pg_logical_slot_peek_changes() incorrect in the docs?

I don't think so, why?

 
Time permitting I'd like to add a pg_logical_slot_confirm function, so you can aternate _peek_changes and _confirm, making it possible to get walsender-interface-like behaviour via the SQL interface.

I thought thats what replorigins do.

Replication origins provide an easy and efficient way for a PostgreSQL downstream (client) to track its replay state and position in a reliable way. If the client is PostgreSQL then you can use replication origins to track the last flushed LSN, yes. If you're using the walsender interface you must do so. They are not used by the server-side of logical decoding on either walsender or SQL interfaces.

If you're using the SQL interface it has its own replay position tracking. Just before pg_logical_slot_get_changes returns it updates the confirm position of the slot, as if you'd sent a replay confirmation from the client on the walsender protocol. The problem is that this happens before we know the client has received and flushed all the data we just sent it. The client might not get all (or any) of it if there are network issues, for example.

Because the replay position can go backwards the client can and must track the last-safely-replayed position its self, as established earlier, so it can skip over data it already saw. The problem is that when using the SQL interface the client can't do the reverse - it can't tell the server "send me that data again, please, I only got half of it last time because the network died". The server has advanced the confirmed position and there's no way to ask for the data again over the SQL interface.

That's why I want to add a param to let it do so, replaying from prior to the confirm location or skipping past data that'd otherwise be replayed. If you pass NULL (the default, and the only option in 9.5 or older) for that param then you get data from the server's last recorded confirm location, which is the server's best-effort tracking of your replay position from server-side.

That's also why a pg_logical_slot_confirm function could be handy. It lets you do a _peek_changes then a _confirm when you know you've flushed them, prior to the next peek.

I think the docs for pg_replication_slots are wrong about the confirm location btw, it says data from before the confirm location can't be replayed, but it can over the walsender protocol. There's just no way to ask for it over the SQL interface.


--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: WIP: Detecting SSI conflicts before reporting constraint violations
Next
From: Magnus Hagander
Date:
Subject: Re: auto_explain sample rate