Thread: pg_receivewal starting position

pg_receivewal starting position

From
Ronan Dunklau
Date:
Hello,

I've notived that pg_receivewal logic for deciding which LSN to start 
streaming at consists of:
  - looking up the latest WAL file in our destination folder, and resume from 
here
  - if there isn't, use the current flush location instead.

This behaviour surprised me when using it with a replication slot: I was 
expecting it to start streaming at the last flushed location from the 
replication slot instead. If you consider a backup tool which will take 
pg_receivewal's output and transfer it somewhere else, using the replication 
slot position would be the easiest way to ensure we don't miss WAL files.

Does that make sense ? 

I don't know if it should be the default, toggled by a command line flag, or if 
we even should let the user provide a LSN.

I'd be happy to implement any of that if we agree.

-- 
Ronan Dunklau





Re: pg_receivewal starting position

From
Kyotaro Horiguchi
Date:
At Tue, 27 Jul 2021 07:50:39 +0200, Ronan Dunklau <ronan.dunklau@aiven.io> wrote in 
> Hello,
> 
> I've notived that pg_receivewal logic for deciding which LSN to start 
> streaming at consists of:
>   - looking up the latest WAL file in our destination folder, and resume from 
> here
>   - if there isn't, use the current flush location instead.
> 
> This behaviour surprised me when using it with a replication slot: I was 
> expecting it to start streaming at the last flushed location from the 
> replication slot instead. If you consider a backup tool which will take 
> pg_receivewal's output and transfer it somewhere else, using the replication 
> slot position would be the easiest way to ensure we don't miss WAL files.
> 
> Does that make sense ? 
> 
> I don't know if it should be the default, toggled by a command line flag, or if 
> we even should let the user provide a LSN.

*I* think it is completely reasonable (or at least convenient or less
astonishing) that pg_receivewal starts from the restart_lsn of the
replication slot to use.  The tool already decides the clean-start LSN
a bit unusual way. And it seems to me that proposed behavior can be
the default when -S is specified.

> I'd be happy to implement any of that if we agree.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: pg_receivewal starting position

From
Ronan Dunklau
Date:
Le mercredi 28 juillet 2021, 08:22:30 CEST Kyotaro Horiguchi a écrit :
> At Tue, 27 Jul 2021 07:50:39 +0200, Ronan Dunklau <ronan.dunklau@aiven.io>
> wrote in
> > Hello,
> >
> > I've notived that pg_receivewal logic for deciding which LSN to start
> >
> > streaming at consists of:
> >   - looking up the latest WAL file in our destination folder, and resume
> >   from
> >
> > here
> >
> >   - if there isn't, use the current flush location instead.
> >
> > This behaviour surprised me when using it with a replication slot: I was
> > expecting it to start streaming at the last flushed location from the
> > replication slot instead. If you consider a backup tool which will take
> > pg_receivewal's output and transfer it somewhere else, using the
> > replication slot position would be the easiest way to ensure we don't
> > miss WAL files.
> >
> > Does that make sense ?
> >
> > I don't know if it should be the default, toggled by a command line flag,
> > or if we even should let the user provide a LSN.
>
> *I* think it is completely reasonable (or at least convenient or less
> astonishing) that pg_receivewal starts from the restart_lsn of the
> replication slot to use.  The tool already decides the clean-start LSN
> a bit unusual way. And it seems to me that proposed behavior can be
> the default when -S is specified.
>

As of now we can't get the replication_slot restart_lsn with a replication
connection AFAIK.

This implies that the patch could require the user to specify a maintenance-db
parameter, and we would use that if provided to fetch the replication slot
info, or fallback to the previous behaviour. I don't really like this approach
as the behaviour changing wether we supply a maintenance-db parameter or not
is error-prone for the user.

Another option would be to add a new replication command (for example
ACQUIRE_REPLICATION_SLOT <slot_name>) to set the replication slot as the
current one, and return some info about it (restart_lsn at least for a
physical slot).

I don't see any reason not to make it work for logical replication connections
/ slots, but it wouldn't be that useful since we can query the database in
that case.

Acquiring the replication slot instead of just reading it would make sure that
no other process could start the replication between the time we read the
restart_lsn and when we issue START_REPLICATION. START_REPLICATION could then
check if we already have a replication slot, and ensure it is the same one as
the one we're trying to use.

From pg_receivewal point of view, this would amount to:

 - check if we currently have wal in the target directory.
   - if we do, proceed as currently done, by computing the start lsn and
timeline from the last archived wal
  - if we don't, and we have a slot, run ACQUIRE_REPLICATION_SLOT. Use the
restart_lsn as the start lsn if there is one, and don't provide a timeline
  - if we still don't have a start_lsn, fallback to using the current server
wal position as is done.

What do you think ? Which information should we provide about the slot ?


--
Ronan Dunklau





Re: pg_receivewal starting position

From
Kyotaro Horiguchi
Date:
At Wed, 28 Jul 2021 12:57:39 +0200, Ronan Dunklau <ronan.dunklau@aiven.io> wrote in
> Le mercredi 28 juillet 2021, 08:22:30 CEST Kyotaro Horiguchi a écrit :
> > At Tue, 27 Jul 2021 07:50:39 +0200, Ronan Dunklau <ronan.dunklau@aiven.io>
> > wrote in
> > > I don't know if it should be the default, toggled by a command line flag,
> > > or if we even should let the user provide a LSN.
> >
> > *I* think it is completely reasonable (or at least convenient or less
> > astonishing) that pg_receivewal starts from the restart_lsn of the
> > replication slot to use.  The tool already decides the clean-start LSN
> > a bit unusual way. And it seems to me that proposed behavior can be
> > the default when -S is specified.
> >
>
> As of now we can't get the replication_slot restart_lsn with a replication
> connection AFAIK.
>
> This implies that the patch could require the user to specify a maintenance-db
> parameter, and we would use that if provided to fetch the replication slot
> info, or fallback to the previous behaviour. I don't really like this approach
> as the behaviour changing wether we supply a maintenance-db parameter or not
> is error-prone for the user.
>
> Another option would be to add a new replication command (for example
> ACQUIRE_REPLICATION_SLOT <slot_name>) to set the replication slot as the
> current one, and return some info about it (restart_lsn at least for a
> physical slot).

I didn't thought in details.  But I forgot that ordinary SQL commands
have been prohibited in physical replication connection.  So we need a
new replication command but it's not that a big deal.

> I don't see any reason not to make it work for logical replication connections
> / slots, but it wouldn't be that useful since we can query the database in
> that case.

Ordinary SQL queries are usable on a logical replication slot so
I'm not sure how logical replication connection uses the command.
However, like you, I wouldn't bother restricting the command to
physical replication, but perhaps the new command should return the
slot type.

> Acquiring the replication slot instead of just reading it would make sure that
> no other process could start the replication between the time we read the
> restart_lsn and when we issue START_REPLICATION. START_REPLICATION could then
> check if we already have a replication slot, and ensure it is the same one as
> the one we're trying to use.

I'm not sure it's worth adding complexity for such strictness.
START_REPLICATION safely fails if someone steals the slot meanwhile.
In the first place there's no means to protect a slot from others
while idle.  One possible problem is the case where START_REPLICATION
successfully acquire the slot after the new command failed.  But that
case doesn't seem worse than the case someone advances the slot while
absence.  So I think READ_REPLICATION_SLOT is sufficient.

> From pg_receivewal point of view, this would amount to:
>
>  - check if we currently have wal in the target directory.
>    - if we do, proceed as currently done, by computing the start lsn and
> timeline from the last archived wal
>   - if we don't, and we have a slot, run ACQUIRE_REPLICATION_SLOT. Use the
> restart_lsn as the start lsn if there is one, and don't provide a timeline
>   - if we still don't have a start_lsn, fallback to using the current server
> wal position as is done.

That's pretty much it.

> What do you think ? Which information should we provide about the slot ?

We need the timeline id to start with when using restart_lsn.  The
current timeline can be used in most cases but there's a case where
the LSN is historical.

pg_receivewal doesn't send a replication status report when a segment
is finished. So after pg_receivewal stops just after a segment is
finished, the slot stays at the beginning of the last segment.  Thus
next time it will start from there, creating a duplicate segment.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center



Re: pg_receivewal starting position

From
Ronan Dunklau
Date:
Le jeudi 29 juillet 2021, 07:32:37 CEST Kyotaro Horiguchi a écrit :
> I didn't thought in details.  But I forgot that ordinary SQL commands
> have been prohibited in physical replication connection.  So we need a
> new replication command but it's not that a big deal.

Thank you for your feedback !


>
> > I don't see any reason not to make it work for logical replication
> > connections / slots, but it wouldn't be that useful since we can query
> > the database in that case.
>
> Ordinary SQL queries are usable on a logical replication slot so
> I'm not sure how logical replication connection uses the command.
> However, like you, I wouldn't bother restricting the command to
> physical replication, but perhaps the new command should return the
> slot type.
>

Ok done in the attached patch.

>
> I'm not sure it's worth adding complexity for such strictness.
> START_REPLICATION safely fails if someone steals the slot meanwhile.
> In the first place there's no means to protect a slot from others
> while idle.  One possible problem is the case where START_REPLICATION
> successfully acquire the slot after the new command failed.  But that
> case doesn't seem worse than the case someone advances the slot while
> absence.  So I think READ_REPLICATION_SLOT is sufficient.
>

Ok, I implemented it like this. I tried to follow the pg_get_replication_slots
approach with regards to how to prevent concurrent modification while reading
the slot.

> > From pg_receivewal point of view, this would amount to:
> >  - check if we currently have wal in the target directory.
> >
> >    - if we do, proceed as currently done, by computing the start lsn and
> >
> > timeline from the last archived wal
> >
> >   - if we don't, and we have a slot, run ACQUIRE_REPLICATION_SLOT. Use the
> >
> > restart_lsn as the start lsn if there is one, and don't provide a timeline
> >
> >   - if we still don't have a start_lsn, fallback to using the current
> >   server
> >
> > wal position as is done.
>
> That's pretty much it.

Great.

>
> > What do you think ? Which information should we provide about the slot ?
>
> We need the timeline id to start with when using restart_lsn.  The
> current timeline can be used in most cases but there's a case where
> the LSN is historical.

Ok, see below.

>
> pg_receivewal doesn't send a replication status report when a segment
> is finished. So after pg_receivewal stops just after a segment is
> finished, the slot stays at the beginning of the last segment.  Thus
> next time it will start from there, creating a duplicate segment.

I'm not sure I see where the problem is here. If we don't keep the segments in
pg_walreceiver target directory, then it would be the responsibility of
whoever moved them to make sure we don't have duplicates, or to handle them
gracefully.

Even if we were forcing a feedback after a segment is finished, there could
still be a problem if the feedback never made it to the server but the segment
was here. It might be interesting to send a feedback anyway.

Please find attached two patches implementing what we've been discussing.

Patch 0001 adds the new READ_REPLICATION_SLOT command.
It returns for a given slot the type, restart_lsn, flush_lsn,
restart_lsn_timeline and flush_lsn_timeline.
The timelines are determined by reading the current timeline history, and
finding the timeline where we may find the record. I didn't find explicit test
for eg IDENTIFY_SYSTEM so didn't write one either for this new command, but it
is tested indirectly in patch 0002.

Patch 0002 makes pg_receivewal use that command if we use a replication slot
and the command is available, and use the restart_lsn and restart_lsn_timeline
as a starting point. It also adds a small test to check that we start back
from the previous restart_lsn instead of the current flush position when our
destination directory does not contain any WAL file.

I also noticed we don't test following a timeline switch. It would probably be
good to add that, both for the case where we determine the previous timeline
from the archived segments and when it comes from the new command. What do you
think ?


Regards,

--
Ronan Dunklau
Attachment

Re: pg_receivewal starting position

From
Ronan Dunklau
Date:
Le jeudi 29 juillet 2021, 11:09:40 CEST Ronan Dunklau a écrit :
> Patch 0001 adds the new READ_REPLICATION_SLOT command.
> It returns for a given slot the type, restart_lsn, flush_lsn,
> restart_lsn_timeline and flush_lsn_timeline.
> The timelines are determined by reading the current timeline history, and
> finding the timeline where we may find the record. I didn't find explicit
> test for eg IDENTIFY_SYSTEM so didn't write one either for this new
> command, but it is tested indirectly in patch 0002.
>
> Patch 0002 makes pg_receivewal use that command if we use a replication slot
> and the command is available, and use the restart_lsn and
> restart_lsn_timeline as a starting point. It also adds a small test to
> check that we start back from the previous restart_lsn instead of the
> current flush position when our destination directory does not contain any
> WAL file.
>
> I also noticed we don't test following a timeline switch. It would probably
> be good to add that, both for the case where we determine the previous
> timeline from the archived segments and when it comes from the new command.
> What do you think ?

Following the discussion at [1], I refactored the implementation into
streamutil and added a third patch making use of it in pg_basebackup itself in
order to fail early if the replication slot doesn't exist, so please find
attached v2 for that.

Best regards,

[1]: https://www.postgresql.org/message-id/flat/
CAD21AoDYmv0yJMQnWtCx_kZGwVZnkQSTQ1re2JNSgM0k37afYQ%40mail.gmail.com

--
Ronan Dunklau
Attachment

Re: pg_receivewal starting position

From
Bharath Rupireddy
Date:
On Thu, Aug 26, 2021 at 5:45 PM Ronan Dunklau <ronan.dunklau@aiven.io> wrote:
> order to fail early if the replication slot doesn't exist, so please find
> attached v2 for that.

Thanks for the patches. Here are some comments:

1) While the intent of these patches looks good, I have following
concern with new replication command READ_REPLICATION_SLOT: what if
the pg_receivewal exits (because user issued a SIGINT or for some
reason) after flushing  the received WAL to disk, before it sends
sendFeedback to postgres server's walsender so that it doesn't get a
chance to update the restart_lsn in the replication slot via
PhysicalConfirmReceivedLocation. If the pg_receivewal is started
again, isn't it going to get the previous restart_lsn and receive the
last chunk of flushed WAL again?

2) What is the significance of READ_REPLICATION_SLOT for logical
replication slots? I read above that somebody suggested to restrict
the walsender to handle READ_REPLICATION_SLOT for physical replication
slots so that the callers will see a command failure. But I tend to
think that it is clean to have this command common for both physical
and logical replication slots and the callers can have an Assert(type
== 'physical').

3) Isn't it useful to send active, active_pid info of the replication
slot via READ_REPLICATION_SLOT? pg_receivewal can use Assert(active ==
true && active_pid == getpid()) as an assertion to ensure that it is
the sole owner of the replication slot? Also, is it good send
wal_status info

4) I think below messages should start with lower case letter and also
there are some typos:
+ pg_log_warning("Could not fetch the replication_slot \"%s\" information "
+ pg_log_warning("Server does not suport fetching the slot's position, "
something like:
+ pg_log_warning("could not fetch replication slot \"%s\" information, "
+    "resuming from current server position instead", replication_slot);
+ pg_log_warning("server does not support fetching replication slot
information, "
+    "resuming from current server position instead");

5) How about emitting the above messages in case of "verbose"?

6) How about an assertion like below?
+ if (stream.startpos == InvalidXLogRecPtr)
+ {
+ stream.startpos = serverpos;
+ stream.timeline = servertli;
+ }
+
+Assert(stream.startpos != InvalidXLogRecPtr)>>

7) How about we let pg_receivewal use READ_REPLICATION_SLOT as an option?

8) Just an idea, how about we store pg_receivewal's lastFlushPosition
in a file before pg_receivewal exits and compare it with the
restart_lsn that it received from the replication slot, if
lastFlushPosition == received_restart_lsn well and good, if not, then
something would have happened and we always start at the
lastFlushPosition ?

Regards,
Bharath Rupireddy.



Re: pg_receivewal starting position

From
Ronan Dunklau
Date:
Le vendredi 27 août 2021, 05:44:32 CEST Michael Paquier a écrit :
> On Thu, Aug 26, 2021 at 02:14:27PM +0200, Ronan Dunklau wrote:
> > Following the discussion at [1], I refactored the implementation into
> > streamutil and added a third patch making use of it in pg_basebackup
> > itself in order to fail early if the replication slot doesn't exist, so
> > please find attached v2 for that.
>
> Thanks for the split.  That helps a lot.
>

Thank you very much for the review, please find attached an updated patchset.
I've also taken into account some remarks made by Bharath Rupireddy.

> +
> +
>  /*
>   * Run IDENTIFY_SYSTEM through a given connection and give back to caller
>
> The patch series has some noise diffs here and there, you may want to
> clean up that for clarity.

Ok, sorry about that.

>
> +   if (slot == NULL || !slot->in_use)
> +   {
> +       LWLockRelease(ReplicationSlotControlLock);
> +
> +       ereport(ERROR,
> +               (errcode(ERRCODE_UNDEFINED_OBJECT),
> LWLocks are released on ERROR, so no need for LWLockRelease() here.
>

Following your suggestion of not erroring out on an unexisting slot this point
is no longer be relevant, but thanks for pointing this out anyway.

> +    <listitem>
> +      <para>
> +      Read information about the named replication slot.  This is
> useful to determine which WAL location we should be asking the server
> to start streaming at.
>
> A nit.  You may want to be more careful with the indentation of the
> documentation.  Things are usually limited in width for readability.
> More <literal> markups would be nice for the field names used in the
> descriptions.

Ok.

>
> +   if (slot == NULL || !slot->in_use)
>
>                                                                     [...] +
>       ereport(ERROR,
> +               (errcode(ERRCODE_UNDEFINED_OBJECT),
> +                errmsg("replication slot \"%s\" does not exist",
> +                       cmd->slotname)));
> [...]
> +       if (PQntuples(res) == 0)
> +       {
> +               pg_log_error("replication slot %s does not exist",
> slot_name); +               PQclear(0);
> +               return false;
> So, the backend and ReadReplicationSlot() report an ERROR if a slot
> does not exist but pg_basebackup's GetSlotInformation() does the same
> if there are no tuples returned.  That's inconsistent.  Wouldn't it be
> more instinctive to return a NULL tuple instead if the slot does not
> exist to be able to check after real ERRORs in frontends using this
> interface?

The attached patch returns no tuple at all when the replication slot doesn't
exist. I'm not sure if that's what you meant by returning a NULL tuple ?

> A slot in use exists, so the error is a bit confusing here
> anyway, no?

From my understanding, a slot  *not* in use doesn't exist anymore, as such I
don't really understand this point. Could you clarify ?


>
> +    * XXX: should we allow the caller to specify which target timeline it
> wants +    * ?
> +    */
> What are you thinking about here?

I was thinking that maybe instead of walking back the timeline history from
where we currently are on the server, we could allow an additional argument
for the client to specify which timeline it wants. But I guess a replication
slot can not be present for a past, divergent timeline ? I have removed that
suggestion.

>
> -# restarts of pg_receivewal will see this segment as full..
> +# restarts of pg_receivewal will see this segment as full../
> Typo.

Ok.

>
> +   TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 4,
> "restart_lsn_timeline", +                             INT4OID, -1, 0);
> +   TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 5,
> "confirmed_flush_lsn_timeline", +                             INT4OID, -1,
> 0);
> I would call these restart_tli and confirmed_flush_tli., without the
> "lsn" part.

Ok.
>
> The patch for READ_REPLICATION_SLOT could have some tests using a
> connection that has replication=1 in some TAP tests.  We do that in
> 001_stream_rep.pl with SHOW, as one example.

Ok. I added the physical part to 001_stream_rep.pl, using the protocol
interface directly for creating / dropping the slot, and some tests for
logical replication slots to 006_logical_decoding.pl.

>
> -               'slot0'
> +               'slot0',                     '-p',
> +               "$port"
> Something we are missing here?

The thing we're missing here is a wrapper for command_fails_like. I've added
this to PostgresNode.pm.

Best regards,


--
Ronan Dunklau
Attachment

Re: pg_receivewal starting position

From
Ronan Dunklau
Date:
Le samedi 28 août 2021, 14:10:25 CEST Bharath Rupireddy a écrit :
> On Thu, Aug 26, 2021 at 5:45 PM Ronan Dunklau <ronan.dunklau@aiven.io>
wrote:
> > order to fail early if the replication slot doesn't exist, so please find
> > attached v2 for that.
>
> Thanks for the patches. Here are some comments:
>

Thank you for this review ! Please see the other side of the thread where I
answered Michael Paquier with a new patchset, which includes some of your
proposed modifications.

> 1) While the intent of these patches looks good, I have following
> concern with new replication command READ_REPLICATION_SLOT: what if
> the pg_receivewal exits (because user issued a SIGINT or for some
> reason) after flushing  the received WAL to disk, before it sends
> sendFeedback to postgres server's walsender so that it doesn't get a
> chance to update the restart_lsn in the replication slot via
> PhysicalConfirmReceivedLocation. If the pg_receivewal is started
> again, isn't it going to get the previous restart_lsn and receive the
> last chunk of flushed WAL again?

I've kept the existing directory as the source of truth if we have any WAL
there already. If we don't, we fallback to the restart_lsn on the replication
slot.
So in the event that we start it again from somewhere else where we don't have
access to those WALs anymore, we could be receiving it again, which in my
opinion is better than losing everything in between in that case.

>
> 2) What is the significance of READ_REPLICATION_SLOT for logical
> replication slots? I read above that somebody suggested to restrict
> the walsender to handle READ_REPLICATION_SLOT for physical replication
> slots so that the callers will see a command failure. But I tend to
> think that it is clean to have this command common for both physical
> and logical replication slots and the callers can have an Assert(type
> == 'physical').

I've updated the patch to make it easy for the caller to check the slot's type
and added a verification for those cases.

In general, I tried to implement the meaning of the different fields exactly as
it's done in the pg_replication_slots view.

>
> 3) Isn't it useful to send active, active_pid info of the replication
> slot via READ_REPLICATION_SLOT? pg_receivewal can use Assert(active ==
> true && active_pid == getpid()) as an assertion to ensure that it is
> the sole owner of the replication slot? Also, is it good send
> wal_status info

Typically we read the slot before attaching to it, since what we want to do is
check if it exists. It may be worthwile to check that it's not already used by
another backend though.

>
> 4) I think below messages should start with lower case letter and also
> there are some typos:
> + pg_log_warning("Could not fetch the replication_slot \"%s\" information "
> + pg_log_warning("Server does not suport fetching the slot's position, "
> something like:
> + pg_log_warning("could not fetch replication slot \"%s\" information, "
> +    "resuming from current server position instead", replication_slot);
> + pg_log_warning("server does not support fetching replication slot
> information, "
> +    "resuming from current server position instead");
>
I've rephrased it a bit in v3, let me know if that's what you had in mind.


> 5) How about emitting the above messages in case of "verbose"?

I think it is useful to warn the user even if not in the verbose case, but if
the consensus is to move it to verbose only output I can change it.

>
> 6) How about an assertion like below?
> + if (stream.startpos == InvalidXLogRecPtr)
> + {
> + stream.startpos = serverpos;
> + stream.timeline = servertli;
> + }
> +
> +Assert(stream.startpos != InvalidXLogRecPtr)>>

Good idea.

>
> 7) How about we let pg_receivewal use READ_REPLICATION_SLOT as an option?

From my point of view, I already expected it to use something like that when
using a replication slot. Maybe an option to turn it off could be useful ?

>
> 8) Just an idea, how about we store pg_receivewal's lastFlushPosition
> in a file before pg_receivewal exits and compare it with the
> restart_lsn that it received from the replication slot, if
> lastFlushPosition == received_restart_lsn well and good, if not, then
> something would have happened and we always start at the
> lastFlushPosition ?

The patch idea originally came from the fact that some utility use
pg_receivewal to fetch WALs, and move them elsewhere. In that sense I don't
really see what value this brings compared to the existing (and unmodified) way
of computing the restart position from the already stored files ?

Best regards,

--
Ronan Dunklau





Re: pg_receivewal starting position

From
Bharath Rupireddy
Date:
On Mon, Aug 30, 2021 at 3:26 PM Ronan Dunklau <ronan.dunklau@aiven.io> wrote:
> Thank you for this review ! Please see the other side of the thread where I
> answered Michael Paquier with a new patchset, which includes some of your
> proposed modifications.

Thanks for the updated patches. Here are some comments on v3-0001
patch. I will continue to review 0002 and 0003.

1) Missing full stops "." at the end.
+               <literal>logical</literal>
+               when following the current timeline history
+               position, when following the current timeline history

2) Can we have the "type" column as "slot_type" just to keep in sync
with the pg_replication_slots view?

3) Can we mention the link to pg_replication_slots view in the columns
- "type", "restart_lsn", "confirmed_flush_lsn"?
Something like: the "slot_type"/"restart_lsn"/"confirmed_flush_lsn" is
same as  <link linkend="view-pg-replication-slots"><structname>pg_replication_slots</structname></link>
view.

4) Can we use "read_replication_slot" instead of
"identify_replication_slot", just to be in sync with the actual
command?

5) Are you actually copying the slot contents into the slot_contents
variable here? Isn't just taking the pointer to the shared memory?
+ /* Copy slot contents while holding spinlock */
+ SpinLockAcquire(&slot->mutex);
+ slot_contents = *slot;
+ SpinLockRelease(&slot->mutex);
+ LWLockRelease(ReplicationSlotControlLock);

You could just do:
+ Oid dbid;
+ XLogRecPtr restart_lsn;
+ XLogRecPtr confirmed_flush;

+ /* Copy the required slot contents */
+ SpinLockAcquire(&slot->mutex);
+ dbid = slot.data.database;
+ restart_lsn = slot.data.restart_lsn;
+ confirmed_flush = slot.data.confirmed_flush;
+ SpinLockRelease(&slot->mutex);
+ LWLockRelease(ReplicationSlotControlLock);

6) It looks like you are not sending anything as a response to the
READ_REPLICATION_SLOT command, if the slot specified doesn't exist.
You are just calling end_tup_output which just calls rShutdown (points
to donothingCleanup of printsimpleDR)
if (has_value)
do_tup_output(tstate, values, nulls);
end_tup_output(tstate);

Can you test the use case where the pg_receivewal asks
READ_REPLICATION_SLOT with a non-existent replication slot and see
with your v3 patch how it behaves?

Why don't you remove has_value flag and do this in ReadReplicationSlot:
Datum values[5];
bool nulls[5];
MemSet(values, 0, sizeof(values));
MemSet(nulls, 0, sizeof(nulls));

+ dest = CreateDestReceiver(DestRemoteSimple);
+ tstate = begin_tup_output_tupdesc(dest, tupdesc, &TTSOpsVirtual);
+ do_tup_output(tstate, values, nulls);
+ end_tup_output(tstate);

7) Why don't we use 2 separate variables "restart_tli",
"confirmed_flush_tli" instead of "slots_position_timeline", just to be
more informative?

8) I still have the question like, how can a client (pg_receivewal for
instance) know that it is the current owner/user of the slot it is
requesting the info? As I said upthread, why don't we send "active"
and "active_pid" fields of the pg_replication_slots view?
Also, it would be good to send the "wal_status" field so that the
client can check if the "wal_status" is not "lost"?

9) There are 2 new lines at the end of ReadReplicationSlot. We give
only one new line after each function definition.
end_tup_output(tstate);
}
<<1stnewline>>
<<2ndnewline>>
/*
 * Handle TIMELINE_HISTORY command.
 */

10) Why do we need to have two test cases for "non-existent" slots?
Isn't the test case after "DROP REPLICATION" enough?
+($ret, $stdout, $stderr) = $node_primary->psql(
+  'postgres', 'READ_REPLICATION_SLOT non_existent_slot;',
+  extra_params => [ '-d', $connstr_rep ]);
+ok( $ret == 0,
+  "READ_REPLICATION_SLOT does not produce an error with non existent slot");
+ok( $stdout eq '',
+    "READ_REPLICATION_SLOT returns no tuple if a slot is non existent");

You can just rename the test case name from:
+    "READ_REPLICATION_SLOT returns no tuple if a slot has been dropped");
to
+    "READ_REPLICATION_SLOT returns no tuple if a slot is non existent");

Regards,
Bharath Rupireddy.



Re: pg_receivewal starting position

From
Bharath Rupireddy
Date:
On Tue, Aug 31, 2021 at 4:47 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Mon, Aug 30, 2021 at 3:26 PM Ronan Dunklau <ronan.dunklau@aiven.io> wrote:
> > Thank you for this review ! Please see the other side of the thread where I
> > answered Michael Paquier with a new patchset, which includes some of your
> > proposed modifications.
>
> Thanks for the updated patches. Here are some comments on v3-0001
> patch. I will continue to review 0002 and 0003.

Continuing my review on the v3 patch set:

0002:
1) I think the following message
"could not fetch replication slot LSN: got %d rows and %d fields,
expected %d rows and %d or more fields"
should be:
"could not read replication slot: got %d rows and %d fields, expected
%d rows and %d or more fields"

2) Why GetSlotInformation is returning InvalidXLogRecPtr? Change it to
return false instead.

3) Alignment issue:
Change below 2 lines:
+ appendPQExpBuffer(query, "READ_REPLICATION_SLOT %s",
+   slot_name);
To 1 line, as it will be under 80 char limit:
+ appendPQExpBuffer(query, "READ_REPLICATION_SLOT %s", slot_name);

4) Shouldn't your GetSlotInformation return what ever the
READ_REPLICATION_SLOT gets as output columns to be more generic?
Callers would pass non-null/null pointers to the inputs required/not
required for them. Please refer to RunIdentifySystem how it does that.
GetSlotInformation can just read the tuples that are interested for the callers.
bool
GetSlotInformation(PGconn *conn, const char *slot_name, char **slot_type,
   XLogRecPtr *restart_lsn, uint32 *restart_lsn_tli,
   XLogRecPtr *confirmed_lsn, XLogRecPtr *confirmed_lsn_tli)
{
if (slot_type)
{
/* get the slot_type value from the received tuple */
}
if (restart_lsn)
{
/* get the restart_lsn value from the received tuple */
}
if (restart_lsn_tli)
{
/* get the restart_lsn_tli value from the received tuple */
}

if (confirmed_lsn)
{
/* get the confirmed_lsn value from the received tuple */
}

if (confirmed_lsn_tli)
{
/* get the confirmed_lsn_tli value from the received tuple */
}
}

5) How about below as GetSlotInformation function comment?
/*
 * Run READ_REPLICATION_SLOT through a given connection and give back to caller
 * following information of the slot if requested:
 * - type
 * - restart lsn
 * - restart lsn timeline
 * - confirmed lsn
 * - confirmed lsn timeline
 */

6) Do you need +#include "pqexpbuffer.h" in pg_receivewal.c?

7) Missing "," after information and it is not required to use "the"
in the messages.
Change below
+ pg_log_warning("could not fetch the replication_slot \"%s\" information "
+    "resuming from the current server position instead", replication_slot);
to:
+ pg_log_warning("could not fetch replication_slot \"%s\" information, "
+    "resuming from current server position instead", replication_slot);

8) A typo "suport". Ignore this comment, if you incorporate review comment #10.
Change below
pg_log_warning("server does not suport fetching the slot's position, "
   "resuming from the current server position instead");
to:
pg_log_warning("server does not support getting start LSN from
replication slot, "
   "resuming from current server position instead");

9) I think you should do free the memory allocated to slot_type by
GetSlotInformation:
+ if (strcmp(slot_type, "physical") != 0)
+ {
+ pg_log_error("slot \"%s\" is not a physical replication slot",
+ replication_slot);
+ exit(1);
+ }
+
+ pg_free(slot_type);

10) Isn't it PQclear(res);?
+ PQclear(0);

11) I don't think you need to check for the null value of
replication_slot. In the StreamLog it can't be null, so you can safely
remove the below if condition.
+ if (replication_slot)
+ {

12) How about
/* Try to get start position from server's replication slot information */
insted of
+ /* Try to get it from the slot if any, and the server supports it */

13) When you say that the server supports the new
READ_REPLICATION_SLOT command only if version >= 15000, then isn't it
the function GetSlotInformation doing the following:
bool
GetSlotInformation(PGconn *conn,....,bool *is_supported)
{

if (PQserverVersion(conn) < 15000)
{
*is_supported = false;
return false;
}
*is_supported = true;
}
So, the callers will just do:
/* Try to get start position from server's replication slot information */
char    *slot_type = NULL;
bool is_supported;

if (!GetSlotInformation(conn, replication_slot, &stream.startpos,
&stream.timeline, &slot_type, &is_supported))
{
if (!is_supported)
pg_log_warning("server does not support getting start LSN from
replication slot, "
   "resuming from current server position instead");
else
pg_log_warning("could not fetch replication_slot \"%s\" information, "
   "resuming from current server position instead",
   replication_slot);
}

if (slot_type && strcmp(slot_type, "physical") != 0)
{
pg_log_error("slot \"%s\" is not a physical replication slot",
    replication_slot);
exit(1);
}

pg_free(slot_type);
}

14) Instead of just
+ if (strcmp(slot_type, "physical") != 0)
do
+ if (slot_type && strcmp(slot_type, "physical") != 0)

0003:
1) The message should start with lower case: "slot \"%s\" is not a
physical replication slot".
+ pg_log_error("Slot \"%s\" is not a physical replication slot",

2)
+ if (strcmp(slot_type, "physical") != 0)
do
+ if (slot_type && strcmp(slot_type, "physical") != 0)

Regards,
Bharath Rupireddy.



Re: pg_receivewal starting position

From
Ronan Dunklau
Date:
Le mardi 31 août 2021, 13:17:22 CEST Bharath Rupireddy a écrit :
> On Mon, Aug 30, 2021 at 3:26 PM Ronan Dunklau <ronan.dunklau@aiven.io>
wrote:
> > Thank you for this review ! Please see the other side of the thread where
> > I
> > answered Michael Paquier with a new patchset, which includes some of your
> > proposed modifications.
>
> Thanks for the updated patches. Here are some comments on v3-0001
> patch. I will continue to review 0002 and 0003.

Thank you ! I will send a new version shortly, once I address your remarks
concerning patch 0002 (and hopefully 0003 :-) )
>
> 1) Missing full stops "." at the end.
> +               <literal>logical</literal>
> +               when following the current timeline history
> +               position, when following the current timeline history
>

Good catch, I will take care of it for the next version.

> 2) Can we have the "type" column as "slot_type" just to keep in sync
> with the pg_replication_slots view?

You're right, it makes more sense like this.

>
> 3) Can we mention the link to pg_replication_slots view in the columns
> - "type", "restart_lsn", "confirmed_flush_lsn"?
> Something like: the "slot_type"/"restart_lsn"/"confirmed_flush_lsn" is
> same as  <link
> linkend="view-pg-replication-slots"><structname>pg_replication_slots</struc
> tname></link> view.

Same as above, thanks.

>
> 4) Can we use "read_replication_slot" instead of
> "identify_replication_slot", just to be in sync with the actual
> command?

That must have been a leftover from an earlier version of the patch, I will fix
it also.

>
> 5) Are you actually copying the slot contents into the slot_contents
> variable here? Isn't just taking the pointer to the shared memory?
> + /* Copy slot contents while holding spinlock */
> + SpinLockAcquire(&slot->mutex);
> + slot_contents = *slot;
> + SpinLockRelease(&slot->mutex);
> + LWLockRelease(ReplicationSlotControlLock);
>
> You could just do:
> + Oid dbid;
> + XLogRecPtr restart_lsn;
> + XLogRecPtr confirmed_flush;
>
> + /* Copy the required slot contents */
> + SpinLockAcquire(&slot->mutex);
> + dbid = slot.data.database;
> + restart_lsn = slot.data.restart_lsn;
> + confirmed_flush = slot.data.confirmed_flush;
> + SpinLockRelease(&slot->mutex);
> + LWLockRelease(ReplicationSlotControlLock);

It's probably simpler that way.

>
> 6) It looks like you are not sending anything as a response to the
> READ_REPLICATION_SLOT command, if the slot specified doesn't exist.
> You are just calling end_tup_output which just calls rShutdown (points
> to donothingCleanup of printsimpleDR)
> if (has_value)
> do_tup_output(tstate, values, nulls);
> end_tup_output(tstate);

>
> Can you test the use case where the pg_receivewal asks
> READ_REPLICATION_SLOT with a non-existent replication slot and see
> with your v3 patch how it behaves?

>
> Why don't you remove has_value flag and do this in ReadReplicationSlot:
> Datum values[5];
> bool nulls[5];
> MemSet(values, 0, sizeof(values));
> MemSet(nulls, 0, sizeof(nulls));
>
> + dest = CreateDestReceiver(DestRemoteSimple);
> + tstate = begin_tup_output_tupdesc(dest, tupdesc, &TTSOpsVirtual);
> + do_tup_output(tstate, values, nulls);
> + end_tup_output(tstate);

As for the API, I think it's cleaner to just send an empty result set instead
of a null tuple in that case but I won't fight over it if there is consensus on
having an all-nulls tuple value instead.

There is indeed a bug, but not here, in the second patch: I still test the
slot type even when we didn't fetch anything. So I will add a test for that
too.
>
> 7) Why don't we use 2 separate variables "restart_tli",
> "confirmed_flush_tli" instead of "slots_position_timeline", just to be
> more informative?

You're right.

>
> 8) I still have the question like, how can a client (pg_receivewal for
> instance) know that it is the current owner/user of the slot it is
> requesting the info? As I said upthread, why don't we send "active"
> and "active_pid" fields of the pg_replication_slots view?
> Also, it would be good to send the "wal_status" field so that the
> client can check if the "wal_status" is not "lost"?

 As for pg_receivewal, it can only check that it's not active at that time,
since we only aquire the replication slot once we know the start_lsn. For the
basebackup case it's the same thing as we only want to check if it exists.
As such, I didn't add them as I didn't see the need, but if it can be useful
why not ? I will do that in the next version.

>
> 9) There are 2 new lines at the end of ReadReplicationSlot. We give
> only one new line after each function definition.
> end_tup_output(tstate);
> }
> <<1stnewline>>
> <<2ndnewline>>
> /*
>  * Handle TIMELINE_HISTORY command.
>  */
>

Ok !


> 10) Why do we need to have two test cases for "non-existent" slots?
> Isn't the test case after "DROP REPLICATION" enough?
> +($ret, $stdout, $stderr) = $node_primary->psql(
> +  'postgres', 'READ_REPLICATION_SLOT non_existent_slot;',
> +  extra_params => [ '-d', $connstr_rep ]);
> +ok( $ret == 0,
> +  "READ_REPLICATION_SLOT does not produce an error with non existent
> slot"); +ok( $stdout eq '',
> +    "READ_REPLICATION_SLOT returns no tuple if a slot is non existent");
>
> You can just rename the test case name from:
> +    "READ_REPLICATION_SLOT returns no tuple if a slot has been dropped");
> to
> +    "READ_REPLICATION_SLOT returns no tuple if a slot is non existent");
>

I wanted to test both the case where no slot by this name exists, and the case
where it has been dropped hence still referenced but marked as not "in_use".
Maybe it's not worth it and we can remove the first case as you suggest.

Best regards,

--
Ronan Dunklau





Re: pg_receivewal starting position

From
Bharath Rupireddy
Date:
On Mon, Aug 30, 2021 at 3:26 PM Ronan Dunklau <ronan.dunklau@aiven.io> wrote:
> > 7) How about we let pg_receivewal use READ_REPLICATION_SLOT as an option?
>
> From my point of view, I already expected it to use something like that when
> using a replication slot. Maybe an option to turn it off could be useful ?

IMO, pg_receivewal should have a way to turn off/on using
READ_REPLICATION_SLOT. Imagine if the postgres server doesn't support
READ_REPLICATION_SLOT (a lower version) but for some reasons the
pg_receivewal(running separately) is upgraded to the version that uses
READ_REPLICATION_SLOT, knowing that the server doesn't support
READ_REPLICATION_SLOT why should user let pg_receivewal run an
unnecessary code?

Regards,
Bharath Rupireddy.



Re: pg_receivewal starting position

From
Kyotaro Horiguchi
Date:
At Wed, 1 Sep 2021 10:30:05 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in 
> On Mon, Aug 30, 2021 at 3:26 PM Ronan Dunklau <ronan.dunklau@aiven.io> wrote:
> > > 7) How about we let pg_receivewal use READ_REPLICATION_SLOT as an option?
> >
> > From my point of view, I already expected it to use something like that when
> > using a replication slot. Maybe an option to turn it off could be useful ?
> 
> IMO, pg_receivewal should have a way to turn off/on using
> READ_REPLICATION_SLOT. Imagine if the postgres server doesn't support
> READ_REPLICATION_SLOT (a lower version) but for some reasons the
> pg_receivewal(running separately) is upgraded to the version that uses
> READ_REPLICATION_SLOT, knowing that the server doesn't support
> READ_REPLICATION_SLOT why should user let pg_receivewal run an
> unnecessary code?

If I read the patch correctly the situation above is warned by the
following message then continue to the next step giving up to identify
start position from slot data.

> "server does not suport fetching the slot's position, resuming from the current server position instead"

(The message looks a bit too long, though..)

However, if the operator doesn't know the server is old, pg_receivewal
starts streaming from unexpected position, which is a kind of
disaster. So I'm inclined to agree to Bharath, but rather I imagine of
an option to explicitly specify how to determine the start position.

--start-source=[server,wal,slot]  specify starting-LSN source, default is
                     trying all of them in the order of wal, slot, server. 

I don't think the option doesn't need to accept multiple values at once.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: pg_receivewal starting position

From
Bharath Rupireddy
Date:
On Thu, Sep 2, 2021 at 11:15 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> At Wed, 1 Sep 2021 10:30:05 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in
> > On Mon, Aug 30, 2021 at 3:26 PM Ronan Dunklau <ronan.dunklau@aiven.io> wrote:
> > > > 7) How about we let pg_receivewal use READ_REPLICATION_SLOT as an option?
> > >
> > > From my point of view, I already expected it to use something like that when
> > > using a replication slot. Maybe an option to turn it off could be useful ?
> >
> > IMO, pg_receivewal should have a way to turn off/on using
> > READ_REPLICATION_SLOT. Imagine if the postgres server doesn't support
> > READ_REPLICATION_SLOT (a lower version) but for some reasons the
> > pg_receivewal(running separately) is upgraded to the version that uses
> > READ_REPLICATION_SLOT, knowing that the server doesn't support
> > READ_REPLICATION_SLOT why should user let pg_receivewal run an
> > unnecessary code?
>
> If I read the patch correctly the situation above is warned by the
> following message then continue to the next step giving up to identify
> start position from slot data.
>
> > "server does not suport fetching the slot's position, resuming from the current server position instead"
>
> (The message looks a bit too long, though..)
>
> However, if the operator doesn't know the server is old, pg_receivewal
> starts streaming from unexpected position, which is a kind of
> disaster. So I'm inclined to agree to Bharath, but rather I imagine of
> an option to explicitly specify how to determine the start position.
>
> --start-source=[server,wal,slot]  specify starting-LSN source, default is
>                      trying all of them in the order of wal, slot, server.
>
> I don't think the option doesn't need to accept multiple values at once.

If --start-source = 'wal' fails, then pg_receivewal should show up an
error saying "cannot find start position from <<user-specified-wal>>
directory, try with "server" or "slot" for --start-source". We might
end having similar errors for other options as well. Isn't this going
to create unnecessary complexity?

The existing way the pg_receivewal fetches start pos i.e. first from
wal directory and then from server start position, isn't known to the
user at all, no verbose message or anything specified in the docs. Why
do we need to expose this with the --start-source option? IMO, we can
keep it that way and we can just have a way to turn off the new
behaviour that we are proposing here, i.e.fetching the start position
from the slot's restart_lsn.

Regards,
Bharath Rupireddy.



Re: pg_receivewal starting position

From
Ronan Dunklau
Date:
Le jeudi 2 septembre 2021, 08:42:22 CEST Bharath Rupireddy a écrit :
> On Thu, Sep 2, 2021 at 11:15 AM Kyotaro Horiguchi
>
> <horikyota.ntt@gmail.com> wrote:
> > At Wed, 1 Sep 2021 10:30:05 +0530, Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote in>
> > > On Mon, Aug 30, 2021 at 3:26 PM Ronan Dunklau <ronan.dunklau@aiven.io>
wrote:
> > > > > 7) How about we let pg_receivewal use READ_REPLICATION_SLOT as an
> > > > > option?
> > > >
> > > > From my point of view, I already expected it to use something like
> > > > that when using a replication slot. Maybe an option to turn it off
> > > > could be useful ?> >
> > > IMO, pg_receivewal should have a way to turn off/on using
> > > READ_REPLICATION_SLOT. Imagine if the postgres server doesn't support
> > > READ_REPLICATION_SLOT (a lower version) but for some reasons the
> > > pg_receivewal(running separately) is upgraded to the version that uses
> > > READ_REPLICATION_SLOT, knowing that the server doesn't support
> > > READ_REPLICATION_SLOT why should user let pg_receivewal run an
> > > unnecessary code?
> >
> > If I read the patch correctly the situation above is warned by the
> > following message then continue to the next step giving up to identify
> > start position from slot data.
> >
> > > "server does not suport fetching the slot's position, resuming from the
> > > current server position instead">
> > (The message looks a bit too long, though..)
> >
> > However, if the operator doesn't know the server is old, pg_receivewal
> > starts streaming from unexpected position, which is a kind of
> > disaster. So I'm inclined to agree to Bharath, but rather I imagine of
> > an option to explicitly specify how to determine the start position.
> >
> > --start-source=[server,wal,slot]  specify starting-LSN source, default is
> >
> >                      trying all of them in the order of wal, slot, server.
> >
> > I don't think the option doesn't need to accept multiple values at once.
>
> If --start-source = 'wal' fails, then pg_receivewal should show up an
> error saying "cannot find start position from <<user-specified-wal>>
> directory, try with "server" or "slot" for --start-source". We might
> end having similar errors for other options as well. Isn't this going
> to create unnecessary complexity?
>
> The existing way the pg_receivewal fetches start pos i.e. first from
> wal directory and then from server start position, isn't known to the
> user at all, no verbose message or anything specified in the docs. Why
> do we need to expose this with the --start-source option? IMO, we can
> keep it that way and we can just have a way to turn off the new
> behaviour that we are proposing here, i.e.fetching the start position
> from the slot's restart_lsn.

Then it should probably be documented. We write in the docs that it is
strongly recommended to use a replication slot, but do not mention how we
resume from have been already processed.

If someone really cares about having control over how the start position is
defined instead of relying on the auto detection, it would be wiser to add a --
startpos parameter similar to the endpos one, which would override everything
else, instead of different flags for different behaviours.

Regards,

--
Ronan Dunklau





Re: pg_receivewal starting position

From
Ronan Dunklau
Date:
Le jeudi 2 septembre 2021, 09:28:29 CEST Michael Paquier a écrit :
> On Thu, Sep 02, 2021 at 02:45:54PM +0900, Kyotaro Horiguchi wrote:
> > At Wed, 1 Sep 2021 10:30:05 +0530, Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote in If I read the patch
> > correctly the situation above is warned by the following message then
> > continue to the next step giving up to identify start position from slot
> > data.
>
> Better to fallback to the past behavior if attempting to use a
> pg_receivewal >= 15 with a PG instance older than 14.
>
> >> "server does not suport fetching the slot's position, resuming from the
> >> current server position instead">
> > (The message looks a bit too long, though..)
>
> Agreed.  Falling back to a warning is not the best answer we can have
> here, as there could be various failure types, and for some of them a
> hard failure is adapted;
> - Failure in the backend while running READ_REPLICATION_SLOT.  This
> should imply a hard failure, no?
> - Slot that does not exist.  In this case, we could fall back to the
> current write position of the server.
>
> by default if the slot information cannot be retrieved.
> Something that's disturbing me in patch 0002 is that we would ignore
> the results of GetSlotInformation() if any error happens, even if
> there is a problem in the backend, like an OOM.  We should be careful
> about the semantics here.

Ok !

>
> > However, if the operator doesn't know the server is old, pg_receivewal
> > starts streaming from unexpected position, which is a kind of
> > disaster. So I'm inclined to agree to Bharath, but rather I imagine of
> > an option to explicitly specify how to determine the start position.
> >
> > --start-source=[server,wal,slot]  specify starting-LSN source, default is
> >
> >                      trying all of them in the order of wal, slot, server.
> >
> > I don't think the option doesn't need to accept multiple values at once.
>
> What is the difference between "wal" and "server"?  "wal" stands for
> the start position of the set of files stored in the location
> directory, and "server" is the location that we'd receive from the
> server?  I don't think that we need that because, when using a slot,
> we know that we can rely on the LSN that the slot retains for
> pg_receivewal as that should be the same point as what has been
> streamed last.  Could there be an argument instead for changing the
> default and rely on the slot information rather than scanning the
> local WAL archive path for the start point when using --slot?  When
> using pg_receivewal as a service, relying on a scan of the WAL archive
> directory if there is no slot and fallback to an invalid LSN if there
> is nothing is fine by me, but I think that just relying on the slot
> information is saner as the backend makes sure that nothing is
> missing.  That's also more useful when streaming changes from a single
> slot from multiple locations (stream to location 1 with a slot, stop
> pg_receivewal, stream to location 2 that completes 1 with the same
> slot).

One benefit I see from first trying to get it from the local WAL stream is that
we may end up in a state where it has been flushed to disk but we couldn't
advance the replication slot. In that case it is better to resume from the
point on disk. Maybe taking the max(slot_lsn, local_file_lsn) would work best
for the use case you're describing.

>
> +        pg_log_error("Slot \"%s\" is not a physical replication slot",
> +                     replication_slot);
> In 0003, the format of this error is not really project-like.
> Something like that perhaps would be more adapted:
> "cannot use the slot provided, physical slot expected."
>
> I am not really convinced about the need of getting the active state
> and the PID used in the backend when fetcing the slot data,
> particularly if that's just for some frontend-side checks.  The
> backend has safeguards already for all that.

I could see the use for sending active_pid for use within pg_basebackup: at
least we could fail early if the slot is already in use. But at the same time,
maybe it won't be in use anymore once we need it.

>
> While looking at that, I have applied de1d4fe to add
> PostgresNode::command_fails_like(), coming from 0003, and put my hands
> on 0001 as per the attached, as the starting point.  That basically
> comes down to all the points raised upthread, plus some tweaks for
> things I bumped into to get the semantics of the command to what looks
> like the right shape.

Thanks, I was about to send a new patchset with basically the same thing. It
would be nice to know we work on the same thing concurrently in the future to
avoid duplicate efforts. I'll rebase and send the updated version for patches
0002 and 0003 of my original proposal once we reach an agreement over the
behaviour / options of pg_receivewal.

Also considering the number of different fields to be filled by the
GetSlotInformation function, my local branch groups them into a dedicated
struct which is more convenient than having X possibly null arguments.

--
Ronan Dunklau





Re: pg_receivewal starting position

From
Ronan Dunklau
Date:
Le jeudi 2 septembre 2021, 10:37:20 CEST Michael Paquier a écrit :
> On Thu, Sep 02, 2021 at 10:08:26AM +0200, Ronan Dunklau wrote:
> > I could see the use for sending active_pid for use within pg_basebackup:
> > at
> > least we could fail early if the slot is already in use. But at the same
> > time, maybe it won't be in use anymore once we need it.
>
> There is no real concurrent protection with this design.  You could
> read that the slot is not active during READ_REPLICATION_SLOT just to
> find out after in the process of pg_basebackup streaming WAL that it
> became in use in-between.  And the backend-side protections would kick
> in at this stage.
>
> Hmm.  The logic doing the decision-making with pg_receivewal may
> become more tricky when it comes to pg_replication_slots.wal_status,
> max_slot_wal_keep_size and pg_replication_slots.safe_wal_size.  The
> number of cases we'd like to consider impacts directly the amount of
> data send through READ_REPLICATION_SLOT.  That's not really different
> than deciding of a failure, a success or a retry with active_pid at an
> earlier or a later stage of a base backup.  pg_receivewal, on the
> contrary, can just rely on what the backend tells when it begins
> streaming.  So I'd prefer keeping things simple and limit the number
> of fields a minimum for this command.

Ok. Please find attached new patches rebased on master.*

0001 is yours without any modification.

0002 for pg_receivewal tried to simplify the logic of information to return,
by using a dedicated struct for this. This accounts for Bahrath's demands to
return every possible field.
In particular, an is_logical field simplifies the detection of the type of slot.
In my opinion it makes sense to simplify it like this on the client side while
being more open-minded on the server side if we ever need to provide a new
type of slot. Also, GetSlotInformation now returns an enum to be able to
handle the different modes of failures, which differ between pg_receivewal and
pg_basebackup.

0003 is the pg_basebackup one, not much changed except for the concerns you
had about the log message and handling of different failure modes.

There is still the concern raised by Bharath about being able to select the
way to fetch the replication slot information for the user, and what should or
should not be included in the documentation. I am in favor of documenting the
process of selecting the wal start, and maybe include a --start-lsn option for
the user to override it, but that's maybe for another patch.

--
Ronan Dunklau
Attachment

Re: pg_receivewal starting position

From
Bharath Rupireddy
Date:
On Fri, Sep 3, 2021 at 3:28 PM Ronan Dunklau <ronan.dunklau@aiven.io> wrote:
> There is still the concern raised by Bharath about being able to select the
> way to fetch the replication slot information for the user, and what should or
> should not be included in the documentation. I am in favor of documenting the
> process of selecting the wal start, and maybe include a --start-lsn option for
> the user to override it, but that's maybe for another patch.

Let's hear from others.

Thanks for the patches. I have some quick comments on the v5 patch-set:

0001:
1) Do you also want to MemSet values too in ReadReplicationSlot?

2) When if clause has single statement we don't generally use "{" and "}"
+ if (slot == NULL || !slot->in_use)
+ {
+ LWLockRelease(ReplicationSlotControlLock);
+ }
you can just have:
+ if (slot == NULL || !slot->in_use)
+ LWLockRelease(ReplicationSlotControlLock);

3) This is still not copying the slot contents, as I said earlier you
can just take the required info into some local variables instead of
taking the slot pointer to slot_contents pointer.
+ /* Copy slot contents while holding spinlock */
+ SpinLockAcquire(&slot->mutex);
+ slot_contents = *slot;
+ SpinLockRelease(&slot->mutex);
+ LWLockRelease(ReplicationSlotControlLock);

4) As I said earlier, why can't we have separate tli variables for
more readability instead of just one slots_position_timeline and
timeline_history variable? And you are not even resetting those 2
variables after if
(!XLogRecPtrIsInvalid(slot_contents.data.restart_lsn)), you might end
up sending the restart_lsn timelineid for confirmed_flush. So, better
use two separate variables. In fact you can use block local variables:
+ if (!XLogRecPtrIsInvalid(slot_contents.data.restart_lsn))
+ {
+ List    *tl_history= NIL;
+ TimeLineID tli;
+ tl_history= readTimeLineHistory(ThisTimeLineID);
+ tli = tliOfPointInHistory(slot_contents.data.restart_lsn,
+   tl_history);
+ values[i] = Int32GetDatum(tli);
+ nulls[i] = false;
+ }
similarly for confirmed_flush.

5) I still don't see the need for below test case:
+ "READ_REPLICATION_SLOT does not produce an error with non existent slot");
when we have
+ "READ_REPLICATION_SLOT does not produce an error with dropped slot");
Because for a user, dropped or non existent slot is same, it's just
that for dropped slot we internally don't delete its entry from the
shared memory.

0002:
1) Imagine GetSlotInformation always returns
READ_REPLICATION_SLOT_ERROR, don't you think StreamLog enters an
infinite loop there? Instead, why don't you just exit(1); instead of
return; and retry? Similarly for READ_REPLICATION_SLOT_NONEXISTENT? I
think, you can just do exit(1), no need to retry.
+ case READ_REPLICATION_SLOT_ERROR:
+
+ /*
+ * Error has been logged by GetSlotInformation, return and
+ * maybe retry
+ */
+ return;

2) Why is it returning READ_REPLICATION_SLOT_OK when slot_info isn't
passed? And why are you having this check after you connect to the
server and fetch the data?
+ /* If no slotinformation has been passed, we can return immediately */
+ if (slot_info == NULL)
+ {
+ PQclear(res);
+ return READ_REPLICATION_SLOT_OK;
+ }
Instead you can just have a single assert:

+ Assert(slot_name && slot_info );

3) How about
pg_log_error("could not read replication slot:
instead of
pg_log_error("could not fetch replication slot:

4) Why are you having the READ_REPLICATION_SLOT_OK case in default?
+ default:
+ if (slot_info.is_logical)
+ {
+ /*
+ * If the slot is not physical we can't expect to
+ * recover from that
+ */
+ pg_log_error("cannot use the slot provided, physical slot expected.");
+ exit(1);
+ }
+ stream.startpos = slot_info.restart_lsn;
+ stream.timeline = slot_info.restart_tli;
+ }
You can just have another case statement for READ_REPLICATION_SLOT_OK
and in the default you can throw an error "unknown read replication
slot status" or some other better working and exit(1);

5) Do you want initialize slot_info to 0?
+ if (replication_slot)
+ {
+ SlotInformation slot_info;
+ MemSet(slot_info, 0, sizeof(SlotInformation));

6) This isn't readable:
+ slot_info->is_logical = strcmp(PQgetvalue(res, 0, 0), "logical") == 0;
How about:
if (strcmp(PQgetvalue(res, 0, 0), "logical") == 0)
   slot_info->is_logical = true;
You don't need to set it false, because you would have
MemSet(slot_info) in the caller.

7) How about
uint32 hi;
uint32 lo;
instead of
+ uint32 hi,
+ lo;

8) Move  SlotInformation * slot_info) to the next line as it crosses
the 80 char limit.
+GetSlotInformation(PGconn *conn, const char *slot_name,
SlotInformation * slot_info)

9) Instead of a boolean is_logical, I would rather suggest to use an
enum or #define macros the slot types correctly, because someday we
might introduce new type slots and somebody wants is_physical kind of
variable name?
+typedef struct SlotInformation {
+ bool    is_logical;

Regards,
Bharath Rupireddy.



Re: pg_receivewal starting position

From
Ronan Dunklau
Date:
Le lundi 6 septembre 2021, 06:22:30 CEST Michael Paquier a écrit :
> On Fri, Sep 03, 2021 at 11:58:27AM +0200, Ronan Dunklau wrote:
> > 0002 for pg_receivewal tried to simplify the logic of information to
> > return, by using a dedicated struct for this. This accounts for Bahrath's
> > demands to return every possible field.
> > In particular, an is_logical field simplifies the detection of the type of
> > slot. In my opinion it makes sense to simplify it like this on the client
> > side while being more open-minded on the server side if we ever need to
> > provide a new type of slot. Also, GetSlotInformation now returns an enum
> > to be able to handle the different modes of failures, which differ
> > between pg_receivewal and pg_basebackup.
>
> +   if (PQserverVersion(conn) < 150000)
> +       return READ_REPLICATION_SLOT_UNSUPPORTED;
> [...]
> +typedef enum {
> +   READ_REPLICATION_SLOT_OK,
> +   READ_REPLICATION_SLOT_UNSUPPORTED,
> +   READ_REPLICATION_SLOT_ERROR,
> +   READ_REPLICATION_SLOT_NONEXISTENT
> +} ReadReplicationSlotStatus;
>
> Do you really need this much complication?  We could treat the
> unsupported case and the non-existing case the same way: we don't know
> so just assign InvalidXlogRecPtr or NULL to the fields of the
> structure, and make GetSlotInformation() return false just on error,
> with some pg_log_error() where adapted in its internals.

I actually started with the implementation you propose, but changed my mind
while writing it because I realised it's easier to reason about like this,
instead of failing silently during READ_REPLICATION_SLOT to fail a bit later
when actually trying to start the replication slot because it doesn't exists.
Either the user expects the replication slot to exists, and in this case we
should retry the whole loop in the hope of getting an interesting LSN, or the
user doesn't and shouldn't even pass a replication_slot name to begin with.

>
> > There is still the concern raised by Bharath about being able to select
> > the
> > way to fetch the replication slot information for the user, and what
> > should or should not be included in the documentation. I am in favor of
> > documenting the process of selecting the wal start, and maybe include a
> > --start-lsn option for the user to override it, but that's maybe for
> > another patch.
>
> The behavior of pg_receivewal that you are implementing should be
> documented.  We don't say either how the start location is selected
> when working on an existing directory, so I would recommend to add a
> paragraph in the description section to detail all that, as of:
> - First a scan of the existing archives is done.
> - If nothing is found, and if there is a slot, request the slot
> information.
> - If still nothing (aka slot undefined, or command not supported), use
> the last flush location.

Sounds good, I will add another patch for the documentation of this.

>
> As a whole, I am not really convinced that we need a new option for
> that as long as we rely on a slot with pg_receivewal as these are used
> to make sure that we avoid holes in the WAL archives.
>
> Regarding pg_basebackup, Daniel has proposed a couple of days ago a
> different solution to trap errors earlier, which would cover the case
> dealt with here:
> https://www.postgresql.org/message-id/0F69E282-97F9-4DB7-8D6D-F927AA6340C8@y
> esql.se

I will take a look.

> We should not mimic in the frontend errors that are safely trapped
> in the backend with the proper locks, in any case.

I don't understand what you mean by this ? My original proposal was for the
command to actually attach to the replication slot while reading it, thus
keeping a lock on it to prevent dropping or concurrent access on the server.
>
> While on it, READ_REPLICATION_SLOT returns a confirmed LSN when
> grabbing the data of a logical slot.  We are not going to use that
> with pg_recvlogical as by default START_STREAMING 0/0 would just use
> the confirmed LSN.  Do we have use cases where this information would
> help?  There is the argument of consistency with physical slots and
> that this can be helpful to do sanity checks for clients, but that's
> rather thin.

If we don't, we should rename the command to READ_PHYSICAL_REPLICATION_SLOT
and error out on the server if the slot is not actually a physical one to
spare the client from performing those checks. I still think it's better to
support both cases as opposed to having two completely different APIs
(READ_(PHYSICAL)_REPLICATION_SLOT for physical ones on a replication
connection, pg_replication_slots view for logical ones) as it would enable
more for third-party clients at a relatively low maintenance cost for us.

--
Ronan Dunklau





Re: pg_receivewal starting position

From
Bharath Rupireddy
Date:
On Mon, Sep 6, 2021 at 12:20 PM Ronan Dunklau <ronan.dunklau@aiven.io> wrote:
>
> Le lundi 6 septembre 2021, 06:22:30 CEST Michael Paquier a écrit :
> > On Fri, Sep 03, 2021 at 11:58:27AM +0200, Ronan Dunklau wrote:
> > > 0002 for pg_receivewal tried to simplify the logic of information to
> > > return, by using a dedicated struct for this. This accounts for Bahrath's
> > > demands to return every possible field.
> > > In particular, an is_logical field simplifies the detection of the type of
> > > slot. In my opinion it makes sense to simplify it like this on the client
> > > side while being more open-minded on the server side if we ever need to
> > > provide a new type of slot. Also, GetSlotInformation now returns an enum
> > > to be able to handle the different modes of failures, which differ
> > > between pg_receivewal and pg_basebackup.
> >
> > +   if (PQserverVersion(conn) < 150000)
> > +       return READ_REPLICATION_SLOT_UNSUPPORTED;
> > [...]
> > +typedef enum {
> > +   READ_REPLICATION_SLOT_OK,
> > +   READ_REPLICATION_SLOT_UNSUPPORTED,
> > +   READ_REPLICATION_SLOT_ERROR,
> > +   READ_REPLICATION_SLOT_NONEXISTENT
> > +} ReadReplicationSlotStatus;
> >
> > Do you really need this much complication?  We could treat the
> > unsupported case and the non-existing case the same way: we don't know
> > so just assign InvalidXlogRecPtr or NULL to the fields of the
> > structure, and make GetSlotInformation() return false just on error,
> > with some pg_log_error() where adapted in its internals.
>
> I actually started with the implementation you propose, but changed my mind
> while writing it because I realised it's easier to reason about like this,
> instead of failing silently during READ_REPLICATION_SLOT to fail a bit later
> when actually trying to start the replication slot because it doesn't exists.
> Either the user expects the replication slot to exists, and in this case we
> should retry the whole loop in the hope of getting an interesting LSN, or the
> user doesn't and shouldn't even pass a replication_slot name to begin with.

I don't think so we need to retry the whole loop if the
READ_REPLICATION_SLOT command fails as pg_receivewal might enter an
infinite loop there. IMO, we should just exit(1) if
READ_REPLICATION_SLOT fails.

> > > There is still the concern raised by Bharath about being able to select
> > > the
> > > way to fetch the replication slot information for the user, and what
> > > should or should not be included in the documentation. I am in favor of
> > > documenting the process of selecting the wal start, and maybe include a
> > > --start-lsn option for the user to override it, but that's maybe for
> > > another patch.
> >
> > The behavior of pg_receivewal that you are implementing should be
> > documented.  We don't say either how the start location is selected
> > when working on an existing directory, so I would recommend to add a
> > paragraph in the description section to detail all that, as of:
> > - First a scan of the existing archives is done.
> > - If nothing is found, and if there is a slot, request the slot
> > information.
> > - If still nothing (aka slot undefined, or command not supported), use
> > the last flush location.
>
> Sounds good, I will add another patch for the documentation of this.

+1.

> > While on it, READ_REPLICATION_SLOT returns a confirmed LSN when
> > grabbing the data of a logical slot.  We are not going to use that
> > with pg_recvlogical as by default START_STREAMING 0/0 would just use
> > the confirmed LSN.  Do we have use cases where this information would
> > help?  There is the argument of consistency with physical slots and
> > that this can be helpful to do sanity checks for clients, but that's
> > rather thin.
>
> If we don't, we should rename the command to READ_PHYSICAL_REPLICATION_SLOT
> and error out on the server if the slot is not actually a physical one to
> spare the client from performing those checks. I still think it's better to
> support both cases as opposed to having two completely different APIs
> (READ_(PHYSICAL)_REPLICATION_SLOT for physical ones on a replication
> connection, pg_replication_slots view for logical ones) as it would enable
> more for third-party clients at a relatively low maintenance cost for us.

-1 for READ_PHYSICAL_REPLICATION_SLOT or failing on the server. What
happens if we have another slot type "PHYSIOLOGICAL" or "FOO" or "BAR"
some other? IMO, READ_REPLICATION_SLOT should just return info of all
slots. The clients know better how to deal with the slot type.
Although, we don't have a use case for logical slots with the
READ_REPLICATION_SLOT command, let's not change it.

If others are still concerned about the unnecessary slot being
returned, you might consider passing in a parameter to
READ_REPLICATION_SLOT command, something like below. But this too
looks complex to me. I would vote for what the existing patch does
with READ_REPLICATION_SLOT.
READ_REPLICATION_SLOT 'slot_name' 'physical'; only returns physical
slot info with the given name.
READ_REPLICATION_SLOT 'slot_name' 'logical'; only returns logical slot
with the given name.

Regards,
Bharath Rupireddy.



Re: pg_receivewal starting position

From
Ronan Dunklau
Date:
Le vendredi 3 septembre 2021 17:49:34 CEST, vous avez écrit :
> On Fri, Sep 3, 2021 at 3:28 PM Ronan Dunklau <ronan.dunklau@aiven.io> wrote:
> > There is still the concern raised by Bharath about being able to select
> > the
> > way to fetch the replication slot information for the user, and what
> > should or should not be included in the documentation. I am in favor of
> > documenting the process of selecting the wal start, and maybe include a
> > --start-lsn option for the user to override it, but that's maybe for
> > another patch.
>
> Let's hear from others.
>
> Thanks for the patches. I have some quick comments on the v5 patch-set:
>
> 0001:
> 1) Do you also want to MemSet values too in ReadReplicationSlot?
>
> 2) When if clause has single statement we don't generally use "{" and "}"
> + if (slot == NULL || !slot->in_use)
> + {
> + LWLockRelease(ReplicationSlotControlLock);
> + }
> you can just have:
> + if (slot == NULL || !slot->in_use)
> + LWLockRelease(ReplicationSlotControlLock);
>
> 3) This is still not copying the slot contents, as I said earlier you
> can just take the required info into some local variables instead of
> taking the slot pointer to slot_contents pointer.
> + /* Copy slot contents while holding spinlock */
> + SpinLockAcquire(&slot->mutex);
> + slot_contents = *slot;
> + SpinLockRelease(&slot->mutex);
> + LWLockRelease(ReplicationSlotControlLock);
>
> 4) As I said earlier, why can't we have separate tli variables for
> more readability instead of just one slots_position_timeline and
> timeline_history variable? And you are not even resetting those 2
> variables after if
> (!XLogRecPtrIsInvalid(slot_contents.data.restart_lsn)), you might end
> up sending the restart_lsn timelineid for confirmed_flush. So, better
> use two separate variables. In fact you can use block local variables:
> + if (!XLogRecPtrIsInvalid(slot_contents.data.restart_lsn))
> + {
> + List    *tl_history= NIL;
> + TimeLineID tli;
> + tl_history= readTimeLineHistory(ThisTimeLineID);
> + tli = tliOfPointInHistory(slot_contents.data.restart_lsn,
> +   tl_history);
> + values[i] = Int32GetDatum(tli);
> + nulls[i] = false;
> + }
> similarly for confirmed_flush.
>
> 5) I still don't see the need for below test case:
> + "READ_REPLICATION_SLOT does not produce an error with non existent slot");
> when we have
> + "READ_REPLICATION_SLOT does not produce an error with dropped slot");
> Because for a user, dropped or non existent slot is same, it's just
> that for dropped slot we internally don't delete its entry from the
> shared memory.
>

Thank you for reiterating those concerns. As I said, I haven't touched
Michael's version of the patch. I was incorporating those changes in my branch
before he sent this, so I'll probably merge both before sending an updated
patch.

> 0002:
> 1) Imagine GetSlotInformation always returns
> READ_REPLICATION_SLOT_ERROR, don't you think StreamLog enters an
> infinite loop there? Instead, why don't you just exit(1); instead of
> return; and retry? Similarly for READ_REPLICATION_SLOT_NONEXISTENT? I
> think, you can just do exit(1), no need to retry.
> + case READ_REPLICATION_SLOT_ERROR:
> +
> + /*
> + * Error has been logged by GetSlotInformation, return and
> + * maybe retry
> + */
> + return;

This is the same behaviour we had before: if there is an error with
pg_receivewal we retry the command. There was no special case for the
replication slot not existing before, I don't see why we should change it now
?

Eg:

2021-09-06 09:11:07.774 CEST [95853] ERROR:  replication slot "nonexistent"
does not exist
2021-09-06 09:11:07.774 CEST [95853] STATEMENT:  START_REPLICATION SLOT
"nonexistent" 0/1000000 TIMELINE 1
pg_receivewal: error: could not send replication command "START_REPLICATION":
ERROR:  replication slot "nonexistent" does not exist
pg_receivewal: disconnected; waiting 5 seconds to try again

Users may rely on it to keep retrying in the background until the slot has
been created for example.

>
> 2) Why is it returning READ_REPLICATION_SLOT_OK when slot_info isn't
> passed? And why are you having this check after you connect to the
> server and fetch the data?
> + /* If no slotinformation has been passed, we can return immediately */
> + if (slot_info == NULL)
> + {
> + PQclear(res);
> + return READ_REPLICATION_SLOT_OK;
> + }
> Instead you can just have a single assert:
>
> + Assert(slot_name && slot_info );

At first it was so that we didn't have to fill in all required information if we
don't need too, but it turns out pg_basebackup also has to check for the
slot's type. I agree we should not support the null slot_info case anymore.

>
> 3) How about
> pg_log_error("could not read replication slot:
> instead of
> pg_log_error("could not fetch replication slot:

Ok.
>
> 4) Why are you having the READ_REPLICATION_SLOT_OK case in default?
> + default:
> + if (slot_info.is_logical)
> + {
> + /*
> + * If the slot is not physical we can't expect to
> + * recover from that
> + */
> + pg_log_error("cannot use the slot provided, physical slot expected.");
> + exit(1);
> + }
> + stream.startpos = slot_info.restart_lsn;
> + stream.timeline = slot_info.restart_tli;
> + }
> You can just have another case statement for READ_REPLICATION_SLOT_OK
> and in the default you can throw an error "unknown read replication
> slot status" or some other better working and exit(1);

Ok.

>
> 5) Do you want initialize slot_info to 0?
> + if (replication_slot)
> + {
> + SlotInformation slot_info;
> + MemSet(slot_info, 0, sizeof(SlotInformation));
>
> 6) This isn't readable:
> + slot_info->is_logical = strcmp(PQgetvalue(res, 0, 0), "logical") == 0;
> How about:
> if (strcmp(PQgetvalue(res, 0, 0), "logical") == 0)
>    slot_info->is_logical = true;
> You don't need to set it false, because you would have
> MemSet(slot_info) in the caller.
>

Isn't it preferable to fill in the whole struct without any regards to it's
prior state ? I will modify the is_logical assignment to make it more readable
though.


> 7) How about
> uint32 hi;
> uint32 lo;
> instead of
> + uint32 hi,
> + lo;
>

Ok.

> 8) Move  SlotInformation * slot_info) to the next line as it crosses
> the 80 char limit.
> +GetSlotInformation(PGconn *conn, const char *slot_name,
> SlotInformation * slot_info)

Ok.

>
> 9) Instead of a boolean is_logical, I would rather suggest to use an
> enum or #define macros the slot types correctly, because someday we
> might introduce new type slots and somebody wants is_physical kind of
> variable name?
> +typedef struct SlotInformation {
> + bool    is_logical;

That's the reason why the READ_REPLICATION_SLOT command returns a slot type as
a string, but the intermediate representation on the client doesn't really
need to be concerned about this: it's the client after all and it will be very
easy to change it locally if we need to. Thinking about it, for our use case
we should really use is_physical instead of is_logical. But I guess all of
this is moot if we end up deciding not to support the logical case anymore ?

Best regards,

--
Ronan Dunklau





Re: pg_receivewal starting position

From
Michael Paquier
Date:
On Mon, Sep 06, 2021 at 12:37:29PM +0530, Bharath Rupireddy wrote:
> -1 for READ_PHYSICAL_REPLICATION_SLOT or failing on the server. What
> happens if we have another slot type "PHYSIOLOGICAL" or "FOO" or "BAR"
> some other? IMO, READ_REPLICATION_SLOT should just return info of all
> slots. The clients know better how to deal with the slot type.
> Although, we don't have a use case for logical slots with the
> READ_REPLICATION_SLOT command, let's not change it.

Using READ_REPLICATION_SLOT as the command name is fine, and it could
be extended with more fields if necessary, implemented now with only
what we think is useful.  Returning errors on cases that are still not
supported yet is fine, say for logical slots if we decide to reject
the case for now, and testrictions can always be lifted in the
future.
--
Michael

Attachment

Re: pg_receivewal starting position

From
Michael Paquier
Date:
On Mon, Sep 06, 2021 at 04:17:28PM +0900, Michael Paquier wrote:
> Using READ_REPLICATION_SLOT as the command name is fine, and it could
> be extended with more fields if necessary, implemented now with only
> what we think is useful.  Returning errors on cases that are still not
> supported yet is fine, say for logical slots if we decide to reject
> the case for now, and testrictions can always be lifted in the
> future.

And marked as RwF as this was three weeks ago.  Please feel free to
register a new entry if this is being worked on.
--
Michael

Attachment

Re: pg_receivewal starting position

From
Ronan Dunklau
Date:
Hello,

Following recommendations, I stripped most of the features from the patch. For
now we support only physical replication slots, and only provide the two fields
of interest (restart_lsn, restart_tli) in addition to the slot type (fixed at
physical) to not paint ourselves in a corner.

I also removed the part about pg_basebackup since other fixes have been
proposed for that case.

Le vendredi 1 octobre 2021, 09:05:18 CEST Michael Paquier a écrit :
> On Mon, Sep 06, 2021 at 04:17:28PM +0900, Michael Paquier wrote:
> > Using READ_REPLICATION_SLOT as the command name is fine, and it could
> > be extended with more fields if necessary, implemented now with only
> > what we think is useful.  Returning errors on cases that are still not
> > supported yet is fine, say for logical slots if we decide to reject
> > the case for now, and testrictions can always be lifted in the
> > future.
>
> And marked as RwF as this was three weeks ago.  Please feel free to
> register a new entry if this is being worked on.
> --
> Michael


--
Ronan Dunklau
Attachment

Re: pg_receivewal starting position

From
Michael Paquier
Date:
On Tue, Oct 19, 2021 at 05:32:55PM +0200, Ronan Dunklau wrote:
> Following recommendations, I stripped most of the features from the patch. For
> now we support only physical replication slots, and only provide the two fields
> of interest (restart_lsn, restart_tli) in addition to the slot type (fixed at
> physical) to not paint ourselves in a corner.
>
> I also removed the part about pg_basebackup since other fixes have been
> proposed for that case.

Patch 0001 looks rather clean.  I have a couple of comments.

+       if (OidIsValid(slot_contents.data.database))
+           elog(ERROR, "READ_REPLICATION_SLOT is only supported for physical slots");

elog() can only be used for internal errors.  Errors that can be
triggered by a user should use ereport() instead.

+ok($stdout eq '||',
+   "READ_REPLICATION_SLOT returns NULL values if slot does not exist");
[...]
+ok($stdout =~ 'physical\|[^|]*\|1',
+   "READ_REPLICATION_SLOT returns tuple corresponding to the slot");
Isn't result pattern matching something we usually test with like()?

+($ret, $stdout, $stderr) = $node_primary->psql(
+   'postgres',
+   "READ_REPLICATION_SLOT $slotname;",
+   extra_params => [ '-d', $connstr_rep ]);
No need for extra_params in this test.  You can just pass down
"replication => 1" instead, no?

--- a/src/test/recovery/t/006_logical_decoding.pl
+++ b/src/test/recovery/t/006_logical_decoding.pl
[...]
+ok($stderr=~ 'READ_REPLICATION_SLOT is only supported for physical slots',
+   'Logical replication slot is not supported');
This one should use like().

+        <para>
+        The slot's <literal>restart_lsn</literal> can also beused as a starting
+        point if the target directory is empty.
+        </para>
I am not sure that there is a need for this addition as the same thing
is said when describing the lookup ordering.

+      If nothing is found and a slot is specified, use the
+      <command>READ_REPLICATION_SLOT</command>
+      command.
It may be clearer to say that the position is retrieved from the
command.

+bool
+GetSlotInformation(PGconn *conn, const char *slot_name, XLogRecPtr
*restart_lsn, TimeLineID* restart_tli)
+{
Could you extend that so as we still run the command but don't crash
if the caller specifies NULL for any of the result fields?  This would
be handy.

+   if (PQgetisnull(res, 0, 0))
+   {
+       PQclear(res);
+       pg_log_error("replication slot \"%s\" does not exist",
slot_name);
+       return false;
+   }
+   if (PQntuples(res) != 1 || PQnfields(res) < 3)
+   {
+       pg_log_error("could not fetch replication slot: got %d rows
and %d fields, expected %d rows and %d or more fields",
+                    PQntuples(res), PQnfields(res), 1, 3);
+       PQclear(res);
+       return false;
+   }
Wouldn't it be better to reverse the order of these two checks?

I don't mind the addition of the slot type being part of the result of
READ_REPLICATION_SLOT even if it is not mandatory (?), but at least
GetSlotInformation() should check after it.

+# Setup the slot, and connect to it a first time
+$primary->run_log(
+   [ 'pg_receivewal', '--slot', $slot_name, '--create-slot' ],
+   'creating a replication slot');
+$primary->psql('postgres',
+   'INSERT INTO test_table VALUES (generate_series(1,100));');
+$primary->psql('postgres', 'SELECT pg_switch_wal();');
+$nextlsn =
+  $primary->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
+chomp($nextlsn);
Wouldn't it be simpler to use CREATE_REPLICATION_SLOT with RESERVE_WAL
here, rather than going through pg_receivewal?  It seems to me that
this would be cheaper without really impacting the coverage.
--
Michael

Attachment

Re: pg_receivewal starting position

From
Ronan Dunklau
Date:
Le mercredi 20 octobre 2021, 07:13:15 CEST Michael Paquier a écrit :
> On Tue, Oct 19, 2021 at 05:32:55PM +0200, Ronan Dunklau wrote:
> > Following recommendations, I stripped most of the features from the patch.
> > For now we support only physical replication slots, and only provide the
> > two fields of interest (restart_lsn, restart_tli) in addition to the slot
> > type (fixed at physical) to not paint ourselves in a corner.
> >
> > I also removed the part about pg_basebackup since other fixes have been
> > proposed for that case.
>
> Patch 0001 looks rather clean.  I have a couple of comments.

Thank you for the quick review !


>
> +       if (OidIsValid(slot_contents.data.database))
> +           elog(ERROR, "READ_REPLICATION_SLOT is only supported for
> physical slots");
>
> elog() can only be used for internal errors.  Errors that can be
> triggered by a user should use ereport() instead.

Ok.
>
> +ok($stdout eq '||',
> +   "READ_REPLICATION_SLOT returns NULL values if slot does not exist");
> [...]
> +ok($stdout =~ 'physical\|[^|]*\|1',
> +   "READ_REPLICATION_SLOT returns tuple corresponding to the slot");
> Isn't result pattern matching something we usually test with like()?

Ok.
>
> +($ret, $stdout, $stderr) = $node_primary->psql(
> +   'postgres',
> +   "READ_REPLICATION_SLOT $slotname;",
> +   extra_params => [ '-d', $connstr_rep ]);
> No need for extra_params in this test.  You can just pass down
> "replication => 1" instead, no?

In that test file, every replication connection is obtained by using
connstr_rep so I thought it would be best to use the same thing.

>
> --- a/src/test/recovery/t/006_logical_decoding.pl
> +++ b/src/test/recovery/t/006_logical_decoding.pl
> [...]
> +ok($stderr=~ 'READ_REPLICATION_SLOT is only supported for physical slots',
> +   'Logical replication slot is not supported');
> This one should use like().

Ok.

>
> +        <para>
> +        The slot's <literal>restart_lsn</literal> can also beused as a
> starting +        point if the target directory is empty.
> +        </para>
> I am not sure that there is a need for this addition as the same thing
> is said when describing the lookup ordering.

Ok, removed.

>
> +      If nothing is found and a slot is specified, use the
> +      <command>READ_REPLICATION_SLOT</command>
> +      command.
> It may be clearer to say that the position is retrieved from the
> command.

Ok, done. The doc also uses the active voice here now.

>
> +bool
> +GetSlotInformation(PGconn *conn, const char *slot_name, XLogRecPtr
> *restart_lsn, TimeLineID* restart_tli)
> +{
> Could you extend that so as we still run the command but don't crash
> if the caller specifies NULL for any of the result fields?  This would
> be handy.

Done.

>
> +   if (PQgetisnull(res, 0, 0))
> +   {
> +       PQclear(res);
> +       pg_log_error("replication slot \"%s\" does not exist",
> slot_name);
> +       return false;
> +   }
> +   if (PQntuples(res) != 1 || PQnfields(res) < 3)
> +   {
> +       pg_log_error("could not fetch replication slot: got %d rows
> and %d fields, expected %d rows and %d or more fields",
> +                    PQntuples(res), PQnfields(res), 1, 3);
> +       PQclear(res);
> +       return false;
> +   }
> Wouldn't it be better to reverse the order of these two checks?

Yes it is, and the PQntuples condition should be removed from the first error
test.

>
> I don't mind the addition of the slot type being part of the result of
> READ_REPLICATION_SLOT even if it is not mandatory (?), but at least
> GetSlotInformation() should check after it.

Ok.

>
> +# Setup the slot, and connect to it a first time
> +$primary->run_log(
> +   [ 'pg_receivewal', '--slot', $slot_name, '--create-slot' ],
> +   'creating a replication slot');
> +$primary->psql('postgres',
> +   'INSERT INTO test_table VALUES (generate_series(1,100));');
> +$primary->psql('postgres', 'SELECT pg_switch_wal();');
> +$nextlsn =
> +  $primary->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
> +chomp($nextlsn);
> Wouldn't it be simpler to use CREATE_REPLICATION_SLOT with RESERVE_WAL
> here, rather than going through pg_receivewal?  It seems to me that
> this would be cheaper without really impacting the coverage.

You're right, we can skip two invocations of pg_receivewal like this (for the
slot creation + for starting the slot a first time).


--
Ronan Dunklau
Attachment

Re: pg_receivewal starting position

From
Ronan Dunklau
Date:
Le mercredi 20 octobre 2021 11:40:18 CEST, vous avez écrit :
> > +# Setup the slot, and connect to it a first time
> > +$primary->run_log(
> > +   [ 'pg_receivewal', '--slot', $slot_name, '--create-slot' ],
> > +   'creating a replication slot');
> > +$primary->psql('postgres',
> > +   'INSERT INTO test_table VALUES (generate_series(1,100));');
> > +$primary->psql('postgres', 'SELECT pg_switch_wal();');
> > +$nextlsn =
> > +  $primary->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
> > +chomp($nextlsn);
> > Wouldn't it be simpler to use CREATE_REPLICATION_SLOT with RESERVE_WAL
> > here, rather than going through pg_receivewal?  It seems to me that
> > this would be cheaper without really impacting the coverage.
>
> You're right, we can skip two invocations of pg_receivewal like this (for
> the slot creation + for starting the slot a first time).

After sending the previous patch suite, I figured it would be worthwhile to
also have tests covering timeline switches, which was not covered before.
So please find attached a new version with an additional patch for those tests,
covering both  "resume from last know archive" and "resume from the
replication slots position" cases.

--
Ronan Dunklau
Attachment

Re: pg_receivewal starting position

From
Michael Paquier
Date:
On Wed, Oct 20, 2021 at 02:58:26PM +0200, Ronan Dunklau wrote:
> After sending the previous patch suite, I figured it would be worthwhile to
> also have tests covering timeline switches, which was not covered before.

That seems independent to me.  I'll take a look.

> So please find attached a new version with an additional patch for those tests,
> covering both  "resume from last know archive" and "resume from the
> replication slots position" cases.

So, taking things in order, I have looked at 0003 and 0001, and
attached are refined versions for both of them.

0003 is an existing hole in the docs, which I think we had better
address first and backpatch, taking into account that the starting
point calculation considers compressed segments when looking for
completed segments.

Regarding 0001, I have found the last test to check for NULL values
returned by READ_REPLICATION_SLOT after dropping the slot overlaps
with the first test, so I have removed that.  I have expanded a bit
the use of like(), and there were some confusion with
PostgresNode::psql and some extra arguments (see DROP_REPLICATION_SLOT
and CREATE_REPLICATION_SLOT, and no need for return values in the
CREATE case either).  Some comments, docs and code have been slightly
tweaked.

Here are some comments about 0002.

+       /* The commpand should always return precisely one tuple */
s/commpand/command/

+       pg_log_error("could not fetch replication slot: got %d rows and %d fields, expected %d rows and %d or more
fields",
+                    PQntuples(res), PQnfields(res), 1, 3);
Should this be "could not read" instead?

+       if (sscanf(PQgetvalue(res, 0, 1), "%X/%X", &hi, &lo) != 2)
+       {
+           pg_log_error("could not parse slot's restart_lsn \"%s\"",
+                        PQgetvalue(res, 0, 1));
+           PQclear(res);
+           return false;
+       }
Wouldn't it be saner to initialize *restart_lsn and *restart_tli to
some default values at the top of GetSlotInformation() instead, if
they are specified by the caller?  And I think that we should still
complain even if restart_lsn is NULL.

On a quick read of 0004, I find the split of the logic with
change_timeline() a bit hard to understand.  It looks like we should
be able to make a cleaner split, but I am not sure how that would
look, though.
--
Michael

Attachment

Re: pg_receivewal starting position

From
Ronan Dunklau
Date:
Le jeudi 21 octobre 2021, 07:35:08 CEST Michael Paquier a écrit :
> On Wed, Oct 20, 2021 at 02:58:26PM +0200, Ronan Dunklau wrote:
> > After sending the previous patch suite, I figured it would be worthwhile
> > to
> > also have tests covering timeline switches, which was not covered before.
>
> That seems independent to me.  I'll take a look.
>
> > So please find attached a new version with an additional patch for those
> > tests, covering both  "resume from last know archive" and "resume from
> > the replication slots position" cases.
>
> So, taking things in order, I have looked at 0003 and 0001, and
> attached are refined versions for both of them.
>
> 0003 is an existing hole in the docs, which I think we had better
> address first and backpatch, taking into account that the starting
> point calculation considers compressed segments when looking for
> completed segments.

Ok, do you want me to propose a different patch for previous versions ?

>
> Regarding 0001, I have found the last test to check for NULL values
> returned by READ_REPLICATION_SLOT after dropping the slot overlaps
> with the first test, so I have removed that.  I have expanded a bit
> the use of like(), and there were some confusion with
> PostgresNode::psql and some extra arguments (see DROP_REPLICATION_SLOT
> and CREATE_REPLICATION_SLOT, and no need for return values in the
> CREATE case either).  Some comments, docs and code have been slightly
> tweaked.

Thank you for this.


>
> Here are some comments about 0002.
>
> +       /* The commpand should always return precisely one tuple */
> s/commpand/command/
>
> +       pg_log_error("could not fetch replication slot: got %d rows and %d
> fields, expected %d rows and %d or more fields", +
> PQntuples(res), PQnfields(res), 1, 3);
> Should this be "could not read" instead?
>
> +       if (sscanf(PQgetvalue(res, 0, 1), "%X/%X", &hi, &lo) != 2)
> +       {
> +           pg_log_error("could not parse slot's restart_lsn \"%s\"",
> +                        PQgetvalue(res, 0, 1));
> +           PQclear(res);
> +           return false;
> +       }
> Wouldn't it be saner to initialize *restart_lsn and *restart_tli to
> some default values at the top of GetSlotInformation() instead, if
> they are specified by the caller?

Ok.

> And I think that we should still
> complain even if restart_lsn is NULL.

Do you mean restart_lsn as the pointer argument to the function, or
restart_lsn as the field returned by the command ? If it's the first, I'll
change it but if it's the latter it is expected that we sometime run this on a
slot where WAL has never been reserved yet.

>
> On a quick read of 0004, I find the split of the logic with
> change_timeline() a bit hard to understand.  It looks like we should
> be able to make a cleaner split, but I am not sure how that would
> look, though.

Thanks, at least if the proposal to test this seems sensible I can move
forward. I wanted to avoid having a lot of code duplication since the test
setup is a bit more complicated.
My first approach was to split it into two functions, setup_standby and
change_timeline, but then realized that what would happen between the two
invocations would basically be the same for the two test cases, so I ended up
with that patch. I'll try to see if I can see a better way of organizing that
code.

--
Ronan Dunklau





Re: pg_receivewal starting position

From
Michael Paquier
Date:
On Thu, Oct 21, 2021 at 08:29:54AM +0200, Ronan Dunklau wrote:
> Ok, do you want me to propose a different patch for previous versions ?

That's not necessary.  Thanks for proposing.

> Do you mean restart_lsn as the pointer argument to the function, or
> restart_lsn as the field returned by the command ? If it's the first, I'll
> change it but if it's the latter it is expected that we sometime run this on a
> slot where WAL has never been reserved yet.

restart_lsn as the pointer of the function.
--
Michael

Attachment

Re: pg_receivewal starting position

From
Ronan Dunklau
Date:
Le jeudi 21 octobre 2021, 09:21:44 CEST Michael Paquier a écrit :
> On Thu, Oct 21, 2021 at 08:29:54AM +0200, Ronan Dunklau wrote:
> > Ok, do you want me to propose a different patch for previous versions ?
>
> That's not necessary.  Thanks for proposing.
>
> > Do you mean restart_lsn as the pointer argument to the function, or
> > restart_lsn as the field returned by the command ? If it's the first, I'll
> > change it but if it's the latter it is expected that we sometime run this
> > on a slot where WAL has never been reserved yet.
>
> restart_lsn as the pointer of the function.

Done. I haven't touched the timeline switch test patch for now, but I still
include it here for completeness.





--
Ronan Dunklau
Attachment

Re: pg_receivewal starting position

From
Michael Paquier
Date:
On Thu, Oct 21, 2021 at 10:36:42AM +0200, Ronan Dunklau wrote:
> Done. I haven't touched the timeline switch test patch for now, but I still
> include it here for completeness.

Thanks.  I have applied and back-patched 0001, then looked again at
0002 that adds READ_REPLICATION_SLOT:
- Change the TLI to use int8 rather than int4, so as we will always be
right with TimelineID which is unsigned (this was discussed upthread
but I got back on it after more thoughts, to avoid any future
issues).
- Added an extra initialization for the set of Datum values, just as
an extra safety net.
- There was a bug with the timeline returned when executing the
command while in recovery as ThisTimeLineID is 0 in the context of a
standby, but we need to support the case of physical slots even when
streaming archives from a standby.  The fix is similar to what we do
for IDENTIFY_SYSTEM, where we need to use the timeline currently
replayed from GetXLogReplayRecPtr(), before looking at the past
timeline history using restart_lsn and the replayed TLI.

With that in place, I think that we are good now for this part.
--
Michael

Attachment

Re: pg_receivewal starting position

From
Bharath Rupireddy
Date:
On Sat, Oct 23, 2021 at 1:14 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Thu, Oct 21, 2021 at 10:36:42AM +0200, Ronan Dunklau wrote:
> > Done. I haven't touched the timeline switch test patch for now, but I still
> > include it here for completeness.
>
> Thanks.  I have applied and back-patched 0001, then looked again at
> 0002 that adds READ_REPLICATION_SLOT:
> - Change the TLI to use int8 rather than int4, so as we will always be
> right with TimelineID which is unsigned (this was discussed upthread
> but I got back on it after more thoughts, to avoid any future
> issues).
> - Added an extra initialization for the set of Datum values, just as
> an extra safety net.
> - There was a bug with the timeline returned when executing the
> command while in recovery as ThisTimeLineID is 0 in the context of a
> standby, but we need to support the case of physical slots even when
> streaming archives from a standby.  The fix is similar to what we do
> for IDENTIFY_SYSTEM, where we need to use the timeline currently
> replayed from GetXLogReplayRecPtr(), before looking at the past
> timeline history using restart_lsn and the replayed TLI.
>
> With that in place, I think that we are good now for this part.

Thanks for the updated patch. I have following comments on v10:

1) It's better to initialize nulls with false, we can avoid setting
them to true. The instances where the columns are not nulls is going
to be more than the columns with null values, so we could avoid some
of nulls[i] = false; instructions.
+ MemSet(nulls, true, READ_REPLICATION_SLOT_COLS * sizeof(bool));
I suggest we do the following. The number of instances of hitting the
"else" parts will be less.
MemSet(nulls, false, READ_REPLICATION_SLOT_COLS * sizeof(bool));

    if (slot == NULL || !slot->in_use)
    {
        MemSet(nulls, true, READ_REPLICATION_SLOT_COLS * sizeof(bool));
        LWLockRelease(ReplicationSlotControlLock);
    }

if (!XLogRecPtrIsInvalid(slot_contents.data.restart_lsn))
{
}
else
   nulls[i] = true;

if (!XLogRecPtrIsInvalid(slot_contents.data.restart_lsn))
{
}
else
   nulls[i] = true;

2) As I previously mentioned, we are not copying the slot contents
while holding the spinlock, it's just we are taking the memory address
and releasing the lock, so there is a chance that the memory we are
looking at can become unavailable or stale while we access
slot_contents. So, I suggest we do the memcpy of the *slot to
slot_contents. I'm sure the memcpy ing the entire ReplicationSlot
contents will be costlier, so let's just take the info that we need
(data.database, data.restart_lsn) into local variables while we hold
the spin lock
+ /* Copy slot contents while holding spinlock */
+ SpinLockAcquire(&slot->mutex);
+ slot_contents = *slot;
+ SpinLockRelease(&slot->mutex);
+ LWLockRelease(ReplicationSlotControlLock);

The code will look like following:
        Oid            database;
        XLogRecPtr restart_lsn;

        /* Take required information from slot contents while holding
spinlock */
        SpinLockAcquire(&slot->mutex);
        database= slot->data.database;
        restart_lsn= slot->data.restart_lsn;
        SpinLockRelease(&slot->mutex);
        LWLockRelease(ReplicationSlotControlLock);

3) The data that the new command returns to the client can actually
become stale while it is captured and in transit to the client as we
release the spinlock and other backends can drop or alter the info.
So, it's better we talk about this in the documentation of the new
command and also in the comments saying "clients will have to deal
with it."

4) How about we be more descriptive about the error added? This will
help identify for which replication slot the command has failed from
tons of server logs which really help in debugging and analysis.
I suggest we have this:
errmsg("cannot use \"%s\" command with logical replication slot
\"%s\"", "READ_REPLICATION_SLOT", cmd->slotname);
instead of just a plain, non-informative, generic message:
errmsg("cannot use \"%s\" with logical replication slots",

Regards,
Bharath Rupireddy.



Re: pg_receivewal starting position

From
Michael Paquier
Date:
On Sat, Oct 23, 2021 at 10:46:30PM +0530, Bharath Rupireddy wrote:
> 1) It's better to initialize nulls with false, we can avoid setting
> them to true. The instances where the columns are not nulls is going
> to be more than the columns with null values, so we could avoid some
> of nulls[i] = false; instructions.

I don't think that this is an improvement in this case as in the
default case we'd return a tuple full of NULL values if the slot does
not exist, so the existing code is simpler when we don't look at the
slot contents.

> 2) As I previously mentioned, we are not copying the slot contents
> while holding the spinlock, it's just we are taking the memory address
> and releasing the lock, so there is a chance that the memory we are
> looking at can become unavailable or stale while we access
> slot_contents. So, I suggest we do the memcpy of the *slot to
> slot_contents. I'm sure the memcpy ing the entire ReplicationSlot
> contents will be costlier, so let's just take the info that we need
> (data.database, data.restart_lsn) into local variables while we hold
> the spin lock

The style of the patch is consistent with what we do in other areas
(see pg_get_replication_slots as one example).

> + /* Copy slot contents while holding spinlock */
> + SpinLockAcquire(&slot->mutex);
> + slot_contents = *slot;

And what this does is to copy the contents of the slot into a local
area (note that we use a NameData pointing to an area with
NAMEDATALEN).  Aka if the contents of *slot are changed by whatever
reason (this cannot change as of the LWLock acquired), what we have
saved is unchanged as of this command's context.

> 3) The data that the new command returns to the client can actually
> become stale while it is captured and in transit to the client as we
> release the spinlock and other backends can drop or alter the info.
> So, it's better we talk about this in the documentation of the new
> command and also in the comments saying "clients will have to deal
> with it."

The same can be said with IDENTIFY_SYSTEM when the flushed location
becomes irrelevant.  I am not sure that this has any need to apply
here.  We could add that this is useful to get a streaming start
position though.

> 4) How about we be more descriptive about the error added? This will
> help identify for which replication slot the command has failed from
> tons of server logs which really help in debugging and analysis.
> I suggest we have this:
> errmsg("cannot use \"%s\" command with logical replication slot
> \"%s\"", "READ_REPLICATION_SLOT", cmd->slotname);
> instead of just a plain, non-informative, generic message:
> errmsg("cannot use \"%s\" with logical replication slots",

Yeah.  I don't mind adding the slot name in this string.
--
Michael

Attachment

Re: pg_receivewal starting position

From
Bharath Rupireddy
Date:
On Sun, Oct 24, 2021 at 4:40 AM Michael Paquier <michael@paquier.xyz> wrote:
> On Sat, Oct 23, 2021 at 10:46:30PM +0530, Bharath Rupireddy wrote:
> > 2) As I previously mentioned, we are not copying the slot contents
> > while holding the spinlock, it's just we are taking the memory address
> > and releasing the lock, so there is a chance that the memory we are
> > looking at can become unavailable or stale while we access
> > slot_contents. So, I suggest we do the memcpy of the *slot to
> > slot_contents. I'm sure the memcpy ing the entire ReplicationSlot
> > contents will be costlier, so let's just take the info that we need
> > (data.database, data.restart_lsn) into local variables while we hold
> > the spin lock
>
> The style of the patch is consistent with what we do in other areas
> (see pg_get_replication_slots as one example).
>
> > + /* Copy slot contents while holding spinlock */
> > + SpinLockAcquire(&slot->mutex);
> > + slot_contents = *slot;
>
> And what this does is to copy the contents of the slot into a local
> area (note that we use a NameData pointing to an area with
> NAMEDATALEN).  Aka if the contents of *slot are changed by whatever
> reason (this cannot change as of the LWLock acquired), what we have
> saved is unchanged as of this command's context.

pg_get_replication_slots holds the ReplicationSlotControlLock until
the end of the function so it can be assured that *slot contents will
not change. In ReadReplicationSlot, the ReplicationSlotControlLock is
released immediately after taking *slot pointer into slot_contents.
Isn't it better if we hold the lock until the end of the function so
that we can avoid the slot contents becoming stale problems?

Having said that, the ReplicationSlotCreate,
SearchNamedReplicationSlot (when need_lock is true) etc. release the
lock immediately. I'm not sure if I should ignore this staleness
problem in ReadReplicationSlot.

> > 3) The data that the new command returns to the client can actually
> > become stale while it is captured and in transit to the client as we
> > release the spinlock and other backends can drop or alter the info.
> > So, it's better we talk about this in the documentation of the new
> > command and also in the comments saying "clients will have to deal
> > with it."
>
> The same can be said with IDENTIFY_SYSTEM when the flushed location
> becomes irrelevant.  I am not sure that this has any need to apply
> here.  We could add that this is useful to get a streaming start
> position though.

I think that's okay, let's not make any changes or add any comments in
regards to the above. The client is basically bound to get the
snapshot of the data at the time it requests the database.

> > 4) How about we be more descriptive about the error added? This will
> > help identify for which replication slot the command has failed from
> > tons of server logs which really help in debugging and analysis.
> > I suggest we have this:
> > errmsg("cannot use \"%s\" command with logical replication slot
> > \"%s\"", "READ_REPLICATION_SLOT", cmd->slotname);
> > instead of just a plain, non-informative, generic message:
> > errmsg("cannot use \"%s\" with logical replication slots",
>
> Yeah.  I don't mind adding the slot name in this string.

Thanks.

Regards,
Bharath Rupireddy.



Re: pg_receivewal starting position

From
Michael Paquier
Date:
On Sun, Oct 24, 2021 at 09:08:01AM +0530, Bharath Rupireddy wrote:
> pg_get_replication_slots holds the ReplicationSlotControlLock until
> the end of the function so it can be assured that *slot contents will
> not change. In ReadReplicationSlot, the ReplicationSlotControlLock is
> released immediately after taking *slot pointer into slot_contents.
> Isn't it better if we hold the lock until the end of the function so
> that we can avoid the slot contents becoming stale problems?

The reason is different in the case of pg_get_replication_slots().  We
have to hold ReplicationSlotControlLock for the whole duration of the
shared memory scan to return back to the user a consistent set of
information to the user, for all the slots.
--
Michael

Attachment

Re: pg_receivewal starting position

From
Bharath Rupireddy
Date:
On Sun, Oct 24, 2021 at 12:21 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Sun, Oct 24, 2021 at 09:08:01AM +0530, Bharath Rupireddy wrote:
> > pg_get_replication_slots holds the ReplicationSlotControlLock until
> > the end of the function so it can be assured that *slot contents will
> > not change. In ReadReplicationSlot, the ReplicationSlotControlLock is
> > released immediately after taking *slot pointer into slot_contents.
> > Isn't it better if we hold the lock until the end of the function so
> > that we can avoid the slot contents becoming stale problems?
>
> The reason is different in the case of pg_get_replication_slots().  We
> have to hold ReplicationSlotControlLock for the whole duration of the
> shared memory scan to return back to the user a consistent set of
> information to the user, for all the slots.

Thanks. I've no further comments on the v10 patch.

Regards,
Bharath Rupireddy.



Re: pg_receivewal starting position

From
Michael Paquier
Date:
On Sun, Oct 24, 2021 at 07:20:57PM +0530, Bharath Rupireddy wrote:
> Thanks. I've no further comments on the v10 patch.

Okay, thanks.  I have applied this part, then.
--
Michael

Attachment

Re: pg_receivewal starting position

From
Michael Paquier
Date:
On Thu, Oct 21, 2021 at 10:36:42AM +0200, Ronan Dunklau wrote:
> Done. I haven't touched the timeline switch test patch for now, but I still
> include it here for completeness.

As 0001 and 0002 have been applied, I have put my hands on 0003, and
found a couple of issues upon review.

+       Assert(slot_name != NULL);
+       Assert(restart_lsn != NULL);
There is no need for those asserts, as we should support the case
where the caller gives NULL for those variables.

+       if (PQserverVersion(conn) < 150000)
+               return false;
Returning false is incorrect for older server versions as we won't
fallback to the old method when streaming from older server.  What
this needs to do is return true and set restart_lsn to
InvalidXLogRecPtr, so as pg_receivewal would just stream from the
current flush location.  "false" should just be returned on error,
with pg_log_error().

+$primary->psql('postgres',
+       'INSERT INTO test_table VALUES (generate_series(1,100));');
+$primary->psql('postgres', 'SELECT pg_switch_wal();');
+$nextlsn =
+  $primary->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
+chomp($nextlsn);
There is no need to switch twice to a new WAL segment as we just need
to be sure that the WAL segment of the restart_lsn is the one
archived.  Note that RESERVE_WAL uses the last redo point, so it is
better to use a checkpoint and reduce the number of logs we stream
into the new location.

Better to add some --no-sync to the new commands of pg_receivewal, to
not stress the I/O more than necessary.  I have added some extra -n
while on it to avoid loops on failure.

Attached is the updated patch I am finishing with, which is rather
clean now.  I have tweaked a couple of things while on it, and
documented better the new GetSlotInformation() in streamutil.c.
--
Michael

Attachment

Re: pg_receivewal starting position

From
Ronan Dunklau
Date:
Le lundi 25 octobre 2021, 08:51:23 CEST Michael Paquier a écrit :
> On Thu, Oct 21, 2021 at 10:36:42AM +0200, Ronan Dunklau wrote:
> > Done. I haven't touched the timeline switch test patch for now, but I
> > still
> > include it here for completeness.
>
> As 0001 and 0002 have been applied, I have put my hands on 0003, and
> found a couple of issues upon review.
>
> +       Assert(slot_name != NULL);
> +       Assert(restart_lsn != NULL);
> There is no need for those asserts, as we should support the case
> where the caller gives NULL for those variables.

Does it make sense though ? The NULL slot_name case handling is pretty
straight forward has it will be handled by string formatting, but in the case
of a null restart_lsn, we have no way of knowing if the command was issued at
all.

>
> +       if (PQserverVersion(conn) < 150000)
> +               return false;
> Returning false is incorrect for older server versions as we won't
> fallback to the old method when streaming from older server.  What
> this needs to do is return true and set restart_lsn to
> InvalidXLogRecPtr, so as pg_receivewal would just stream from the
> current flush location.  "false" should just be returned on error,
> with pg_log_error().

Thank you, this was an oversight when moving from the more complicated error
handling code.

>
> +$primary->psql('postgres',
> +       'INSERT INTO test_table VALUES (generate_series(1,100));');
> +$primary->psql('postgres', 'SELECT pg_switch_wal();');
> +$nextlsn =
> +  $primary->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
> +chomp($nextlsn);
> There is no need to switch twice to a new WAL segment as we just need
> to be sure that the WAL segment of the restart_lsn is the one
> archived.  Note that RESERVE_WAL uses the last redo point, so it is
> better to use a checkpoint and reduce the number of logs we stream
> into the new location.
>
> Better to add some --no-sync to the new commands of pg_receivewal, to
> not stress the I/O more than necessary.  I have added some extra -n
> while on it to avoid loops on failure.
>
> Attached is the updated patch I am finishing with, which is rather
> clean now.  I have tweaked a couple of things while on it, and
> documented better the new GetSlotInformation() in streamutil.c.
> --
> Michael


--
Ronan Dunklau





Re: pg_receivewal starting position

From
Ronan Dunklau
Date:
Le lundi 25 octobre 2021, 00:43:11 CEST Michael Paquier a écrit :
> On Sun, Oct 24, 2021 at 07:20:57PM +0530, Bharath Rupireddy wrote:
> > Thanks. I've no further comments on the v10 patch.
>
> Okay, thanks.  I have applied this part, then.
> --
> Michael

Thank you all for your work on this patch.

--
Ronan Dunklau





Re: pg_receivewal starting position

From
Michael Paquier
Date:
On Mon, Oct 25, 2021 at 09:15:32AM +0200, Ronan Dunklau wrote:
> Does it make sense though ? The NULL slot_name case handling is pretty
> straight forward has it will be handled by string formatting, but in the case
> of a null restart_lsn, we have no way of knowing if the command was issued at
> all.

If I am following your point, I don't think that it matters much here,
and it seems useful to me to be able to pass NULL for both of them, so
as one can check if the slot exists or not with an API designed this
way.
--
Michael

Attachment

Re: pg_receivewal starting position

From
Ronan Dunklau
Date:
Le lundi 25 octobre 2021, 09:40:10 CEST Michael Paquier a écrit :
> On Mon, Oct 25, 2021 at 09:15:32AM +0200, Ronan Dunklau wrote:
> > Does it make sense though ? The NULL slot_name case handling is pretty
> > straight forward has it will be handled by string formatting, but in the
> > case of a null restart_lsn, we have no way of knowing if the command was
> > issued at all.
>
> If I am following your point, I don't think that it matters much here,
> and it seems useful to me to be able to pass NULL for both of them, so
> as one can check if the slot exists or not with an API designed this
> way.

You're right, but I'm afraid we would have to check the server version twice
in any case different from the basic pg_receivewal on (once in the function
itself, and one before calling it if we want a meaningful result). Maybe we
should move the version check outside the GetSlotInformation function to avoid
this, and let it fail with a syntax error when the server doesn't support it ?

--
Ronan Dunklau





Re: pg_receivewal starting position

From
Michael Paquier
Date:
On Mon, Oct 25, 2021 at 09:50:01AM +0200, Ronan Dunklau wrote:
> Le lundi 25 octobre 2021, 09:40:10 CEST Michael Paquier a écrit :
>> On Mon, Oct 25, 2021 at 09:15:32AM +0200, Ronan Dunklau wrote:
>>> Does it make sense though ? The NULL slot_name case handling is pretty
>>> straight forward has it will be handled by string formatting, but in the
>>> case of a null restart_lsn, we have no way of knowing if the command was
>>> issued at all.
>>
>> If I am following your point, I don't think that it matters much here,
>> and it seems useful to me to be able to pass NULL for both of them, so
>> as one can check if the slot exists or not with an API designed this
>> way.
>
> You're right, but I'm afraid we would have to check the server version twice
> in any case different from the basic pg_receivewal on (once in the function
> itself, and one before calling it if we want a meaningful result). Maybe we
> should move the version check outside the GetSlotInformation function to avoid
> this, and let it fail with a syntax error when the server doesn't support it ?

With the approach taken by the patch, we fall down silently to the
previous behavior if we connect to a server <= 14, and rely on the new
behavior with a server >= 15, ensuring compatibility.  Why would you
want to make sure that the command is executed when we should just
enforce that the old behavior is what happens when there is a slot
defined and a backend <= 14?
--
Michael

Attachment

Re: pg_receivewal starting position

From
Ronan Dunklau
Date:
Le lundi 25 octobre 2021, 10:10:13 CEST Michael Paquier a écrit :
> On Mon, Oct 25, 2021 at 09:50:01AM +0200, Ronan Dunklau wrote:
> > Le lundi 25 octobre 2021, 09:40:10 CEST Michael Paquier a écrit :
> >> On Mon, Oct 25, 2021 at 09:15:32AM +0200, Ronan Dunklau wrote:
> >>> Does it make sense though ? The NULL slot_name case handling is pretty
> >>> straight forward has it will be handled by string formatting, but in the
> >>> case of a null restart_lsn, we have no way of knowing if the command was
> >>> issued at all.
> >>
> >> If I am following your point, I don't think that it matters much here,
> >> and it seems useful to me to be able to pass NULL for both of them, so
> >> as one can check if the slot exists or not with an API designed this
> >> way.
> >
> > You're right, but I'm afraid we would have to check the server version
> > twice in any case different from the basic pg_receivewal on (once in the
> > function itself, and one before calling it if we want a meaningful
> > result). Maybe we should move the version check outside the
> > GetSlotInformation function to avoid this, and let it fail with a syntax
> > error when the server doesn't support it ?
> With the approach taken by the patch, we fall down silently to the
> previous behavior if we connect to a server <= 14, and rely on the new
> behavior with a server >= 15, ensuring compatibility.  Why would you
> want to make sure that the command is executed when we should just
> enforce that the old behavior is what happens when there is a slot
> defined and a backend <= 14?

Sorry I haven't been clear. For the use case of this patch, the current
approach is perfect (and we supply the restart_lsn).

However, if we want to support the case of "just check if the slot exists", we
need to make sure the command is actually executed, and check the version
before calling the function, which would make the check executed twice.

What I'm proposing is just that we let the responsibility of checking
PQServerVersion() to the caller, and remove it from GetSlotInformation, ie:

-               if (replication_slot != NULL)
+               if (replication_slot != NULL && PQserverVersion(conn) >=
150000)
                {
                        if (!GetSlotInformation(conn, replication_slot,
&stream.startpos,
                                                                        &stream.timeline))

That way, if we introduce a caller wanting to use this function as an API to
check a slot exists, the usage of checking the server version beforehand will
be consistent.



--
Ronan Dunklau





Re: pg_receivewal starting position

From
Michael Paquier
Date:
On Mon, Oct 25, 2021 at 10:24:46AM +0200, Ronan Dunklau wrote:
> However, if we want to support the case of "just check if the slot exists", we
> need to make sure the command is actually executed, and check the version
> before calling the function, which would make the check executed twice.
>
> What I'm proposing is just that we let the responsibility of checking
> PQServerVersion() to the caller, and remove it from GetSlotInformation, ie:
>
> -               if (replication_slot != NULL)
> +               if (replication_slot != NULL && PQserverVersion(conn) >=
> 150000)
>                 {
>                         if (!GetSlotInformation(conn, replication_slot,
> &stream.startpos,
>                                                                         &stream.timeline))
>
> That way, if we introduce a caller wanting to use this function as an API to
> check a slot exists, the usage of checking the server version beforehand will
> be consistent.

Ah, good point.  My apologies for not following.  Indeed, the patch
makes this part of the routine a bit blurry.  It is fine by me to do
as you suggest, and let the caller do the version check as you
propose, while making the routine fail if directly called for an older
server.
--
Michael

Attachment

Re: pg_receivewal starting position

From
Bharath Rupireddy
Date:
On Mon, Oct 25, 2021 at 12:21 PM Michael Paquier <michael@paquier.xyz> wrote:
> Attached is the updated patch I am finishing with, which is rather
> clean now.  I have tweaked a couple of things while on it, and
> documented better the new GetSlotInformation() in streamutil.c.

Thanks for the v11 patch, here are some comments:

1) Remove the extra whitespace in between "t" and ":"
+ pg_log_error("could not read replication slot : %s",

2) I think we should tweak the below error message
+ pg_log_error("could not read replication slot \"%s\": %s",
                                slot_name, PQerrorMessage(conn));
to
        pg_log_error("could not read replication slot \"%s\": %s",
                     "IDENTIFY_SYSTEM", PQerrorMessage(conn));
Having slot name in the error message helps to isolate the error
message from tons of server logs that gets generated.

3) Also for the same reasons stated as above, change the below error message
pg_log_error("could not read replication slot: got %d rows and %d
fields, expected %d rows and %d or more fields",
to
pg_log_error("could not read replication slot  \"%s\": got %d rows and
%d fields, expected %d rows and %d or more fields", slot_name,....

4) Also for the same reasons, change below
+ pg_log_error("could not parse slot's restart_lsn \"%s\"",
to
pg_log_error("could not parse replicaton slot  \"%s\" restart_lsn \"%s\"",
                         slot_name, PQgetvalue(res, 0, 1));

5) I think we should also have assertion for the timeline id:
    Assert(stream.startpos != InvalidXLogRecPtr);
    Assert(stream.timeline!= 0);

6) Why do we need these two assignements?
+ if (*restart_lsn)
+ *restart_lsn = lsn_loc;
+ if (restart_tli != NULL)
+ *restart_tli = tli_loc;

+ /* Assign results if requested */
+ if (restart_lsn)
+ *restart_lsn = lsn_loc;
+ if (restart_tli)
+ *restart_tli = tli_loc;

I think we can just get rid of lsn_loc and tli_loc, initlaize
*restart_lsn = InvalidXLogRecPtr and *restart_tli = 0 at the start of
the function and directly assign the requrested values to *restart_lsn
and *restart_tli, also see comment (8).

7) Let's be consistent, change the following
+
+ if (*restart_lsn)
+ *restart_lsn = lsn_loc;
+ if (restart_tli != NULL)
+ *restart_tli = tli_loc;
to
+
+ if (restart_lsn)
+ *restart_lsn = lsn_loc;
+ if (restart_tli != NULL)
+ *restart_tli = tli_loc;

8) Let's extract the values asked by the caller, change:
+ /* restart LSN */
+ if (!PQgetisnull(res, 0, 1))

+ /* current TLI */
+ if (!PQgetisnull(res, 0, 2))
+ tli_loc = (TimeLineID) atol(PQgetvalue(res, 0, 2));

to

+ /* restart LSN */
+ if (restart_lsn && !PQgetisnull(res, 0, 1))

+ /* current TLI */
+ if (restart_tli && !PQgetisnull(res, 0, 2))

9) 80char limit crossed:
+GetSlotInformation(PGconn *conn, const char *slot_name, XLogRecPtr
*restart_lsn, TimeLineID *restart_tli)

10) Missing word "command", and use "issued to the server", so change the below:
+      <command>READ_REPLICATION_SLOT</command> is issued to retrieve the
to
+      <command>READ_REPLICATION_SLOT</command> command is issued to
the server to retrieve the

11) Will replication_slot ever be NULL? If it ever be null, then we
don't reach this far right? We see the pg_log_error("%s needs a slot
to be specified using --slot". Please revmove below if condition:
+ * server may not support this option.
+ */
+ if (replication_slot != NULL)

We can just add Assert(slot_name); in GetSlotInformation().

12) How about following:
"If a starting point cannot be calculated with the previous method,
<command>READ_REPLICATION_SLOT</command> command with the provided
slot is issued to the server for retreving the slot's restart_lsn and
timelineid"
instead of
+      and if a replication slot is used, an extra
+      <command>READ_REPLICATION_SLOT</command> is issued to retrieve the
+      slot's <literal>restart_lsn</literal> to use as starting point.

The IDENTIFY_SYSTEM descritption starts with "If a starting point
cannot be calculated....":
    <listitem>
     <para>
      If a starting point cannot be calculated with the previous method,
      the latest WAL flush location is used as reported by the server from
      a <literal>IDENTIFY_SYSTEM</literal> command.

Regards,
Bharath Rupireddy.



Re: pg_receivewal starting position

From
Michael Paquier
Date:
On Mon, Oct 25, 2021 at 02:40:05PM +0530, Bharath Rupireddy wrote:
> 2) I think we should tweak the below error message
> to
>         pg_log_error("could not read replication slot \"%s\": %s",
>                      "IDENTIFY_SYSTEM", PQerrorMessage(conn));
> Having slot name in the error message helps to isolate the error
> message from tons of server logs that gets generated.

Yes, this suggestion makes sense.

> 3) Also for the same reasons stated as above, change the below error message
> pg_log_error("could not read replication slot: got %d rows and %d
> fields, expected %d rows and %d or more fields",
> to
> pg_log_error("could not read replication slot  \"%s\": got %d rows and
> %d fields, expected %d rows and %d or more fields", slot_name,....

We can even get rid of "or more" to match the condition used.

> 4) Also for the same reasons, change below
> + pg_log_error("could not parse slot's restart_lsn \"%s\"",
> to
> pg_log_error("could not parse replicaton slot  \"%s\" restart_lsn \"%s\"",
>                          slot_name, PQgetvalue(res, 0, 1));

Appending the slot name makes sense.

> 5) I think we should also have assertion for the timeline id:
>     Assert(stream.startpos != InvalidXLogRecPtr);
>     Assert(stream.timeline!= 0);

Okay.

> 6) Why do we need these two assignements?
> I think we can just get rid of lsn_loc and tli_loc, initlaize
> *restart_lsn = InvalidXLogRecPtr and *restart_tli = 0 at the start of
> the function and directly assign the requrested values to *restart_lsn
> and *restart_tli, also see comment (8).

FWIW, I find the style of the patch easier to follow.

> 9) 80char limit crossed:
> +GetSlotInformation(PGconn *conn, const char *slot_name, XLogRecPtr
> *restart_lsn, TimeLineID *restart_tli)

pgindent says nothing.

> 10) Missing word "command", and use "issued to the server", so change the below:
> +      <command>READ_REPLICATION_SLOT</command> is issued to retrieve the
> to
> +      <command>READ_REPLICATION_SLOT</command> command is issued to
> the server to retrieve the

Okay.

> 11) Will replication_slot ever be NULL? If it ever be null, then we
> don't reach this far right? We see the pg_log_error("%s needs a slot
> to be specified using --slot". Please revmove below if condition:
> + * server may not support this option.

Did you notice that this applies when creating or dropping a slot, for
code paths entirely different than what we are dealing with here?
--
Michael

Attachment

Re: pg_receivewal starting position

From
Bharath Rupireddy
Date:
On Mon, Oct 25, 2021 at 4:19 PM Michael Paquier <michael@paquier.xyz> wrote:
> > 6) Why do we need these two assignements?
> > I think we can just get rid of lsn_loc and tli_loc, initlaize
> > *restart_lsn = InvalidXLogRecPtr and *restart_tli = 0 at the start of
> > the function and directly assign the requrested values to *restart_lsn
> > and *restart_tli, also see comment (8).
>
> FWIW, I find the style of the patch easier to follow.

Then, please change if (*restart_lsn) and if (*restart_tli) to if
(restart_lsn) and if (restart_tli), just to be consistent with the
other parts of the patch and the existing code of RunIdentifySystem():
    if (*restart_lsn)
        *restart_lsn = lsn_loc;
    if (restart_tli != NULL)
        *restart_tli = tli_loc;

> > 11) Will replication_slot ever be NULL? If it ever be null, then we
> > don't reach this far right? We see the pg_log_error("%s needs a slot
> > to be specified using --slot". Please revmove below if condition:
> > + * server may not support this option.
>
> Did you notice that this applies when creating or dropping a slot, for
> code paths entirely different than what we are dealing with here?

StreamLog() isn't reached for create and drop slot cases, see [1]. I
suggest to remove replication_slot != NULL and have Assert(slot_name)
in GetSlotInformation():
        /*
         * Try to get the starting point from the slot.  This is supported in
         * PostgreSQL 15 and up.
         */
        if (PQserverVersion(conn) >= 150000)
        {
            if (!GetSlotInformation(conn, replication_slot, &stream.startpos,
                                    &stream.timeline))
            {
                /* Error is logged by GetSlotInformation() */
                return;
            }
        }

Here is another comment on the patch:
Remove the extra new line above the GetSlotInformation() definition:
  return true;
 }

+    -----> REMOVE THIS
+/*
+ * Run READ_REPLICATION_SLOT through a given connection and give back to

Apart from the above v12 patch LGTM.

[1]
    /*
     * Drop a replication slot.
     */
    if (do_drop_slot)
    {
        if (verbose)
            pg_log_info("dropping replication slot \"%s\"", replication_slot);

        if (!DropReplicationSlot(conn, replication_slot))
            exit(1);
        exit(0);
    }

    /* Create a replication slot */
    if (do_create_slot)
    {
        if (verbose)
            pg_log_info("creating replication slot \"%s\"", replication_slot);

        if (!CreateReplicationSlot(conn, replication_slot, NULL,
false, true, false,
                                   slot_exists_ok, false))
            exit(1);
        exit(0);
    }

Regards,
Bharath Rupireddy.



Re: pg_receivewal starting position

From
Michael Paquier
Date:
On Mon, Oct 25, 2021 at 05:46:57PM +0530, Bharath Rupireddy wrote:
> StreamLog() isn't reached for create and drop slot cases, see [1]. I
> suggest to remove replication_slot != NULL and have Assert(slot_name)
> in GetSlotInformation():
>         /*
>          * Try to get the starting point from the slot.  This is supported in
>          * PostgreSQL 15 and up.
>          */
>         if (PQserverVersion(conn) >= 150000)
>         {
>             if (!GetSlotInformation(conn, replication_slot, &stream.startpos,
>                                     &stream.timeline))
>             {
>                 /* Error is logged by GetSlotInformation() */
>                 return;
>             }
>         }

Please note that it is possible to use pg_receivewal without a slot,
which is the default case, so we cannot do what you are suggesting
here.  An assertion on slot_name in GetSlotInformation() is not that
helpful either in my opinion, as we would just crash a couple of lines
down the road.

I have changed the patch per Ronan's suggestion to have the version
check out of GetSlotInformation(), addressed what you have reported,
and the result looked good.  So I have applied this part.

What remains on this thread is the addition of new tests to make sure
that pg_receivewal is able to follow a timeline switch.  Now that we
can restart from a slot that should be a bit easier to implemented as
a test by creating a slot on a standby.  Ronan, are you planning to
send a new patch for this part?
--
Michael

Attachment

Re: pg_receivewal starting position

From
Ronan Dunklau
Date:
Le mardi 26 octobre 2021, 03:15:40 CEST Michael Paquier a écrit :
> I have changed the patch per Ronan's suggestion to have the version
> check out of GetSlotInformation(), addressed what you have reported,
> and the result looked good.  So I have applied this part.

Thanks !
>
> What remains on this thread is the addition of new tests to make sure
> that pg_receivewal is able to follow a timeline switch.  Now that we
> can restart from a slot that should be a bit easier to implemented as
> a test by creating a slot on a standby.  Ronan, are you planning to
> send a new patch for this part?

Yes, I will try to simplify the logic of the patch I sent last week. I'll keep
you posted here soon.


--
Ronan Dunklau





Re: pg_receivewal starting position

From
Ronan Dunklau
Date:
Le mardi 26 octobre 2021, 08:27:47 CEST Ronan Dunklau a écrit :
> Yes, I will try to simplify the logic of the patch I sent last week. I'll
> keep you posted here soon.

I was able to simplify it quite a bit, by using only one standby for both test
scenarios.

This test case verify that after a timeline switch, if we resume from a
previous state we will archive:
 - segments from the old timeline
 - segments from the new timeline
 - the timeline history file itself.

I chose to check against a full segment from the previous timeline, but it
would have been possible to check that the latest timeline segment was
partial. I chose not not, in the unlikely event we promote at an exact segment
boundary. I don't think it matters much, since partial wal files are already
covered by other tests.

--
Ronan Dunklau
Attachment

Re: pg_receivewal starting position

From
Kyotaro Horiguchi
Date:
At Tue, 26 Oct 2021 11:01:46 +0200, Ronan Dunklau <ronan.dunklau@aiven.io> wrote in
> Le mardi 26 octobre 2021, 08:27:47 CEST Ronan Dunklau a écrit :
> > Yes, I will try to simplify the logic of the patch I sent last week. I'll
> > keep you posted here soon.
>
> I was able to simplify it quite a bit, by using only one standby for both test
> scenarios.
>
> This test case verify that after a timeline switch, if we resume from a
> previous state we will archive:
>  - segments from the old timeline
>  - segments from the new timeline
>  - the timeline history file itself.
>
> I chose to check against a full segment from the previous timeline, but it
> would have been possible to check that the latest timeline segment was
> partial. I chose not not, in the unlikely event we promote at an exact segment
> boundary. I don't think it matters much, since partial wal files are already
> covered by other tests.

+my @walfiles = glob "$slot_dir/*";

This is not used.

Each pg_receivewal run stalls for about 10 or more seconds before
finishing, which is not great from the standpoint of recently
increasing test run time.

Maybe we want to advance LSN a bit, after taking $nextlsn then pass
"-s 1" to pg_receivewal.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center



Re: pg_receivewal starting position

From
Kyotaro Horiguchi
Date:
At Wed, 27 Oct 2021 11:17:28 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> Each pg_receivewal run stalls for about 10 or more seconds before
> finishing, which is not great from the standpoint of recently
> increasing test run time.
> 
> Maybe we want to advance LSN a bit, after taking $nextlsn then pass
> "-s 1" to pg_receivewal.

Hmm. Sorry, my fingers slipped.  We don't need '-s 1' for this purpose.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: pg_receivewal starting position

From
Ronan Dunklau
Date:
Le mercredi 27 octobre 2021, 04:17:28 CEST Kyotaro Horiguchi a écrit :
> +my @walfiles = glob "$slot_dir/*";
>
> This is not used.
>

Sorry, fixed in attached version.

> Each pg_receivewal run stalls for about 10 or more seconds before
> finishing, which is not great from the standpoint of recently
> increasing test run time.

> Maybe we want to advance LSN a bit, after taking $nextlsn then pass
> "-s 1" to pg_receivewal.

I incorrectly assumed it was due to the promotion time without looking into
it. In fact, you're right the LSN was not incremented after we fetched the end
lsn, and thus we would wait for quite a while. I fixed that too.

Thank you for the review !

--
Ronan Dunklau
Attachment

Re: pg_receivewal starting position

From
Ronan Dunklau
Date:
Le mercredi 27 octobre 2021, 10:00:40 CEST Ronan Dunklau a écrit :
> Le mercredi 27 octobre 2021, 04:17:28 CEST Kyotaro Horiguchi a écrit :
> > +my @walfiles = glob "$slot_dir/*";
> >
> > This is not used.
>
> Sorry, fixed in attached version.
>
> > Each pg_receivewal run stalls for about 10 or more seconds before
> > finishing, which is not great from the standpoint of recently
> > increasing test run time.
> >
> > Maybe we want to advance LSN a bit, after taking $nextlsn then pass
> > "-s 1" to pg_receivewal.
>
> I incorrectly assumed it was due to the promotion time without looking into
> it. In fact, you're right the LSN was not incremented after we fetched the
> end lsn, and thus we would wait for quite a while. I fixed that too.
>
> Thank you for the review !

Sorry I sent an intermediary version of the patch, here is the correct one.

--
Ronan Dunklau
Attachment

Re: pg_receivewal starting position

From
Ronan Dunklau
Date:
Le jeudi 28 octobre 2021, 14:31:36 CEST Michael Paquier a écrit :
> On Wed, Oct 27, 2021 at 10:11:00AM +0200, Ronan Dunklau wrote:
> > Sorry I sent an intermediary version of the patch, here is the correct
> > one.
>
> While looking at this patch, I have figured out a simple way to make
> the tests faster without impacting the coverage.  The size of the WAL
> segments archived is a bit of a bottleneck as they need to be zeroed
> by pg_receivewal at creation.  This finishes by being a waste of time,
> even if we don't flush the data.  So I'd like to switch the test so as
> we use a segment size of 1MB, first.
>
> A second thing is that we copy too many segments than really needed
> when using the slot's restart_lsn as starting point as RESERVE_WAL
> would use the current redo location, so it seems to me that a
> checkpoint is in order before the slot creation.  A third thing is
> that generating some extra data after the end LSN we want to use makes
> the test much faster at the end.
>
> With those three methods combined, the test goes down from 11s to 9s
> here.  Attached is a patch I'd like to apply to make the test
> cheaper.

Interesting ideas, thanks. For the record, the time drops from ~4.5s to 3s on
average on my machine.
I think if you reduce the size of the generate_series batches, this should
probably be reduced everywhere. With what we do though, inserting a single
line should work just as well, I wonder why we insist on inserting a hundred
lines ? I updated your patch with that small modification, it also makes the
code less verbose.


>
> I also had a look at your patch.  Here are some comments.
>
> +# Cleanup the previous stream directories to reuse them
> +unlink glob "'${stream_dir}/*'";
> +unlink glob "'${slot_dir}/*'";
> I think that we'd better choose a different location for the
> archives.  Keeping the segments of the previous tests is useful for
> debugging if a previous step of the test failed.

Ok.
>
> +$standby->psql('',
> +  "CREATE_REPLICATION_SLOT $folder_slot PHYSICAL (RESERVE_WAL)",
> +  replication => 1);
> Here as well we could use a restart point to reduce the number of
> segments archived.

The restart point should be very close, as we don't generate any activity on
the primary between the backup and the slot's creation. I'm not sure adding
the complexity of triggering a checkpoint on the primary and waiting for the
standby to catch up on it would be that useful.
>
> +# Now, try to resume after the promotion, from the folder.
> +$standby->command_ok(
> +   [ 'pg_receivewal', '-D', $stream_dir, '--verbose', '--endpos', $nextlsn,
> +    '--slot', $folder_slot, '--no-sync'],
> +   "Stream some wal after promoting, resuming from the folder's position");
> What is called the "resume-from-folder" test in the patch is IMO
> too costly and brings little extra value, requiring two commands of
> pg_receivewal (one to populate the folder and one to test the TLI jump
> from the previously-populated point) to test basically the same thing
> as when the starting point is taken from the slot.  Except that
> restarting from the slot is twice cheaper.  The point of the
> resume-from-folder case is to make sure that we restart from the point
> of the archive folder rather than the slot's restart_lsn, but your
> test fails this part, in fact, because the first command populating
> the archive folder also uses "--slot $folder_slot", updating the
> slot's restart_lsn before the second pg_receivewal uses this slot
> again.
>
> I think, as a whole, that testing for the case where an archive folder
> is populated (without a slot!), followed by a second command where we
> use a slot that has a restart_lsn older than the archive's location,
> to not be that interesting.  If we include such a test, there is no
> need to include that within the TLI jump part, in my opinion.  So I
> think that we had better drop this part of the patch, and keep only
> the case where we resume from a slot for the TLI jump.

You're right about the test not being that interesting in that case. I thought
it would be worthwhile to test both cases, but resume_from_folder doesn't
actually exercise any code that was not already called before (timeline
computation from segment name).

> The commands of pg_receivewal included in the test had better use -n
> so as there is no infinite loop on failure.

Ok.

--
Ronan Dunklau
Attachment

Re: pg_receivewal starting position

From
Ronan Dunklau
Date:
Le vendredi 29 octobre 2021, 04:27:51 CEST Michael Paquier a écrit :
> On Thu, Oct 28, 2021 at 03:55:12PM +0200, Ronan Dunklau wrote:
> > Interesting ideas, thanks. For the record, the time drops from ~4.5s to 3s
> > on average on my machine.
> > I think if you reduce the size of the generate_series batches, this should
> > probably be reduced everywhere. With what we do though, inserting a single
> > line should work just as well, I wonder why we insist on inserting a
> > hundred lines ? I updated your patch with that small modification, it
> > also makes the code less verbose.
>
> Thanks for the extra numbers.  I have added your suggestions,
> switching the dummy table to use a primary key with different values,
> while on it, as there is an argument that it makes debugging easier,
> and applied the speedup patch.

Thanks !

>
> >> +$standby->psql('',
> >> +  "CREATE_REPLICATION_SLOT $folder_slot PHYSICAL (RESERVE_WAL)",
> >> +  replication => 1);
> >> Here as well we could use a restart point to reduce the number of
> >> segments archived.
> >
> > The restart point should be very close, as we don't generate any activity
> > on the primary between the backup and the slot's creation. I'm not sure
> > adding the complexity of triggering a checkpoint on the primary and
> > waiting for the standby to catch up on it would be that useful.
>
> Yes, you are right here.  The base backup taken from the primary
> at this point ensures a fresh point.
>
> +# This test is split in two, using the same standby: one test check the
> +# resume-from-folder case, the other the resume-from-slot one.
> This comment needs a refresh, as the resume-from-folder case is no
> more.
>

Done.

> +$standby->psql(
> +  'postgres',
> +  "SELECT pg_promote(wait_seconds => 300)");
> This could be $standby->promote.
>

Oh, didn't know about that.

> +# Switch wal to make sure it is not a partial file but a complete
> segment.
> +$primary->psql('postgres', 'INSERT INTO test_table VALUES (1);');
> +$primary->psql('postgres', 'SELECT pg_switch_wal();');
> +$primary->wait_for_catchup($standby, 'replay', $primary->lsn('write'));
> This INSERT needs a slight change to adapt to the primary key of the
> table.  This one is on me :p

Done.

>
> Anyway, is this first segment switch really necessary?  From the data
> archived by pg_receivewal in the command testing the TLI jump, we
> finish with the following contents (contents generated after fixing
> the three INSERTs):
> 00000001000000000000000B
> 00000001000000000000000C
> 00000002000000000000000D
> 00000002000000000000000E.partial
> 00000002.history
>
> So, even if we don't do the first switch, we'd still have one
> completed segment on the previous timeline, before switching to the
> new timeline and the next segment (pg_receivewal is a bit inconsistent
> with the backend here, by the way, as the first segment on the new
> timeline would map with the last segment of the old timeline, but here
> we have a clean switch as of stop_streaming in pg_receivewal.c).

The first completed segment on the previous timeline comes from the fact we
stream from the restart point. I removed the switch to use the walfilename of
the replication slot's restart point instead. This means querying both the
standby (to get the replication slot's restart_lsn) and the primary (to have
access to pg_walfile_name).

We could use a single query on the primary (using the primary's checkpoint LSN
instead)  but it feels a bit convoluted just to avoid a query on the standby.


--
Ronan Dunklau
Attachment