Thread: pg_receivewal starting position
Hello, I've notived that pg_receivewal logic for deciding which LSN to start streaming at consists of: - looking up the latest WAL file in our destination folder, and resume from here - if there isn't, use the current flush location instead. This behaviour surprised me when using it with a replication slot: I was expecting it to start streaming at the last flushed location from the replication slot instead. If you consider a backup tool which will take pg_receivewal's output and transfer it somewhere else, using the replication slot position would be the easiest way to ensure we don't miss WAL files. Does that make sense ? I don't know if it should be the default, toggled by a command line flag, or if we even should let the user provide a LSN. I'd be happy to implement any of that if we agree. -- Ronan Dunklau
At Tue, 27 Jul 2021 07:50:39 +0200, Ronan Dunklau <ronan.dunklau@aiven.io> wrote in > Hello, > > I've notived that pg_receivewal logic for deciding which LSN to start > streaming at consists of: > - looking up the latest WAL file in our destination folder, and resume from > here > - if there isn't, use the current flush location instead. > > This behaviour surprised me when using it with a replication slot: I was > expecting it to start streaming at the last flushed location from the > replication slot instead. If you consider a backup tool which will take > pg_receivewal's output and transfer it somewhere else, using the replication > slot position would be the easiest way to ensure we don't miss WAL files. > > Does that make sense ? > > I don't know if it should be the default, toggled by a command line flag, or if > we even should let the user provide a LSN. *I* think it is completely reasonable (or at least convenient or less astonishing) that pg_receivewal starts from the restart_lsn of the replication slot to use. The tool already decides the clean-start LSN a bit unusual way. And it seems to me that proposed behavior can be the default when -S is specified. > I'd be happy to implement any of that if we agree. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Le mercredi 28 juillet 2021, 08:22:30 CEST Kyotaro Horiguchi a écrit : > At Tue, 27 Jul 2021 07:50:39 +0200, Ronan Dunklau <ronan.dunklau@aiven.io> > wrote in > > Hello, > > > > I've notived that pg_receivewal logic for deciding which LSN to start > > > > streaming at consists of: > > - looking up the latest WAL file in our destination folder, and resume > > from > > > > here > > > > - if there isn't, use the current flush location instead. > > > > This behaviour surprised me when using it with a replication slot: I was > > expecting it to start streaming at the last flushed location from the > > replication slot instead. If you consider a backup tool which will take > > pg_receivewal's output and transfer it somewhere else, using the > > replication slot position would be the easiest way to ensure we don't > > miss WAL files. > > > > Does that make sense ? > > > > I don't know if it should be the default, toggled by a command line flag, > > or if we even should let the user provide a LSN. > > *I* think it is completely reasonable (or at least convenient or less > astonishing) that pg_receivewal starts from the restart_lsn of the > replication slot to use. The tool already decides the clean-start LSN > a bit unusual way. And it seems to me that proposed behavior can be > the default when -S is specified. > As of now we can't get the replication_slot restart_lsn with a replication connection AFAIK. This implies that the patch could require the user to specify a maintenance-db parameter, and we would use that if provided to fetch the replication slot info, or fallback to the previous behaviour. I don't really like this approach as the behaviour changing wether we supply a maintenance-db parameter or not is error-prone for the user. Another option would be to add a new replication command (for example ACQUIRE_REPLICATION_SLOT <slot_name>) to set the replication slot as the current one, and return some info about it (restart_lsn at least for a physical slot). I don't see any reason not to make it work for logical replication connections / slots, but it wouldn't be that useful since we can query the database in that case. Acquiring the replication slot instead of just reading it would make sure that no other process could start the replication between the time we read the restart_lsn and when we issue START_REPLICATION. START_REPLICATION could then check if we already have a replication slot, and ensure it is the same one as the one we're trying to use. From pg_receivewal point of view, this would amount to: - check if we currently have wal in the target directory. - if we do, proceed as currently done, by computing the start lsn and timeline from the last archived wal - if we don't, and we have a slot, run ACQUIRE_REPLICATION_SLOT. Use the restart_lsn as the start lsn if there is one, and don't provide a timeline - if we still don't have a start_lsn, fallback to using the current server wal position as is done. What do you think ? Which information should we provide about the slot ? -- Ronan Dunklau
At Wed, 28 Jul 2021 12:57:39 +0200, Ronan Dunklau <ronan.dunklau@aiven.io> wrote in > Le mercredi 28 juillet 2021, 08:22:30 CEST Kyotaro Horiguchi a écrit : > > At Tue, 27 Jul 2021 07:50:39 +0200, Ronan Dunklau <ronan.dunklau@aiven.io> > > wrote in > > > I don't know if it should be the default, toggled by a command line flag, > > > or if we even should let the user provide a LSN. > > > > *I* think it is completely reasonable (or at least convenient or less > > astonishing) that pg_receivewal starts from the restart_lsn of the > > replication slot to use. The tool already decides the clean-start LSN > > a bit unusual way. And it seems to me that proposed behavior can be > > the default when -S is specified. > > > > As of now we can't get the replication_slot restart_lsn with a replication > connection AFAIK. > > This implies that the patch could require the user to specify a maintenance-db > parameter, and we would use that if provided to fetch the replication slot > info, or fallback to the previous behaviour. I don't really like this approach > as the behaviour changing wether we supply a maintenance-db parameter or not > is error-prone for the user. > > Another option would be to add a new replication command (for example > ACQUIRE_REPLICATION_SLOT <slot_name>) to set the replication slot as the > current one, and return some info about it (restart_lsn at least for a > physical slot). I didn't thought in details. But I forgot that ordinary SQL commands have been prohibited in physical replication connection. So we need a new replication command but it's not that a big deal. > I don't see any reason not to make it work for logical replication connections > / slots, but it wouldn't be that useful since we can query the database in > that case. Ordinary SQL queries are usable on a logical replication slot so I'm not sure how logical replication connection uses the command. However, like you, I wouldn't bother restricting the command to physical replication, but perhaps the new command should return the slot type. > Acquiring the replication slot instead of just reading it would make sure that > no other process could start the replication between the time we read the > restart_lsn and when we issue START_REPLICATION. START_REPLICATION could then > check if we already have a replication slot, and ensure it is the same one as > the one we're trying to use. I'm not sure it's worth adding complexity for such strictness. START_REPLICATION safely fails if someone steals the slot meanwhile. In the first place there's no means to protect a slot from others while idle. One possible problem is the case where START_REPLICATION successfully acquire the slot after the new command failed. But that case doesn't seem worse than the case someone advances the slot while absence. So I think READ_REPLICATION_SLOT is sufficient. > From pg_receivewal point of view, this would amount to: > > - check if we currently have wal in the target directory. > - if we do, proceed as currently done, by computing the start lsn and > timeline from the last archived wal > - if we don't, and we have a slot, run ACQUIRE_REPLICATION_SLOT. Use the > restart_lsn as the start lsn if there is one, and don't provide a timeline > - if we still don't have a start_lsn, fallback to using the current server > wal position as is done. That's pretty much it. > What do you think ? Which information should we provide about the slot ? We need the timeline id to start with when using restart_lsn. The current timeline can be used in most cases but there's a case where the LSN is historical. pg_receivewal doesn't send a replication status report when a segment is finished. So after pg_receivewal stops just after a segment is finished, the slot stays at the beginning of the last segment. Thus next time it will start from there, creating a duplicate segment. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Le jeudi 29 juillet 2021, 07:32:37 CEST Kyotaro Horiguchi a écrit : > I didn't thought in details. But I forgot that ordinary SQL commands > have been prohibited in physical replication connection. So we need a > new replication command but it's not that a big deal. Thank you for your feedback ! > > > I don't see any reason not to make it work for logical replication > > connections / slots, but it wouldn't be that useful since we can query > > the database in that case. > > Ordinary SQL queries are usable on a logical replication slot so > I'm not sure how logical replication connection uses the command. > However, like you, I wouldn't bother restricting the command to > physical replication, but perhaps the new command should return the > slot type. > Ok done in the attached patch. > > I'm not sure it's worth adding complexity for such strictness. > START_REPLICATION safely fails if someone steals the slot meanwhile. > In the first place there's no means to protect a slot from others > while idle. One possible problem is the case where START_REPLICATION > successfully acquire the slot after the new command failed. But that > case doesn't seem worse than the case someone advances the slot while > absence. So I think READ_REPLICATION_SLOT is sufficient. > Ok, I implemented it like this. I tried to follow the pg_get_replication_slots approach with regards to how to prevent concurrent modification while reading the slot. > > From pg_receivewal point of view, this would amount to: > > - check if we currently have wal in the target directory. > > > > - if we do, proceed as currently done, by computing the start lsn and > > > > timeline from the last archived wal > > > > - if we don't, and we have a slot, run ACQUIRE_REPLICATION_SLOT. Use the > > > > restart_lsn as the start lsn if there is one, and don't provide a timeline > > > > - if we still don't have a start_lsn, fallback to using the current > > server > > > > wal position as is done. > > That's pretty much it. Great. > > > What do you think ? Which information should we provide about the slot ? > > We need the timeline id to start with when using restart_lsn. The > current timeline can be used in most cases but there's a case where > the LSN is historical. Ok, see below. > > pg_receivewal doesn't send a replication status report when a segment > is finished. So after pg_receivewal stops just after a segment is > finished, the slot stays at the beginning of the last segment. Thus > next time it will start from there, creating a duplicate segment. I'm not sure I see where the problem is here. If we don't keep the segments in pg_walreceiver target directory, then it would be the responsibility of whoever moved them to make sure we don't have duplicates, or to handle them gracefully. Even if we were forcing a feedback after a segment is finished, there could still be a problem if the feedback never made it to the server but the segment was here. It might be interesting to send a feedback anyway. Please find attached two patches implementing what we've been discussing. Patch 0001 adds the new READ_REPLICATION_SLOT command. It returns for a given slot the type, restart_lsn, flush_lsn, restart_lsn_timeline and flush_lsn_timeline. The timelines are determined by reading the current timeline history, and finding the timeline where we may find the record. I didn't find explicit test for eg IDENTIFY_SYSTEM so didn't write one either for this new command, but it is tested indirectly in patch 0002. Patch 0002 makes pg_receivewal use that command if we use a replication slot and the command is available, and use the restart_lsn and restart_lsn_timeline as a starting point. It also adds a small test to check that we start back from the previous restart_lsn instead of the current flush position when our destination directory does not contain any WAL file. I also noticed we don't test following a timeline switch. It would probably be good to add that, both for the case where we determine the previous timeline from the archived segments and when it comes from the new command. What do you think ? Regards, -- Ronan Dunklau
Attachment
Le jeudi 29 juillet 2021, 11:09:40 CEST Ronan Dunklau a écrit : > Patch 0001 adds the new READ_REPLICATION_SLOT command. > It returns for a given slot the type, restart_lsn, flush_lsn, > restart_lsn_timeline and flush_lsn_timeline. > The timelines are determined by reading the current timeline history, and > finding the timeline where we may find the record. I didn't find explicit > test for eg IDENTIFY_SYSTEM so didn't write one either for this new > command, but it is tested indirectly in patch 0002. > > Patch 0002 makes pg_receivewal use that command if we use a replication slot > and the command is available, and use the restart_lsn and > restart_lsn_timeline as a starting point. It also adds a small test to > check that we start back from the previous restart_lsn instead of the > current flush position when our destination directory does not contain any > WAL file. > > I also noticed we don't test following a timeline switch. It would probably > be good to add that, both for the case where we determine the previous > timeline from the archived segments and when it comes from the new command. > What do you think ? Following the discussion at [1], I refactored the implementation into streamutil and added a third patch making use of it in pg_basebackup itself in order to fail early if the replication slot doesn't exist, so please find attached v2 for that. Best regards, [1]: https://www.postgresql.org/message-id/flat/ CAD21AoDYmv0yJMQnWtCx_kZGwVZnkQSTQ1re2JNSgM0k37afYQ%40mail.gmail.com -- Ronan Dunklau
Attachment
On Thu, Aug 26, 2021 at 5:45 PM Ronan Dunklau <ronan.dunklau@aiven.io> wrote: > order to fail early if the replication slot doesn't exist, so please find > attached v2 for that. Thanks for the patches. Here are some comments: 1) While the intent of these patches looks good, I have following concern with new replication command READ_REPLICATION_SLOT: what if the pg_receivewal exits (because user issued a SIGINT or for some reason) after flushing the received WAL to disk, before it sends sendFeedback to postgres server's walsender so that it doesn't get a chance to update the restart_lsn in the replication slot via PhysicalConfirmReceivedLocation. If the pg_receivewal is started again, isn't it going to get the previous restart_lsn and receive the last chunk of flushed WAL again? 2) What is the significance of READ_REPLICATION_SLOT for logical replication slots? I read above that somebody suggested to restrict the walsender to handle READ_REPLICATION_SLOT for physical replication slots so that the callers will see a command failure. But I tend to think that it is clean to have this command common for both physical and logical replication slots and the callers can have an Assert(type == 'physical'). 3) Isn't it useful to send active, active_pid info of the replication slot via READ_REPLICATION_SLOT? pg_receivewal can use Assert(active == true && active_pid == getpid()) as an assertion to ensure that it is the sole owner of the replication slot? Also, is it good send wal_status info 4) I think below messages should start with lower case letter and also there are some typos: + pg_log_warning("Could not fetch the replication_slot \"%s\" information " + pg_log_warning("Server does not suport fetching the slot's position, " something like: + pg_log_warning("could not fetch replication slot \"%s\" information, " + "resuming from current server position instead", replication_slot); + pg_log_warning("server does not support fetching replication slot information, " + "resuming from current server position instead"); 5) How about emitting the above messages in case of "verbose"? 6) How about an assertion like below? + if (stream.startpos == InvalidXLogRecPtr) + { + stream.startpos = serverpos; + stream.timeline = servertli; + } + +Assert(stream.startpos != InvalidXLogRecPtr)>> 7) How about we let pg_receivewal use READ_REPLICATION_SLOT as an option? 8) Just an idea, how about we store pg_receivewal's lastFlushPosition in a file before pg_receivewal exits and compare it with the restart_lsn that it received from the replication slot, if lastFlushPosition == received_restart_lsn well and good, if not, then something would have happened and we always start at the lastFlushPosition ? Regards, Bharath Rupireddy.
Le vendredi 27 août 2021, 05:44:32 CEST Michael Paquier a écrit : > On Thu, Aug 26, 2021 at 02:14:27PM +0200, Ronan Dunklau wrote: > > Following the discussion at [1], I refactored the implementation into > > streamutil and added a third patch making use of it in pg_basebackup > > itself in order to fail early if the replication slot doesn't exist, so > > please find attached v2 for that. > > Thanks for the split. That helps a lot. > Thank you very much for the review, please find attached an updated patchset. I've also taken into account some remarks made by Bharath Rupireddy. > + > + > /* > * Run IDENTIFY_SYSTEM through a given connection and give back to caller > > The patch series has some noise diffs here and there, you may want to > clean up that for clarity. Ok, sorry about that. > > + if (slot == NULL || !slot->in_use) > + { > + LWLockRelease(ReplicationSlotControlLock); > + > + ereport(ERROR, > + (errcode(ERRCODE_UNDEFINED_OBJECT), > LWLocks are released on ERROR, so no need for LWLockRelease() here. > Following your suggestion of not erroring out on an unexisting slot this point is no longer be relevant, but thanks for pointing this out anyway. > + <listitem> > + <para> > + Read information about the named replication slot. This is > useful to determine which WAL location we should be asking the server > to start streaming at. > > A nit. You may want to be more careful with the indentation of the > documentation. Things are usually limited in width for readability. > More <literal> markups would be nice for the field names used in the > descriptions. Ok. > > + if (slot == NULL || !slot->in_use) > > [...] + > ereport(ERROR, > + (errcode(ERRCODE_UNDEFINED_OBJECT), > + errmsg("replication slot \"%s\" does not exist", > + cmd->slotname))); > [...] > + if (PQntuples(res) == 0) > + { > + pg_log_error("replication slot %s does not exist", > slot_name); + PQclear(0); > + return false; > So, the backend and ReadReplicationSlot() report an ERROR if a slot > does not exist but pg_basebackup's GetSlotInformation() does the same > if there are no tuples returned. That's inconsistent. Wouldn't it be > more instinctive to return a NULL tuple instead if the slot does not > exist to be able to check after real ERRORs in frontends using this > interface? The attached patch returns no tuple at all when the replication slot doesn't exist. I'm not sure if that's what you meant by returning a NULL tuple ? > A slot in use exists, so the error is a bit confusing here > anyway, no? From my understanding, a slot *not* in use doesn't exist anymore, as such I don't really understand this point. Could you clarify ? > > + * XXX: should we allow the caller to specify which target timeline it > wants + * ? > + */ > What are you thinking about here? I was thinking that maybe instead of walking back the timeline history from where we currently are on the server, we could allow an additional argument for the client to specify which timeline it wants. But I guess a replication slot can not be present for a past, divergent timeline ? I have removed that suggestion. > > -# restarts of pg_receivewal will see this segment as full.. > +# restarts of pg_receivewal will see this segment as full../ > Typo. Ok. > > + TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 4, > "restart_lsn_timeline", + INT4OID, -1, 0); > + TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 5, > "confirmed_flush_lsn_timeline", + INT4OID, -1, > 0); > I would call these restart_tli and confirmed_flush_tli., without the > "lsn" part. Ok. > > The patch for READ_REPLICATION_SLOT could have some tests using a > connection that has replication=1 in some TAP tests. We do that in > 001_stream_rep.pl with SHOW, as one example. Ok. I added the physical part to 001_stream_rep.pl, using the protocol interface directly for creating / dropping the slot, and some tests for logical replication slots to 006_logical_decoding.pl. > > - 'slot0' > + 'slot0', '-p', > + "$port" > Something we are missing here? The thing we're missing here is a wrapper for command_fails_like. I've added this to PostgresNode.pm. Best regards, -- Ronan Dunklau
Attachment
Le samedi 28 août 2021, 14:10:25 CEST Bharath Rupireddy a écrit : > On Thu, Aug 26, 2021 at 5:45 PM Ronan Dunklau <ronan.dunklau@aiven.io> wrote: > > order to fail early if the replication slot doesn't exist, so please find > > attached v2 for that. > > Thanks for the patches. Here are some comments: > Thank you for this review ! Please see the other side of the thread where I answered Michael Paquier with a new patchset, which includes some of your proposed modifications. > 1) While the intent of these patches looks good, I have following > concern with new replication command READ_REPLICATION_SLOT: what if > the pg_receivewal exits (because user issued a SIGINT or for some > reason) after flushing the received WAL to disk, before it sends > sendFeedback to postgres server's walsender so that it doesn't get a > chance to update the restart_lsn in the replication slot via > PhysicalConfirmReceivedLocation. If the pg_receivewal is started > again, isn't it going to get the previous restart_lsn and receive the > last chunk of flushed WAL again? I've kept the existing directory as the source of truth if we have any WAL there already. If we don't, we fallback to the restart_lsn on the replication slot. So in the event that we start it again from somewhere else where we don't have access to those WALs anymore, we could be receiving it again, which in my opinion is better than losing everything in between in that case. > > 2) What is the significance of READ_REPLICATION_SLOT for logical > replication slots? I read above that somebody suggested to restrict > the walsender to handle READ_REPLICATION_SLOT for physical replication > slots so that the callers will see a command failure. But I tend to > think that it is clean to have this command common for both physical > and logical replication slots and the callers can have an Assert(type > == 'physical'). I've updated the patch to make it easy for the caller to check the slot's type and added a verification for those cases. In general, I tried to implement the meaning of the different fields exactly as it's done in the pg_replication_slots view. > > 3) Isn't it useful to send active, active_pid info of the replication > slot via READ_REPLICATION_SLOT? pg_receivewal can use Assert(active == > true && active_pid == getpid()) as an assertion to ensure that it is > the sole owner of the replication slot? Also, is it good send > wal_status info Typically we read the slot before attaching to it, since what we want to do is check if it exists. It may be worthwile to check that it's not already used by another backend though. > > 4) I think below messages should start with lower case letter and also > there are some typos: > + pg_log_warning("Could not fetch the replication_slot \"%s\" information " > + pg_log_warning("Server does not suport fetching the slot's position, " > something like: > + pg_log_warning("could not fetch replication slot \"%s\" information, " > + "resuming from current server position instead", replication_slot); > + pg_log_warning("server does not support fetching replication slot > information, " > + "resuming from current server position instead"); > I've rephrased it a bit in v3, let me know if that's what you had in mind. > 5) How about emitting the above messages in case of "verbose"? I think it is useful to warn the user even if not in the verbose case, but if the consensus is to move it to verbose only output I can change it. > > 6) How about an assertion like below? > + if (stream.startpos == InvalidXLogRecPtr) > + { > + stream.startpos = serverpos; > + stream.timeline = servertli; > + } > + > +Assert(stream.startpos != InvalidXLogRecPtr)>> Good idea. > > 7) How about we let pg_receivewal use READ_REPLICATION_SLOT as an option? From my point of view, I already expected it to use something like that when using a replication slot. Maybe an option to turn it off could be useful ? > > 8) Just an idea, how about we store pg_receivewal's lastFlushPosition > in a file before pg_receivewal exits and compare it with the > restart_lsn that it received from the replication slot, if > lastFlushPosition == received_restart_lsn well and good, if not, then > something would have happened and we always start at the > lastFlushPosition ? The patch idea originally came from the fact that some utility use pg_receivewal to fetch WALs, and move them elsewhere. In that sense I don't really see what value this brings compared to the existing (and unmodified) way of computing the restart position from the already stored files ? Best regards, -- Ronan Dunklau
On Mon, Aug 30, 2021 at 3:26 PM Ronan Dunklau <ronan.dunklau@aiven.io> wrote: > Thank you for this review ! Please see the other side of the thread where I > answered Michael Paquier with a new patchset, which includes some of your > proposed modifications. Thanks for the updated patches. Here are some comments on v3-0001 patch. I will continue to review 0002 and 0003. 1) Missing full stops "." at the end. + <literal>logical</literal> + when following the current timeline history + position, when following the current timeline history 2) Can we have the "type" column as "slot_type" just to keep in sync with the pg_replication_slots view? 3) Can we mention the link to pg_replication_slots view in the columns - "type", "restart_lsn", "confirmed_flush_lsn"? Something like: the "slot_type"/"restart_lsn"/"confirmed_flush_lsn" is same as <link linkend="view-pg-replication-slots"><structname>pg_replication_slots</structname></link> view. 4) Can we use "read_replication_slot" instead of "identify_replication_slot", just to be in sync with the actual command? 5) Are you actually copying the slot contents into the slot_contents variable here? Isn't just taking the pointer to the shared memory? + /* Copy slot contents while holding spinlock */ + SpinLockAcquire(&slot->mutex); + slot_contents = *slot; + SpinLockRelease(&slot->mutex); + LWLockRelease(ReplicationSlotControlLock); You could just do: + Oid dbid; + XLogRecPtr restart_lsn; + XLogRecPtr confirmed_flush; + /* Copy the required slot contents */ + SpinLockAcquire(&slot->mutex); + dbid = slot.data.database; + restart_lsn = slot.data.restart_lsn; + confirmed_flush = slot.data.confirmed_flush; + SpinLockRelease(&slot->mutex); + LWLockRelease(ReplicationSlotControlLock); 6) It looks like you are not sending anything as a response to the READ_REPLICATION_SLOT command, if the slot specified doesn't exist. You are just calling end_tup_output which just calls rShutdown (points to donothingCleanup of printsimpleDR) if (has_value) do_tup_output(tstate, values, nulls); end_tup_output(tstate); Can you test the use case where the pg_receivewal asks READ_REPLICATION_SLOT with a non-existent replication slot and see with your v3 patch how it behaves? Why don't you remove has_value flag and do this in ReadReplicationSlot: Datum values[5]; bool nulls[5]; MemSet(values, 0, sizeof(values)); MemSet(nulls, 0, sizeof(nulls)); + dest = CreateDestReceiver(DestRemoteSimple); + tstate = begin_tup_output_tupdesc(dest, tupdesc, &TTSOpsVirtual); + do_tup_output(tstate, values, nulls); + end_tup_output(tstate); 7) Why don't we use 2 separate variables "restart_tli", "confirmed_flush_tli" instead of "slots_position_timeline", just to be more informative? 8) I still have the question like, how can a client (pg_receivewal for instance) know that it is the current owner/user of the slot it is requesting the info? As I said upthread, why don't we send "active" and "active_pid" fields of the pg_replication_slots view? Also, it would be good to send the "wal_status" field so that the client can check if the "wal_status" is not "lost"? 9) There are 2 new lines at the end of ReadReplicationSlot. We give only one new line after each function definition. end_tup_output(tstate); } <<1stnewline>> <<2ndnewline>> /* * Handle TIMELINE_HISTORY command. */ 10) Why do we need to have two test cases for "non-existent" slots? Isn't the test case after "DROP REPLICATION" enough? +($ret, $stdout, $stderr) = $node_primary->psql( + 'postgres', 'READ_REPLICATION_SLOT non_existent_slot;', + extra_params => [ '-d', $connstr_rep ]); +ok( $ret == 0, + "READ_REPLICATION_SLOT does not produce an error with non existent slot"); +ok( $stdout eq '', + "READ_REPLICATION_SLOT returns no tuple if a slot is non existent"); You can just rename the test case name from: + "READ_REPLICATION_SLOT returns no tuple if a slot has been dropped"); to + "READ_REPLICATION_SLOT returns no tuple if a slot is non existent"); Regards, Bharath Rupireddy.
On Tue, Aug 31, 2021 at 4:47 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Mon, Aug 30, 2021 at 3:26 PM Ronan Dunklau <ronan.dunklau@aiven.io> wrote: > > Thank you for this review ! Please see the other side of the thread where I > > answered Michael Paquier with a new patchset, which includes some of your > > proposed modifications. > > Thanks for the updated patches. Here are some comments on v3-0001 > patch. I will continue to review 0002 and 0003. Continuing my review on the v3 patch set: 0002: 1) I think the following message "could not fetch replication slot LSN: got %d rows and %d fields, expected %d rows and %d or more fields" should be: "could not read replication slot: got %d rows and %d fields, expected %d rows and %d or more fields" 2) Why GetSlotInformation is returning InvalidXLogRecPtr? Change it to return false instead. 3) Alignment issue: Change below 2 lines: + appendPQExpBuffer(query, "READ_REPLICATION_SLOT %s", + slot_name); To 1 line, as it will be under 80 char limit: + appendPQExpBuffer(query, "READ_REPLICATION_SLOT %s", slot_name); 4) Shouldn't your GetSlotInformation return what ever the READ_REPLICATION_SLOT gets as output columns to be more generic? Callers would pass non-null/null pointers to the inputs required/not required for them. Please refer to RunIdentifySystem how it does that. GetSlotInformation can just read the tuples that are interested for the callers. bool GetSlotInformation(PGconn *conn, const char *slot_name, char **slot_type, XLogRecPtr *restart_lsn, uint32 *restart_lsn_tli, XLogRecPtr *confirmed_lsn, XLogRecPtr *confirmed_lsn_tli) { if (slot_type) { /* get the slot_type value from the received tuple */ } if (restart_lsn) { /* get the restart_lsn value from the received tuple */ } if (restart_lsn_tli) { /* get the restart_lsn_tli value from the received tuple */ } if (confirmed_lsn) { /* get the confirmed_lsn value from the received tuple */ } if (confirmed_lsn_tli) { /* get the confirmed_lsn_tli value from the received tuple */ } } 5) How about below as GetSlotInformation function comment? /* * Run READ_REPLICATION_SLOT through a given connection and give back to caller * following information of the slot if requested: * - type * - restart lsn * - restart lsn timeline * - confirmed lsn * - confirmed lsn timeline */ 6) Do you need +#include "pqexpbuffer.h" in pg_receivewal.c? 7) Missing "," after information and it is not required to use "the" in the messages. Change below + pg_log_warning("could not fetch the replication_slot \"%s\" information " + "resuming from the current server position instead", replication_slot); to: + pg_log_warning("could not fetch replication_slot \"%s\" information, " + "resuming from current server position instead", replication_slot); 8) A typo "suport". Ignore this comment, if you incorporate review comment #10. Change below pg_log_warning("server does not suport fetching the slot's position, " "resuming from the current server position instead"); to: pg_log_warning("server does not support getting start LSN from replication slot, " "resuming from current server position instead"); 9) I think you should do free the memory allocated to slot_type by GetSlotInformation: + if (strcmp(slot_type, "physical") != 0) + { + pg_log_error("slot \"%s\" is not a physical replication slot", + replication_slot); + exit(1); + } + + pg_free(slot_type); 10) Isn't it PQclear(res);? + PQclear(0); 11) I don't think you need to check for the null value of replication_slot. In the StreamLog it can't be null, so you can safely remove the below if condition. + if (replication_slot) + { 12) How about /* Try to get start position from server's replication slot information */ insted of + /* Try to get it from the slot if any, and the server supports it */ 13) When you say that the server supports the new READ_REPLICATION_SLOT command only if version >= 15000, then isn't it the function GetSlotInformation doing the following: bool GetSlotInformation(PGconn *conn,....,bool *is_supported) { if (PQserverVersion(conn) < 15000) { *is_supported = false; return false; } *is_supported = true; } So, the callers will just do: /* Try to get start position from server's replication slot information */ char *slot_type = NULL; bool is_supported; if (!GetSlotInformation(conn, replication_slot, &stream.startpos, &stream.timeline, &slot_type, &is_supported)) { if (!is_supported) pg_log_warning("server does not support getting start LSN from replication slot, " "resuming from current server position instead"); else pg_log_warning("could not fetch replication_slot \"%s\" information, " "resuming from current server position instead", replication_slot); } if (slot_type && strcmp(slot_type, "physical") != 0) { pg_log_error("slot \"%s\" is not a physical replication slot", replication_slot); exit(1); } pg_free(slot_type); } 14) Instead of just + if (strcmp(slot_type, "physical") != 0) do + if (slot_type && strcmp(slot_type, "physical") != 0) 0003: 1) The message should start with lower case: "slot \"%s\" is not a physical replication slot". + pg_log_error("Slot \"%s\" is not a physical replication slot", 2) + if (strcmp(slot_type, "physical") != 0) do + if (slot_type && strcmp(slot_type, "physical") != 0) Regards, Bharath Rupireddy.
Le mardi 31 août 2021, 13:17:22 CEST Bharath Rupireddy a écrit : > On Mon, Aug 30, 2021 at 3:26 PM Ronan Dunklau <ronan.dunklau@aiven.io> wrote: > > Thank you for this review ! Please see the other side of the thread where > > I > > answered Michael Paquier with a new patchset, which includes some of your > > proposed modifications. > > Thanks for the updated patches. Here are some comments on v3-0001 > patch. I will continue to review 0002 and 0003. Thank you ! I will send a new version shortly, once I address your remarks concerning patch 0002 (and hopefully 0003 :-) ) > > 1) Missing full stops "." at the end. > + <literal>logical</literal> > + when following the current timeline history > + position, when following the current timeline history > Good catch, I will take care of it for the next version. > 2) Can we have the "type" column as "slot_type" just to keep in sync > with the pg_replication_slots view? You're right, it makes more sense like this. > > 3) Can we mention the link to pg_replication_slots view in the columns > - "type", "restart_lsn", "confirmed_flush_lsn"? > Something like: the "slot_type"/"restart_lsn"/"confirmed_flush_lsn" is > same as <link > linkend="view-pg-replication-slots"><structname>pg_replication_slots</struc > tname></link> view. Same as above, thanks. > > 4) Can we use "read_replication_slot" instead of > "identify_replication_slot", just to be in sync with the actual > command? That must have been a leftover from an earlier version of the patch, I will fix it also. > > 5) Are you actually copying the slot contents into the slot_contents > variable here? Isn't just taking the pointer to the shared memory? > + /* Copy slot contents while holding spinlock */ > + SpinLockAcquire(&slot->mutex); > + slot_contents = *slot; > + SpinLockRelease(&slot->mutex); > + LWLockRelease(ReplicationSlotControlLock); > > You could just do: > + Oid dbid; > + XLogRecPtr restart_lsn; > + XLogRecPtr confirmed_flush; > > + /* Copy the required slot contents */ > + SpinLockAcquire(&slot->mutex); > + dbid = slot.data.database; > + restart_lsn = slot.data.restart_lsn; > + confirmed_flush = slot.data.confirmed_flush; > + SpinLockRelease(&slot->mutex); > + LWLockRelease(ReplicationSlotControlLock); It's probably simpler that way. > > 6) It looks like you are not sending anything as a response to the > READ_REPLICATION_SLOT command, if the slot specified doesn't exist. > You are just calling end_tup_output which just calls rShutdown (points > to donothingCleanup of printsimpleDR) > if (has_value) > do_tup_output(tstate, values, nulls); > end_tup_output(tstate); > > Can you test the use case where the pg_receivewal asks > READ_REPLICATION_SLOT with a non-existent replication slot and see > with your v3 patch how it behaves? > > Why don't you remove has_value flag and do this in ReadReplicationSlot: > Datum values[5]; > bool nulls[5]; > MemSet(values, 0, sizeof(values)); > MemSet(nulls, 0, sizeof(nulls)); > > + dest = CreateDestReceiver(DestRemoteSimple); > + tstate = begin_tup_output_tupdesc(dest, tupdesc, &TTSOpsVirtual); > + do_tup_output(tstate, values, nulls); > + end_tup_output(tstate); As for the API, I think it's cleaner to just send an empty result set instead of a null tuple in that case but I won't fight over it if there is consensus on having an all-nulls tuple value instead. There is indeed a bug, but not here, in the second patch: I still test the slot type even when we didn't fetch anything. So I will add a test for that too. > > 7) Why don't we use 2 separate variables "restart_tli", > "confirmed_flush_tli" instead of "slots_position_timeline", just to be > more informative? You're right. > > 8) I still have the question like, how can a client (pg_receivewal for > instance) know that it is the current owner/user of the slot it is > requesting the info? As I said upthread, why don't we send "active" > and "active_pid" fields of the pg_replication_slots view? > Also, it would be good to send the "wal_status" field so that the > client can check if the "wal_status" is not "lost"? As for pg_receivewal, it can only check that it's not active at that time, since we only aquire the replication slot once we know the start_lsn. For the basebackup case it's the same thing as we only want to check if it exists. As such, I didn't add them as I didn't see the need, but if it can be useful why not ? I will do that in the next version. > > 9) There are 2 new lines at the end of ReadReplicationSlot. We give > only one new line after each function definition. > end_tup_output(tstate); > } > <<1stnewline>> > <<2ndnewline>> > /* > * Handle TIMELINE_HISTORY command. > */ > Ok ! > 10) Why do we need to have two test cases for "non-existent" slots? > Isn't the test case after "DROP REPLICATION" enough? > +($ret, $stdout, $stderr) = $node_primary->psql( > + 'postgres', 'READ_REPLICATION_SLOT non_existent_slot;', > + extra_params => [ '-d', $connstr_rep ]); > +ok( $ret == 0, > + "READ_REPLICATION_SLOT does not produce an error with non existent > slot"); +ok( $stdout eq '', > + "READ_REPLICATION_SLOT returns no tuple if a slot is non existent"); > > You can just rename the test case name from: > + "READ_REPLICATION_SLOT returns no tuple if a slot has been dropped"); > to > + "READ_REPLICATION_SLOT returns no tuple if a slot is non existent"); > I wanted to test both the case where no slot by this name exists, and the case where it has been dropped hence still referenced but marked as not "in_use". Maybe it's not worth it and we can remove the first case as you suggest. Best regards, -- Ronan Dunklau
On Mon, Aug 30, 2021 at 3:26 PM Ronan Dunklau <ronan.dunklau@aiven.io> wrote: > > 7) How about we let pg_receivewal use READ_REPLICATION_SLOT as an option? > > From my point of view, I already expected it to use something like that when > using a replication slot. Maybe an option to turn it off could be useful ? IMO, pg_receivewal should have a way to turn off/on using READ_REPLICATION_SLOT. Imagine if the postgres server doesn't support READ_REPLICATION_SLOT (a lower version) but for some reasons the pg_receivewal(running separately) is upgraded to the version that uses READ_REPLICATION_SLOT, knowing that the server doesn't support READ_REPLICATION_SLOT why should user let pg_receivewal run an unnecessary code? Regards, Bharath Rupireddy.
At Wed, 1 Sep 2021 10:30:05 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in > On Mon, Aug 30, 2021 at 3:26 PM Ronan Dunklau <ronan.dunklau@aiven.io> wrote: > > > 7) How about we let pg_receivewal use READ_REPLICATION_SLOT as an option? > > > > From my point of view, I already expected it to use something like that when > > using a replication slot. Maybe an option to turn it off could be useful ? > > IMO, pg_receivewal should have a way to turn off/on using > READ_REPLICATION_SLOT. Imagine if the postgres server doesn't support > READ_REPLICATION_SLOT (a lower version) but for some reasons the > pg_receivewal(running separately) is upgraded to the version that uses > READ_REPLICATION_SLOT, knowing that the server doesn't support > READ_REPLICATION_SLOT why should user let pg_receivewal run an > unnecessary code? If I read the patch correctly the situation above is warned by the following message then continue to the next step giving up to identify start position from slot data. > "server does not suport fetching the slot's position, resuming from the current server position instead" (The message looks a bit too long, though..) However, if the operator doesn't know the server is old, pg_receivewal starts streaming from unexpected position, which is a kind of disaster. So I'm inclined to agree to Bharath, but rather I imagine of an option to explicitly specify how to determine the start position. --start-source=[server,wal,slot] specify starting-LSN source, default is trying all of them in the order of wal, slot, server. I don't think the option doesn't need to accept multiple values at once. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Thu, Sep 2, 2021 at 11:15 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > At Wed, 1 Sep 2021 10:30:05 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in > > On Mon, Aug 30, 2021 at 3:26 PM Ronan Dunklau <ronan.dunklau@aiven.io> wrote: > > > > 7) How about we let pg_receivewal use READ_REPLICATION_SLOT as an option? > > > > > > From my point of view, I already expected it to use something like that when > > > using a replication slot. Maybe an option to turn it off could be useful ? > > > > IMO, pg_receivewal should have a way to turn off/on using > > READ_REPLICATION_SLOT. Imagine if the postgres server doesn't support > > READ_REPLICATION_SLOT (a lower version) but for some reasons the > > pg_receivewal(running separately) is upgraded to the version that uses > > READ_REPLICATION_SLOT, knowing that the server doesn't support > > READ_REPLICATION_SLOT why should user let pg_receivewal run an > > unnecessary code? > > If I read the patch correctly the situation above is warned by the > following message then continue to the next step giving up to identify > start position from slot data. > > > "server does not suport fetching the slot's position, resuming from the current server position instead" > > (The message looks a bit too long, though..) > > However, if the operator doesn't know the server is old, pg_receivewal > starts streaming from unexpected position, which is a kind of > disaster. So I'm inclined to agree to Bharath, but rather I imagine of > an option to explicitly specify how to determine the start position. > > --start-source=[server,wal,slot] specify starting-LSN source, default is > trying all of them in the order of wal, slot, server. > > I don't think the option doesn't need to accept multiple values at once. If --start-source = 'wal' fails, then pg_receivewal should show up an error saying "cannot find start position from <<user-specified-wal>> directory, try with "server" or "slot" for --start-source". We might end having similar errors for other options as well. Isn't this going to create unnecessary complexity? The existing way the pg_receivewal fetches start pos i.e. first from wal directory and then from server start position, isn't known to the user at all, no verbose message or anything specified in the docs. Why do we need to expose this with the --start-source option? IMO, we can keep it that way and we can just have a way to turn off the new behaviour that we are proposing here, i.e.fetching the start position from the slot's restart_lsn. Regards, Bharath Rupireddy.
Le jeudi 2 septembre 2021, 08:42:22 CEST Bharath Rupireddy a écrit : > On Thu, Sep 2, 2021 at 11:15 AM Kyotaro Horiguchi > > <horikyota.ntt@gmail.com> wrote: > > At Wed, 1 Sep 2021 10:30:05 +0530, Bharath Rupireddy > > <bharath.rupireddyforpostgres@gmail.com> wrote in> > > > On Mon, Aug 30, 2021 at 3:26 PM Ronan Dunklau <ronan.dunklau@aiven.io> wrote: > > > > > 7) How about we let pg_receivewal use READ_REPLICATION_SLOT as an > > > > > option? > > > > > > > > From my point of view, I already expected it to use something like > > > > that when using a replication slot. Maybe an option to turn it off > > > > could be useful ?> > > > > IMO, pg_receivewal should have a way to turn off/on using > > > READ_REPLICATION_SLOT. Imagine if the postgres server doesn't support > > > READ_REPLICATION_SLOT (a lower version) but for some reasons the > > > pg_receivewal(running separately) is upgraded to the version that uses > > > READ_REPLICATION_SLOT, knowing that the server doesn't support > > > READ_REPLICATION_SLOT why should user let pg_receivewal run an > > > unnecessary code? > > > > If I read the patch correctly the situation above is warned by the > > following message then continue to the next step giving up to identify > > start position from slot data. > > > > > "server does not suport fetching the slot's position, resuming from the > > > current server position instead"> > > (The message looks a bit too long, though..) > > > > However, if the operator doesn't know the server is old, pg_receivewal > > starts streaming from unexpected position, which is a kind of > > disaster. So I'm inclined to agree to Bharath, but rather I imagine of > > an option to explicitly specify how to determine the start position. > > > > --start-source=[server,wal,slot] specify starting-LSN source, default is > > > > trying all of them in the order of wal, slot, server. > > > > I don't think the option doesn't need to accept multiple values at once. > > If --start-source = 'wal' fails, then pg_receivewal should show up an > error saying "cannot find start position from <<user-specified-wal>> > directory, try with "server" or "slot" for --start-source". We might > end having similar errors for other options as well. Isn't this going > to create unnecessary complexity? > > The existing way the pg_receivewal fetches start pos i.e. first from > wal directory and then from server start position, isn't known to the > user at all, no verbose message or anything specified in the docs. Why > do we need to expose this with the --start-source option? IMO, we can > keep it that way and we can just have a way to turn off the new > behaviour that we are proposing here, i.e.fetching the start position > from the slot's restart_lsn. Then it should probably be documented. We write in the docs that it is strongly recommended to use a replication slot, but do not mention how we resume from have been already processed. If someone really cares about having control over how the start position is defined instead of relying on the auto detection, it would be wiser to add a -- startpos parameter similar to the endpos one, which would override everything else, instead of different flags for different behaviours. Regards, -- Ronan Dunklau
Le jeudi 2 septembre 2021, 09:28:29 CEST Michael Paquier a écrit : > On Thu, Sep 02, 2021 at 02:45:54PM +0900, Kyotaro Horiguchi wrote: > > At Wed, 1 Sep 2021 10:30:05 +0530, Bharath Rupireddy > > <bharath.rupireddyforpostgres@gmail.com> wrote in If I read the patch > > correctly the situation above is warned by the following message then > > continue to the next step giving up to identify start position from slot > > data. > > Better to fallback to the past behavior if attempting to use a > pg_receivewal >= 15 with a PG instance older than 14. > > >> "server does not suport fetching the slot's position, resuming from the > >> current server position instead"> > > (The message looks a bit too long, though..) > > Agreed. Falling back to a warning is not the best answer we can have > here, as there could be various failure types, and for some of them a > hard failure is adapted; > - Failure in the backend while running READ_REPLICATION_SLOT. This > should imply a hard failure, no? > - Slot that does not exist. In this case, we could fall back to the > current write position of the server. > > by default if the slot information cannot be retrieved. > Something that's disturbing me in patch 0002 is that we would ignore > the results of GetSlotInformation() if any error happens, even if > there is a problem in the backend, like an OOM. We should be careful > about the semantics here. Ok ! > > > However, if the operator doesn't know the server is old, pg_receivewal > > starts streaming from unexpected position, which is a kind of > > disaster. So I'm inclined to agree to Bharath, but rather I imagine of > > an option to explicitly specify how to determine the start position. > > > > --start-source=[server,wal,slot] specify starting-LSN source, default is > > > > trying all of them in the order of wal, slot, server. > > > > I don't think the option doesn't need to accept multiple values at once. > > What is the difference between "wal" and "server"? "wal" stands for > the start position of the set of files stored in the location > directory, and "server" is the location that we'd receive from the > server? I don't think that we need that because, when using a slot, > we know that we can rely on the LSN that the slot retains for > pg_receivewal as that should be the same point as what has been > streamed last. Could there be an argument instead for changing the > default and rely on the slot information rather than scanning the > local WAL archive path for the start point when using --slot? When > using pg_receivewal as a service, relying on a scan of the WAL archive > directory if there is no slot and fallback to an invalid LSN if there > is nothing is fine by me, but I think that just relying on the slot > information is saner as the backend makes sure that nothing is > missing. That's also more useful when streaming changes from a single > slot from multiple locations (stream to location 1 with a slot, stop > pg_receivewal, stream to location 2 that completes 1 with the same > slot). One benefit I see from first trying to get it from the local WAL stream is that we may end up in a state where it has been flushed to disk but we couldn't advance the replication slot. In that case it is better to resume from the point on disk. Maybe taking the max(slot_lsn, local_file_lsn) would work best for the use case you're describing. > > + pg_log_error("Slot \"%s\" is not a physical replication slot", > + replication_slot); > In 0003, the format of this error is not really project-like. > Something like that perhaps would be more adapted: > "cannot use the slot provided, physical slot expected." > > I am not really convinced about the need of getting the active state > and the PID used in the backend when fetcing the slot data, > particularly if that's just for some frontend-side checks. The > backend has safeguards already for all that. I could see the use for sending active_pid for use within pg_basebackup: at least we could fail early if the slot is already in use. But at the same time, maybe it won't be in use anymore once we need it. > > While looking at that, I have applied de1d4fe to add > PostgresNode::command_fails_like(), coming from 0003, and put my hands > on 0001 as per the attached, as the starting point. That basically > comes down to all the points raised upthread, plus some tweaks for > things I bumped into to get the semantics of the command to what looks > like the right shape. Thanks, I was about to send a new patchset with basically the same thing. It would be nice to know we work on the same thing concurrently in the future to avoid duplicate efforts. I'll rebase and send the updated version for patches 0002 and 0003 of my original proposal once we reach an agreement over the behaviour / options of pg_receivewal. Also considering the number of different fields to be filled by the GetSlotInformation function, my local branch groups them into a dedicated struct which is more convenient than having X possibly null arguments. -- Ronan Dunklau
Le jeudi 2 septembre 2021, 10:37:20 CEST Michael Paquier a écrit : > On Thu, Sep 02, 2021 at 10:08:26AM +0200, Ronan Dunklau wrote: > > I could see the use for sending active_pid for use within pg_basebackup: > > at > > least we could fail early if the slot is already in use. But at the same > > time, maybe it won't be in use anymore once we need it. > > There is no real concurrent protection with this design. You could > read that the slot is not active during READ_REPLICATION_SLOT just to > find out after in the process of pg_basebackup streaming WAL that it > became in use in-between. And the backend-side protections would kick > in at this stage. > > Hmm. The logic doing the decision-making with pg_receivewal may > become more tricky when it comes to pg_replication_slots.wal_status, > max_slot_wal_keep_size and pg_replication_slots.safe_wal_size. The > number of cases we'd like to consider impacts directly the amount of > data send through READ_REPLICATION_SLOT. That's not really different > than deciding of a failure, a success or a retry with active_pid at an > earlier or a later stage of a base backup. pg_receivewal, on the > contrary, can just rely on what the backend tells when it begins > streaming. So I'd prefer keeping things simple and limit the number > of fields a minimum for this command. Ok. Please find attached new patches rebased on master.* 0001 is yours without any modification. 0002 for pg_receivewal tried to simplify the logic of information to return, by using a dedicated struct for this. This accounts for Bahrath's demands to return every possible field. In particular, an is_logical field simplifies the detection of the type of slot. In my opinion it makes sense to simplify it like this on the client side while being more open-minded on the server side if we ever need to provide a new type of slot. Also, GetSlotInformation now returns an enum to be able to handle the different modes of failures, which differ between pg_receivewal and pg_basebackup. 0003 is the pg_basebackup one, not much changed except for the concerns you had about the log message and handling of different failure modes. There is still the concern raised by Bharath about being able to select the way to fetch the replication slot information for the user, and what should or should not be included in the documentation. I am in favor of documenting the process of selecting the wal start, and maybe include a --start-lsn option for the user to override it, but that's maybe for another patch. -- Ronan Dunklau
Attachment
On Fri, Sep 3, 2021 at 3:28 PM Ronan Dunklau <ronan.dunklau@aiven.io> wrote: > There is still the concern raised by Bharath about being able to select the > way to fetch the replication slot information for the user, and what should or > should not be included in the documentation. I am in favor of documenting the > process of selecting the wal start, and maybe include a --start-lsn option for > the user to override it, but that's maybe for another patch. Let's hear from others. Thanks for the patches. I have some quick comments on the v5 patch-set: 0001: 1) Do you also want to MemSet values too in ReadReplicationSlot? 2) When if clause has single statement we don't generally use "{" and "}" + if (slot == NULL || !slot->in_use) + { + LWLockRelease(ReplicationSlotControlLock); + } you can just have: + if (slot == NULL || !slot->in_use) + LWLockRelease(ReplicationSlotControlLock); 3) This is still not copying the slot contents, as I said earlier you can just take the required info into some local variables instead of taking the slot pointer to slot_contents pointer. + /* Copy slot contents while holding spinlock */ + SpinLockAcquire(&slot->mutex); + slot_contents = *slot; + SpinLockRelease(&slot->mutex); + LWLockRelease(ReplicationSlotControlLock); 4) As I said earlier, why can't we have separate tli variables for more readability instead of just one slots_position_timeline and timeline_history variable? And you are not even resetting those 2 variables after if (!XLogRecPtrIsInvalid(slot_contents.data.restart_lsn)), you might end up sending the restart_lsn timelineid for confirmed_flush. So, better use two separate variables. In fact you can use block local variables: + if (!XLogRecPtrIsInvalid(slot_contents.data.restart_lsn)) + { + List *tl_history= NIL; + TimeLineID tli; + tl_history= readTimeLineHistory(ThisTimeLineID); + tli = tliOfPointInHistory(slot_contents.data.restart_lsn, + tl_history); + values[i] = Int32GetDatum(tli); + nulls[i] = false; + } similarly for confirmed_flush. 5) I still don't see the need for below test case: + "READ_REPLICATION_SLOT does not produce an error with non existent slot"); when we have + "READ_REPLICATION_SLOT does not produce an error with dropped slot"); Because for a user, dropped or non existent slot is same, it's just that for dropped slot we internally don't delete its entry from the shared memory. 0002: 1) Imagine GetSlotInformation always returns READ_REPLICATION_SLOT_ERROR, don't you think StreamLog enters an infinite loop there? Instead, why don't you just exit(1); instead of return; and retry? Similarly for READ_REPLICATION_SLOT_NONEXISTENT? I think, you can just do exit(1), no need to retry. + case READ_REPLICATION_SLOT_ERROR: + + /* + * Error has been logged by GetSlotInformation, return and + * maybe retry + */ + return; 2) Why is it returning READ_REPLICATION_SLOT_OK when slot_info isn't passed? And why are you having this check after you connect to the server and fetch the data? + /* If no slotinformation has been passed, we can return immediately */ + if (slot_info == NULL) + { + PQclear(res); + return READ_REPLICATION_SLOT_OK; + } Instead you can just have a single assert: + Assert(slot_name && slot_info ); 3) How about pg_log_error("could not read replication slot: instead of pg_log_error("could not fetch replication slot: 4) Why are you having the READ_REPLICATION_SLOT_OK case in default? + default: + if (slot_info.is_logical) + { + /* + * If the slot is not physical we can't expect to + * recover from that + */ + pg_log_error("cannot use the slot provided, physical slot expected."); + exit(1); + } + stream.startpos = slot_info.restart_lsn; + stream.timeline = slot_info.restart_tli; + } You can just have another case statement for READ_REPLICATION_SLOT_OK and in the default you can throw an error "unknown read replication slot status" or some other better working and exit(1); 5) Do you want initialize slot_info to 0? + if (replication_slot) + { + SlotInformation slot_info; + MemSet(slot_info, 0, sizeof(SlotInformation)); 6) This isn't readable: + slot_info->is_logical = strcmp(PQgetvalue(res, 0, 0), "logical") == 0; How about: if (strcmp(PQgetvalue(res, 0, 0), "logical") == 0) slot_info->is_logical = true; You don't need to set it false, because you would have MemSet(slot_info) in the caller. 7) How about uint32 hi; uint32 lo; instead of + uint32 hi, + lo; 8) Move SlotInformation * slot_info) to the next line as it crosses the 80 char limit. +GetSlotInformation(PGconn *conn, const char *slot_name, SlotInformation * slot_info) 9) Instead of a boolean is_logical, I would rather suggest to use an enum or #define macros the slot types correctly, because someday we might introduce new type slots and somebody wants is_physical kind of variable name? +typedef struct SlotInformation { + bool is_logical; Regards, Bharath Rupireddy.
Le lundi 6 septembre 2021, 06:22:30 CEST Michael Paquier a écrit : > On Fri, Sep 03, 2021 at 11:58:27AM +0200, Ronan Dunklau wrote: > > 0002 for pg_receivewal tried to simplify the logic of information to > > return, by using a dedicated struct for this. This accounts for Bahrath's > > demands to return every possible field. > > In particular, an is_logical field simplifies the detection of the type of > > slot. In my opinion it makes sense to simplify it like this on the client > > side while being more open-minded on the server side if we ever need to > > provide a new type of slot. Also, GetSlotInformation now returns an enum > > to be able to handle the different modes of failures, which differ > > between pg_receivewal and pg_basebackup. > > + if (PQserverVersion(conn) < 150000) > + return READ_REPLICATION_SLOT_UNSUPPORTED; > [...] > +typedef enum { > + READ_REPLICATION_SLOT_OK, > + READ_REPLICATION_SLOT_UNSUPPORTED, > + READ_REPLICATION_SLOT_ERROR, > + READ_REPLICATION_SLOT_NONEXISTENT > +} ReadReplicationSlotStatus; > > Do you really need this much complication? We could treat the > unsupported case and the non-existing case the same way: we don't know > so just assign InvalidXlogRecPtr or NULL to the fields of the > structure, and make GetSlotInformation() return false just on error, > with some pg_log_error() where adapted in its internals. I actually started with the implementation you propose, but changed my mind while writing it because I realised it's easier to reason about like this, instead of failing silently during READ_REPLICATION_SLOT to fail a bit later when actually trying to start the replication slot because it doesn't exists. Either the user expects the replication slot to exists, and in this case we should retry the whole loop in the hope of getting an interesting LSN, or the user doesn't and shouldn't even pass a replication_slot name to begin with. > > > There is still the concern raised by Bharath about being able to select > > the > > way to fetch the replication slot information for the user, and what > > should or should not be included in the documentation. I am in favor of > > documenting the process of selecting the wal start, and maybe include a > > --start-lsn option for the user to override it, but that's maybe for > > another patch. > > The behavior of pg_receivewal that you are implementing should be > documented. We don't say either how the start location is selected > when working on an existing directory, so I would recommend to add a > paragraph in the description section to detail all that, as of: > - First a scan of the existing archives is done. > - If nothing is found, and if there is a slot, request the slot > information. > - If still nothing (aka slot undefined, or command not supported), use > the last flush location. Sounds good, I will add another patch for the documentation of this. > > As a whole, I am not really convinced that we need a new option for > that as long as we rely on a slot with pg_receivewal as these are used > to make sure that we avoid holes in the WAL archives. > > Regarding pg_basebackup, Daniel has proposed a couple of days ago a > different solution to trap errors earlier, which would cover the case > dealt with here: > https://www.postgresql.org/message-id/0F69E282-97F9-4DB7-8D6D-F927AA6340C8@y > esql.se I will take a look. > We should not mimic in the frontend errors that are safely trapped > in the backend with the proper locks, in any case. I don't understand what you mean by this ? My original proposal was for the command to actually attach to the replication slot while reading it, thus keeping a lock on it to prevent dropping or concurrent access on the server. > > While on it, READ_REPLICATION_SLOT returns a confirmed LSN when > grabbing the data of a logical slot. We are not going to use that > with pg_recvlogical as by default START_STREAMING 0/0 would just use > the confirmed LSN. Do we have use cases where this information would > help? There is the argument of consistency with physical slots and > that this can be helpful to do sanity checks for clients, but that's > rather thin. If we don't, we should rename the command to READ_PHYSICAL_REPLICATION_SLOT and error out on the server if the slot is not actually a physical one to spare the client from performing those checks. I still think it's better to support both cases as opposed to having two completely different APIs (READ_(PHYSICAL)_REPLICATION_SLOT for physical ones on a replication connection, pg_replication_slots view for logical ones) as it would enable more for third-party clients at a relatively low maintenance cost for us. -- Ronan Dunklau
On Mon, Sep 6, 2021 at 12:20 PM Ronan Dunklau <ronan.dunklau@aiven.io> wrote: > > Le lundi 6 septembre 2021, 06:22:30 CEST Michael Paquier a écrit : > > On Fri, Sep 03, 2021 at 11:58:27AM +0200, Ronan Dunklau wrote: > > > 0002 for pg_receivewal tried to simplify the logic of information to > > > return, by using a dedicated struct for this. This accounts for Bahrath's > > > demands to return every possible field. > > > In particular, an is_logical field simplifies the detection of the type of > > > slot. In my opinion it makes sense to simplify it like this on the client > > > side while being more open-minded on the server side if we ever need to > > > provide a new type of slot. Also, GetSlotInformation now returns an enum > > > to be able to handle the different modes of failures, which differ > > > between pg_receivewal and pg_basebackup. > > > > + if (PQserverVersion(conn) < 150000) > > + return READ_REPLICATION_SLOT_UNSUPPORTED; > > [...] > > +typedef enum { > > + READ_REPLICATION_SLOT_OK, > > + READ_REPLICATION_SLOT_UNSUPPORTED, > > + READ_REPLICATION_SLOT_ERROR, > > + READ_REPLICATION_SLOT_NONEXISTENT > > +} ReadReplicationSlotStatus; > > > > Do you really need this much complication? We could treat the > > unsupported case and the non-existing case the same way: we don't know > > so just assign InvalidXlogRecPtr or NULL to the fields of the > > structure, and make GetSlotInformation() return false just on error, > > with some pg_log_error() where adapted in its internals. > > I actually started with the implementation you propose, but changed my mind > while writing it because I realised it's easier to reason about like this, > instead of failing silently during READ_REPLICATION_SLOT to fail a bit later > when actually trying to start the replication slot because it doesn't exists. > Either the user expects the replication slot to exists, and in this case we > should retry the whole loop in the hope of getting an interesting LSN, or the > user doesn't and shouldn't even pass a replication_slot name to begin with. I don't think so we need to retry the whole loop if the READ_REPLICATION_SLOT command fails as pg_receivewal might enter an infinite loop there. IMO, we should just exit(1) if READ_REPLICATION_SLOT fails. > > > There is still the concern raised by Bharath about being able to select > > > the > > > way to fetch the replication slot information for the user, and what > > > should or should not be included in the documentation. I am in favor of > > > documenting the process of selecting the wal start, and maybe include a > > > --start-lsn option for the user to override it, but that's maybe for > > > another patch. > > > > The behavior of pg_receivewal that you are implementing should be > > documented. We don't say either how the start location is selected > > when working on an existing directory, so I would recommend to add a > > paragraph in the description section to detail all that, as of: > > - First a scan of the existing archives is done. > > - If nothing is found, and if there is a slot, request the slot > > information. > > - If still nothing (aka slot undefined, or command not supported), use > > the last flush location. > > Sounds good, I will add another patch for the documentation of this. +1. > > While on it, READ_REPLICATION_SLOT returns a confirmed LSN when > > grabbing the data of a logical slot. We are not going to use that > > with pg_recvlogical as by default START_STREAMING 0/0 would just use > > the confirmed LSN. Do we have use cases where this information would > > help? There is the argument of consistency with physical slots and > > that this can be helpful to do sanity checks for clients, but that's > > rather thin. > > If we don't, we should rename the command to READ_PHYSICAL_REPLICATION_SLOT > and error out on the server if the slot is not actually a physical one to > spare the client from performing those checks. I still think it's better to > support both cases as opposed to having two completely different APIs > (READ_(PHYSICAL)_REPLICATION_SLOT for physical ones on a replication > connection, pg_replication_slots view for logical ones) as it would enable > more for third-party clients at a relatively low maintenance cost for us. -1 for READ_PHYSICAL_REPLICATION_SLOT or failing on the server. What happens if we have another slot type "PHYSIOLOGICAL" or "FOO" or "BAR" some other? IMO, READ_REPLICATION_SLOT should just return info of all slots. The clients know better how to deal with the slot type. Although, we don't have a use case for logical slots with the READ_REPLICATION_SLOT command, let's not change it. If others are still concerned about the unnecessary slot being returned, you might consider passing in a parameter to READ_REPLICATION_SLOT command, something like below. But this too looks complex to me. I would vote for what the existing patch does with READ_REPLICATION_SLOT. READ_REPLICATION_SLOT 'slot_name' 'physical'; only returns physical slot info with the given name. READ_REPLICATION_SLOT 'slot_name' 'logical'; only returns logical slot with the given name. Regards, Bharath Rupireddy.
Le vendredi 3 septembre 2021 17:49:34 CEST, vous avez écrit : > On Fri, Sep 3, 2021 at 3:28 PM Ronan Dunklau <ronan.dunklau@aiven.io> wrote: > > There is still the concern raised by Bharath about being able to select > > the > > way to fetch the replication slot information for the user, and what > > should or should not be included in the documentation. I am in favor of > > documenting the process of selecting the wal start, and maybe include a > > --start-lsn option for the user to override it, but that's maybe for > > another patch. > > Let's hear from others. > > Thanks for the patches. I have some quick comments on the v5 patch-set: > > 0001: > 1) Do you also want to MemSet values too in ReadReplicationSlot? > > 2) When if clause has single statement we don't generally use "{" and "}" > + if (slot == NULL || !slot->in_use) > + { > + LWLockRelease(ReplicationSlotControlLock); > + } > you can just have: > + if (slot == NULL || !slot->in_use) > + LWLockRelease(ReplicationSlotControlLock); > > 3) This is still not copying the slot contents, as I said earlier you > can just take the required info into some local variables instead of > taking the slot pointer to slot_contents pointer. > + /* Copy slot contents while holding spinlock */ > + SpinLockAcquire(&slot->mutex); > + slot_contents = *slot; > + SpinLockRelease(&slot->mutex); > + LWLockRelease(ReplicationSlotControlLock); > > 4) As I said earlier, why can't we have separate tli variables for > more readability instead of just one slots_position_timeline and > timeline_history variable? And you are not even resetting those 2 > variables after if > (!XLogRecPtrIsInvalid(slot_contents.data.restart_lsn)), you might end > up sending the restart_lsn timelineid for confirmed_flush. So, better > use two separate variables. In fact you can use block local variables: > + if (!XLogRecPtrIsInvalid(slot_contents.data.restart_lsn)) > + { > + List *tl_history= NIL; > + TimeLineID tli; > + tl_history= readTimeLineHistory(ThisTimeLineID); > + tli = tliOfPointInHistory(slot_contents.data.restart_lsn, > + tl_history); > + values[i] = Int32GetDatum(tli); > + nulls[i] = false; > + } > similarly for confirmed_flush. > > 5) I still don't see the need for below test case: > + "READ_REPLICATION_SLOT does not produce an error with non existent slot"); > when we have > + "READ_REPLICATION_SLOT does not produce an error with dropped slot"); > Because for a user, dropped or non existent slot is same, it's just > that for dropped slot we internally don't delete its entry from the > shared memory. > Thank you for reiterating those concerns. As I said, I haven't touched Michael's version of the patch. I was incorporating those changes in my branch before he sent this, so I'll probably merge both before sending an updated patch. > 0002: > 1) Imagine GetSlotInformation always returns > READ_REPLICATION_SLOT_ERROR, don't you think StreamLog enters an > infinite loop there? Instead, why don't you just exit(1); instead of > return; and retry? Similarly for READ_REPLICATION_SLOT_NONEXISTENT? I > think, you can just do exit(1), no need to retry. > + case READ_REPLICATION_SLOT_ERROR: > + > + /* > + * Error has been logged by GetSlotInformation, return and > + * maybe retry > + */ > + return; This is the same behaviour we had before: if there is an error with pg_receivewal we retry the command. There was no special case for the replication slot not existing before, I don't see why we should change it now ? Eg: 2021-09-06 09:11:07.774 CEST [95853] ERROR: replication slot "nonexistent" does not exist 2021-09-06 09:11:07.774 CEST [95853] STATEMENT: START_REPLICATION SLOT "nonexistent" 0/1000000 TIMELINE 1 pg_receivewal: error: could not send replication command "START_REPLICATION": ERROR: replication slot "nonexistent" does not exist pg_receivewal: disconnected; waiting 5 seconds to try again Users may rely on it to keep retrying in the background until the slot has been created for example. > > 2) Why is it returning READ_REPLICATION_SLOT_OK when slot_info isn't > passed? And why are you having this check after you connect to the > server and fetch the data? > + /* If no slotinformation has been passed, we can return immediately */ > + if (slot_info == NULL) > + { > + PQclear(res); > + return READ_REPLICATION_SLOT_OK; > + } > Instead you can just have a single assert: > > + Assert(slot_name && slot_info ); At first it was so that we didn't have to fill in all required information if we don't need too, but it turns out pg_basebackup also has to check for the slot's type. I agree we should not support the null slot_info case anymore. > > 3) How about > pg_log_error("could not read replication slot: > instead of > pg_log_error("could not fetch replication slot: Ok. > > 4) Why are you having the READ_REPLICATION_SLOT_OK case in default? > + default: > + if (slot_info.is_logical) > + { > + /* > + * If the slot is not physical we can't expect to > + * recover from that > + */ > + pg_log_error("cannot use the slot provided, physical slot expected."); > + exit(1); > + } > + stream.startpos = slot_info.restart_lsn; > + stream.timeline = slot_info.restart_tli; > + } > You can just have another case statement for READ_REPLICATION_SLOT_OK > and in the default you can throw an error "unknown read replication > slot status" or some other better working and exit(1); Ok. > > 5) Do you want initialize slot_info to 0? > + if (replication_slot) > + { > + SlotInformation slot_info; > + MemSet(slot_info, 0, sizeof(SlotInformation)); > > 6) This isn't readable: > + slot_info->is_logical = strcmp(PQgetvalue(res, 0, 0), "logical") == 0; > How about: > if (strcmp(PQgetvalue(res, 0, 0), "logical") == 0) > slot_info->is_logical = true; > You don't need to set it false, because you would have > MemSet(slot_info) in the caller. > Isn't it preferable to fill in the whole struct without any regards to it's prior state ? I will modify the is_logical assignment to make it more readable though. > 7) How about > uint32 hi; > uint32 lo; > instead of > + uint32 hi, > + lo; > Ok. > 8) Move SlotInformation * slot_info) to the next line as it crosses > the 80 char limit. > +GetSlotInformation(PGconn *conn, const char *slot_name, > SlotInformation * slot_info) Ok. > > 9) Instead of a boolean is_logical, I would rather suggest to use an > enum or #define macros the slot types correctly, because someday we > might introduce new type slots and somebody wants is_physical kind of > variable name? > +typedef struct SlotInformation { > + bool is_logical; That's the reason why the READ_REPLICATION_SLOT command returns a slot type as a string, but the intermediate representation on the client doesn't really need to be concerned about this: it's the client after all and it will be very easy to change it locally if we need to. Thinking about it, for our use case we should really use is_physical instead of is_logical. But I guess all of this is moot if we end up deciding not to support the logical case anymore ? Best regards, -- Ronan Dunklau
On Mon, Sep 06, 2021 at 12:37:29PM +0530, Bharath Rupireddy wrote: > -1 for READ_PHYSICAL_REPLICATION_SLOT or failing on the server. What > happens if we have another slot type "PHYSIOLOGICAL" or "FOO" or "BAR" > some other? IMO, READ_REPLICATION_SLOT should just return info of all > slots. The clients know better how to deal with the slot type. > Although, we don't have a use case for logical slots with the > READ_REPLICATION_SLOT command, let's not change it. Using READ_REPLICATION_SLOT as the command name is fine, and it could be extended with more fields if necessary, implemented now with only what we think is useful. Returning errors on cases that are still not supported yet is fine, say for logical slots if we decide to reject the case for now, and testrictions can always be lifted in the future. -- Michael
Attachment
On Mon, Sep 06, 2021 at 04:17:28PM +0900, Michael Paquier wrote: > Using READ_REPLICATION_SLOT as the command name is fine, and it could > be extended with more fields if necessary, implemented now with only > what we think is useful. Returning errors on cases that are still not > supported yet is fine, say for logical slots if we decide to reject > the case for now, and testrictions can always be lifted in the > future. And marked as RwF as this was three weeks ago. Please feel free to register a new entry if this is being worked on. -- Michael
Attachment
Hello, Following recommendations, I stripped most of the features from the patch. For now we support only physical replication slots, and only provide the two fields of interest (restart_lsn, restart_tli) in addition to the slot type (fixed at physical) to not paint ourselves in a corner. I also removed the part about pg_basebackup since other fixes have been proposed for that case. Le vendredi 1 octobre 2021, 09:05:18 CEST Michael Paquier a écrit : > On Mon, Sep 06, 2021 at 04:17:28PM +0900, Michael Paquier wrote: > > Using READ_REPLICATION_SLOT as the command name is fine, and it could > > be extended with more fields if necessary, implemented now with only > > what we think is useful. Returning errors on cases that are still not > > supported yet is fine, say for logical slots if we decide to reject > > the case for now, and testrictions can always be lifted in the > > future. > > And marked as RwF as this was three weeks ago. Please feel free to > register a new entry if this is being worked on. > -- > Michael -- Ronan Dunklau
Attachment
On Tue, Oct 19, 2021 at 05:32:55PM +0200, Ronan Dunklau wrote: > Following recommendations, I stripped most of the features from the patch. For > now we support only physical replication slots, and only provide the two fields > of interest (restart_lsn, restart_tli) in addition to the slot type (fixed at > physical) to not paint ourselves in a corner. > > I also removed the part about pg_basebackup since other fixes have been > proposed for that case. Patch 0001 looks rather clean. I have a couple of comments. + if (OidIsValid(slot_contents.data.database)) + elog(ERROR, "READ_REPLICATION_SLOT is only supported for physical slots"); elog() can only be used for internal errors. Errors that can be triggered by a user should use ereport() instead. +ok($stdout eq '||', + "READ_REPLICATION_SLOT returns NULL values if slot does not exist"); [...] +ok($stdout =~ 'physical\|[^|]*\|1', + "READ_REPLICATION_SLOT returns tuple corresponding to the slot"); Isn't result pattern matching something we usually test with like()? +($ret, $stdout, $stderr) = $node_primary->psql( + 'postgres', + "READ_REPLICATION_SLOT $slotname;", + extra_params => [ '-d', $connstr_rep ]); No need for extra_params in this test. You can just pass down "replication => 1" instead, no? --- a/src/test/recovery/t/006_logical_decoding.pl +++ b/src/test/recovery/t/006_logical_decoding.pl [...] +ok($stderr=~ 'READ_REPLICATION_SLOT is only supported for physical slots', + 'Logical replication slot is not supported'); This one should use like(). + <para> + The slot's <literal>restart_lsn</literal> can also beused as a starting + point if the target directory is empty. + </para> I am not sure that there is a need for this addition as the same thing is said when describing the lookup ordering. + If nothing is found and a slot is specified, use the + <command>READ_REPLICATION_SLOT</command> + command. It may be clearer to say that the position is retrieved from the command. +bool +GetSlotInformation(PGconn *conn, const char *slot_name, XLogRecPtr *restart_lsn, TimeLineID* restart_tli) +{ Could you extend that so as we still run the command but don't crash if the caller specifies NULL for any of the result fields? This would be handy. + if (PQgetisnull(res, 0, 0)) + { + PQclear(res); + pg_log_error("replication slot \"%s\" does not exist", slot_name); + return false; + } + if (PQntuples(res) != 1 || PQnfields(res) < 3) + { + pg_log_error("could not fetch replication slot: got %d rows and %d fields, expected %d rows and %d or more fields", + PQntuples(res), PQnfields(res), 1, 3); + PQclear(res); + return false; + } Wouldn't it be better to reverse the order of these two checks? I don't mind the addition of the slot type being part of the result of READ_REPLICATION_SLOT even if it is not mandatory (?), but at least GetSlotInformation() should check after it. +# Setup the slot, and connect to it a first time +$primary->run_log( + [ 'pg_receivewal', '--slot', $slot_name, '--create-slot' ], + 'creating a replication slot'); +$primary->psql('postgres', + 'INSERT INTO test_table VALUES (generate_series(1,100));'); +$primary->psql('postgres', 'SELECT pg_switch_wal();'); +$nextlsn = + $primary->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();'); +chomp($nextlsn); Wouldn't it be simpler to use CREATE_REPLICATION_SLOT with RESERVE_WAL here, rather than going through pg_receivewal? It seems to me that this would be cheaper without really impacting the coverage. -- Michael
Attachment
Le mercredi 20 octobre 2021, 07:13:15 CEST Michael Paquier a écrit : > On Tue, Oct 19, 2021 at 05:32:55PM +0200, Ronan Dunklau wrote: > > Following recommendations, I stripped most of the features from the patch. > > For now we support only physical replication slots, and only provide the > > two fields of interest (restart_lsn, restart_tli) in addition to the slot > > type (fixed at physical) to not paint ourselves in a corner. > > > > I also removed the part about pg_basebackup since other fixes have been > > proposed for that case. > > Patch 0001 looks rather clean. I have a couple of comments. Thank you for the quick review ! > > + if (OidIsValid(slot_contents.data.database)) > + elog(ERROR, "READ_REPLICATION_SLOT is only supported for > physical slots"); > > elog() can only be used for internal errors. Errors that can be > triggered by a user should use ereport() instead. Ok. > > +ok($stdout eq '||', > + "READ_REPLICATION_SLOT returns NULL values if slot does not exist"); > [...] > +ok($stdout =~ 'physical\|[^|]*\|1', > + "READ_REPLICATION_SLOT returns tuple corresponding to the slot"); > Isn't result pattern matching something we usually test with like()? Ok. > > +($ret, $stdout, $stderr) = $node_primary->psql( > + 'postgres', > + "READ_REPLICATION_SLOT $slotname;", > + extra_params => [ '-d', $connstr_rep ]); > No need for extra_params in this test. You can just pass down > "replication => 1" instead, no? In that test file, every replication connection is obtained by using connstr_rep so I thought it would be best to use the same thing. > > --- a/src/test/recovery/t/006_logical_decoding.pl > +++ b/src/test/recovery/t/006_logical_decoding.pl > [...] > +ok($stderr=~ 'READ_REPLICATION_SLOT is only supported for physical slots', > + 'Logical replication slot is not supported'); > This one should use like(). Ok. > > + <para> > + The slot's <literal>restart_lsn</literal> can also beused as a > starting + point if the target directory is empty. > + </para> > I am not sure that there is a need for this addition as the same thing > is said when describing the lookup ordering. Ok, removed. > > + If nothing is found and a slot is specified, use the > + <command>READ_REPLICATION_SLOT</command> > + command. > It may be clearer to say that the position is retrieved from the > command. Ok, done. The doc also uses the active voice here now. > > +bool > +GetSlotInformation(PGconn *conn, const char *slot_name, XLogRecPtr > *restart_lsn, TimeLineID* restart_tli) > +{ > Could you extend that so as we still run the command but don't crash > if the caller specifies NULL for any of the result fields? This would > be handy. Done. > > + if (PQgetisnull(res, 0, 0)) > + { > + PQclear(res); > + pg_log_error("replication slot \"%s\" does not exist", > slot_name); > + return false; > + } > + if (PQntuples(res) != 1 || PQnfields(res) < 3) > + { > + pg_log_error("could not fetch replication slot: got %d rows > and %d fields, expected %d rows and %d or more fields", > + PQntuples(res), PQnfields(res), 1, 3); > + PQclear(res); > + return false; > + } > Wouldn't it be better to reverse the order of these two checks? Yes it is, and the PQntuples condition should be removed from the first error test. > > I don't mind the addition of the slot type being part of the result of > READ_REPLICATION_SLOT even if it is not mandatory (?), but at least > GetSlotInformation() should check after it. Ok. > > +# Setup the slot, and connect to it a first time > +$primary->run_log( > + [ 'pg_receivewal', '--slot', $slot_name, '--create-slot' ], > + 'creating a replication slot'); > +$primary->psql('postgres', > + 'INSERT INTO test_table VALUES (generate_series(1,100));'); > +$primary->psql('postgres', 'SELECT pg_switch_wal();'); > +$nextlsn = > + $primary->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();'); > +chomp($nextlsn); > Wouldn't it be simpler to use CREATE_REPLICATION_SLOT with RESERVE_WAL > here, rather than going through pg_receivewal? It seems to me that > this would be cheaper without really impacting the coverage. You're right, we can skip two invocations of pg_receivewal like this (for the slot creation + for starting the slot a first time). -- Ronan Dunklau
Attachment
Le mercredi 20 octobre 2021 11:40:18 CEST, vous avez écrit : > > +# Setup the slot, and connect to it a first time > > +$primary->run_log( > > + [ 'pg_receivewal', '--slot', $slot_name, '--create-slot' ], > > + 'creating a replication slot'); > > +$primary->psql('postgres', > > + 'INSERT INTO test_table VALUES (generate_series(1,100));'); > > +$primary->psql('postgres', 'SELECT pg_switch_wal();'); > > +$nextlsn = > > + $primary->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();'); > > +chomp($nextlsn); > > Wouldn't it be simpler to use CREATE_REPLICATION_SLOT with RESERVE_WAL > > here, rather than going through pg_receivewal? It seems to me that > > this would be cheaper without really impacting the coverage. > > You're right, we can skip two invocations of pg_receivewal like this (for > the slot creation + for starting the slot a first time). After sending the previous patch suite, I figured it would be worthwhile to also have tests covering timeline switches, which was not covered before. So please find attached a new version with an additional patch for those tests, covering both "resume from last know archive" and "resume from the replication slots position" cases. -- Ronan Dunklau
Attachment
On Wed, Oct 20, 2021 at 02:58:26PM +0200, Ronan Dunklau wrote: > After sending the previous patch suite, I figured it would be worthwhile to > also have tests covering timeline switches, which was not covered before. That seems independent to me. I'll take a look. > So please find attached a new version with an additional patch for those tests, > covering both "resume from last know archive" and "resume from the > replication slots position" cases. So, taking things in order, I have looked at 0003 and 0001, and attached are refined versions for both of them. 0003 is an existing hole in the docs, which I think we had better address first and backpatch, taking into account that the starting point calculation considers compressed segments when looking for completed segments. Regarding 0001, I have found the last test to check for NULL values returned by READ_REPLICATION_SLOT after dropping the slot overlaps with the first test, so I have removed that. I have expanded a bit the use of like(), and there were some confusion with PostgresNode::psql and some extra arguments (see DROP_REPLICATION_SLOT and CREATE_REPLICATION_SLOT, and no need for return values in the CREATE case either). Some comments, docs and code have been slightly tweaked. Here are some comments about 0002. + /* The commpand should always return precisely one tuple */ s/commpand/command/ + pg_log_error("could not fetch replication slot: got %d rows and %d fields, expected %d rows and %d or more fields", + PQntuples(res), PQnfields(res), 1, 3); Should this be "could not read" instead? + if (sscanf(PQgetvalue(res, 0, 1), "%X/%X", &hi, &lo) != 2) + { + pg_log_error("could not parse slot's restart_lsn \"%s\"", + PQgetvalue(res, 0, 1)); + PQclear(res); + return false; + } Wouldn't it be saner to initialize *restart_lsn and *restart_tli to some default values at the top of GetSlotInformation() instead, if they are specified by the caller? And I think that we should still complain even if restart_lsn is NULL. On a quick read of 0004, I find the split of the logic with change_timeline() a bit hard to understand. It looks like we should be able to make a cleaner split, but I am not sure how that would look, though. -- Michael
Attachment
Le jeudi 21 octobre 2021, 07:35:08 CEST Michael Paquier a écrit : > On Wed, Oct 20, 2021 at 02:58:26PM +0200, Ronan Dunklau wrote: > > After sending the previous patch suite, I figured it would be worthwhile > > to > > also have tests covering timeline switches, which was not covered before. > > That seems independent to me. I'll take a look. > > > So please find attached a new version with an additional patch for those > > tests, covering both "resume from last know archive" and "resume from > > the replication slots position" cases. > > So, taking things in order, I have looked at 0003 and 0001, and > attached are refined versions for both of them. > > 0003 is an existing hole in the docs, which I think we had better > address first and backpatch, taking into account that the starting > point calculation considers compressed segments when looking for > completed segments. Ok, do you want me to propose a different patch for previous versions ? > > Regarding 0001, I have found the last test to check for NULL values > returned by READ_REPLICATION_SLOT after dropping the slot overlaps > with the first test, so I have removed that. I have expanded a bit > the use of like(), and there were some confusion with > PostgresNode::psql and some extra arguments (see DROP_REPLICATION_SLOT > and CREATE_REPLICATION_SLOT, and no need for return values in the > CREATE case either). Some comments, docs and code have been slightly > tweaked. Thank you for this. > > Here are some comments about 0002. > > + /* The commpand should always return precisely one tuple */ > s/commpand/command/ > > + pg_log_error("could not fetch replication slot: got %d rows and %d > fields, expected %d rows and %d or more fields", + > PQntuples(res), PQnfields(res), 1, 3); > Should this be "could not read" instead? > > + if (sscanf(PQgetvalue(res, 0, 1), "%X/%X", &hi, &lo) != 2) > + { > + pg_log_error("could not parse slot's restart_lsn \"%s\"", > + PQgetvalue(res, 0, 1)); > + PQclear(res); > + return false; > + } > Wouldn't it be saner to initialize *restart_lsn and *restart_tli to > some default values at the top of GetSlotInformation() instead, if > they are specified by the caller? Ok. > And I think that we should still > complain even if restart_lsn is NULL. Do you mean restart_lsn as the pointer argument to the function, or restart_lsn as the field returned by the command ? If it's the first, I'll change it but if it's the latter it is expected that we sometime run this on a slot where WAL has never been reserved yet. > > On a quick read of 0004, I find the split of the logic with > change_timeline() a bit hard to understand. It looks like we should > be able to make a cleaner split, but I am not sure how that would > look, though. Thanks, at least if the proposal to test this seems sensible I can move forward. I wanted to avoid having a lot of code duplication since the test setup is a bit more complicated. My first approach was to split it into two functions, setup_standby and change_timeline, but then realized that what would happen between the two invocations would basically be the same for the two test cases, so I ended up with that patch. I'll try to see if I can see a better way of organizing that code. -- Ronan Dunklau
On Thu, Oct 21, 2021 at 08:29:54AM +0200, Ronan Dunklau wrote: > Ok, do you want me to propose a different patch for previous versions ? That's not necessary. Thanks for proposing. > Do you mean restart_lsn as the pointer argument to the function, or > restart_lsn as the field returned by the command ? If it's the first, I'll > change it but if it's the latter it is expected that we sometime run this on a > slot where WAL has never been reserved yet. restart_lsn as the pointer of the function. -- Michael
Attachment
Le jeudi 21 octobre 2021, 09:21:44 CEST Michael Paquier a écrit : > On Thu, Oct 21, 2021 at 08:29:54AM +0200, Ronan Dunklau wrote: > > Ok, do you want me to propose a different patch for previous versions ? > > That's not necessary. Thanks for proposing. > > > Do you mean restart_lsn as the pointer argument to the function, or > > restart_lsn as the field returned by the command ? If it's the first, I'll > > change it but if it's the latter it is expected that we sometime run this > > on a slot where WAL has never been reserved yet. > > restart_lsn as the pointer of the function. Done. I haven't touched the timeline switch test patch for now, but I still include it here for completeness. -- Ronan Dunklau
Attachment
On Thu, Oct 21, 2021 at 10:36:42AM +0200, Ronan Dunklau wrote: > Done. I haven't touched the timeline switch test patch for now, but I still > include it here for completeness. Thanks. I have applied and back-patched 0001, then looked again at 0002 that adds READ_REPLICATION_SLOT: - Change the TLI to use int8 rather than int4, so as we will always be right with TimelineID which is unsigned (this was discussed upthread but I got back on it after more thoughts, to avoid any future issues). - Added an extra initialization for the set of Datum values, just as an extra safety net. - There was a bug with the timeline returned when executing the command while in recovery as ThisTimeLineID is 0 in the context of a standby, but we need to support the case of physical slots even when streaming archives from a standby. The fix is similar to what we do for IDENTIFY_SYSTEM, where we need to use the timeline currently replayed from GetXLogReplayRecPtr(), before looking at the past timeline history using restart_lsn and the replayed TLI. With that in place, I think that we are good now for this part. -- Michael
Attachment
On Sat, Oct 23, 2021 at 1:14 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Thu, Oct 21, 2021 at 10:36:42AM +0200, Ronan Dunklau wrote: > > Done. I haven't touched the timeline switch test patch for now, but I still > > include it here for completeness. > > Thanks. I have applied and back-patched 0001, then looked again at > 0002 that adds READ_REPLICATION_SLOT: > - Change the TLI to use int8 rather than int4, so as we will always be > right with TimelineID which is unsigned (this was discussed upthread > but I got back on it after more thoughts, to avoid any future > issues). > - Added an extra initialization for the set of Datum values, just as > an extra safety net. > - There was a bug with the timeline returned when executing the > command while in recovery as ThisTimeLineID is 0 in the context of a > standby, but we need to support the case of physical slots even when > streaming archives from a standby. The fix is similar to what we do > for IDENTIFY_SYSTEM, where we need to use the timeline currently > replayed from GetXLogReplayRecPtr(), before looking at the past > timeline history using restart_lsn and the replayed TLI. > > With that in place, I think that we are good now for this part. Thanks for the updated patch. I have following comments on v10: 1) It's better to initialize nulls with false, we can avoid setting them to true. The instances where the columns are not nulls is going to be more than the columns with null values, so we could avoid some of nulls[i] = false; instructions. + MemSet(nulls, true, READ_REPLICATION_SLOT_COLS * sizeof(bool)); I suggest we do the following. The number of instances of hitting the "else" parts will be less. MemSet(nulls, false, READ_REPLICATION_SLOT_COLS * sizeof(bool)); if (slot == NULL || !slot->in_use) { MemSet(nulls, true, READ_REPLICATION_SLOT_COLS * sizeof(bool)); LWLockRelease(ReplicationSlotControlLock); } if (!XLogRecPtrIsInvalid(slot_contents.data.restart_lsn)) { } else nulls[i] = true; if (!XLogRecPtrIsInvalid(slot_contents.data.restart_lsn)) { } else nulls[i] = true; 2) As I previously mentioned, we are not copying the slot contents while holding the spinlock, it's just we are taking the memory address and releasing the lock, so there is a chance that the memory we are looking at can become unavailable or stale while we access slot_contents. So, I suggest we do the memcpy of the *slot to slot_contents. I'm sure the memcpy ing the entire ReplicationSlot contents will be costlier, so let's just take the info that we need (data.database, data.restart_lsn) into local variables while we hold the spin lock + /* Copy slot contents while holding spinlock */ + SpinLockAcquire(&slot->mutex); + slot_contents = *slot; + SpinLockRelease(&slot->mutex); + LWLockRelease(ReplicationSlotControlLock); The code will look like following: Oid database; XLogRecPtr restart_lsn; /* Take required information from slot contents while holding spinlock */ SpinLockAcquire(&slot->mutex); database= slot->data.database; restart_lsn= slot->data.restart_lsn; SpinLockRelease(&slot->mutex); LWLockRelease(ReplicationSlotControlLock); 3) The data that the new command returns to the client can actually become stale while it is captured and in transit to the client as we release the spinlock and other backends can drop or alter the info. So, it's better we talk about this in the documentation of the new command and also in the comments saying "clients will have to deal with it." 4) How about we be more descriptive about the error added? This will help identify for which replication slot the command has failed from tons of server logs which really help in debugging and analysis. I suggest we have this: errmsg("cannot use \"%s\" command with logical replication slot \"%s\"", "READ_REPLICATION_SLOT", cmd->slotname); instead of just a plain, non-informative, generic message: errmsg("cannot use \"%s\" with logical replication slots", Regards, Bharath Rupireddy.
On Sat, Oct 23, 2021 at 10:46:30PM +0530, Bharath Rupireddy wrote: > 1) It's better to initialize nulls with false, we can avoid setting > them to true. The instances where the columns are not nulls is going > to be more than the columns with null values, so we could avoid some > of nulls[i] = false; instructions. I don't think that this is an improvement in this case as in the default case we'd return a tuple full of NULL values if the slot does not exist, so the existing code is simpler when we don't look at the slot contents. > 2) As I previously mentioned, we are not copying the slot contents > while holding the spinlock, it's just we are taking the memory address > and releasing the lock, so there is a chance that the memory we are > looking at can become unavailable or stale while we access > slot_contents. So, I suggest we do the memcpy of the *slot to > slot_contents. I'm sure the memcpy ing the entire ReplicationSlot > contents will be costlier, so let's just take the info that we need > (data.database, data.restart_lsn) into local variables while we hold > the spin lock The style of the patch is consistent with what we do in other areas (see pg_get_replication_slots as one example). > + /* Copy slot contents while holding spinlock */ > + SpinLockAcquire(&slot->mutex); > + slot_contents = *slot; And what this does is to copy the contents of the slot into a local area (note that we use a NameData pointing to an area with NAMEDATALEN). Aka if the contents of *slot are changed by whatever reason (this cannot change as of the LWLock acquired), what we have saved is unchanged as of this command's context. > 3) The data that the new command returns to the client can actually > become stale while it is captured and in transit to the client as we > release the spinlock and other backends can drop or alter the info. > So, it's better we talk about this in the documentation of the new > command and also in the comments saying "clients will have to deal > with it." The same can be said with IDENTIFY_SYSTEM when the flushed location becomes irrelevant. I am not sure that this has any need to apply here. We could add that this is useful to get a streaming start position though. > 4) How about we be more descriptive about the error added? This will > help identify for which replication slot the command has failed from > tons of server logs which really help in debugging and analysis. > I suggest we have this: > errmsg("cannot use \"%s\" command with logical replication slot > \"%s\"", "READ_REPLICATION_SLOT", cmd->slotname); > instead of just a plain, non-informative, generic message: > errmsg("cannot use \"%s\" with logical replication slots", Yeah. I don't mind adding the slot name in this string. -- Michael
Attachment
On Sun, Oct 24, 2021 at 4:40 AM Michael Paquier <michael@paquier.xyz> wrote: > On Sat, Oct 23, 2021 at 10:46:30PM +0530, Bharath Rupireddy wrote: > > 2) As I previously mentioned, we are not copying the slot contents > > while holding the spinlock, it's just we are taking the memory address > > and releasing the lock, so there is a chance that the memory we are > > looking at can become unavailable or stale while we access > > slot_contents. So, I suggest we do the memcpy of the *slot to > > slot_contents. I'm sure the memcpy ing the entire ReplicationSlot > > contents will be costlier, so let's just take the info that we need > > (data.database, data.restart_lsn) into local variables while we hold > > the spin lock > > The style of the patch is consistent with what we do in other areas > (see pg_get_replication_slots as one example). > > > + /* Copy slot contents while holding spinlock */ > > + SpinLockAcquire(&slot->mutex); > > + slot_contents = *slot; > > And what this does is to copy the contents of the slot into a local > area (note that we use a NameData pointing to an area with > NAMEDATALEN). Aka if the contents of *slot are changed by whatever > reason (this cannot change as of the LWLock acquired), what we have > saved is unchanged as of this command's context. pg_get_replication_slots holds the ReplicationSlotControlLock until the end of the function so it can be assured that *slot contents will not change. In ReadReplicationSlot, the ReplicationSlotControlLock is released immediately after taking *slot pointer into slot_contents. Isn't it better if we hold the lock until the end of the function so that we can avoid the slot contents becoming stale problems? Having said that, the ReplicationSlotCreate, SearchNamedReplicationSlot (when need_lock is true) etc. release the lock immediately. I'm not sure if I should ignore this staleness problem in ReadReplicationSlot. > > 3) The data that the new command returns to the client can actually > > become stale while it is captured and in transit to the client as we > > release the spinlock and other backends can drop or alter the info. > > So, it's better we talk about this in the documentation of the new > > command and also in the comments saying "clients will have to deal > > with it." > > The same can be said with IDENTIFY_SYSTEM when the flushed location > becomes irrelevant. I am not sure that this has any need to apply > here. We could add that this is useful to get a streaming start > position though. I think that's okay, let's not make any changes or add any comments in regards to the above. The client is basically bound to get the snapshot of the data at the time it requests the database. > > 4) How about we be more descriptive about the error added? This will > > help identify for which replication slot the command has failed from > > tons of server logs which really help in debugging and analysis. > > I suggest we have this: > > errmsg("cannot use \"%s\" command with logical replication slot > > \"%s\"", "READ_REPLICATION_SLOT", cmd->slotname); > > instead of just a plain, non-informative, generic message: > > errmsg("cannot use \"%s\" with logical replication slots", > > Yeah. I don't mind adding the slot name in this string. Thanks. Regards, Bharath Rupireddy.
On Sun, Oct 24, 2021 at 09:08:01AM +0530, Bharath Rupireddy wrote: > pg_get_replication_slots holds the ReplicationSlotControlLock until > the end of the function so it can be assured that *slot contents will > not change. In ReadReplicationSlot, the ReplicationSlotControlLock is > released immediately after taking *slot pointer into slot_contents. > Isn't it better if we hold the lock until the end of the function so > that we can avoid the slot contents becoming stale problems? The reason is different in the case of pg_get_replication_slots(). We have to hold ReplicationSlotControlLock for the whole duration of the shared memory scan to return back to the user a consistent set of information to the user, for all the slots. -- Michael
Attachment
On Sun, Oct 24, 2021 at 12:21 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Sun, Oct 24, 2021 at 09:08:01AM +0530, Bharath Rupireddy wrote: > > pg_get_replication_slots holds the ReplicationSlotControlLock until > > the end of the function so it can be assured that *slot contents will > > not change. In ReadReplicationSlot, the ReplicationSlotControlLock is > > released immediately after taking *slot pointer into slot_contents. > > Isn't it better if we hold the lock until the end of the function so > > that we can avoid the slot contents becoming stale problems? > > The reason is different in the case of pg_get_replication_slots(). We > have to hold ReplicationSlotControlLock for the whole duration of the > shared memory scan to return back to the user a consistent set of > information to the user, for all the slots. Thanks. I've no further comments on the v10 patch. Regards, Bharath Rupireddy.
On Sun, Oct 24, 2021 at 07:20:57PM +0530, Bharath Rupireddy wrote: > Thanks. I've no further comments on the v10 patch. Okay, thanks. I have applied this part, then. -- Michael
Attachment
On Thu, Oct 21, 2021 at 10:36:42AM +0200, Ronan Dunklau wrote: > Done. I haven't touched the timeline switch test patch for now, but I still > include it here for completeness. As 0001 and 0002 have been applied, I have put my hands on 0003, and found a couple of issues upon review. + Assert(slot_name != NULL); + Assert(restart_lsn != NULL); There is no need for those asserts, as we should support the case where the caller gives NULL for those variables. + if (PQserverVersion(conn) < 150000) + return false; Returning false is incorrect for older server versions as we won't fallback to the old method when streaming from older server. What this needs to do is return true and set restart_lsn to InvalidXLogRecPtr, so as pg_receivewal would just stream from the current flush location. "false" should just be returned on error, with pg_log_error(). +$primary->psql('postgres', + 'INSERT INTO test_table VALUES (generate_series(1,100));'); +$primary->psql('postgres', 'SELECT pg_switch_wal();'); +$nextlsn = + $primary->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();'); +chomp($nextlsn); There is no need to switch twice to a new WAL segment as we just need to be sure that the WAL segment of the restart_lsn is the one archived. Note that RESERVE_WAL uses the last redo point, so it is better to use a checkpoint and reduce the number of logs we stream into the new location. Better to add some --no-sync to the new commands of pg_receivewal, to not stress the I/O more than necessary. I have added some extra -n while on it to avoid loops on failure. Attached is the updated patch I am finishing with, which is rather clean now. I have tweaked a couple of things while on it, and documented better the new GetSlotInformation() in streamutil.c. -- Michael
Attachment
Le lundi 25 octobre 2021, 08:51:23 CEST Michael Paquier a écrit : > On Thu, Oct 21, 2021 at 10:36:42AM +0200, Ronan Dunklau wrote: > > Done. I haven't touched the timeline switch test patch for now, but I > > still > > include it here for completeness. > > As 0001 and 0002 have been applied, I have put my hands on 0003, and > found a couple of issues upon review. > > + Assert(slot_name != NULL); > + Assert(restart_lsn != NULL); > There is no need for those asserts, as we should support the case > where the caller gives NULL for those variables. Does it make sense though ? The NULL slot_name case handling is pretty straight forward has it will be handled by string formatting, but in the case of a null restart_lsn, we have no way of knowing if the command was issued at all. > > + if (PQserverVersion(conn) < 150000) > + return false; > Returning false is incorrect for older server versions as we won't > fallback to the old method when streaming from older server. What > this needs to do is return true and set restart_lsn to > InvalidXLogRecPtr, so as pg_receivewal would just stream from the > current flush location. "false" should just be returned on error, > with pg_log_error(). Thank you, this was an oversight when moving from the more complicated error handling code. > > +$primary->psql('postgres', > + 'INSERT INTO test_table VALUES (generate_series(1,100));'); > +$primary->psql('postgres', 'SELECT pg_switch_wal();'); > +$nextlsn = > + $primary->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();'); > +chomp($nextlsn); > There is no need to switch twice to a new WAL segment as we just need > to be sure that the WAL segment of the restart_lsn is the one > archived. Note that RESERVE_WAL uses the last redo point, so it is > better to use a checkpoint and reduce the number of logs we stream > into the new location. > > Better to add some --no-sync to the new commands of pg_receivewal, to > not stress the I/O more than necessary. I have added some extra -n > while on it to avoid loops on failure. > > Attached is the updated patch I am finishing with, which is rather > clean now. I have tweaked a couple of things while on it, and > documented better the new GetSlotInformation() in streamutil.c. > -- > Michael -- Ronan Dunklau
Le lundi 25 octobre 2021, 00:43:11 CEST Michael Paquier a écrit : > On Sun, Oct 24, 2021 at 07:20:57PM +0530, Bharath Rupireddy wrote: > > Thanks. I've no further comments on the v10 patch. > > Okay, thanks. I have applied this part, then. > -- > Michael Thank you all for your work on this patch. -- Ronan Dunklau
On Mon, Oct 25, 2021 at 09:15:32AM +0200, Ronan Dunklau wrote: > Does it make sense though ? The NULL slot_name case handling is pretty > straight forward has it will be handled by string formatting, but in the case > of a null restart_lsn, we have no way of knowing if the command was issued at > all. If I am following your point, I don't think that it matters much here, and it seems useful to me to be able to pass NULL for both of them, so as one can check if the slot exists or not with an API designed this way. -- Michael
Attachment
Le lundi 25 octobre 2021, 09:40:10 CEST Michael Paquier a écrit : > On Mon, Oct 25, 2021 at 09:15:32AM +0200, Ronan Dunklau wrote: > > Does it make sense though ? The NULL slot_name case handling is pretty > > straight forward has it will be handled by string formatting, but in the > > case of a null restart_lsn, we have no way of knowing if the command was > > issued at all. > > If I am following your point, I don't think that it matters much here, > and it seems useful to me to be able to pass NULL for both of them, so > as one can check if the slot exists or not with an API designed this > way. You're right, but I'm afraid we would have to check the server version twice in any case different from the basic pg_receivewal on (once in the function itself, and one before calling it if we want a meaningful result). Maybe we should move the version check outside the GetSlotInformation function to avoid this, and let it fail with a syntax error when the server doesn't support it ? -- Ronan Dunklau
On Mon, Oct 25, 2021 at 09:50:01AM +0200, Ronan Dunklau wrote: > Le lundi 25 octobre 2021, 09:40:10 CEST Michael Paquier a écrit : >> On Mon, Oct 25, 2021 at 09:15:32AM +0200, Ronan Dunklau wrote: >>> Does it make sense though ? The NULL slot_name case handling is pretty >>> straight forward has it will be handled by string formatting, but in the >>> case of a null restart_lsn, we have no way of knowing if the command was >>> issued at all. >> >> If I am following your point, I don't think that it matters much here, >> and it seems useful to me to be able to pass NULL for both of them, so >> as one can check if the slot exists or not with an API designed this >> way. > > You're right, but I'm afraid we would have to check the server version twice > in any case different from the basic pg_receivewal on (once in the function > itself, and one before calling it if we want a meaningful result). Maybe we > should move the version check outside the GetSlotInformation function to avoid > this, and let it fail with a syntax error when the server doesn't support it ? With the approach taken by the patch, we fall down silently to the previous behavior if we connect to a server <= 14, and rely on the new behavior with a server >= 15, ensuring compatibility. Why would you want to make sure that the command is executed when we should just enforce that the old behavior is what happens when there is a slot defined and a backend <= 14? -- Michael
Attachment
Le lundi 25 octobre 2021, 10:10:13 CEST Michael Paquier a écrit : > On Mon, Oct 25, 2021 at 09:50:01AM +0200, Ronan Dunklau wrote: > > Le lundi 25 octobre 2021, 09:40:10 CEST Michael Paquier a écrit : > >> On Mon, Oct 25, 2021 at 09:15:32AM +0200, Ronan Dunklau wrote: > >>> Does it make sense though ? The NULL slot_name case handling is pretty > >>> straight forward has it will be handled by string formatting, but in the > >>> case of a null restart_lsn, we have no way of knowing if the command was > >>> issued at all. > >> > >> If I am following your point, I don't think that it matters much here, > >> and it seems useful to me to be able to pass NULL for both of them, so > >> as one can check if the slot exists or not with an API designed this > >> way. > > > > You're right, but I'm afraid we would have to check the server version > > twice in any case different from the basic pg_receivewal on (once in the > > function itself, and one before calling it if we want a meaningful > > result). Maybe we should move the version check outside the > > GetSlotInformation function to avoid this, and let it fail with a syntax > > error when the server doesn't support it ? > With the approach taken by the patch, we fall down silently to the > previous behavior if we connect to a server <= 14, and rely on the new > behavior with a server >= 15, ensuring compatibility. Why would you > want to make sure that the command is executed when we should just > enforce that the old behavior is what happens when there is a slot > defined and a backend <= 14? Sorry I haven't been clear. For the use case of this patch, the current approach is perfect (and we supply the restart_lsn). However, if we want to support the case of "just check if the slot exists", we need to make sure the command is actually executed, and check the version before calling the function, which would make the check executed twice. What I'm proposing is just that we let the responsibility of checking PQServerVersion() to the caller, and remove it from GetSlotInformation, ie: - if (replication_slot != NULL) + if (replication_slot != NULL && PQserverVersion(conn) >= 150000) { if (!GetSlotInformation(conn, replication_slot, &stream.startpos, &stream.timeline)) That way, if we introduce a caller wanting to use this function as an API to check a slot exists, the usage of checking the server version beforehand will be consistent. -- Ronan Dunklau
On Mon, Oct 25, 2021 at 10:24:46AM +0200, Ronan Dunklau wrote: > However, if we want to support the case of "just check if the slot exists", we > need to make sure the command is actually executed, and check the version > before calling the function, which would make the check executed twice. > > What I'm proposing is just that we let the responsibility of checking > PQServerVersion() to the caller, and remove it from GetSlotInformation, ie: > > - if (replication_slot != NULL) > + if (replication_slot != NULL && PQserverVersion(conn) >= > 150000) > { > if (!GetSlotInformation(conn, replication_slot, > &stream.startpos, > &stream.timeline)) > > That way, if we introduce a caller wanting to use this function as an API to > check a slot exists, the usage of checking the server version beforehand will > be consistent. Ah, good point. My apologies for not following. Indeed, the patch makes this part of the routine a bit blurry. It is fine by me to do as you suggest, and let the caller do the version check as you propose, while making the routine fail if directly called for an older server. -- Michael
Attachment
On Mon, Oct 25, 2021 at 12:21 PM Michael Paquier <michael@paquier.xyz> wrote: > Attached is the updated patch I am finishing with, which is rather > clean now. I have tweaked a couple of things while on it, and > documented better the new GetSlotInformation() in streamutil.c. Thanks for the v11 patch, here are some comments: 1) Remove the extra whitespace in between "t" and ":" + pg_log_error("could not read replication slot : %s", 2) I think we should tweak the below error message + pg_log_error("could not read replication slot \"%s\": %s", slot_name, PQerrorMessage(conn)); to pg_log_error("could not read replication slot \"%s\": %s", "IDENTIFY_SYSTEM", PQerrorMessage(conn)); Having slot name in the error message helps to isolate the error message from tons of server logs that gets generated. 3) Also for the same reasons stated as above, change the below error message pg_log_error("could not read replication slot: got %d rows and %d fields, expected %d rows and %d or more fields", to pg_log_error("could not read replication slot \"%s\": got %d rows and %d fields, expected %d rows and %d or more fields", slot_name,.... 4) Also for the same reasons, change below + pg_log_error("could not parse slot's restart_lsn \"%s\"", to pg_log_error("could not parse replicaton slot \"%s\" restart_lsn \"%s\"", slot_name, PQgetvalue(res, 0, 1)); 5) I think we should also have assertion for the timeline id: Assert(stream.startpos != InvalidXLogRecPtr); Assert(stream.timeline!= 0); 6) Why do we need these two assignements? + if (*restart_lsn) + *restart_lsn = lsn_loc; + if (restart_tli != NULL) + *restart_tli = tli_loc; + /* Assign results if requested */ + if (restart_lsn) + *restart_lsn = lsn_loc; + if (restart_tli) + *restart_tli = tli_loc; I think we can just get rid of lsn_loc and tli_loc, initlaize *restart_lsn = InvalidXLogRecPtr and *restart_tli = 0 at the start of the function and directly assign the requrested values to *restart_lsn and *restart_tli, also see comment (8). 7) Let's be consistent, change the following + + if (*restart_lsn) + *restart_lsn = lsn_loc; + if (restart_tli != NULL) + *restart_tli = tli_loc; to + + if (restart_lsn) + *restart_lsn = lsn_loc; + if (restart_tli != NULL) + *restart_tli = tli_loc; 8) Let's extract the values asked by the caller, change: + /* restart LSN */ + if (!PQgetisnull(res, 0, 1)) + /* current TLI */ + if (!PQgetisnull(res, 0, 2)) + tli_loc = (TimeLineID) atol(PQgetvalue(res, 0, 2)); to + /* restart LSN */ + if (restart_lsn && !PQgetisnull(res, 0, 1)) + /* current TLI */ + if (restart_tli && !PQgetisnull(res, 0, 2)) 9) 80char limit crossed: +GetSlotInformation(PGconn *conn, const char *slot_name, XLogRecPtr *restart_lsn, TimeLineID *restart_tli) 10) Missing word "command", and use "issued to the server", so change the below: + <command>READ_REPLICATION_SLOT</command> is issued to retrieve the to + <command>READ_REPLICATION_SLOT</command> command is issued to the server to retrieve the 11) Will replication_slot ever be NULL? If it ever be null, then we don't reach this far right? We see the pg_log_error("%s needs a slot to be specified using --slot". Please revmove below if condition: + * server may not support this option. + */ + if (replication_slot != NULL) We can just add Assert(slot_name); in GetSlotInformation(). 12) How about following: "If a starting point cannot be calculated with the previous method, <command>READ_REPLICATION_SLOT</command> command with the provided slot is issued to the server for retreving the slot's restart_lsn and timelineid" instead of + and if a replication slot is used, an extra + <command>READ_REPLICATION_SLOT</command> is issued to retrieve the + slot's <literal>restart_lsn</literal> to use as starting point. The IDENTIFY_SYSTEM descritption starts with "If a starting point cannot be calculated....": <listitem> <para> If a starting point cannot be calculated with the previous method, the latest WAL flush location is used as reported by the server from a <literal>IDENTIFY_SYSTEM</literal> command. Regards, Bharath Rupireddy.
On Mon, Oct 25, 2021 at 02:40:05PM +0530, Bharath Rupireddy wrote: > 2) I think we should tweak the below error message > to > pg_log_error("could not read replication slot \"%s\": %s", > "IDENTIFY_SYSTEM", PQerrorMessage(conn)); > Having slot name in the error message helps to isolate the error > message from tons of server logs that gets generated. Yes, this suggestion makes sense. > 3) Also for the same reasons stated as above, change the below error message > pg_log_error("could not read replication slot: got %d rows and %d > fields, expected %d rows and %d or more fields", > to > pg_log_error("could not read replication slot \"%s\": got %d rows and > %d fields, expected %d rows and %d or more fields", slot_name,.... We can even get rid of "or more" to match the condition used. > 4) Also for the same reasons, change below > + pg_log_error("could not parse slot's restart_lsn \"%s\"", > to > pg_log_error("could not parse replicaton slot \"%s\" restart_lsn \"%s\"", > slot_name, PQgetvalue(res, 0, 1)); Appending the slot name makes sense. > 5) I think we should also have assertion for the timeline id: > Assert(stream.startpos != InvalidXLogRecPtr); > Assert(stream.timeline!= 0); Okay. > 6) Why do we need these two assignements? > I think we can just get rid of lsn_loc and tli_loc, initlaize > *restart_lsn = InvalidXLogRecPtr and *restart_tli = 0 at the start of > the function and directly assign the requrested values to *restart_lsn > and *restart_tli, also see comment (8). FWIW, I find the style of the patch easier to follow. > 9) 80char limit crossed: > +GetSlotInformation(PGconn *conn, const char *slot_name, XLogRecPtr > *restart_lsn, TimeLineID *restart_tli) pgindent says nothing. > 10) Missing word "command", and use "issued to the server", so change the below: > + <command>READ_REPLICATION_SLOT</command> is issued to retrieve the > to > + <command>READ_REPLICATION_SLOT</command> command is issued to > the server to retrieve the Okay. > 11) Will replication_slot ever be NULL? If it ever be null, then we > don't reach this far right? We see the pg_log_error("%s needs a slot > to be specified using --slot". Please revmove below if condition: > + * server may not support this option. Did you notice that this applies when creating or dropping a slot, for code paths entirely different than what we are dealing with here? -- Michael
Attachment
On Mon, Oct 25, 2021 at 4:19 PM Michael Paquier <michael@paquier.xyz> wrote: > > 6) Why do we need these two assignements? > > I think we can just get rid of lsn_loc and tli_loc, initlaize > > *restart_lsn = InvalidXLogRecPtr and *restart_tli = 0 at the start of > > the function and directly assign the requrested values to *restart_lsn > > and *restart_tli, also see comment (8). > > FWIW, I find the style of the patch easier to follow. Then, please change if (*restart_lsn) and if (*restart_tli) to if (restart_lsn) and if (restart_tli), just to be consistent with the other parts of the patch and the existing code of RunIdentifySystem(): if (*restart_lsn) *restart_lsn = lsn_loc; if (restart_tli != NULL) *restart_tli = tli_loc; > > 11) Will replication_slot ever be NULL? If it ever be null, then we > > don't reach this far right? We see the pg_log_error("%s needs a slot > > to be specified using --slot". Please revmove below if condition: > > + * server may not support this option. > > Did you notice that this applies when creating or dropping a slot, for > code paths entirely different than what we are dealing with here? StreamLog() isn't reached for create and drop slot cases, see [1]. I suggest to remove replication_slot != NULL and have Assert(slot_name) in GetSlotInformation(): /* * Try to get the starting point from the slot. This is supported in * PostgreSQL 15 and up. */ if (PQserverVersion(conn) >= 150000) { if (!GetSlotInformation(conn, replication_slot, &stream.startpos, &stream.timeline)) { /* Error is logged by GetSlotInformation() */ return; } } Here is another comment on the patch: Remove the extra new line above the GetSlotInformation() definition: return true; } + -----> REMOVE THIS +/* + * Run READ_REPLICATION_SLOT through a given connection and give back to Apart from the above v12 patch LGTM. [1] /* * Drop a replication slot. */ if (do_drop_slot) { if (verbose) pg_log_info("dropping replication slot \"%s\"", replication_slot); if (!DropReplicationSlot(conn, replication_slot)) exit(1); exit(0); } /* Create a replication slot */ if (do_create_slot) { if (verbose) pg_log_info("creating replication slot \"%s\"", replication_slot); if (!CreateReplicationSlot(conn, replication_slot, NULL, false, true, false, slot_exists_ok, false)) exit(1); exit(0); } Regards, Bharath Rupireddy.
On Mon, Oct 25, 2021 at 05:46:57PM +0530, Bharath Rupireddy wrote: > StreamLog() isn't reached for create and drop slot cases, see [1]. I > suggest to remove replication_slot != NULL and have Assert(slot_name) > in GetSlotInformation(): > /* > * Try to get the starting point from the slot. This is supported in > * PostgreSQL 15 and up. > */ > if (PQserverVersion(conn) >= 150000) > { > if (!GetSlotInformation(conn, replication_slot, &stream.startpos, > &stream.timeline)) > { > /* Error is logged by GetSlotInformation() */ > return; > } > } Please note that it is possible to use pg_receivewal without a slot, which is the default case, so we cannot do what you are suggesting here. An assertion on slot_name in GetSlotInformation() is not that helpful either in my opinion, as we would just crash a couple of lines down the road. I have changed the patch per Ronan's suggestion to have the version check out of GetSlotInformation(), addressed what you have reported, and the result looked good. So I have applied this part. What remains on this thread is the addition of new tests to make sure that pg_receivewal is able to follow a timeline switch. Now that we can restart from a slot that should be a bit easier to implemented as a test by creating a slot on a standby. Ronan, are you planning to send a new patch for this part? -- Michael
Attachment
Le mardi 26 octobre 2021, 03:15:40 CEST Michael Paquier a écrit : > I have changed the patch per Ronan's suggestion to have the version > check out of GetSlotInformation(), addressed what you have reported, > and the result looked good. So I have applied this part. Thanks ! > > What remains on this thread is the addition of new tests to make sure > that pg_receivewal is able to follow a timeline switch. Now that we > can restart from a slot that should be a bit easier to implemented as > a test by creating a slot on a standby. Ronan, are you planning to > send a new patch for this part? Yes, I will try to simplify the logic of the patch I sent last week. I'll keep you posted here soon. -- Ronan Dunklau
Le mardi 26 octobre 2021, 08:27:47 CEST Ronan Dunklau a écrit : > Yes, I will try to simplify the logic of the patch I sent last week. I'll > keep you posted here soon. I was able to simplify it quite a bit, by using only one standby for both test scenarios. This test case verify that after a timeline switch, if we resume from a previous state we will archive: - segments from the old timeline - segments from the new timeline - the timeline history file itself. I chose to check against a full segment from the previous timeline, but it would have been possible to check that the latest timeline segment was partial. I chose not not, in the unlikely event we promote at an exact segment boundary. I don't think it matters much, since partial wal files are already covered by other tests. -- Ronan Dunklau
Attachment
At Tue, 26 Oct 2021 11:01:46 +0200, Ronan Dunklau <ronan.dunklau@aiven.io> wrote in > Le mardi 26 octobre 2021, 08:27:47 CEST Ronan Dunklau a écrit : > > Yes, I will try to simplify the logic of the patch I sent last week. I'll > > keep you posted here soon. > > I was able to simplify it quite a bit, by using only one standby for both test > scenarios. > > This test case verify that after a timeline switch, if we resume from a > previous state we will archive: > - segments from the old timeline > - segments from the new timeline > - the timeline history file itself. > > I chose to check against a full segment from the previous timeline, but it > would have been possible to check that the latest timeline segment was > partial. I chose not not, in the unlikely event we promote at an exact segment > boundary. I don't think it matters much, since partial wal files are already > covered by other tests. +my @walfiles = glob "$slot_dir/*"; This is not used. Each pg_receivewal run stalls for about 10 or more seconds before finishing, which is not great from the standpoint of recently increasing test run time. Maybe we want to advance LSN a bit, after taking $nextlsn then pass "-s 1" to pg_receivewal. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
At Wed, 27 Oct 2021 11:17:28 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in > Each pg_receivewal run stalls for about 10 or more seconds before > finishing, which is not great from the standpoint of recently > increasing test run time. > > Maybe we want to advance LSN a bit, after taking $nextlsn then pass > "-s 1" to pg_receivewal. Hmm. Sorry, my fingers slipped. We don't need '-s 1' for this purpose. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Le mercredi 27 octobre 2021, 04:17:28 CEST Kyotaro Horiguchi a écrit : > +my @walfiles = glob "$slot_dir/*"; > > This is not used. > Sorry, fixed in attached version. > Each pg_receivewal run stalls for about 10 or more seconds before > finishing, which is not great from the standpoint of recently > increasing test run time. > Maybe we want to advance LSN a bit, after taking $nextlsn then pass > "-s 1" to pg_receivewal. I incorrectly assumed it was due to the promotion time without looking into it. In fact, you're right the LSN was not incremented after we fetched the end lsn, and thus we would wait for quite a while. I fixed that too. Thank you for the review ! -- Ronan Dunklau
Attachment
Le mercredi 27 octobre 2021, 10:00:40 CEST Ronan Dunklau a écrit : > Le mercredi 27 octobre 2021, 04:17:28 CEST Kyotaro Horiguchi a écrit : > > +my @walfiles = glob "$slot_dir/*"; > > > > This is not used. > > Sorry, fixed in attached version. > > > Each pg_receivewal run stalls for about 10 or more seconds before > > finishing, which is not great from the standpoint of recently > > increasing test run time. > > > > Maybe we want to advance LSN a bit, after taking $nextlsn then pass > > "-s 1" to pg_receivewal. > > I incorrectly assumed it was due to the promotion time without looking into > it. In fact, you're right the LSN was not incremented after we fetched the > end lsn, and thus we would wait for quite a while. I fixed that too. > > Thank you for the review ! Sorry I sent an intermediary version of the patch, here is the correct one. -- Ronan Dunklau
Attachment
Le jeudi 28 octobre 2021, 14:31:36 CEST Michael Paquier a écrit : > On Wed, Oct 27, 2021 at 10:11:00AM +0200, Ronan Dunklau wrote: > > Sorry I sent an intermediary version of the patch, here is the correct > > one. > > While looking at this patch, I have figured out a simple way to make > the tests faster without impacting the coverage. The size of the WAL > segments archived is a bit of a bottleneck as they need to be zeroed > by pg_receivewal at creation. This finishes by being a waste of time, > even if we don't flush the data. So I'd like to switch the test so as > we use a segment size of 1MB, first. > > A second thing is that we copy too many segments than really needed > when using the slot's restart_lsn as starting point as RESERVE_WAL > would use the current redo location, so it seems to me that a > checkpoint is in order before the slot creation. A third thing is > that generating some extra data after the end LSN we want to use makes > the test much faster at the end. > > With those three methods combined, the test goes down from 11s to 9s > here. Attached is a patch I'd like to apply to make the test > cheaper. Interesting ideas, thanks. For the record, the time drops from ~4.5s to 3s on average on my machine. I think if you reduce the size of the generate_series batches, this should probably be reduced everywhere. With what we do though, inserting a single line should work just as well, I wonder why we insist on inserting a hundred lines ? I updated your patch with that small modification, it also makes the code less verbose. > > I also had a look at your patch. Here are some comments. > > +# Cleanup the previous stream directories to reuse them > +unlink glob "'${stream_dir}/*'"; > +unlink glob "'${slot_dir}/*'"; > I think that we'd better choose a different location for the > archives. Keeping the segments of the previous tests is useful for > debugging if a previous step of the test failed. Ok. > > +$standby->psql('', > + "CREATE_REPLICATION_SLOT $folder_slot PHYSICAL (RESERVE_WAL)", > + replication => 1); > Here as well we could use a restart point to reduce the number of > segments archived. The restart point should be very close, as we don't generate any activity on the primary between the backup and the slot's creation. I'm not sure adding the complexity of triggering a checkpoint on the primary and waiting for the standby to catch up on it would be that useful. > > +# Now, try to resume after the promotion, from the folder. > +$standby->command_ok( > + [ 'pg_receivewal', '-D', $stream_dir, '--verbose', '--endpos', $nextlsn, > + '--slot', $folder_slot, '--no-sync'], > + "Stream some wal after promoting, resuming from the folder's position"); > What is called the "resume-from-folder" test in the patch is IMO > too costly and brings little extra value, requiring two commands of > pg_receivewal (one to populate the folder and one to test the TLI jump > from the previously-populated point) to test basically the same thing > as when the starting point is taken from the slot. Except that > restarting from the slot is twice cheaper. The point of the > resume-from-folder case is to make sure that we restart from the point > of the archive folder rather than the slot's restart_lsn, but your > test fails this part, in fact, because the first command populating > the archive folder also uses "--slot $folder_slot", updating the > slot's restart_lsn before the second pg_receivewal uses this slot > again. > > I think, as a whole, that testing for the case where an archive folder > is populated (without a slot!), followed by a second command where we > use a slot that has a restart_lsn older than the archive's location, > to not be that interesting. If we include such a test, there is no > need to include that within the TLI jump part, in my opinion. So I > think that we had better drop this part of the patch, and keep only > the case where we resume from a slot for the TLI jump. You're right about the test not being that interesting in that case. I thought it would be worthwhile to test both cases, but resume_from_folder doesn't actually exercise any code that was not already called before (timeline computation from segment name). > The commands of pg_receivewal included in the test had better use -n > so as there is no infinite loop on failure. Ok. -- Ronan Dunklau
Attachment
Le vendredi 29 octobre 2021, 04:27:51 CEST Michael Paquier a écrit : > On Thu, Oct 28, 2021 at 03:55:12PM +0200, Ronan Dunklau wrote: > > Interesting ideas, thanks. For the record, the time drops from ~4.5s to 3s > > on average on my machine. > > I think if you reduce the size of the generate_series batches, this should > > probably be reduced everywhere. With what we do though, inserting a single > > line should work just as well, I wonder why we insist on inserting a > > hundred lines ? I updated your patch with that small modification, it > > also makes the code less verbose. > > Thanks for the extra numbers. I have added your suggestions, > switching the dummy table to use a primary key with different values, > while on it, as there is an argument that it makes debugging easier, > and applied the speedup patch. Thanks ! > > >> +$standby->psql('', > >> + "CREATE_REPLICATION_SLOT $folder_slot PHYSICAL (RESERVE_WAL)", > >> + replication => 1); > >> Here as well we could use a restart point to reduce the number of > >> segments archived. > > > > The restart point should be very close, as we don't generate any activity > > on the primary between the backup and the slot's creation. I'm not sure > > adding the complexity of triggering a checkpoint on the primary and > > waiting for the standby to catch up on it would be that useful. > > Yes, you are right here. The base backup taken from the primary > at this point ensures a fresh point. > > +# This test is split in two, using the same standby: one test check the > +# resume-from-folder case, the other the resume-from-slot one. > This comment needs a refresh, as the resume-from-folder case is no > more. > Done. > +$standby->psql( > + 'postgres', > + "SELECT pg_promote(wait_seconds => 300)"); > This could be $standby->promote. > Oh, didn't know about that. > +# Switch wal to make sure it is not a partial file but a complete > segment. > +$primary->psql('postgres', 'INSERT INTO test_table VALUES (1);'); > +$primary->psql('postgres', 'SELECT pg_switch_wal();'); > +$primary->wait_for_catchup($standby, 'replay', $primary->lsn('write')); > This INSERT needs a slight change to adapt to the primary key of the > table. This one is on me :p Done. > > Anyway, is this first segment switch really necessary? From the data > archived by pg_receivewal in the command testing the TLI jump, we > finish with the following contents (contents generated after fixing > the three INSERTs): > 00000001000000000000000B > 00000001000000000000000C > 00000002000000000000000D > 00000002000000000000000E.partial > 00000002.history > > So, even if we don't do the first switch, we'd still have one > completed segment on the previous timeline, before switching to the > new timeline and the next segment (pg_receivewal is a bit inconsistent > with the backend here, by the way, as the first segment on the new > timeline would map with the last segment of the old timeline, but here > we have a clean switch as of stop_streaming in pg_receivewal.c). The first completed segment on the previous timeline comes from the fact we stream from the restart point. I removed the switch to use the walfilename of the replication slot's restart point instead. This means querying both the standby (to get the replication slot's restart_lsn) and the primary (to have access to pg_walfile_name). We could use a single query on the primary (using the primary's checkpoint LSN instead) but it feels a bit convoluted just to avoid a query on the standby. -- Ronan Dunklau