Thread: pg_recvlogical prints bogus error when interrupted
Hi, While reviewing https://postgr.es/m/CAD21AoBe2o2D%3Dxyycsxw2bQOD%3DzPj7ETuJ5VYGN%3DdpoTiCMRJQ%40mail.gmail.com I noticed that pg_recvlogical prints "pg_recvlogical: error: unexpected termination of replication stream: " when signalled with SIGINT/TERM. Oddly enough, that looks to have "always" been the case, even though clearly the code tried to make provisions for a different outcome. It looks to me like all that's needed is to gate the block printing the message with an !time_to_abort. I also then noticed that we don't fsync the output file in cases of errors - that seems wrong to me? Looks to me like that block should be moved till after the error:? Greetings, Andres Freund
On Thu, Oct 20, 2022 at 3:10 AM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > While reviewing > https://postgr.es/m/CAD21AoBe2o2D%3Dxyycsxw2bQOD%3DzPj7ETuJ5VYGN%3DdpoTiCMRJQ%40mail.gmail.com > I noticed that pg_recvlogical prints > "pg_recvlogical: error: unexpected termination of replication stream: " > > when signalled with SIGINT/TERM. > > Oddly enough, that looks to have "always" been the case, even though clearly > the code tried to make provisions for a different outcome. > > > It looks to me like all that's needed is to gate the block printing the > message with an !time_to_abort. +1. How about emitting a message like its friend pg_receivewal, like the attached patch? > I also then noticed that we don't fsync the output file in cases of errors - > that seems wrong to me? Looks to me like that block should be moved till after > the error:? How about something like the attached patch? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
At Thu, 20 Oct 2022 13:28:45 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in > On Thu, Oct 20, 2022 at 3:10 AM Andres Freund <andres@anarazel.de> wrote: > > > > Hi, > > > > While reviewing > > https://postgr.es/m/CAD21AoBe2o2D%3Dxyycsxw2bQOD%3DzPj7ETuJ5VYGN%3DdpoTiCMRJQ%40mail.gmail.com > > I noticed that pg_recvlogical prints > > "pg_recvlogical: error: unexpected termination of replication stream: " > > > > when signalled with SIGINT/TERM. > > > > Oddly enough, that looks to have "always" been the case, even though clearly > > the code tried to make provisions for a different outcome. > > > > > > It looks to me like all that's needed is to gate the block printing the > > message with an !time_to_abort. +1 > +1. How about emitting a message like its friend pg_receivewal, like > the attached patch? I'm not a fan of treating SIGINT as an error in this case. It calls prepareToTerminate() when time_to_abort and everything goes fine after then. So I think we should do the same thing after receiving an interrupt. This also does file-sync naturally as a part of normal shutdown. I'm also not a fan of doing fsync at error. > > I also then noticed that we don't fsync the output file in cases of errors - > > that seems wrong to me? Looks to me like that block should be moved till after > > the error:? > > How about something like the attached patch? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Attachment
On Fri, Oct 21, 2022 at 7:52 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > > +1. How about emitting a message like its friend pg_receivewal, like > > the attached patch? > > I'm not a fan of treating SIGINT as an error in this case. It calls > prepareToTerminate() when time_to_abort and everything goes fine after > then. So I think we should do the same thing after receiving an > interrupt. This also does file-sync naturally as a part of normal > shutdown. I'm also not a fan of doing fsync at error. I think the pg_recvlogical can gracefully exit on both SIGINT and SIGTERM to keep things simple. > > > I also then noticed that we don't fsync the output file in cases of errors - > > > that seems wrong to me? Looks to me like that block should be moved till after > > > the error:? > > > > How about something like the attached patch? The attached patch (pg_recvlogical_graceful_interrupt.text) has a couple of problems, I believe. We're losing prepareToTerminate() with keepalive true and we're not skipping pg_log_error("unexpected termination of replication stream: %s" upon interrupt, after all we're here discussing how to avoid it. I came up with the attached v2 patch, please have a look. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
On Mon, Oct 24, 2022 at 8:15 AM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Fri, Oct 21, 2022 at 7:52 AM Kyotaro Horiguchi > <horikyota.ntt@gmail.com> wrote: > > > > > +1. How about emitting a message like its friend pg_receivewal, like > > > the attached patch? > > > > I'm not a fan of treating SIGINT as an error in this case. It calls > > prepareToTerminate() when time_to_abort and everything goes fine after > > then. So I think we should do the same thing after receiving an > > interrupt. This also does file-sync naturally as a part of normal > > shutdown. I'm also not a fan of doing fsync at error. > > I think the pg_recvlogical can gracefully exit on both SIGINT and > SIGTERM to keep things simple. > > > > > I also then noticed that we don't fsync the output file in cases of errors - > > > > that seems wrong to me? Looks to me like that block should be moved till after > > > > the error:? > > > > > > How about something like the attached patch? > > The attached patch (pg_recvlogical_graceful_interrupt.text) has a > couple of problems, I believe. We're losing prepareToTerminate() with > keepalive true and we're not skipping pg_log_error("unexpected > termination of replication stream: %s" upon interrupt, after all we're > here discussing how to avoid it. > > I came up with the attached v2 patch, please have a look. FWIW, I added it to CF - https://commitfest.postgresql.org/40/3966/. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On 2022-10-24 08:15:11 +0530, Bharath Rupireddy wrote: > I came up with the attached v2 patch, please have a look. Thanks for working on this! > + /* When we get SIGINT/SIGTERM, we exit */ > + if (ready_to_exit) > + { > + /* > + * Try informing the server about our exit, but don't wait around > + * or retry on failure. > + */ > + (void) PQputCopyEnd(conn, NULL); > + (void) PQflush(conn); > + time_to_abort = ready_to_exit; This doesn't strike me as great - because the ready_to_exit isn't checked in the loop around StreamLogicalLog(), we'll reconnect if something else causes StreamLogicalLog() to return. Why do we need both time_to_abort and ready_to_exit? Perhaps worth noting that time_to_abort is still an sig_atomic_t, but isn't modified in a signal handler, which seems a bit unnecessarily confusing. Greetings, Andres Freund
On Fri, Oct 28, 2022 at 4:41 AM Andres Freund <andres@anarazel.de> wrote: > > On 2022-10-24 08:15:11 +0530, Bharath Rupireddy wrote: > > > > + /* When we get SIGINT/SIGTERM, we exit */ > > + if (ready_to_exit) > > + { > > + /* > > + * Try informing the server about our exit, but don't wait around > > + * or retry on failure. > > + */ > > + (void) PQputCopyEnd(conn, NULL); > > + (void) PQflush(conn); > > + time_to_abort = ready_to_exit; > > This doesn't strike me as great - because the ready_to_exit isn't checked in > the loop around StreamLogicalLog(), we'll reconnect if something else causes > StreamLogicalLog() to return. Fixed. > Why do we need both time_to_abort and ready_to_exit? Intention to have ready_to_exit is to be able to distinguish between SIGINT/SIGTERM and aborting when endpos is reached so that necessary code is skipped/executed and proper logs are printed. > Perhaps worth noting that > time_to_abort is still an sig_atomic_t, but isn't modified in a signal > handler, which seems a bit unnecessarily confusing. time_to_abort is just a static variable, no? +static bool time_to_abort = false; +static volatile sig_atomic_t ready_to_exit = false; Please see the attached v3 patch. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation: not tested Hello The patch applies and tests fine. I like the way to have both ready_to_exit and time_to_abort variables to control the exitsequence. I think the (void) cast can be removed in front of PQputCopyEnd(), PQflush for consistency purposes as it doesnot give warnings and everywhere else does not have those casts. thank you Cary
On Mon, Oct 24, 2022 at 08:15:11AM +0530, Bharath Rupireddy wrote: > The attached patch (pg_recvlogical_graceful_interrupt.text) has a > couple of problems, I believe. We're losing prepareToTerminate() with > keepalive true and we're not skipping pg_log_error("unexpected > termination of replication stream: %s" upon interrupt, after all we're > here discussing how to avoid it. > > I came up with the attached v2 patch, please have a look. This thread has slipped through the feature freeze deadline. Would people be OK to do something now on HEAD? A backpatch is also in order, IMO, as the current behavior looks confusing under SIGINT and SIGTERM. -- Michael
Attachment
On Tue, Apr 11, 2023 at 11:42 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Mon, Oct 24, 2022 at 08:15:11AM +0530, Bharath Rupireddy wrote: > > The attached patch (pg_recvlogical_graceful_interrupt.text) has a > > couple of problems, I believe. We're losing prepareToTerminate() with > > keepalive true and we're not skipping pg_log_error("unexpected > > termination of replication stream: %s" upon interrupt, after all we're > > here discussing how to avoid it. > > > > I came up with the attached v2 patch, please have a look. > > This thread has slipped through the feature freeze deadline. Would > people be OK to do something now on HEAD? A backpatch is also in > order, IMO, as the current behavior looks confusing under SIGINT and > SIGTERM. IMO, +1 for HEAD/PG16 and +0.5 for backpatching as it may not be so critical to backpatch all the way down. What may happen without this patch is that the output file isn't fsync-ed upon SIGINT/SIGTERM. Well, is it a critical issue on production servers? On Fri, Apr 7, 2023 at 5:12 AM Cary Huang <cary.huang@highgo.ca> wrote: > > The following review has been posted through the commitfest application: > > The patch applies and tests fine. I like the way to have both ready_to_exit and time_to_abort variables to control theexit sequence. I think the (void) cast can be removed in front of PQputCopyEnd(), PQflush for consistency purposes asit does not give warnings and everywhere else does not have those casts. Thanks for reviewing. I removed the (void) casts like elsewhere in the code, however, I didn't change such casts in prepareToTerminate() to not create a diff. I'm attaching the v4 patch for further review. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
On Thu, Apr 27, 2023 at 11:24:52AM +0530, Bharath Rupireddy wrote: > IMO, +1 for HEAD/PG16 and +0.5 for backpatching as it may not be so > critical to backpatch all the way down. What may happen without this > patch is that the output file isn't fsync-ed upon SIGINT/SIGTERM. > Well, is it a critical issue on production servers? It is also true that it's been this way for years with basically nobody complaining outside this thread. So there is also an argument about leaving v16 out of the picture, and do that only in 17~ to avoid playing with the branch stability more than necessary? I see 7 open items as of today, and there are TAP tests linked to pg_recvlogical. That should be OK, because none of these tests rely on specific signals, but the buildfarm has some weird setups, as well.. -- Michael
Attachment
On Thu Apr 27, 2023 at 12:54 AM CDT, Bharath Rupireddy wrote: > Thanks for reviewing. I removed the (void) casts like elsewhere in the > code, however, I didn't change such casts in prepareToTerminate() to > not create a diff. > > I'm attaching the v4 patch for further review. Bharath, I signed up to review the patch for the commitfest. The patch looks pretty good to me, but I would like to come to a conclusion on what Andres posted earlier in the thread. > Why do we need both time_to_abort and ready_to_exit? I am trying to understand why we need both as well. Maybe I am missing something important :). > /* It is not unexepected termination error when Ctrl-C'ed. */ My only other comment is that it would be nice to have the word "an" before unexpected. -- Tristan Partin Neon (https://neon.tech)
On Thu, Jul 06, 2023 at 10:29:10AM -0500, Tristan Partin wrote: > On Thu Apr 27, 2023 at 12:54 AM CDT, Bharath Rupireddy wrote: >> Why do we need both time_to_abort and ready_to_exit? > > I am trying to understand why we need both as well. Maybe I am missing > something important :). As StreamLogicalLog() states once it leaves its main loop because time_to_abort has been switched to true, we want a clean exit. I think that this patch is just a more complicated way to avoid doing twice the operations done by prepareToTerminate(). So how about moving the prepareToTerminate() call outside the main streaming loop and call it when time_to_abort is true? Then, I would suggest to change the keepalive argument of prepareToTerminate() to an enum able to handle three values to log the reason why the tool is stopping: the end of WAL, an interruption or a keepalive when logging. There are two of them now, but we want a third mode for the signals. > > /* It is not unexepected termination error when Ctrl-C'ed. */ > > My only other comment is that it would be nice to have the word "an" > before unexpected. s/unexepected/unexpected/. Still, it seems to me that we don't need this comment. -- Michael
Attachment
On Mon, Jul 10, 2023 at 01:44:45PM +0900, Michael Paquier wrote: > As StreamLogicalLog() states once it leaves its main loop because > time_to_abort has been switched to true, we want a clean exit. I > think that this patch is just a more complicated way to avoid doing > twice the operations done by prepareToTerminate(). So how about > moving the prepareToTerminate() call outside the main streaming loop > and call it when time_to_abort is true? Then, I would suggest to > change the keepalive argument of prepareToTerminate() to an enum able > to handle three values to log the reason why the tool is stopping: the > end of WAL, an interruption or a keepalive when logging. There are > two of them now, but we want a third mode for the signals. It took me some time to come back to this one, but attached is what I had in mind. This stuff has three reasons to stop: keepalive, end LSN or signal. This makes the code easier to follow. Thoughts or comments? -- Michael
Attachment
On Wed, Jul 19, 2023 at 8:04 AM Michael Paquier <michael@paquier.xyz> wrote: > > It took me some time to come back to this one, but attached is what I > had in mind. This stuff has three reasons to stop: keepalive, end LSN > or signal. This makes the code easier to follow. > > Thoughts or comments? Thanks. I have some comments on v5: 1. I don't think we need a stop_lsn variable, the cur_record_lsn can help if it's defined outside the loop. With this, the patch can further be simplified as attached v6. 2. And, I'd prefer not to assume the stop_reason as signal by default. While it works for now because the remaining place where time_to_abort is set to true is only in the signal handler, it is not extensible, if there's another exit condition comes in future that sets time_to_abort to true. 3. pg_log_info("end position %X/%X reached on signal", .... For signal, end position is a bit vague wording and I think we can just say pg_log_info("received interrupt signal, exiting"); like pg_receivewal. We really can't have a valid stop_lsn for signal exit because that depends on when signal arrives in the while loop. If at all, someone wants to know the last received LSN - they can look at the other messages that pg_recvlogical emits - pg_recvlogical: confirming write up to 0/2BFFFD0, flush to 0/2BFFFD0 (slot myslot). 4. With v5, it was taking a while to exit after the first CTRL+C, see multiple CTRL+Cs at the end: ubuntu::~/postgres/inst/bin$ ./pg_recvlogical --slot=lsub1_repl_slot --file=lsub1.data --start --verbose pg_recvlogical: starting log streaming at 0/0 (slot lsub1_repl_slot) pg_recvlogical: streaming initiated pg_recvlogical: confirming write up to 0/0, flush to 0/0 (slot lsub1_repl_slot) pg_recvlogical: confirming write up to 0/2BFFFD0, flush to 0/2BFFFD0 (slot lsub1_repl_slot) pg_recvlogical: confirming write up to 0/2BFFFD0, flush to 0/2BFFFD0 (slot lsub1_repl_slot) pg_recvlogical: confirming write up to 0/2BFFFD0, flush to 0/2BFFFD0 (slot lsub1_repl_slot) ^Cpg_recvlogical: end position 0/2867D70 reached on signal ^C^C^C^C^C^C^C^C^C^C^C^C ^C^C^C^C^C^C^C^C^C^C^C^C 5. FWIW, on HEAD we'd get the following and the patch emits a better messaging: ubuntu:~/postgres/inst/bin$ ./pg_recvlogical --slot=lsub1_repl_slot --file=lsub1.data --start --dbname='host=localhost port=5432 dbname=postgres user=ubuntu' --verbose pg_recvlogical: starting log streaming at 0/0 (slot lsub1_repl_slot) pg_recvlogical: streaming initiated pg_recvlogical: confirming write up to 0/0, flush to 0/0 (slot lsub1_repl_slot) pg_recvlogical: confirming write up to 0/2BFFFD0, flush to 0/2BFFFD0 (slot lsub1_repl_slot) pg_recvlogical: confirming write up to 0/2BFFFD0, flush to 0/2BFFFD0 (slot lsub1_repl_slot) pg_recvlogical: confirming write up to 0/2BFFFD0, flush to 0/2BFFFD0 (slot lsub1_repl_slot) ^Cpg_recvlogical: error: unexpected termination of replication stream: Attaching v6 patch with the above changes to v6. Thoughts? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
On Wed, Jul 19, 2023 at 12:07 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > 4. With v5, it was taking a while to exit after the first CTRL+C, see > multiple CTRL+Cs at the end: > ubuntu::~/postgres/inst/bin$ ./pg_recvlogical --slot=lsub1_repl_slot > --file=lsub1.data --start --verbose > pg_recvlogical: starting log streaming at 0/0 (slot lsub1_repl_slot) > pg_recvlogical: streaming initiated > pg_recvlogical: confirming write up to 0/0, flush to 0/0 (slot lsub1_repl_slot) > pg_recvlogical: confirming write up to 0/2BFFFD0, flush to 0/2BFFFD0 > (slot lsub1_repl_slot) > pg_recvlogical: confirming write up to 0/2BFFFD0, flush to 0/2BFFFD0 > (slot lsub1_repl_slot) > pg_recvlogical: confirming write up to 0/2BFFFD0, flush to 0/2BFFFD0 > (slot lsub1_repl_slot) > ^Cpg_recvlogical: end position 0/2867D70 reached on signal > ^C^C^C^C^C^C^C^C^C^C^C^C > ^C^C^C^C^C^C^C^C^C^C^C^C I think the delay is expected for the reason specified below and is not because of any of the changes in v5. As far as CTRL+C is concerned, it is a clean exit and hence we can't escape the while(1) loop. /* * We're doing a client-initiated clean exit and have sent CopyDone to * the server. Drain any messages, so we don't miss a last-minute * ErrorResponse. The walsender stops generating XLogData records once * it sees CopyDone, so expect this to finish quickly. After CopyDone, * it's too late for sendFeedback(), even if this were to take a long * time. Hence, use synchronous-mode PQgetCopyData(). */ while (1) { -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Wed, Jul 19, 2023 at 12:07:21PM +0530, Bharath Rupireddy wrote: > 1. I don't think we need a stop_lsn variable, the cur_record_lsn can > help if it's defined outside the loop. With this, the patch can > further be simplified as attached v6. Okay by me. > 2. And, I'd prefer not to assume the stop_reason as signal by default. > While it works for now because the remaining place where time_to_abort > is set to true is only in the signal handler, it is not extensible, if > there's another exit condition comes in future that sets time_to_abort > to true. Yeah, I was also wondering whether this should be set by the signal handler in this case while storing the reason statically on a specific sig_atomic_t. > 3. pg_log_info("end position %X/%X reached on signal", .... For > signal, end position is a bit vague wording and I think we can just > say pg_log_info("received interrupt signal, exiting"); like > pg_receivewal. We really can't have a valid stop_lsn for signal exit > because that depends on when signal arrives in the while loop. If at > all, someone wants to know the last received LSN - they can look at > the other messages that pg_recvlogical emits - pg_recvlogical: > confirming write up to 0/2BFFFD0, flush to 0/2BFFFD0 (slot myslot). + case STREAM_STOP_SIGNAL: + pg_log_info("received interrupt signal, exiting"); + break; Still it is useful to report the location we have finished with when stopping on a signal, no? Why couldn't we use "lsn" here, aka cur_record_lsn? > 4. With v5, it was taking a while to exit after the first CTRL+C, see > multiple CTRL+Cs at the end: > ubuntu::~/postgres/inst/bin$ ./pg_recvlogical --slot=lsub1_repl_slot > --file=lsub1.data --start --verbose > pg_recvlogical: starting log streaming at 0/0 (slot lsub1_repl_slot) > pg_recvlogical: streaming initiated > pg_recvlogical: confirming write up to 0/0, flush to 0/0 (slot lsub1_repl_slot) > pg_recvlogical: confirming write up to 0/2BFFFD0, flush to 0/2BFFFD0 > (slot lsub1_repl_slot) > pg_recvlogical: confirming write up to 0/2BFFFD0, flush to 0/2BFFFD0 > (slot lsub1_repl_slot) > pg_recvlogical: confirming write up to 0/2BFFFD0, flush to 0/2BFFFD0 > (slot lsub1_repl_slot) > ^Cpg_recvlogical: end position 0/2867D70 reached on signal > ^C^C^C^C^C^C^C^C^C^C^C^C > ^C^C^C^C^C^C^C^C^C^C^C^C Isn't that where we'd need to look at a long change but we need to stop cleanly? That sounds expected to me. -- Michael
Attachment
On Wed, Jul 19, 2023 at 1:41 PM Michael Paquier <michael@paquier.xyz> wrote: > > > 3. pg_log_info("end position %X/%X reached on signal", .... For > > signal, end position is a bit vague wording and I think we can just > > say pg_log_info("received interrupt signal, exiting"); like > > pg_receivewal. We really can't have a valid stop_lsn for signal exit > > because that depends on when signal arrives in the while loop. If at > > all, someone wants to know the last received LSN - they can look at > > the other messages that pg_recvlogical emits - pg_recvlogical: > > confirming write up to 0/2BFFFD0, flush to 0/2BFFFD0 (slot myslot). > > + case STREAM_STOP_SIGNAL: > + pg_log_info("received interrupt signal, exiting"); > + break; > > Still it is useful to report the location we have finished with when > stopping on a signal, no? Why couldn't we use "lsn" here, aka > cur_record_lsn? Printing LSN on signal exit won't be correct - if signal is received before cur_record_lsn gets assigned, we will be showing an old LSN if it was previously assigned or invalid LSN if it wasn't assigned previously. Signal arrival and processing are indeterministic, so we can't always show the right info. Instead, we can just be simple in the messaging without an lsn like pg_receivewal. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Wed, Jul 19, 2023 at 01:33:15PM +0530, Bharath Rupireddy wrote: > I think the delay is expected for the reason specified below and is > not because of any of the changes in v5. As far as CTRL+C is > concerned, it is a clean exit and hence we can't escape the while(1) > loop. Yes, that's also what I am expecting. That's costly when replaying a large change chunk, but we also want a clean exit on a signal as the code comments document, so.. -- Michael
Attachment
On Wed, Jul 19, 2023 at 01:46:02PM +0530, Bharath Rupireddy wrote: > Printing LSN on signal exit won't be correct - if signal is received > before cur_record_lsn gets assigned, we will be showing an old LSN if > it was previously assigned or invalid LSN if it wasn't assigned > previously. Signal arrival and processing are indeterministic, so we > can't always show the right info. I think that there's an argument to be made because cur_record_lsn will be set before coming back to the beginning of the replay loop when a stop is triggered by a signal. > Instead, we can just be simple in the messaging without an lsn like > pg_receivewal. Anyway, I'm OK with simple for now as it looks that you don't feel about that either, and the patch is enough to fix the report of this thread. And one would get periodic information in --verbose mode depending the sync message frequency, as well. So, I have applied v6 after fixing two issues with it: - I have kept the reason as an argument of prepareToTerminate(), to be able to take advantage of the switch structure where compilers would generate a warning if adding a new value to StreamStopReason. - cur_record_lsn was not initialized at the beginning of StreamLogicalLog(), which is where the compiler complains about the case of receiving a signal before entering in the replay loop, after establising a connection. -- Michael