Re: Movement of restart_lsn position movement of logical replication slots is very slow - Mailing list pgsql-hackers

From Jammie
Subject Re: Movement of restart_lsn position movement of logical replication slots is very slow
Date
Msg-id CAFt1pcp=WwaqOqEPq4pie+_SDxdM2wZS6Aoi+kg1h8_OXhL8fQ@mail.gmail.com
Whole thread Raw
In response to Re: Movement of restart_lsn position movement of logical replication slots is very slow  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Movement of restart_lsn position movement of logical replication slots is very slow
List pgsql-hackers
Sorry dont have the debug setup handy. However the sql commands now works though to move the restart_lsn of the slots in standlone code from psql.

 A few followup questions.

What is catalog_xmin in the pg_replication_slots ? and how is it playing role in moving the restart_lsn of the slot.

I am just checking possibility that if a special transaction can cause private slot to stale ?

I do see that in the private slot catalog_xmin also stuck along with restart_lsn. Though from JDBC code confirmed_flush_lsn is updated correctly in the pg_replication_slots;

Regards
Shailesh

On Thu, Dec 24, 2020 at 12:29 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Dec 23, 2020 at 7:06 PM Jammie <shailesh.jamloki@gmail.com> wrote:
>
> Thanks Amit for the response.
> Two things :
> 1) In our observation via PSQL the advance command as well do not move the restart_lsn immediately. It is similar to our approach that use the confirmed_flush_lsn via stream
> 2) I am ok to understand the point that we are not reading from the stream so we might be facing the issue. But the question is why we are able to move the restart_lsn most of the time by updating the confirmed_flush_lsn via pgJDBC. But only occasionally it lags behind too far behind.
>

I am not sure why you are seeing such behavior. Is it possible for you
to debug the code? Both confirmed_flush_lsn and restart_lsn are
advanced in LogicalConfirmReceivedLocation. You can add elog to print
the values to see the progress. Here, the point to note is that even
though we update confirmed_flush_lsn every time with the new value but
restart_lsn is updated only when candidate_restart_valid has a valid
value each time after a call to LogicalConfirmReceivedLocation. We
update candidate_restart_valid in
LogicalIncreaseRestartDecodingForSlot which is called only during
decoding of XLOG_RUNNING_XACTS record. So, it is not clear to me how
in your case restart_lsn is getting advanced without decode? I think
if you add some elogs in the code to track the values of
candidate_restart_valid, confirmed_flush_lsn, and restart_lsn, you
might get some clue.

--
With Regards,
Amit Kapila.

pgsql-hackers by date:

Previous
From: "k.jamison@fujitsu.com"
Date:
Subject: RE: [Patch] Optimize dropping of relation buffers using dlist
Next
From: Amit Kapila
Date:
Subject: Re: [Patch] Optimize dropping of relation buffers using dlist