Thread: Movement of restart_lsn position movement of logical replication slots is very slow

Hello,

We have two logical replication slots in our postgresql database (version-11) instance and we are using pgJDBC to stream data from these two slots. We are ensuring that when we regularly send feedback and update the confirmed_flush_lsn (every 10 minutes) for both the slots to the same position. However From our data we have seen that the restart_lsn movement of the two are not in sync and most of the time one of them lags too far behind to hold the WAL files unnecessarily. Here are some data points to indicate the problem

Thu Dec 10 05:37:13 CET 2020                      slot_name       |  restart_lsn  | confirmed_flush_lsn 
--------------------------------------+---------------+--------------------- db_dsn_metadata_src_private          | 48FB/F3000208 | 48FB/F3000208 db_dsn_metadata_src_shared           | 48FB/F3000208 | 48FB/F3000208
(2 rows)



Thu Dec 10 13:53:46 CET 2020                      slot_name      |  restart_lsn  | confirmed_flush_lsn 
-------------------------------------+---------------+--------------------- db_dsn_metadata_src_private         | 48FC/2309B150 | 48FC/233AA1D0 db_dsn_metadata_src_shared          | 48FC/233AA1D0 | 48FC/233AA1D0
(2 rows)


Thu Dec 10 17:13:51 CET 2020                      slot_name      |  restart_lsn  | confirmed_flush_lsn 
-------------------------------------+---------------+--------------------- db_dsn_metadata_src_private         | 4900/B4C3AE8  | 4900/94FDF908 db_dsn_metadata_src_shared          | 48FD/D2F66F10 | 4900/94FDF908
(2 rows)

Though we are using setFlushLsn() and forceStatusUpdate for both the slot's stream regularly still the slot with name private is far behind the confirmed_flush_lsn and slot with name shared is also behind with confirmed_flush_lsn but not too far. Since the restart_lsn is not moving fast enough, causing lot of issues with WAL log file management and not allowing to delete them to free up disk space


Please note that for the second slot we are not doing reading from the stream rather just sending the feedback.

How can this problem be solved? Are there any general guidelines to overcome this issue ?

Regards

Shailesh

On Mon, Dec 14, 2020 at 9:30 AM Jammie <shailesh.jamloki@gmail.com> wrote:
>
> Hello,
>
> We have two logical replication slots in our postgresql database (version-11) instance and we are using pgJDBC to
streamdata from these two slots. 
>

IIUC, you are using some out-of-core outputplugin to stream the data?
Are you using in  walsender mechanism to decode the changes from slots
or via SQL APIs?

> We are ensuring that when we regularly send feedback and update the confirmed_flush_lsn (every 10 minutes) for both
theslots to the same position. However From our data we have seen that the restart_lsn movement of the two are not in
syncand most of the time one of them lags too far behind to hold the WAL files unnecessarily. Here are some data points
toindicate the problem 
>
> Thu Dec 10 05:37:13 CET 2020
>                       slot_name       |  restart_lsn  | confirmed_flush_lsn
> --------------------------------------+---------------+---------------------
>  db_dsn_metadata_src_private          | 48FB/F3000208 | 48FB/F3000208
>  db_dsn_metadata_src_shared           | 48FB/F3000208 | 48FB/F3000208
> (2 rows)
>
>
>
> Thu Dec 10 13:53:46 CET 2020
>                       slot_name      |  restart_lsn  | confirmed_flush_lsn
> -------------------------------------+---------------+---------------------
>  db_dsn_metadata_src_private         | 48FC/2309B150 | 48FC/233AA1D0
>  db_dsn_metadata_src_shared          | 48FC/233AA1D0 | 48FC/233AA1D0
> (2 rows)
>
>
> Thu Dec 10 17:13:51 CET 2020
>                       slot_name      |  restart_lsn  | confirmed_flush_lsn
> -------------------------------------+---------------+---------------------
>  db_dsn_metadata_src_private         | 4900/B4C3AE8  | 4900/94FDF908
>  db_dsn_metadata_src_shared          | 48FD/D2F66F10 | 4900/94FDF908
> (2 rows)
>
> Though we are using setFlushLsn() and forceStatusUpdate for both the slot's stream regularly still the slot with name
privateis far behind the confirmed_flush_lsn and slot with name shared is also behind with confirmed_flush_lsn but not
toofar. Since the restart_lsn is not moving fast enough, causing lot of issues with WAL log file management and not
allowingto delete them to free up disk space 
>

What is this setFlushLsn? I am not able to find in the PG-code. If it
is some outside code reference then please provide the link to code.
In general, the restart_lsn and confirmed_flush_lsn are advanced in
different ways so you might see some difference but it should not be
this much. The confirmed_flush_lsn is updated when we get confirmation
from the downstream node about the flush_lsn but restart_lsn is only
incremented based on the LSN required by the oldest in-progress
transaction.

>
> Please note that for the second slot we are not doing reading from the stream rather just sending the feedback.
>

Here does the second slot refers to 'shared' or 'private'? It is not
very clear what you mean by "we are not doing reading from the
stream', do you mean to say that decoding happens in the slot but the
output plugin just throws away the streamed data and in the end just
send the feedback?

--
With Regards,
Amit Kapila.



Thanks Amit for the response

We are using pgJDBC sample program here

the setFlushLSN is coming from the pgJDBC only.

git hub for APIs of pgJDBC methods available.


The second slot refers to "private" slot.

So ""we are not doing reading from the stream' ==> It means that we are having readPending call only from the shared slot then we get the lastReceivedLSN() from stream and
send it back to stream as confirmed_flush_lsn for both private and shared slot. We dont do readPending call to private slot. we will use private slot only when we dont have choice. It is kind of reserver slot for us.

We are also doing forceUpdateStatus for both the slots().

Questions :
1) The confirmed_flush_lsn is updated when we get confirmation
from the downstream node about the flush_lsn but restart_lsn is only
incremented based on the LSN required by the oldest in-progress
transaction. ==> As explained above we are updating (setFlshLSN an API to update confirmed_flush_lsn) both the slots with same LSN. So dont understand why one leaves behind.

2) What are the other factors that might cause delay in updating restart_lsn of the slot ?
3) In PG -13 does this behaviour change ?

Regards
Shailesh

the s

On Mon, Dec 14, 2020 at 4:51 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Dec 14, 2020 at 9:30 AM Jammie <shailesh.jamloki@gmail.com> wrote:
>
> Hello,
>
> We have two logical replication slots in our postgresql database (version-11) instance and we are using pgJDBC to stream data from these two slots.
>

IIUC, you are using some out-of-core outputplugin to stream the data?
Are you using in  walsender mechanism to decode the changes from slots
or via SQL APIs?

> We are ensuring that when we regularly send feedback and update the confirmed_flush_lsn (every 10 minutes) for both the slots to the same position. However From our data we have seen that the restart_lsn movement of the two are not in sync and most of the time one of them lags too far behind to hold the WAL files unnecessarily. Here are some data points to indicate the problem
>
> Thu Dec 10 05:37:13 CET 2020
>                       slot_name       |  restart_lsn  | confirmed_flush_lsn
> --------------------------------------+---------------+---------------------
>  db_dsn_metadata_src_private          | 48FB/F3000208 | 48FB/F3000208
>  db_dsn_metadata_src_shared           | 48FB/F3000208 | 48FB/F3000208
> (2 rows)
>
>
>
> Thu Dec 10 13:53:46 CET 2020
>                       slot_name      |  restart_lsn  | confirmed_flush_lsn
> -------------------------------------+---------------+---------------------
>  db_dsn_metadata_src_private         | 48FC/2309B150 | 48FC/233AA1D0
>  db_dsn_metadata_src_shared          | 48FC/233AA1D0 | 48FC/233AA1D0
> (2 rows)
>
>
> Thu Dec 10 17:13:51 CET 2020
>                       slot_name      |  restart_lsn  | confirmed_flush_lsn
> -------------------------------------+---------------+---------------------
>  db_dsn_metadata_src_private         | 4900/B4C3AE8  | 4900/94FDF908
>  db_dsn_metadata_src_shared          | 48FD/D2F66F10 | 4900/94FDF908
> (2 rows)
>
> Though we are using setFlushLsn() and forceStatusUpdate for both the slot's stream regularly still the slot with name private is far behind the confirmed_flush_lsn and slot with name shared is also behind with confirmed_flush_lsn but not too far. Since the restart_lsn is not moving fast enough, causing lot of issues with WAL log file management and not allowing to delete them to free up disk space
>

What is this setFlushLsn? I am not able to find in the PG-code. If it
is some outside code reference then please provide the link to code.
In general, the restart_lsn and confirmed_flush_lsn are advanced in
different ways so you might see some difference but it should not be
this much. The confirmed_flush_lsn is updated when we get confirmation
from the downstream node about the flush_lsn but restart_lsn is only
incremented based on the LSN required by the oldest in-progress
transaction.

>
> Please note that for the second slot we are not doing reading from the stream rather just sending the feedback.
>

Here does the second slot refers to 'shared' or 'private'? It is not
very clear what you mean by "we are not doing reading from the
stream', do you mean to say that decoding happens in the slot but the
output plugin just throws away the streamed data and in the end just
send the feedback?

--
With Regards,
Amit Kapila.
On Tue, Dec 15, 2020 at 11:00 AM Jammie <shailesh.jamloki@gmail.com> wrote:
>
> Thanks Amit for the response
>
> We are using pgJDBC sample program here
> https://jdbc.postgresql.org/documentation/head/replication.html
>
> the setFlushLSN is coming from the pgJDBC only.
>
> git hub for APIs of pgJDBC methods available.
>
> https://github.com/pgjdbc/pgjdbc
>
> The second slot refers to "private" slot.
>
> So ""we are not doing reading from the stream' ==> It means that we are having readPending call only from the shared
slotthen we get the lastReceivedLSN() from stream and
 
> send it back to stream as confirmed_flush_lsn for both private and shared slot. We dont do readPending call to
privateslot. we will use private slot only when we dont have choice. It is kind of reserver slot for us.
 
>

I think this (not performing read/decode on the private slot) could be
the reason why it lagging behind. If you want to use as a reserve slot
then you probably want to at least perform
pg_replication_slot_advance() to move it to the required position. The
restart_lsn won't move unless you read/decode from that slot.

-- 
With Regards,
Amit Kapila.



Thanks Amit for the response.
Two things :
1) In our observation via PSQL the advance command as well do not move the restart_lsn immediately. It is similar to our approach that use the confirmed_flush_lsn via stream
2) I am ok to understand the point that we are not reading from the stream so we might be facing the issue. But the question is why we are able to move the restart_lsn most of the time by updating the confirmed_flush_lsn via pgJDBC. But only occasionally it lags behind too far behind.

Regards
Shailesh

 

On Tue, Dec 15, 2020 at 6:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Dec 15, 2020 at 11:00 AM Jammie <shailesh.jamloki@gmail.com> wrote:
>
> Thanks Amit for the response
>
> We are using pgJDBC sample program here
> https://jdbc.postgresql.org/documentation/head/replication.html
>
> the setFlushLSN is coming from the pgJDBC only.
>
> git hub for APIs of pgJDBC methods available.
>
> https://github.com/pgjdbc/pgjdbc
>
> The second slot refers to "private" slot.
>
> So ""we are not doing reading from the stream' ==> It means that we are having readPending call only from the shared slot then we get the lastReceivedLSN() from stream and
> send it back to stream as confirmed_flush_lsn for both private and shared slot. We dont do readPending call to private slot. we will use private slot only when we dont have choice. It is kind of reserver slot for us.
>

I think this (not performing read/decode on the private slot) could be
the reason why it lagging behind. If you want to use as a reserve slot
then you probably want to at least perform
pg_replication_slot_advance() to move it to the required position. The
restart_lsn won't move unless you read/decode from that slot.

--
With Regards,
Amit Kapila.
However when the situation comes and that one slot gets behind it never recovers and no way to recover from this situation even after reading using advance ro pg_logical_get_changes sql command.

On Wed, Dec 23, 2020 at 7:05 PM Jammie <shailesh.jamloki@gmail.com> wrote:
Thanks Amit for the response.
Two things :
1) In our observation via PSQL the advance command as well do not move the restart_lsn immediately. It is similar to our approach that use the confirmed_flush_lsn via stream
2) I am ok to understand the point that we are not reading from the stream so we might be facing the issue. But the question is why we are able to move the restart_lsn most of the time by updating the confirmed_flush_lsn via pgJDBC. But only occasionally it lags behind too far behind.

Regards
Shailesh

 

On Tue, Dec 15, 2020 at 6:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Dec 15, 2020 at 11:00 AM Jammie <shailesh.jamloki@gmail.com> wrote:
>
> Thanks Amit for the response
>
> We are using pgJDBC sample program here
> https://jdbc.postgresql.org/documentation/head/replication.html
>
> the setFlushLSN is coming from the pgJDBC only.
>
> git hub for APIs of pgJDBC methods available.
>
> https://github.com/pgjdbc/pgjdbc
>
> The second slot refers to "private" slot.
>
> So ""we are not doing reading from the stream' ==> It means that we are having readPending call only from the shared slot then we get the lastReceivedLSN() from stream and
> send it back to stream as confirmed_flush_lsn for both private and shared slot. We dont do readPending call to private slot. we will use private slot only when we dont have choice. It is kind of reserver slot for us.
>

I think this (not performing read/decode on the private slot) could be
the reason why it lagging behind. If you want to use as a reserve slot
then you probably want to at least perform
pg_replication_slot_advance() to move it to the required position. The
restart_lsn won't move unless you read/decode from that slot.

--
With Regards,
Amit Kapila.
On Wed, Dec 23, 2020 at 7:06 PM Jammie <shailesh.jamloki@gmail.com> wrote:
>
> Thanks Amit for the response.
> Two things :
> 1) In our observation via PSQL the advance command as well do not move the restart_lsn immediately. It is similar to
ourapproach that use the confirmed_flush_lsn via stream 
> 2) I am ok to understand the point that we are not reading from the stream so we might be facing the issue. But the
questionis why we are able to move the restart_lsn most of the time by updating the confirmed_flush_lsn via pgJDBC. But
onlyoccasionally it lags behind too far behind. 
>

I am not sure why you are seeing such behavior. Is it possible for you
to debug the code? Both confirmed_flush_lsn and restart_lsn are
advanced in LogicalConfirmReceivedLocation. You can add elog to print
the values to see the progress. Here, the point to note is that even
though we update confirmed_flush_lsn every time with the new value but
restart_lsn is updated only when candidate_restart_valid has a valid
value each time after a call to LogicalConfirmReceivedLocation. We
update candidate_restart_valid in
LogicalIncreaseRestartDecodingForSlot which is called only during
decoding of XLOG_RUNNING_XACTS record. So, it is not clear to me how
in your case restart_lsn is getting advanced without decode? I think
if you add some elogs in the code to track the values of
candidate_restart_valid, confirmed_flush_lsn, and restart_lsn, you
might get some clue.

--
With Regards,
Amit Kapila.



Sorry dont have the debug setup handy. However the sql commands now works though to move the restart_lsn of the slots in standlone code from psql.

 A few followup questions.

What is catalog_xmin in the pg_replication_slots ? and how is it playing role in moving the restart_lsn of the slot.

I am just checking possibility that if a special transaction can cause private slot to stale ?

I do see that in the private slot catalog_xmin also stuck along with restart_lsn. Though from JDBC code confirmed_flush_lsn is updated correctly in the pg_replication_slots;

Regards
Shailesh

On Thu, Dec 24, 2020 at 12:29 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Dec 23, 2020 at 7:06 PM Jammie <shailesh.jamloki@gmail.com> wrote:
>
> Thanks Amit for the response.
> Two things :
> 1) In our observation via PSQL the advance command as well do not move the restart_lsn immediately. It is similar to our approach that use the confirmed_flush_lsn via stream
> 2) I am ok to understand the point that we are not reading from the stream so we might be facing the issue. But the question is why we are able to move the restart_lsn most of the time by updating the confirmed_flush_lsn via pgJDBC. But only occasionally it lags behind too far behind.
>

I am not sure why you are seeing such behavior. Is it possible for you
to debug the code? Both confirmed_flush_lsn and restart_lsn are
advanced in LogicalConfirmReceivedLocation. You can add elog to print
the values to see the progress. Here, the point to note is that even
though we update confirmed_flush_lsn every time with the new value but
restart_lsn is updated only when candidate_restart_valid has a valid
value each time after a call to LogicalConfirmReceivedLocation. We
update candidate_restart_valid in
LogicalIncreaseRestartDecodingForSlot which is called only during
decoding of XLOG_RUNNING_XACTS record. So, it is not clear to me how
in your case restart_lsn is getting advanced without decode? I think
if you add some elogs in the code to track the values of
candidate_restart_valid, confirmed_flush_lsn, and restart_lsn, you
might get some clue.

--
With Regards,
Amit Kapila.
On Thu, Dec 24, 2020 at 7:30 PM Jammie <shailesh.jamloki@gmail.com> wrote:
>
> Sorry dont have the debug setup handy. However the sql commands now works though to move the restart_lsn of the slots
instandlone code from psql.
 
>
>  A few followup questions.
>
> What is catalog_xmin in the pg_replication_slots ? and how is it playing role in moving the restart_lsn of the slot.
>
> I am just checking possibility that if a special transaction can cause private slot to stale ?
>

Yeah, it is possible if there is some old transaction is active in the
system. The restart_lsn is lsn required by the oldesttxn. But it is
strange that it affects only one of the slots.

-- 
With Regards,
Amit Kapila.



Hi Amit, 
Thanks for the response .
Can you please let me know what pg_current_wal_lsn returns ?

is this position the LSN of the next log record to be created, or is it the LSN of the last log record already created and inserted in the log?

The document says 
- it returns current WAL write location.

Regards
Shailesh

On Thu, 24 Dec, 2020, 7:43 pm Amit Kapila, <amit.kapila16@gmail.com> wrote:
On Thu, Dec 24, 2020 at 7:30 PM Jammie <shailesh.jamloki@gmail.com> wrote:
>
> Sorry dont have the debug setup handy. However the sql commands now works though to move the restart_lsn of the slots in standlone code from psql.
>
>  A few followup questions.
>
> What is catalog_xmin in the pg_replication_slots ? and how is it playing role in moving the restart_lsn of the slot.
>
> I am just checking possibility that if a special transaction can cause private slot to stale ?
>

Yeah, it is possible if there is some old transaction is active in the
system. The restart_lsn is lsn required by the oldesttxn. But it is
strange that it affects only one of the slots.

--
With Regards,
Amit Kapila.
On Tue, Jan 12, 2021 at 9:15 AM Jammie <shailesh.jamloki@gmail.com> wrote:
>
> Hi Amit,
> Thanks for the response .
> Can you please let me know what pg_current_wal_lsn returns ?
>
> is this position the LSN of the next log record to be created, or is it the LSN of the last log record already
createdand inserted in the log?
 
>

This is the position up to which we have already written the WAL to
the kernel but not yet flushed to disk.

-- 
With Regards,
Amit Kapila.