Hello,
We have two logical replication slots in our postgresql database (version-11) instance and we are using pgJDBC to stream data from these two slots. We are ensuring that when we regularly send feedback and update the confirmed_flush_lsn (every 10 minutes) for both the slots to the same position. However From our data we have seen that the restart_lsn movement of the two are not in sync and most of the time one of them lags too far behind to hold the WAL files unnecessarily. Here are some data points to indicate the problem
Thu Dec 10 05:37:13 CET 2020 slot_name | restart_lsn | confirmed_flush_lsn
--------------------------------------+---------------+--------------------- db_dsn_metadata_src_private | 48FB/F3000208 | 48FB/F3000208 db_dsn_metadata_src_shared | 48FB/F3000208 | 48FB/F3000208
(2 rows)
Thu Dec 10 13:53:46 CET 2020 slot_name | restart_lsn | confirmed_flush_lsn
-------------------------------------+---------------+--------------------- db_dsn_metadata_src_private | 48FC/2309B150 | 48FC/233AA1D0 db_dsn_metadata_src_shared | 48FC/233AA1D0 | 48FC/233AA1D0
(2 rows)
Thu Dec 10 17:13:51 CET 2020 slot_name | restart_lsn | confirmed_flush_lsn
-------------------------------------+---------------+--------------------- db_dsn_metadata_src_private | 4900/B4C3AE8 | 4900/94FDF908 db_dsn_metadata_src_shared | 48FD/D2F66F10 | 4900/94FDF908
(2 rows)
Though we are using setFlushLsn() and forceStatusUpdate for both the slot's stream regularly still the slot with name private is far behind the confirmed_flush_lsn and slot with name shared is also behind with confirmed_flush_lsn but not too far. Since the restart_lsn is not moving fast enough, causing lot of issues with WAL log file management and not allowing to delete them to free up disk space
Please note that for the second slot we are not doing reading from the stream rather just sending the feedback.
How can this problem be solved? Are there any general guidelines to overcome this issue ?
Regards
Shailesh