Re: Log a warning in pg_createsubscriber for max_slot_wal_keep_size - Mailing list pgsql-hackers

From vignesh C
Subject Re: Log a warning in pg_createsubscriber for max_slot_wal_keep_size
Date
Msg-id CALDaNm09cRzke52UN5zx33PT390whU92oXY4gfOSZEo17CLPjw@mail.gmail.com
Whole thread Raw
List pgsql-hackers
On Mon, 30 Dec 2024 at 09:34, Shubham Khanna
<khannashubham1197@gmail.com> wrote:
>
> Hi,
>
> Currently, there is a risk that pg_createsubscriber may fail to
> complete successfully when the max_slot_wal_keep_size value is set too
> low. This can occur if the WAL is removed before the standby using the
> replication slot is able to complete replication, as the required WAL
> files are no longer available.
>
> I was able to reproduce this issue using the following steps:
> Set up a streaming replication environment.
> Run pg_createsubscriber in a debugger.
> Pause pg_createsubscriber at the setup_recovery stage.
> Perform several operations on the primary node to generate a large
> volume of WAL, causing older WAL segments to be removed due to the low
> max_slot_wal_keep_size setting.
> Once the necessary WAL segments are deleted, continue the execution of
> pg_createsubscriber.
> At this point, pg_createsubscriber fails with the following error:
> 2024-12-29 01:21:37.590 IST [427353] FATAL:  could not receive data
> from WAL stream: ERROR:  requested WAL segment
> 000000010000000000000003 has already been removed
> 2024-12-29 01:21:37.592 IST [427345] LOG:  waiting for WAL to become
> available at 0/3000110
> 2024-12-29 01:21:42.593 IST [427358] LOG:  started streaming WAL from
> primary at 0/3000000 on timeline 1
> 2024-12-29 01:21:42.593 IST [427358] FATAL:  could not receive data
> from WAL stream: ERROR:  requested WAL segment
> 000000010000000000000003 has already been removed
>
> This issue was previously reported in [1], with a suggestion to raise
> a warning in [2]. I’ve implemented a patch that logs a warning in
> dry-run mode. This will give users the opportunity to adjust the
> max_slot_wal_keep_size value before running the command.
>
> Thoughts?

+1 for throwing a warning in dry-run mode

Few comments:
1) We can have this check only in dry-run mode, it is not required in
non dry-run mode as there is nothing much user can do once the tool is
running, we can change this:
+       if (max_slot_wal_keep_size != -1)
+       {
+               pg_log_warning("publisher requires
'max_slot_wal_keep_size = -1', but only %d remain",
+                                          max_slot_wal_keep_size);
+               pg_log_warning_detail("Change the
'max_slot_wal_keep_size' configuration on the publisher to -1.");
+       }

to:
+       if (dry_run && max_slot_wal_keep_size != -1)
+       {
+               pg_log_warning("publisher requires
'max_slot_wal_keep_size = -1', but only %d remain",
+                                          max_slot_wal_keep_size);
+               pg_log_warning_detail("Change the
'max_slot_wal_keep_size' configuration on the publisher to -1.");
+       }

2) This error message is not quite right, can we change it to
"publisher requires max_slot_wal_keep_size to be -1, but is set to %d"
+       if (max_slot_wal_keep_size != -1)
+       {
+               pg_log_warning("publisher requires
'max_slot_wal_keep_size = -1', but only %d remain",
+                                          max_slot_wal_keep_size);
+               pg_log_warning_detail("Change the
'max_slot_wal_keep_size' configuration on the publisher to -1.");
+       }

3) Also the configuration could be specified in format specifier like
it is specified in the earlier case

Regards,
Vignesh



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: IANA timezone abbreviations versus timezone_abbreviations
Next
From: Peter Smith
Date:
Subject: Re: Introduce XID age and inactive timeout based replication slot invalidation