Hi,
On 4/24/23 11:45 AM, Amit Kapila wrote:
> On Mon, Apr 24, 2023 at 11:54 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> On Mon, Apr 24, 2023 at 11:24 AM Drouvot, Bertrand
>> <bertranddrouvot.pg@gmail.com> wrote:
>>>
>>
>> Few comments:
>> ============
>>
>
> +# We can not test if the WAL file still exists immediately.
> +# We need to let some time to the standby to actually "remove" it.
> +my $i = 0;
> +while (1)
> +{
> + last if !-f $standby_walfile;
> + if ($i++ == 10 * $default_timeout)
> + {
> + die
> + "could not determine if WAL file has been retained or not, can't continue";
> + }
> + usleep(100_000);
> +}
>
> Is this adhoc wait required because we can't guarantee that the
> checkpoint is complete on standby even after using wait_for_catchup?
Yes, the restart point on the standby is not necessary completed even after wait_for_catchup is done.
> Is there a guarantee that it can never fail on some slower machines?
>
We are waiting here at a maximum for 10 * $default_timeout (means 3 minutes) before
we time out. Would you prefer to wait more than 3 minutes at a maximum?
> BTW, for the second test is it necessary that we first ensure that the
> WAL file has not been retained on the primary?
>
I was not sure it's worth it too. Idea was more: it's useless to verify it is removed on
the standby if we are not 100% sure it has been removed on the primary first. But yeah, we can get
rid of this test if you prefer.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com