On Wed, Mar 13, 2024 at 01:12:28PM +1300, David Steele wrote:
> On 2/29/24 16:55, Michael Paquier wrote:
>> +unlink("$backup_dir/postmaster.pid")
>> + or BAIL_OUT("unable to unlink $backup_dir/postmaster.pid");
>> +unlink("$backup_dir/postmaster.opts")
>> + or BAIL_OUT("unable to unlink $backup_dir/postmaster.opts");
>> +unlink("$backup_dir/global/pg_control")
>> + or BAIL_OUT("unable to unlink $backup_dir/global/pg_control");
>>
>> RELCACHE_INIT_FILENAME as well?
>
> I'm not trying to implement the full exclusion list here, just enough to get
> the test working. Since exclusions are optional according to the docs I
> don't think we need them for a valid tests.
Okay. Fine by me at the end.
>> +# Rather than writing out backup_label, try to recover the backup without
>> +# backup_label to demonstrate that recovery will not work correctly without it,
>> +# i.e. the canary table will be missing and the cluster will be corrupt. Provide
>> +# only the WAL segment that recovery will think it needs.
>>
>> Okay, why not. No objections to this addition. I am a bit surprised
>> that this is not scanning some of the logs produced by the startup
>> process for particular patterns.
>
> Not sure what to look for here. There are no distinct messages for crash
> recovery. Perhaps there should be?
The closest thing I can think of here would be "database system was
not properly shut down; automatic recovery in progress" as we don't
have InArchiveRecovery, after checking that the canary is missing. If
you don't like this suggestion, feel free to say so, of course :)
>> +# Save backup_label into the backup directory and recover using the primary's
>> +# archive. This time recovery will succeed and the canary table will be
>> +# present.
>>
>> Here are well, I think that we should add some log_contains() with
>> pre-defined patterns to show that recovery has completed the way we
>> want it with a backup_label up to the end-of-backup record.
>
> Sure, I added a check for the new log message when recovering with a
> backup_label.
+ok($node_replica->log_contains('completed backup recovery with redo LSN'),
+ 'verify backup recovery performed with backup_label');
Okay for this choice. I was thinking first about "starting backup
recovery with redo LSN", closer to the area where the backup_label is
read.
--
Michael