Re: recovery starting when backup_label exists, but notrecovery.signal - Mailing list pgsql-hackers
From | David Steele |
---|---|
Subject | Re: recovery starting when backup_label exists, but notrecovery.signal |
Date | |
Msg-id | c4909bdd-4a4d-31b7-c705-aabf3f1273e0@pgmasters.net Whole thread Raw |
In response to | Re: recovery starting when backup_label exists, but not recovery.signal (Fujii Masao <masao.fujii@gmail.com>) |
List | pgsql-hackers |
On 9/27/19 4:34 AM, Fujii Masao wrote: > On Fri, Sep 27, 2019 at 3:36 AM David Steele <david@pgmasters.net> wrote: >> >> On 9/24/19 1:25 AM, Fujii Masao wrote: >>> >>> When backup_label exists, the startup process enters archive recovery mode >>> even if recovery.signal file doesn't exist. In this case, the startup process >>> tries to retrieve WAL files by using restore_command. Then, at the beginning >>> of the archive recovery, the contents of backup_label are copied to pg_control >>> and backup_label file is removed. This would be an intentional behavior. >> >>> But I think the problem is that, if the server shuts down during that >>> archive recovery, the restart of the server may cause the recovery to fail >>> because neither backup_label nor recovery.signal exist and the server >>> doesn't enter an archive recovery mode. Is this intentional, too? Seems No. >>> >>> So the problematic scenario is; >>> >>> 1. the server starts with backup_label, but not recovery.signal. >>> 2. the startup process enters an archive recovery mode because >>> backup_label exists. >>> 3. the contents of backup_label are copied to pg_control and >>> backup_label is deleted. >> >> Do you mean deleted or renamed to backup_label.old? > > Sorry for the confusing wording.. > I meant the following code that renames backup_label to .old, in StartupXLOG(). Right, that makes sense. >> >> I assume you have a repro? Can you give more details? > > What I did is: > > 1. Start PostgreSQL server with WAL archiving enabled. > 2. Take an online backup by using pg_basebackup, for example, > $ pg_basebackup -D backup > 3. Execute many write SQL to generate lots of WAL files. During that execution, > perform CHECKPOINT to remove some WAL files from pg_wal directory. > You need to repeat these until you confirm that there are many WAL files > that have already been removed from pg_wal but exist only in archive area. > 4. Shutdown the server. > 5. Remove PGDATA and restore it from backup. > 6. Set up restore_command. > 7. (Forget to put recovery.signal) > That is, in this scenario, you want to recover the database up to > the latest WAL records in the archive area. So you need to start archive > recovery by setting restore_command and putting recovery.signal. > But the problem happens when you forget to put recovery.signal. > 8. Start PostgreSQL server. > 9. Shutdown the server while it's restoring archived WAL files and replaying > them. At this point, you will notice that the archive recovery starts > even though recovery.signal doesn't exist. So even archived WAL files > are successfully restored at this step. > 10. Restart PostgreSQL server. Since neither backup_label or recovery.signal > exist, crash recovery starts and fail to restore the archived WAL files. > So you fail to recover the database up to the latest WAL record > in archive > directory. The recovery will finish at early point. Yes, I see it now. I did not have enough WAL to make it work before, as I suspected. >>> One idea to fix this issue is to make the above step #3 remember that >>> backup_label existed, in pg_control. Then we should make the subsequent >>> recovery enter an archive recovery mode if pg_control indicates that >>> even if neither backup_label nor recovery.signal exist. Thought? >> >> That seems pretty invasive to me at this stage. I'd like to reproduce >> it and see if there are alternatives. >> >> Also, are you sure this is a new behavior? > > In v11 or before, if backup_label exists but not recovery.conf, > the startup process doesn't enter an archive recovery mode. It starts > crash recovery in that case. So the bahavior is somewhat different > between versions. Agreed. Since recovery options can be used in the presence of backup_label *or* recovery.signal (or standby.signal for that matter) this does represent a change in behavior. And it doesn't appear to be a beneficial change. Regards, -- -David david@pgmasters.net
pgsql-hackers by date: