Thread: [BUG] non archived WAL removed during production crash recovery
[BUG] non archived WAL removed during production crash recovery
From
Jehan-Guillaume de Rorthais
Date:
Hello, A colleague of mine reported an expected behavior. On production cluster is in crash recovery, eg. after killing a backend, the WALs ready to be archived are removed before being archived. See in attachment the reproduction script "non-arch-wal-on-recovery.bash". This behavior has been introduced in 78ea8b5daab9237fd42d7a8a836c1c451765499f. Function XLogArchiveCheckDone() badly consider the in crashed recovery production cluster as a standby without archive_mode=always. So the check conclude the WAL can be removed safely. bool inRecovery = RecoveryInProgress(); /* * The file is always deletable if archive_mode is "off". On standbys * archiving is disabled if archive_mode is "on", and enabled with * "always". On a primary, archiving is enabled if archive_mode is "on" * or "always". */ if (!((XLogArchivingActive() && !inRecovery) || (XLogArchivingAlways() && inRecovery))) return true; Please find in attachment a patch that fix this issue using the following test instead: if (!((XLogArchivingActive() && !StandbyModeRequested) || (XLogArchivingAlways() && inRecovery))) return true; I'm not sure if we should rely on StandbyModeRequested for the second part of the test as well thought. What was the point to rely on RecoveryInProgress() to get the recovery status from shared mem? Regards,
Attachment
On 2020/04/01 0:22, Jehan-Guillaume de Rorthais wrote: > Hello, > > A colleague of mine reported an expected behavior. > > On production cluster is in crash recovery, eg. after killing a backend, the > WALs ready to be archived are removed before being archived. > > See in attachment the reproduction script "non-arch-wal-on-recovery.bash". > > This behavior has been introduced in 78ea8b5daab9237fd42d7a8a836c1c451765499f. > Function XLogArchiveCheckDone() badly consider the in crashed recovery > production cluster as a standby without archive_mode=always. So the check > conclude the WAL can be removed safely. Thanks for the report! Yeah, this seems a bug. > bool inRecovery = RecoveryInProgress(); > > /* > * The file is always deletable if archive_mode is "off". On standbys > * archiving is disabled if archive_mode is "on", and enabled with > * "always". On a primary, archiving is enabled if archive_mode is "on" > * or "always". > */ > if (!((XLogArchivingActive() && !inRecovery) || > (XLogArchivingAlways() && inRecovery))) > return true; > > Please find in attachment a patch that fix this issue using the following test > instead: > > if (!((XLogArchivingActive() && !StandbyModeRequested) || > (XLogArchivingAlways() && inRecovery))) > return true; > > I'm not sure if we should rely on StandbyModeRequested for the second part of > the test as well thought. What was the point to rely on RecoveryInProgress() to > get the recovery status from shared mem? Since StandbyModeRequested is the startup process-local variable, it basically cannot be used in XLogArchiveCheckDone() that other process may call. So another approach would be necessary... One straight idea is to add new shmem flag indicating whether we are in standby mode or not. Another is to make the startup process remove .ready file if necessary. If it's not easy to fix the issue, we might need to just revert the commit that introduced the issue at first. Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters
Re: [BUG] non archived WAL removed during production crash recovery
From
Jehan-Guillaume de Rorthais
Date:
On Wed, 1 Apr 2020 17:27:22 +0900 Fujii Masao <masao.fujii@oss.nttdata.com> wrote: [...] > > bool inRecovery = RecoveryInProgress(); > > > > /* > > * The file is always deletable if archive_mode is "off". On standbys > > * archiving is disabled if archive_mode is "on", and enabled with > > * "always". On a primary, archiving is enabled if archive_mode is "on" > > * or "always". > > */ > > if (!((XLogArchivingActive() && !inRecovery) || > > (XLogArchivingAlways() && inRecovery))) > > return true; > > > > Please find in attachment a patch that fix this issue using the following > > test instead: > > > > if (!((XLogArchivingActive() && !StandbyModeRequested) || > > (XLogArchivingAlways() && inRecovery))) > > return true; > > > > I'm not sure if we should rely on StandbyModeRequested for the second part > > of the test as well thought. What was the point to rely on > > RecoveryInProgress() to get the recovery status from shared mem? > > Since StandbyModeRequested is the startup process-local variable, > it basically cannot be used in XLogArchiveCheckDone() that other process > may call. Ok, you answered my wondering about using recovery status from shared mem. This was obvious. Thanks for your help! I was wondering if we could use "ControlFile->state != DB_IN_CRASH_RECOVERY". It seems fine during crash recovery as the control file is updated before the checkpoint, but it doesn't feel right for other code paths where the control file might not be up-to-date on filesystem, right ? > So another approach would be necessary... One straight idea is to > add new shmem flag indicating whether we are in standby mode or not. I was thinking about setting XLogCtlData->SharedRecoveryInProgress as an enum using: enum RecoveryState { NOT_IN_RECOVERY = 0 IN_CRASH_RECOVERY, IN_ARCHIVE_RECOVERY } Please, find in attachment a patch implementing this. Plus, I added a second commit to add one test in regard with this bug. > Another is to make the startup process remove .ready file if necessary. I'm not sure to understand this one. Regards,
Attachment
Hello. At Wed, 1 Apr 2020 18:17:35 +0200, Jehan-Guillaume de Rorthais <jgdr@dalibo.com> wrote in > Please, find in attachment a patch implementing this. The patch partially reintroduces the issue the patch have fixed. Specifically a standby running a crash recovery wrongly marks a WAL file as ".ready" if it is extant in pg_wal without accompanied by .ready file. Perhaps checking '.ready' before the checking for archive-mode would be sufficient. > Plus, I added a second commit to add one test in regard with this bug. > > > Another is to make the startup process remove .ready file if necessary. > > I'm not sure to understand this one. -- Kyotaro Horiguchi NTT Open Source Software Center
Sorry, it was quite ambiguous. At Thu, 02 Apr 2020 13:04:43 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in > At Wed, 1 Apr 2020 18:17:35 +0200, Jehan-Guillaume de Rorthais <jgdr@dalibo.com> wrote in > > Please, find in attachment a patch implementing this. > > The patch partially reintroduces the issue the patch have > fixed. Specifically a standby running a crash recovery wrongly marks a > WAL file as ".ready" if it is extant in pg_wal without accompanied by > .ready file. The patch partially reintroduces the issue the commit 78ea8b5daa have fixed. Specifically a standby running a crash recovery wrongly marks a WAL file as ".ready" if it is extant in pg_wal without accompanied by .ready file. > Perhaps checking '.ready' before the checking for archive-mode would > be sufficient. > > > Plus, I added a second commit to add one test in regard with this bug. > > > > > Another is to make the startup process remove .ready file if necessary. > > > > I'm not sure to understand this one. -- Kyotaro Horiguchi NTT Open Source Software Center
On 2020/04/02 13:07, Kyotaro Horiguchi wrote: > Sorry, it was quite ambiguous. > > At Thu, 02 Apr 2020 13:04:43 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in >> At Wed, 1 Apr 2020 18:17:35 +0200, Jehan-Guillaume de Rorthais <jgdr@dalibo.com> wrote in >>> Please, find in attachment a patch implementing this. >> >> The patch partially reintroduces the issue the patch have >> fixed. Specifically a standby running a crash recovery wrongly marks a >> WAL file as ".ready" if it is extant in pg_wal without accompanied by >> .ready file. > > The patch partially reintroduces the issue the commit 78ea8b5daa have > fixed. Specifically a standby running a crash recovery wrongly marks a > WAL file as ".ready" if it is extant in pg_wal without accompanied by > .ready file. On second thought, I think that we should discuss what the desirable behavior is before the implentation. Especially what's unclear to me is whether to remove such WAL files in archive recovery case with archive_mode=on. Those WAL files would be required when recovering from the backup taken before that archive recovery happens. So it seems unsafe to remove them in that case. Therefore, IMO that the patch should change the code so that no unarchived WAL files are removed not only in crash recovery but also archive recovery. Thought? Of course, this change would lead to the issue that the past unarchived WAL files keep remaining in the case of warm-standby using archive recovery. But this issue looks unavoidable. If users want to avoid that, archive_mode should be set to always. Also I'm a bit wondering if it's really safe to remove such unarchived WAL files even in the standby case with archive_mode=on. I would need more time to think that. >> Perhaps checking '.ready' before the checking for archive-mode would >> be sufficient. >> >>> Plus, I added a second commit to add one test in regard with this bug. >>> >>>> Another is to make the startup process remove .ready file if necessary. >>> >>> I'm not sure to understand this one. I was thinking to make the startup process remove such unarchived WAL files if archive_mode=on and StandbyModeRequested/ArchiveRecoveryRequested is true. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
At Thu, 2 Apr 2020 14:19:15 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in > On 2020/04/02 13:07, Kyotaro Horiguchi wrote: > > Sorry, it was quite ambiguous. > > At Thu, 02 Apr 2020 13:04:43 +0900 (JST), Kyotaro Horiguchi > > <horikyota.ntt@gmail.com> wrote in > >> At Wed, 1 Apr 2020 18:17:35 +0200, Jehan-Guillaume de Rorthais > >> <jgdr@dalibo.com> wrote in > >>> Please, find in attachment a patch implementing this. > >> > >> The patch partially reintroduces the issue the patch have > >> fixed. Specifically a standby running a crash recovery wrongly marks a > >> WAL file as ".ready" if it is extant in pg_wal without accompanied by > >> .ready file. > > The patch partially reintroduces the issue the commit 78ea8b5daa have > > fixed. Specifically a standby running a crash recovery wrongly marks a > > WAL file as ".ready" if it is extant in pg_wal without accompanied by > > .ready file. > > On second thought, I think that we should discuss what the desirable > behavior is before the implentation. Especially what's unclear to me Agreed. > is whether to remove such WAL files in archive recovery case with > archive_mode=on. Those WAL files would be required when recovering > from the backup taken before that archive recovery happens. > So it seems unsafe to remove them in that case. I'm not sure I'm getting the intention correctly, but I think it responsibility of the operator to provide a complete set of archived WAL files for a backup. Could you elaborate what operation steps are you assuming of? > Therefore, IMO that the patch should change the code so that > no unarchived WAL files are removed not only in crash recovery > but also archive recovery. Thought? Agreed if "an unarchived WAL" means "a WAL file that is marked .ready" and it should be archived immediately. My previous mail is written based on the same thought. In a very narrow window, if server crashed or killed after a segment is finished but before marking the file as .ready, the file doesn't have .ready but should be archived. If we need to get rid of such a window, it would help to mark a WAL file as ".busy" at creation time. > Of course, this change would lead to the issue that the past > unarchived > WAL files keep remaining in the case of warm-standby using archive > recovery. But this issue looks unavoidable. If users want to avoid > that, > archive_mode should be set to always. > > Also I'm a bit wondering if it's really safe to remove such unarchived > WAL files even in the standby case with archive_mode=on. I would need > more time to think that. > > >> Perhaps checking '.ready' before the checking for archive-mode would > >> be sufficient. > >> > >>> Plus, I added a second commit to add one test in regard with this bug. > >>> > >>>> Another is to make the startup process remove .ready file if > >>>> necessary. > >>> > >>> I'm not sure to understand this one. > > I was thinking to make the startup process remove such unarchived WAL > files > if archive_mode=on and StandbyModeRequested/ArchiveRecoveryRequested > is true. As mentioned above, I don't understand the point of preserving WAL files that are either marked as .ready or not marked at all on a standby with archive_mode=on. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On 2020/04/02 16:23, Kyotaro Horiguchi wrote: > At Thu, 2 Apr 2020 14:19:15 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in >> On 2020/04/02 13:07, Kyotaro Horiguchi wrote: >>> Sorry, it was quite ambiguous. >>> At Thu, 02 Apr 2020 13:04:43 +0900 (JST), Kyotaro Horiguchi >>> <horikyota.ntt@gmail.com> wrote in >>>> At Wed, 1 Apr 2020 18:17:35 +0200, Jehan-Guillaume de Rorthais >>>> <jgdr@dalibo.com> wrote in >>>>> Please, find in attachment a patch implementing this. >>>> >>>> The patch partially reintroduces the issue the patch have >>>> fixed. Specifically a standby running a crash recovery wrongly marks a >>>> WAL file as ".ready" if it is extant in pg_wal without accompanied by >>>> .ready file. >>> The patch partially reintroduces the issue the commit 78ea8b5daa have >>> fixed. Specifically a standby running a crash recovery wrongly marks a >>> WAL file as ".ready" if it is extant in pg_wal without accompanied by >>> .ready file. >> >> On second thought, I think that we should discuss what the desirable >> behavior is before the implentation. Especially what's unclear to me > > Agreed. > >> is whether to remove such WAL files in archive recovery case with >> archive_mode=on. Those WAL files would be required when recovering >> from the backup taken before that archive recovery happens. >> So it seems unsafe to remove them in that case. > > I'm not sure I'm getting the intention correctly, but I think it > responsibility of the operator to provide a complete set of archived > WAL files for a backup. Could you elaborate what operation steps are > you assuming of? Please imagine the case where you need to do archive recovery from the database snapshot taken while there are many WAL files with .ready files. Those WAL files have not been archived yet. In this case, ISTM those WAL files should not be removed until they are archived, when archive_mode = on. >> Therefore, IMO that the patch should change the code so that >> no unarchived WAL files are removed not only in crash recovery >> but also archive recovery. Thought? > > Agreed if "an unarchived WAL" means "a WAL file that is marked .ready" > and it should be archived immediately. My previous mail is written > based on the same thought. Ok, so our *current* consensus seems the followings. Right? - If archive_mode=off, any WAL files with .ready files are removed in crash recovery, archive recoery and standby mode. - If archive_mode=on, WAL files with .ready files are removed only in standby mode. In crash recovery and archive recovery cases, they keep remaining and would be archived after recovery finishes (i.e., during normal processing). - If archive_mode=always, in crash recovery, archive recovery and standby mode, WAL files with .ready files are archived if WAL archiver is running. That is, WAL files with .ready files are removed when either archive_mode!=always in standby mode or archive_mode=off. > In a very narrow window, if server crashed or killed after a segment > is finished but before marking the file as .ready, the file doesn't > have .ready but should be archived. If we need to get rid of such a > window, it would help to mark a WAL file as ".busy" at creation time. > >> Of course, this change would lead to the issue that the past >> unarchived >> WAL files keep remaining in the case of warm-standby using archive >> recovery. But this issue looks unavoidable. If users want to avoid >> that, >> archive_mode should be set to always. >> >> Also I'm a bit wondering if it's really safe to remove such unarchived >> WAL files even in the standby case with archive_mode=on. I would need >> more time to think that. >> >>>> Perhaps checking '.ready' before the checking for archive-mode would >>>> be sufficient. >>>> >>>>> Plus, I added a second commit to add one test in regard with this bug. >>>>> >>>>>> Another is to make the startup process remove .ready file if >>>>>> necessary. >>>>> >>>>> I'm not sure to understand this one. >> >> I was thinking to make the startup process remove such unarchived WAL >> files >> if archive_mode=on and StandbyModeRequested/ArchiveRecoveryRequested >> is true. > > As mentioned above, I don't understand the point of preserving WAL > files that are either marked as .ready or not marked at all on a > standby with archive_mode=on. Maybe yes. But I'm not confident about that there is no such case. Anyway I'm fine to fix the bug based on the above consensus at first. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: [BUG] non archived WAL removed during production crash recovery
From
Jehan-Guillaume de Rorthais
Date:
On Thu, 02 Apr 2020 13:07:34 +0900 (JST) Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > Sorry, it was quite ambiguous. > > At Thu, 02 Apr 2020 13:04:43 +0900 (JST), Kyotaro Horiguchi > <horikyota.ntt@gmail.com> wrote in > > At Wed, 1 Apr 2020 18:17:35 +0200, Jehan-Guillaume de Rorthais > > <jgdr@dalibo.com> wrote in > > > Please, find in attachment a patch implementing this. > > > > The patch partially reintroduces the issue the patch have > > fixed. Specifically a standby running a crash recovery wrongly marks a > > WAL file as ".ready" if it is extant in pg_wal without accompanied by > > .ready file. > > The patch partially reintroduces the issue the commit 78ea8b5daa have > fixed. Specifically a standby running a crash recovery wrongly marks a > WAL file as ".ready" if it is extant in pg_wal without accompanied by > .ready file. As far as I understand StartupXLOG(), NOT_IN_RECOVERY and IN_CRASH_RECOVERY are only set for production clusters, not standby ones. So the following test should never catch standby cluster as : (XLogArchivingActive() && inRecoveryState != IN_ARCHIVE_RECOVERY) Forgive me if I'm wrong, but do I miss something? Regards,
Re: [BUG] non archived WAL removed during production crash recovery
From
Jehan-Guillaume de Rorthais
Date:
On Thu, 2 Apr 2020 19:38:59 +0900 Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > On 2020/04/02 16:23, Kyotaro Horiguchi wrote: > > At Thu, 2 Apr 2020 14:19:15 +0900, Fujii Masao > > <masao.fujii@oss.nttdata.com> wrote in [...] > >> is whether to remove such WAL files in archive recovery case with > >> archive_mode=on. Those WAL files would be required when recovering > >> from the backup taken before that archive recovery happens. > >> So it seems unsafe to remove them in that case. > > > > I'm not sure I'm getting the intention correctly, but I think it > > responsibility of the operator to provide a complete set of archived > > WAL files for a backup. Could you elaborate what operation steps are > > you assuming of? > > Please imagine the case where you need to do archive recovery > from the database snapshot taken while there are many WAL files > with .ready files. Those WAL files have not been archived yet. > In this case, ISTM those WAL files should not be removed until > they are archived, when archive_mode = on. If you rely on snapshot without pg_start/stop_backup, I agree. Theses WAL should be archived if: * archive_mode >= on for primary * archive_mode = always for standby > >> Therefore, IMO that the patch should change the code so that > >> no unarchived WAL files are removed not only in crash recovery > >> but also archive recovery. Thought? > > > > Agreed if "an unarchived WAL" means "a WAL file that is marked .ready" > > and it should be archived immediately. My previous mail is written > > based on the same thought. > > Ok, so our *current* consensus seems the followings. Right? > > - If archive_mode=off, any WAL files with .ready files are removed in > crash recovery, archive recoery and standby mode. yes > - If archive_mode=on, WAL files with .ready files are removed only in > standby mode. In crash recovery and archive recovery cases, they keep > remaining and would be archived after recovery finishes (i.e., during > normal processing). yes > - If archive_mode=always, in crash recovery, archive recovery and > standby mode, WAL files with .ready files are archived if WAL archiver > is running. yes > That is, WAL files with .ready files are removed when either > archive_mode!=always in standby mode or archive_mode=off. sounds fine to me. [...] > >>>>>> Another is to make the startup process remove .ready file if > >>>>>> necessary. > >>>>> > >>>>> I'm not sure to understand this one. > >> > >> I was thinking to make the startup process remove such unarchived WAL > >> files > >> if archive_mode=on and StandbyModeRequested/ArchiveRecoveryRequested > >> is true. Ok, understood. > > As mentioned above, I don't understand the point of preserving WAL > > files that are either marked as .ready or not marked at all on a > > standby with archive_mode=on. > > Maybe yes. But I'm not confident about that there is no such case. Well, it seems to me that this is what you suggested few paragraph away: «.ready files are removed when either archive_mode!=always in standby mode»
On 2020/04/02 22:49, Jehan-Guillaume de Rorthais wrote: > On Thu, 2 Apr 2020 19:38:59 +0900 > Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > >> On 2020/04/02 16:23, Kyotaro Horiguchi wrote: >>> At Thu, 2 Apr 2020 14:19:15 +0900, Fujii Masao >>> <masao.fujii@oss.nttdata.com> wrote in > [...] >>>> is whether to remove such WAL files in archive recovery case with >>>> archive_mode=on. Those WAL files would be required when recovering >>>> from the backup taken before that archive recovery happens. >>>> So it seems unsafe to remove them in that case. >>> >>> I'm not sure I'm getting the intention correctly, but I think it >>> responsibility of the operator to provide a complete set of archived >>> WAL files for a backup. Could you elaborate what operation steps are >>> you assuming of? >> >> Please imagine the case where you need to do archive recovery >> from the database snapshot taken while there are many WAL files >> with .ready files. Those WAL files have not been archived yet. >> In this case, ISTM those WAL files should not be removed until >> they are archived, when archive_mode = on. > > If you rely on snapshot without pg_start/stop_backup, I agree. Theses WAL > should be archived if: > > * archive_mode >= on for primary > * archive_mode = always for standby > >>>> Therefore, IMO that the patch should change the code so that >>>> no unarchived WAL files are removed not only in crash recovery >>>> but also archive recovery. Thought? >>> >>> Agreed if "an unarchived WAL" means "a WAL file that is marked .ready" >>> and it should be archived immediately. My previous mail is written >>> based on the same thought. >> >> Ok, so our *current* consensus seems the followings. Right? >> >> - If archive_mode=off, any WAL files with .ready files are removed in >> crash recovery, archive recoery and standby mode. > > yes > >> - If archive_mode=on, WAL files with .ready files are removed only in >> standby mode. In crash recovery and archive recovery cases, they keep >> remaining and would be archived after recovery finishes (i.e., during >> normal processing). > > yes > >> - If archive_mode=always, in crash recovery, archive recovery and >> standby mode, WAL files with .ready files are archived if WAL archiver >> is running. > > yes > >> That is, WAL files with .ready files are removed when either >> archive_mode!=always in standby mode or archive_mode=off. > > sounds fine to me. > > [...] >>>>>>>> Another is to make the startup process remove .ready file if >>>>>>>> necessary. >>>>>>> >>>>>>> I'm not sure to understand this one. >>>> >>>> I was thinking to make the startup process remove such unarchived WAL >>>> files >>>> if archive_mode=on and StandbyModeRequested/ArchiveRecoveryRequested >>>> is true. > > Ok, understood. > >>> As mentioned above, I don't understand the point of preserving WAL >>> files that are either marked as .ready or not marked at all on a >>> standby with archive_mode=on. >> >> Maybe yes. But I'm not confident about that there is no such case. > > Well, it seems to me that this is what you suggested few paragraph away: > > «.ready files are removed when either archive_mode!=always in standby mode» Yes, so I'm fine with that as the first consensus because the behavior is obviously better than the current one. *If* the case where no WAL files should be removed is found, I'd just like to propose the additional patch. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2020/04/02 22:02, Jehan-Guillaume de Rorthais wrote: > On Thu, 02 Apr 2020 13:07:34 +0900 (JST) > Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > >> Sorry, it was quite ambiguous. >> >> At Thu, 02 Apr 2020 13:04:43 +0900 (JST), Kyotaro Horiguchi >> <horikyota.ntt@gmail.com> wrote in >>> At Wed, 1 Apr 2020 18:17:35 +0200, Jehan-Guillaume de Rorthais >>> <jgdr@dalibo.com> wrote in >>>> Please, find in attachment a patch implementing this. >>> >>> The patch partially reintroduces the issue the patch have >>> fixed. Specifically a standby running a crash recovery wrongly marks a >>> WAL file as ".ready" if it is extant in pg_wal without accompanied by >>> .ready file. >> >> The patch partially reintroduces the issue the commit 78ea8b5daa have >> fixed. Specifically a standby running a crash recovery wrongly marks a >> WAL file as ".ready" if it is extant in pg_wal without accompanied by >> .ready file. > > As far as I understand StartupXLOG(), NOT_IN_RECOVERY and IN_CRASH_RECOVERY are > only set for production clusters, not standby ones. DB_IN_CRASH_RECOVERY can be set even in standby mode. For example, if you start the standby from the cold backup of the primary, since InArchiveRecovery is false at the beginning of the recovery, DB_IN_CRASH_RECOVERY is set in that moment. But then after all the valid WAL in pg_wal have been replayed, InArchiveRecovery is set to true and DB_IN_ARCHIVE_RECOVERY is set. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: [BUG] non archived WAL removed during production crash recovery
From
Jehan-Guillaume de Rorthais
Date:
On Thu, 2 Apr 2020 23:58:00 +0900 Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > On 2020/04/02 22:02, Jehan-Guillaume de Rorthais wrote: > > On Thu, 02 Apr 2020 13:07:34 +0900 (JST) > > Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > > >> Sorry, it was quite ambiguous. > >> > >> At Thu, 02 Apr 2020 13:04:43 +0900 (JST), Kyotaro Horiguchi > >> <horikyota.ntt@gmail.com> wrote in > >>> At Wed, 1 Apr 2020 18:17:35 +0200, Jehan-Guillaume de Rorthais > >>> <jgdr@dalibo.com> wrote in > >>>> Please, find in attachment a patch implementing this. > >>> > >>> The patch partially reintroduces the issue the patch have > >>> fixed. Specifically a standby running a crash recovery wrongly marks a > >>> WAL file as ".ready" if it is extant in pg_wal without accompanied by > >>> .ready file. > >> > >> The patch partially reintroduces the issue the commit 78ea8b5daa have > >> fixed. Specifically a standby running a crash recovery wrongly marks a > >> WAL file as ".ready" if it is extant in pg_wal without accompanied by > >> .ready file. > > > > As far as I understand StartupXLOG(), NOT_IN_RECOVERY and IN_CRASH_RECOVERY > > are only set for production clusters, not standby ones. > > DB_IN_CRASH_RECOVERY can be set even in standby mode. For example, > if you start the standby from the cold backup of the primary, In cold backup? Then ControlFile->state == DB_SHUTDOWNED, right? Unless I'm wrong, this should be catched by: if (ArchiveRecoveryRequested && ( [...] || ControlFile->state == DB_SHUTDOWNED)) { InArchiveRecovery = true; if (StandbyModeRequested) StandbyMode = true; } With InArchiveRecovery=true, we later set DB_IN_ARCHIVE_RECOVERY instead of DB_IN_CRASH_RECOVERY. > since InArchiveRecovery is false at the beginning of the recovery, > DB_IN_CRASH_RECOVERY is set in that moment. But then after all the valid > WAL in pg_wal have been replayed, InArchiveRecovery is set to true and > DB_IN_ARCHIVE_RECOVERY is set. However, I suppose this is true if you restore a backup from a snapshot without backup_label, right?
Re: [BUG] non archived WAL removed during production crash recovery
From
Jehan-Guillaume de Rorthais
Date:
On Thu, 2 Apr 2020 23:55:46 +0900 Fujii Masao <masao.fujii@oss.nttdata.com> wrote: [...] > > Well, it seems to me that this is what you suggested few paragraph away: > > > > «.ready files are removed when either archive_mode!=always in standby > > mode» > > Yes, so I'm fine with that as the first consensus because the behavior > is obviously better than the current one. *If* the case where no WAL files > should be removed is found, I'd just like to propose the additional patch. Do you mean to want to produce the next patch yourself?
At Thu, 2 Apr 2020 17:44:50 +0200, Jehan-Guillaume de Rorthais <jgdr@dalibo.com> wrote in > On Thu, 2 Apr 2020 23:55:46 +0900 > Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > [...] > > > Well, it seems to me that this is what you suggested few paragraph away: > > > > > > «.ready files are removed when either archive_mode!=always in standby > > > mode» > > > > Yes, so I'm fine with that as the first consensus because the behavior > > is obviously better than the current one. *If* the case where no WAL files > > should be removed is found, I'd just like to propose the additional patch. > > Do you mean to want to produce the next patch yourself? No. Fujii-san is saying that he will address it, if the fix made in this thread is found to be imperfect later. He suspects that WAL files should be preserved at least in certain cases even if the file persists forever, but the consensus here is to remove files under certain conditions so as not to no WAL file persists in pg_wal directory. Feel free to propose the next version! regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On 2020/04/03 0:37, Jehan-Guillaume de Rorthais wrote: > On Thu, 2 Apr 2020 23:58:00 +0900 > Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > >> On 2020/04/02 22:02, Jehan-Guillaume de Rorthais wrote: >>> On Thu, 02 Apr 2020 13:07:34 +0900 (JST) >>> Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: >>> >>>> Sorry, it was quite ambiguous. >>>> >>>> At Thu, 02 Apr 2020 13:04:43 +0900 (JST), Kyotaro Horiguchi >>>> <horikyota.ntt@gmail.com> wrote in >>>>> At Wed, 1 Apr 2020 18:17:35 +0200, Jehan-Guillaume de Rorthais >>>>> <jgdr@dalibo.com> wrote in >>>>>> Please, find in attachment a patch implementing this. >>>>> >>>>> The patch partially reintroduces the issue the patch have >>>>> fixed. Specifically a standby running a crash recovery wrongly marks a >>>>> WAL file as ".ready" if it is extant in pg_wal without accompanied by >>>>> .ready file. >>>> >>>> The patch partially reintroduces the issue the commit 78ea8b5daa have >>>> fixed. Specifically a standby running a crash recovery wrongly marks a >>>> WAL file as ".ready" if it is extant in pg_wal without accompanied by >>>> .ready file. >>> >>> As far as I understand StartupXLOG(), NOT_IN_RECOVERY and IN_CRASH_RECOVERY >>> are only set for production clusters, not standby ones. >> >> DB_IN_CRASH_RECOVERY can be set even in standby mode. For example, >> if you start the standby from the cold backup of the primary, > > In cold backup? Then ControlFile->state == DB_SHUTDOWNED, right? > > Unless I'm wrong, this should be catched by: > > if (ArchiveRecoveryRequested && ( [...] || > ControlFile->state == DB_SHUTDOWNED)) > { > InArchiveRecovery = true; > if (StandbyModeRequested) > StandbyMode = true; > } > > With InArchiveRecovery=true, we later set DB_IN_ARCHIVE_RECOVERY instead of > DB_IN_CRASH_RECOVERY. Yes, you're right. So I had to mention one more condition in my previous email. The condition is that the cold backup was taken from the server that was shutdowned with immdiate mode. In this case, the code block that you pointed is skipped and InArchiveRecovery is not set to true there. >> since InArchiveRecovery is false at the beginning of the recovery, >> DB_IN_CRASH_RECOVERY is set in that moment. But then after all the valid >> WAL in pg_wal have been replayed, InArchiveRecovery is set to true and >> DB_IN_ARCHIVE_RECOVERY is set. > > However, I suppose this is true if you restore a backup from a snapshot > without backup_label, right? Maybe yes. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2020/04/03 10:14, Kyotaro Horiguchi wrote: > At Thu, 2 Apr 2020 17:44:50 +0200, Jehan-Guillaume de Rorthais <jgdr@dalibo.com> wrote in >> On Thu, 2 Apr 2020 23:55:46 +0900 >> Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >> >> [...] >>>> Well, it seems to me that this is what you suggested few paragraph away: >>>> >>>> «.ready files are removed when either archive_mode!=always in standby >>>> mode» >>> >>> Yes, so I'm fine with that as the first consensus because the behavior >>> is obviously better than the current one. *If* the case where no WAL files >>> should be removed is found, I'd just like to propose the additional patch. >> >> Do you mean to want to produce the next patch yourself? > > No. Fujii-san is saying that he will address it, if the fix made in > this thread is found to be imperfect later. > > He suspects that WAL files should be preserved at least in certain > cases even if the file persists forever, but the consensus here is to > remove files under certain conditions so as not to no WAL file > persists in pg_wal directory. Yes. > Feel free to propose the next version! Yes! BTW, now I'm thinking that the flag in shmem should be updated when the startup process sets StandbyModeRequested to true at the beginning of the recovery. That is, - Add something like SharedStandbyModeRequested into XLogCtl. This field should be initialized with false; - Set XLogCtl->SharedStandbyModeRequested to true when the startup process detects the standby.signal file and sets the local variable StandbyModeRequested to true. - Make XLogArchiveCheckDone() use XLogCtl->SharedStandbyModeRequested to know whether the server is in standby mode or not. Thought? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: [BUG] non archived WAL removed during production crash recovery
From
Jehan-Guillaume de Rorthais
Date:
On Thu, 2 Apr 2020 23:55:46 +0900 Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > On 2020/04/02 22:49, Jehan-Guillaume de Rorthais wrote: >> On Thu, 2 Apr 2020 19:38:59 +0900 >> Fujii Masao <masao.fujii@oss.nttdata.com> wrote: [...] >>> That is, WAL files with .ready files are removed when either >>> archive_mode!=always in standby mode or archive_mode=off. >> >> sounds fine to me. To some extends, it appears to me this sentence was relatively close to my previous patch, as far as you exclude IN_CRASH_RECOVERY from the shortcut: XLogArchiveCheckDone(const char *xlog) { [...] if ( (inRecoveryState != IN_CRASH_RECOVERY) && ( (inRecoveryState == NOT_IN_RECOVERY && !XLogArchivingActive()) || (inRecoveryState == IN_ARCHIVE_RECOVERY && !XLogArchivingAlways()))) return true; Which means that only .done cleanup would occurs during CRASH_RECOVERY and .ready files might be created if no .done exists. No matter the futur status of the cluster: primary or standby. Normal shortcut will apply during first checkpoint after the crash recovery step. This should handle the case where a backup without backup_label (taken from a snapshot or after a shutdown with immediate) is restored to build a standby. Please, find in attachment a third version of my patch "0001-v3-Fix-WAL-retention-during-production-crash-recovery.patch". "0002-v1-Add-test-on-non-archived-WAL-during-crash-recovery.patch" is left untouched. But I'm considering adding some more tests relative to this discussion. > BTW, now I'm thinking that the flag in shmem should be updated when > the startup process sets StandbyModeRequested to true at the beginning > of the recovery. That is, > > - Add something like SharedStandbyModeRequested into XLogCtl. This field > should be initialized with false; > - Set XLogCtl->SharedStandbyModeRequested to true when the startup > process detects the standby.signal file and sets the local variable > StandbyModeRequested to true. > - Make XLogArchiveCheckDone() use XLogCtl->SharedStandbyModeRequested > to know whether the server is in standby mode or not. > > Thought? I try to avoid a new flag in memory with the proposal in attachment of this email. It seems to me various combinaisons of booleans with subtle differences around the same subject makes it a bit trappy and complicated to understand. If my proposal is rejected, I'll be happy to volunteer to add XLogCtl->SharedStandbyModeRequested though. It seems like a simple enough fix as well. Regards,
Attachment
On 2020/04/04 1:26, Jehan-Guillaume de Rorthais wrote: > On Thu, 2 Apr 2020 23:55:46 +0900 > Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > >> On 2020/04/02 22:49, Jehan-Guillaume de Rorthais wrote: >>> On Thu, 2 Apr 2020 19:38:59 +0900 >>> Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > [...] >>>> That is, WAL files with .ready files are removed when either >>>> archive_mode!=always in standby mode or archive_mode=off. >>> >>> sounds fine to me. > > To some extends, it appears to me this sentence was relatively close to my > previous patch, as far as you exclude IN_CRASH_RECOVERY from the shortcut: > > XLogArchiveCheckDone(const char *xlog) > { > [...] > if ( (inRecoveryState != IN_CRASH_RECOVERY) && ( > (inRecoveryState == NOT_IN_RECOVERY && !XLogArchivingActive()) || > (inRecoveryState == IN_ARCHIVE_RECOVERY && !XLogArchivingAlways()))) > return true; > > Which means that only .done cleanup would occurs during CRASH_RECOVERY > and .ready files might be created if no .done exists. No matter the futur status > of the cluster: primary or standby. Normal shortcut will apply during first > checkpoint after the crash recovery step. > > This should handle the case where a backup without backup_label (taken from a > snapshot or after a shutdown with immediate) is restored to build a standby. > > Please, find in attachment a third version of my patch > "0001-v3-Fix-WAL-retention-during-production-crash-recovery.patch". Thanks for updating the patch! Here are my review comments: - bool SharedRecoveryInProgress; + RecoveryState SharedRecoveryInProgress; Since the patch changes the meaning of this variable, the name of the variable should be changed? Otherwise, the current name seems confusing. + SpinLockAcquire(&XLogCtl->info_lck); + XLogCtl->SharedRecoveryInProgress = IN_CRASH_RECOVERY; + SpinLockRelease(&XLogCtl->info_lck); As I explained upthread, this code can be reached and IN_CRASH_RECOVERY can be set even in standby or archive recovery. Is this right behavior that you're expecting? Even in crash recovery case, GetRecoveryState() returns IN_ARCHIVE_RECOVERY until this code is reached. Also when WAL replay is not necessary (e.g., restart of the server shutdowed cleanly before), GetRecoveryState() returns IN_ARCHIVE_RECOVERY because this code is not reached. Aren't these fragile? If XLogArchiveCheckDone() is only user of GetRecoveryState(), they would be ok. But if another user will appear in the future, it seems very easy to mistake. At least those behaviors should be commented in GetRecoveryState(). - if (!((XLogArchivingActive() && !inRecovery) || - (XLogArchivingAlways() && inRecovery))) + if ( (inRecoveryState != IN_CRASH_RECOVERY) && ( + (inRecoveryState == NOT_IN_RECOVERY && !XLogArchivingActive()) && + (inRecoveryState == IN_ARCHIVE_RECOVERY && !XLogArchivingAlways()))) return true; The last condition seems to cause XLogArchiveCheckDone() to return true in archive recovery mode with archive_mode=on, then cause unarchived WAL files with .ready to be removed. Is my understanding right? If yes, that behavior doesn't seem to match with our consensus, i.e., WAL files with .ready should not be removed in that case. +/* Recovery state */ +typedef enum RecoveryState +{ + NOT_IN_RECOVERY = 0, + IN_CRASH_RECOVERY, + IN_ARCHIVE_RECOVERY +} RecoveryState; Isn't it better to add more comments here? For example, what does "Recovery state" mean? Which state is used in standby mode? Why? ,etc. Is it really ok not to have the value indicating standby mode? These enum values names are confusing because the variables with similar names already exist. For example, IN_CRASH_RECOVERY vs. DB_IN_CRASH_RECOVERY. So IMO it seems better to rename them, .e.g., by adding the prefix. > > "0002-v1-Add-test-on-non-archived-WAL-during-crash-recovery.patch" is left > untouched. But I'm considering adding some more tests relative to this > discussion. > >> BTW, now I'm thinking that the flag in shmem should be updated when >> the startup process sets StandbyModeRequested to true at the beginning >> of the recovery. That is, >> >> - Add something like SharedStandbyModeRequested into XLogCtl. This field >> should be initialized with false; >> - Set XLogCtl->SharedStandbyModeRequested to true when the startup >> process detects the standby.signal file and sets the local variable >> StandbyModeRequested to true. >> - Make XLogArchiveCheckDone() use XLogCtl->SharedStandbyModeRequested >> to know whether the server is in standby mode or not. >> >> Thought? > > I try to avoid a new flag in memory with the proposal in attachment of this > email. It seems to me various combinaisons of booleans with subtle differences > around the same subject makes it a bit trappy and complicated to understand. Ok, so firstly I try to review your patch! Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: [BUG] non archived WAL removed during production crash recovery
From
Jehan-Guillaume de Rorthais
Date:
On Sat, 4 Apr 2020 02:49:50 +0900 Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > On 2020/04/04 1:26, Jehan-Guillaume de Rorthais wrote: > > On Thu, 2 Apr 2020 23:55:46 +0900 > > Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > > >> On 2020/04/02 22:49, Jehan-Guillaume de Rorthais wrote: > >>> On Thu, 2 Apr 2020 19:38:59 +0900 > >>> Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > [...] > >>>> That is, WAL files with .ready files are removed when either > >>>> archive_mode!=always in standby mode or archive_mode=off. > >>> > >>> sounds fine to me. > > > > To some extends, it appears to me this sentence was relatively close to my > > previous patch, as far as you exclude IN_CRASH_RECOVERY from the shortcut: > > > > XLogArchiveCheckDone(const char *xlog) > > { > > [...] > > if ( (inRecoveryState != IN_CRASH_RECOVERY) && ( > > (inRecoveryState == NOT_IN_RECOVERY && !XLogArchivingActive()) || > > (inRecoveryState == IN_ARCHIVE_RECOVERY > > && !XLogArchivingAlways()))) return true; > > > > Which means that only .done cleanup would occurs during CRASH_RECOVERY > > and .ready files might be created if no .done exists. No matter the futur > > status of the cluster: primary or standby. Normal shortcut will apply > > during first checkpoint after the crash recovery step. > > > > This should handle the case where a backup without backup_label (taken from > > a snapshot or after a shutdown with immediate) is restored to build a > > standby. > > > > Please, find in attachment a third version of my patch > > "0001-v3-Fix-WAL-retention-during-production-crash-recovery.patch". > > Thanks for updating the patch! Here are my review comments: > > - bool SharedRecoveryInProgress; > + RecoveryState SharedRecoveryInProgress; > > Since the patch changes the meaning of this variable, the name of > the variable should be changed? Otherwise, the current name seems > confusing. Indeed, fixed using SharedRecoveryState > + SpinLockAcquire(&XLogCtl->info_lck); > + XLogCtl->SharedRecoveryInProgress = > IN_CRASH_RECOVERY; > + SpinLockRelease(&XLogCtl->info_lck); > > As I explained upthread, this code can be reached and IN_CRASH_RECOVERY > can be set even in standby or archive recovery. Is this right behavior that > you're expecting? Yes. This patch avoids archive cleanup during crash recovery altogether, whatever the requested status for the cluster. > Even in crash recovery case, GetRecoveryState() returns IN_ARCHIVE_RECOVERY > until this code is reached. I tried to stick as close as possible with "ControlFile->state" and old XLogCtl->SharedRecoveryInProgress variables. That's why it's initialized as IN_ARCHIVE_RECOVERY as XLogCtl->SharedRecoveryInProgress was init as true as well. The status itself is set during StartupXLOG when the historical code actually tries to define and record the real state between DB_IN_ARCHIVE_RECOVERY and DB_IN_CRASH_RECOVERY. > Also when WAL replay is not necessary (e.g., restart of the server shutdowed > cleanly before), GetRecoveryState() returns IN_ARCHIVE_RECOVERY because this > code is not reached. It is set to NOT_IN_RECOVERY at the end of StartupXLOG, in the same place we set ControlFile->state = DB_IN_PRODUCTION. So GetRecoveryState() returns NOT_IN_RECOVERY as soon as StartupXLOG is done when no WAL replay is necessary. > Aren't these fragile? If XLogArchiveCheckDone() is only user of > GetRecoveryState(), they would be ok. But if another user will appear in the > future, it seems very easy to mistake. At least those behaviors should be > commented in GetRecoveryState(). We certainly can set SharedRecoveryState earlier, in XLOGShmemInit, based on the ControlFile->state value. In my understanding, anything different than DB_SHUTDOWNED or DB_SHUTDOWNED_IN_RECOVERY can be considered as a crash recovery. Based on this XLOGShmemInit can init SharedRecoveryState to IN_ARCHIVE_RECOVERY or IN_CRASH_RECOVERY. With either values, RecoveryInProgress() keep returning the same result so any current code relying on RecoveryInProgress() is compatible. I'm not sure who would need this information before the WAL machinery is up, but is it safe enough in your opinion for futur usage of GetRecoveryState()? Do you think this information might be useful before the WAL machinery is set? Current "user" (eg. restoreTwoPhaseData()) only need to know if we are in recovery, whatever the reason. The patch in attachment set SharedRecoveryState to either IN_ARCHIVE_RECOVERY or IN_CRASH_RECOVERY from XLOGShmemInit based on the ControlFile state. It feels strange though to set this so far away from ControlFile->state = DB_IN_CRASH_RECOVERY... > - if (!((XLogArchivingActive() && !inRecovery) || > - (XLogArchivingAlways() && inRecovery))) > + if ( (inRecoveryState != IN_CRASH_RECOVERY) && ( > + (inRecoveryState == NOT_IN_RECOVERY > && !XLogArchivingActive()) && > + (inRecoveryState == IN_ARCHIVE_RECOVERY > && !XLogArchivingAlways()))) return true; > > The last condition seems to cause XLogArchiveCheckDone() to return > true in archive recovery mode with archive_mode=on, then cause > unarchived WAL files with .ready to be removed. Is my understanding right? > If yes, that behavior doesn't seem to match with our consensus, i.e., > WAL files with .ready should not be removed in that case. We wrote: >> That is, WAL files with .ready files are removed when either >> archive_mode!=always in standby mode or archive_mode=off. > > sounds fine to me. So if in standby mode and archive_mode is not "always", the .ready files should be removed. In current patch, I split the conditions in the sake of clarity. > +/* Recovery state */ > +typedef enum RecoveryState > +{ > + NOT_IN_RECOVERY = 0, > + IN_CRASH_RECOVERY, > + IN_ARCHIVE_RECOVERY > +} RecoveryState; > > Isn't it better to add more comments here? For example, what does > "Recovery state" mean? Which state is used in standby mode? Why? ,etc. Explanations added. > Is it really ok not to have the value indicating standby mode? Unless I'm wrong, this shared state only indicates why we are recovering WAL, it does not need reflect the status of the instance in shared memory. StandbyMode is already available for the startup process. Would it be useful to share this mode outside of the startup process? > These enum values names are confusing because the variables with > similar names already exist. For example, IN_CRASH_RECOVERY vs. > DB_IN_CRASH_RECOVERY. So IMO it seems better to rename them, > .e.g., by adding the prefix. Well, I lack of imagination. So I picked CRASH_RECOVERING and ARCHIVE_RECOVERING. > > "0002-v1-Add-test-on-non-archived-WAL-during-crash-recovery.patch" is left > > untouched. But I'm considering adding some more tests relative to this > > discussion. > > > >> BTW, now I'm thinking that the flag in shmem should be updated when > >> the startup process sets StandbyModeRequested to true at the beginning > >> of the recovery. That is, > >> > >> - Add something like SharedStandbyModeRequested into XLogCtl. This field > >> should be initialized with false; > >> - Set XLogCtl->SharedStandbyModeRequested to true when the startup > >> process detects the standby.signal file and sets the local variable > >> StandbyModeRequested to true. > >> - Make XLogArchiveCheckDone() use XLogCtl->SharedStandbyModeRequested > >> to know whether the server is in standby mode or not. > >> > >> Thought? > > > > I try to avoid a new flag in memory with the proposal in attachment of this > > email. It seems to me various combinaisons of booleans with subtle > > differences around the same subject makes it a bit trappy and complicated > > to understand. > > Ok, so firstly I try to review your patch! Thank you for your review and help! In attachment: * fix proposal: 0001-v4-Fix-WAL-retention-during-production-crash-recovery.patch * add test: 0002-v2-Add-test-on-non-archived-WAL-during-crash-recovery.patch 0002-v2 fix conflict with current master. Regards,
Attachment
On Tue, Apr 07, 2020 at 05:17:36PM +0200, Jehan-Guillaume de Rorthais wrote: > I'm not sure who would need this information before the WAL machinery is up, > but is it safe enough in your opinion for futur usage of GetRecoveryState()? Do > you think this information might be useful before the WAL machinery is set? > Current "user" (eg. restoreTwoPhaseData()) only need to know if we are in > recovery, whatever the reason. (I had this thread marked as something to look at, but could not. Sorry for the delay). > src/test/recovery/t/011_crash_recovery.pl | 16 ++++++++++++++-- > 1 file changed, 14 insertions(+), 2 deletions(-) > > diff --git a/src/test/recovery/t/011_crash_recovery.pl b/src/test/recovery/t/011_crash_recovery.pl > index ca6e92b50d..ce2e899891 100644 > --- a/src/test/recovery/t/011_crash_recovery.pl > +++ b/src/test/recovery/t/011_crash_recovery.pl > @@ -15,11 +15,17 @@ if ($Config{osname} eq 'MSWin32') May I ask why this new test is added to 011_crash_recovery.pl which is aimed at testing crash and redo, while we have 002_archiving.pl that is dedicated to archiving in a more general manner? -- Michael
Attachment
At Tue, 7 Apr 2020 17:17:36 +0200, Jehan-Guillaume de Rorthais <jgdr@dalibo.com> wrote in > > +/* Recovery state */ > > +typedef enum RecoveryState > > +{ > > + NOT_IN_RECOVERY = 0, > > + IN_CRASH_RECOVERY, > > + IN_ARCHIVE_RECOVERY > > +} RecoveryState; I'm not sure the complexity is required here. Are we asuume that archive_mode can be changed before restarting? At Thu, 2 Apr 2020 15:49:15 +0200, Jehan-Guillaume de Rorthais <jgdr@dalibo.com> wrote in > > Ok, so our *current* consensus seems the followings. Right? > > > > - If archive_mode=off, any WAL files with .ready files are removed in > > crash recovery, archive recoery and standby mode. > > yes If archive_mode = off no WAL files are marked as ".ready". > > - If archive_mode=on, WAL files with .ready files are removed only in > > standby mode. In crash recovery and archive recovery cases, they keep > > remaining and would be archived after recovery finishes (i.e., during > > normal processing). > > yes > > > - If archive_mode=always, in crash recovery, archive recovery and > > standby mode, WAL files with .ready files are archived if WAL archiver > > is running. > > yes So if we assume archive_mode won't be changed after a crash before restarting, if archive_mode = on on a standy, WAL files are not marked as ".ready". If it is "always", WAL files that are to be archived are marked as ".ready". Finally, the condition reduces to: If archiver is running, archive ".ready" files. Otherwise ignore ".ready" and just remove WAL files after use. > > > That is, WAL files with .ready files are removed when either > > archive_mode!=always in standby mode or archive_mode=off. > > sounds fine to me. That situation implies that archive_mode has been changed. Can we guarantee the completeness of the archive in the case? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: [BUG] non archived WAL removed during production crash recovery
From
Jehan-Guillaume de Rorthais
Date:
On Wed, 8 Apr 2020 11:23:45 +0900 Michael Paquier <michael@paquier.xyz> wrote: > On Tue, Apr 07, 2020 at 05:17:36PM +0200, Jehan-Guillaume de Rorthais wrote: > > I'm not sure who would need this information before the WAL machinery is up, > > but is it safe enough in your opinion for futur usage of > > GetRecoveryState()? Do you think this information might be useful before > > the WAL machinery is set? Current "user" (eg. restoreTwoPhaseData()) only > > need to know if we are in recovery, whatever the reason. > > (I had this thread marked as something to look at, but could not. > Sorry for the delay). (no worries :)) > > src/test/recovery/t/011_crash_recovery.pl | 16 ++++++++++++++-- > > 1 file changed, 14 insertions(+), 2 deletions(-) > > > > diff --git a/src/test/recovery/t/011_crash_recovery.pl > > b/src/test/recovery/t/011_crash_recovery.pl index ca6e92b50d..ce2e899891 > > 100644 --- a/src/test/recovery/t/011_crash_recovery.pl > > +++ b/src/test/recovery/t/011_crash_recovery.pl > > @@ -15,11 +15,17 @@ if ($Config{osname} eq 'MSWin32') > > May I ask why this new test is added to 011_crash_recovery.pl which is > aimed at testing crash and redo, while we have 002_archiving.pl that > is dedicated to archiving in a more general manner? I thought it was a better place because the test happen during crash recovery. In the meantime, while working on other tests related to $SUBJECT and the current consensus, I was wondering if a new file would be a better place anyway.
Re: [BUG] non archived WAL removed during production crash recovery
From
Jehan-Guillaume de Rorthais
Date:
On Wed, 08 Apr 2020 17:39:09 +0900 (JST) Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > At Tue, 7 Apr 2020 17:17:36 +0200, Jehan-Guillaume de Rorthais > <jgdr@dalibo.com> wrote in > > > +/* Recovery state */ > > > +typedef enum RecoveryState > > > +{ > > > + NOT_IN_RECOVERY = 0, > > > + IN_CRASH_RECOVERY, > > > + IN_ARCHIVE_RECOVERY > > > +} RecoveryState; > > I'm not sure the complexity is required here. Are we asuume that > archive_mode can be changed before restarting? I assume it can yes. Eg., one can restore a PITR backup as a standby and change the value of archive_mode to either off, on or always. > At Thu, 2 Apr 2020 15:49:15 +0200, Jehan-Guillaume de Rorthais > <jgdr@dalibo.com> wrote in > > > Ok, so our *current* consensus seems the followings. Right? > > > > > > - If archive_mode=off, any WAL files with .ready files are removed in > > > crash recovery, archive recoery and standby mode. > > > > yes > > If archive_mode = off no WAL files are marked as ".ready". Sure, on the primary side. What if you build a standby from a backup with archive_mode=on with some .ready files in there? > > > - If archive_mode=on, WAL files with .ready files are removed only in > > > standby mode. In crash recovery and archive recovery cases, they keep > > > remaining and would be archived after recovery finishes (i.e., during > > > normal processing). > > > > yes > > > > > - If archive_mode=always, in crash recovery, archive recovery and > > > standby mode, WAL files with .ready files are archived if WAL archiver > > > is running. > > > > yes > > So if we assume archive_mode won't be changed after a crash before > restarting, if archive_mode = on on a standy, WAL files are not marked > as ".ready". .ready files can be inherited from the old primary when building the standby, depending on the method. See previous explanations from Fujii-san: https://www.postgresql.org/message-id/flat/ca964b3a-61a0-902e-c7b3-3abbc01a921f%40oss.nttdata.com#ddd6cbad6c5e576e2e1ae53868ca3eea > If it is "always", WAL files that are to be archived are > marked as ".ready". Finally, the condition reduces to: > > If archiver is running, archive ".ready" files. Otherwise ignore > ".ready" and just remove WAL files after use. > > > > > That is, WAL files with .ready files are removed when either > > > archive_mode!=always in standby mode or archive_mode=off. > > > > sounds fine to me. > > That situation implies that archive_mode has been changed. Why? archive_mode may have been "always" on the primary when eg. a snapshot has been created. Regards,
Hello, Jehan. At Wed, 8 Apr 2020 15:26:03 +0200, Jehan-Guillaume de Rorthais <jgdr@dalibo.com> wrote in > On Wed, 08 Apr 2020 17:39:09 +0900 (JST) > Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > > At Tue, 7 Apr 2020 17:17:36 +0200, Jehan-Guillaume de Rorthais > > <jgdr@dalibo.com> wrote in > > > > +/* Recovery state */ > > > > +typedef enum RecoveryState > > > > +{ > > > > + NOT_IN_RECOVERY = 0, > > > > + IN_CRASH_RECOVERY, > > > > + IN_ARCHIVE_RECOVERY > > > > +} RecoveryState; > > > > I'm not sure the complexity is required here. Are we asuume that > > archive_mode can be changed before restarting? > > I assume it can yes. Eg., one can restore a PITR backup as a standby and change > the value of archive_mode to either off, on or always. Thanks. I was confused. The original issue was restarted master can miss files in archive. To fix that, it's sufficient not ignoring .ready. It is more than that. > > At Thu, 2 Apr 2020 15:49:15 +0200, Jehan-Guillaume de Rorthais > > <jgdr@dalibo.com> wrote in > > > > Ok, so our *current* consensus seems the followings. Right? > > > > > > > > - If archive_mode=off, any WAL files with .ready files are removed in > > > > crash recovery, archive recoery and standby mode. > > > > > > yes > > > > If archive_mode = off no WAL files are marked as ".ready". > > Sure, on the primary side. > > What if you build a standby from a backup with archive_mode=on with > some .ready files in there? Well. Backup doesn't have nothing in archive_status directory if it is taken by pg_basebackup. If the backup is created other way, it can have some (as Fujii-san mentioned). Master with archive_mode != off and standby with archive_mode=always should archive WAL files that are not marked .done, but standby with archive_mode == on should not. The commit intended that but the mistake here is it thinks that inRecovery represents whether it is running as a standby or not, but actually it is true on primary during crash recovery. On the other hand, with the patch, standby with archive_mode=on wrongly archives WAL files during crash recovery. What we should check there is, as the commit was intended, not whether it is under crash or archive recovery, but whether it is running as primary or standby. > > If it is "always", WAL files that are to be archived are > > marked as ".ready". Finally, the condition reduces to: > > > > If archiver is running, archive ".ready" files. Otherwise ignore > > ".ready" and just remove WAL files after use. > > > > > > > That is, WAL files with .ready files are removed when either > > > > archive_mode!=always in standby mode or archive_mode=off. > > > > > > sounds fine to me. > > > > That situation implies that archive_mode has been changed. > > Why? archive_mode may have been "always" on the primary when eg. a snapshot has > been created. .ready files are created only when archive_mode != off. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: [BUG] non archived WAL removed during production crash recovery
From
Jehan-Guillaume de Rorthais
Date:
On Thu, 09 Apr 2020 11:26:57 +0900 (JST) Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: [...] > > > At Thu, 2 Apr 2020 15:49:15 +0200, Jehan-Guillaume de Rorthais > > > <jgdr@dalibo.com> wrote in > > > > > Ok, so our *current* consensus seems the followings. Right? > > > > > > > > > > - If archive_mode=off, any WAL files with .ready files are removed in > > > > > crash recovery, archive recoery and standby mode. > > > > > > > > yes > > > > > > If archive_mode = off no WAL files are marked as ".ready". > > > > Sure, on the primary side. > > > > What if you build a standby from a backup with archive_mode=on with > > some .ready files in there? > > Well. Backup doesn't have nothing in archive_status directory if it is > taken by pg_basebackup. If the backup is created other way, it can > have some (as Fujii-san mentioned). Master with archive_mode != off > and standby with archive_mode=always should archive WAL files that are > not marked .done, but standby with archive_mode == on should not. The > commit intended that Unless I'm wrong, the commit avoids creating .ready files on standby when a WAL has neither .done or .ready status file. > but the mistake here is it thinks that inRecovery represents whether it is > running as a standby or not, but actually it is true on primary during crash > recovery. Indeed. > On the other hand, with the patch, standby with archive_mode=on > wrongly archives WAL files during crash recovery. "without the patch" you mean? You are talking about 78ea8b5daab, right? > What we should check there is, as the commit was intended, not whether > it is under crash or archive recovery, but whether it is running as > primary or standby. Yes. > > > If it is "always", WAL files that are to be archived are > > > marked as ".ready". Finally, the condition reduces to: > > > > > > If archiver is running, archive ".ready" files. Otherwise ignore > > > ".ready" and just remove WAL files after use. > > > > > > > > > That is, WAL files with .ready files are removed when either > > > > > archive_mode!=always in standby mode or archive_mode=off. > > > > > > > > sounds fine to me. > > > > > > That situation implies that archive_mode has been changed. > > > > Why? archive_mode may have been "always" on the primary when eg. a snapshot > > has been created. > > .ready files are created only when archive_mode != off. Yes, on a primary, they are created when archive_mode > off. On standby, when archive_mode=always. If a primary had archive_mode=always, a standby created from its backup will still have archive_mode=always, with no changes. Maybe I miss your point, sorry. Regards,
Re: [BUG] non archived WAL removed during production crash recovery
From
Jehan-Guillaume de Rorthais
Date:
On Wed, 8 Apr 2020 13:58:30 +0200 Jehan-Guillaume de Rorthais <jgdr@dalibo.com> wrote: [...] > > May I ask why this new test is added to 011_crash_recovery.pl which is > > aimed at testing crash and redo, while we have 002_archiving.pl that > > is dedicated to archiving in a more general manner? > > I thought it was a better place because the test happen during crash recovery. > > In the meantime, while working on other tests related to $SUBJECT and the > current consensus, I was wondering if a new file would be a better place > anyway. So, 002_archiving.pl deals more with testing recovering on hot standby side than archiving. Maybe it could be renamed? While discussing this, I created a new file to add some more tests about WAL archiving and how recovery deal with them. Please, find the patch in attachment. I'll be able to move them elsewhere later, depending on the conclusions of this discussion. Regards,
Attachment
By the way, I haven't noticed that Cc: didn't contain -hackers. Added it. At Thu, 9 Apr 2020 11:35:12 +0200, Jehan-Guillaume de Rorthais <jgdr@dalibo.com> wrote in > On Thu, 09 Apr 2020 11:26:57 +0900 (JST) > Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > [...] > > > > At Thu, 2 Apr 2020 15:49:15 +0200, Jehan-Guillaume de Rorthais > > > > <jgdr@dalibo.com> wrote in > > > > > > Ok, so our *current* consensus seems the followings. Right? > > > > > > > > > > > > - If archive_mode=off, any WAL files with .ready files are removed in > > > > > > crash recovery, archive recoery and standby mode. > > > > > > > > > > yes > > > > > > > > If archive_mode = off no WAL files are marked as ".ready". > > > > > > Sure, on the primary side. > > > > > > What if you build a standby from a backup with archive_mode=on with > > > some .ready files in there? > > > > Well. Backup doesn't have nothing in archive_status directory if it is > > taken by pg_basebackup. If the backup is created other way, it can > > have some (as Fujii-san mentioned). Master with archive_mode != off > > and standby with archive_mode=always should archive WAL files that are > > not marked .done, but standby with archive_mode == on should not. The > > commit intended that > > Unless I'm wrong, the commit avoids creating .ready files on standby when a WAL > has neither .done or .ready status file. Right. > > but the mistake here is it thinks that inRecovery represents whether it is > > running as a standby or not, but actually it is true on primary during crash > > recovery. > > Indeed. > > > On the other hand, with the patch, standby with archive_mode=on > > wrongly archives WAL files during crash recovery. > > "without the patch" you mean? You are talking about 78ea8b5daab, right? No. I menat the v4 patch in [1]. [1] https://www.postgresql.org/message-id/20200407171736.61906608%40firost Prior to appllying the patch (that is the commit 78ea..), XLogArchiveCheckDone() correctly returns true (= allow to remove it) for the same condition. The proposed patch does the folloing thing. if (!XLogArchivingActive() || recoveryState == ARCHIVE_RECOVERING && !XLogArchivingAlways()) return true; It doesn't return for the condition "recoveryState=CRASH_RECOVERING and archive_mode = on". Then the WAL files is mitakenly marked as ".ready" if not marked yet. By the way, the code seems not following the convention a bit here. Let the inserting code be in the same style to the existing code around. + if ( ! XLogArchivingActive() ) I think we don't put the spaces within the parentheses above. | ARCHIVE_RECOVERING/CRASH_RECOVERING/NOT_IN_RECOVERY The first two and the last one are in different style. *I* prefer them (if we use it) be "IN_ARCHIVE_RECOVERY/IN_CRASH_RECOVERY/NOT_IN_RECOVERY". > > What we should check there is, as the commit was intended, not whether > > it is under crash or archive recovery, but whether it is running as > > primary or standby. > > Yes. > > > > > If it is "always", WAL files that are to be archived are > > > > marked as ".ready". Finally, the condition reduces to: > > > > > > > > If archiver is running, archive ".ready" files. Otherwise ignore > > > > ".ready" and just remove WAL files after use. > > > > > > > > > > > That is, WAL files with .ready files are removed when either > > > > > > archive_mode!=always in standby mode or archive_mode=off. > > > > > > > > > > sounds fine to me. > > > > > > > > That situation implies that archive_mode has been changed. > > > > > > Why? archive_mode may have been "always" on the primary when eg. a snapshot > > > has been created. > > > > .ready files are created only when archive_mode != off. > > Yes, on a primary, they are created when archive_mode > off. On standby, when > archive_mode=always. If a primary had archive_mode=always, a standby created > from its backup will still have archive_mode=always, with no changes. > Maybe I miss your point, sorry. Sorry, it was ambiguous. > That is, WAL files with .ready files are removed when either > archive_mode!=always in standby mode or archive_mode=off. If we have .ready file when archive_mode = off, the cluster (or the original of the copy cluster) should have been running in archive = on or always. That is, archive_mode has been changed. But anyway that discussion would not be in much relevance. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
By the way, I haven't noticed that Cc: didn't contain -hackers. Added it. At Thu, 9 Apr 2020 11:35:12 +0200, Jehan-Guillaume de Rorthais <jgdr@dalibo.com> wrote in > On Thu, 09 Apr 2020 11:26:57 +0900 (JST) > Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > [...] > > > > At Thu, 2 Apr 2020 15:49:15 +0200, Jehan-Guillaume de Rorthais > > > > <jgdr@dalibo.com> wrote in > > > > > > Ok, so our *current* consensus seems the followings. Right? > > > > > > > > > > > > - If archive_mode=off, any WAL files with .ready files are removed in > > > > > > crash recovery, archive recoery and standby mode. > > > > > > > > > > yes > > > > > > > > If archive_mode = off no WAL files are marked as ".ready". > > > > > > Sure, on the primary side. > > > > > > What if you build a standby from a backup with archive_mode=on with > > > some .ready files in there? > > > > Well. Backup doesn't have nothing in archive_status directory if it is > > taken by pg_basebackup. If the backup is created other way, it can > > have some (as Fujii-san mentioned). Master with archive_mode != off > > and standby with archive_mode=always should archive WAL files that are > > not marked .done, but standby with archive_mode == on should not. The > > commit intended that > > Unless I'm wrong, the commit avoids creating .ready files on standby when a WAL > has neither .done or .ready status file. Right. > > but the mistake here is it thinks that inRecovery represents whether it is > > running as a standby or not, but actually it is true on primary during crash > > recovery. > > Indeed. > > > On the other hand, with the patch, standby with archive_mode=on > > wrongly archives WAL files during crash recovery. > > "without the patch" you mean? You are talking about 78ea8b5daab, right? No. I menat the v4 patch in [1]. [1] https://www.postgresql.org/message-id/20200407171736.61906608%40firost Prior to appllying the patch (that is the commit 78ea..), XLogArchiveCheckDone() correctly returns true (= allow to remove it) for the same condition. The proposed patch does the folloing thing. if (!XLogArchivingActive() || recoveryState == ARCHIVE_RECOVERING && !XLogArchivingAlways()) return true; It doesn't return for the condition "recoveryState=CRASH_RECOVERING and archive_mode = on". Then the WAL files is mitakenly marked as ".ready" if not marked yet. By the way, the code seems not following the convention a bit here. Let the inserting code be in the same style to the existing code around. + if ( ! XLogArchivingActive() ) I think we don't put the spaces within the parentheses above. | ARCHIVE_RECOVERING/CRASH_RECOVERING/NOT_IN_RECOVERY The first two and the last one are in different style. *I* prefer them (if we use it) be "IN_ARCHIVE_RECOVERY/IN_CRASH_RECOVERY/NOT_IN_RECOVERY". > > What we should check there is, as the commit was intended, not whether > > it is under crash or archive recovery, but whether it is running as > > primary or standby. > > Yes. > > > > > If it is "always", WAL files that are to be archived are > > > > marked as ".ready". Finally, the condition reduces to: > > > > > > > > If archiver is running, archive ".ready" files. Otherwise ignore > > > > ".ready" and just remove WAL files after use. > > > > > > > > > > > That is, WAL files with .ready files are removed when either > > > > > > archive_mode!=always in standby mode or archive_mode=off. > > > > > > > > > > sounds fine to me. > > > > > > > > That situation implies that archive_mode has been changed. > > > > > > Why? archive_mode may have been "always" on the primary when eg. a snapshot > > > has been created. > > > > .ready files are created only when archive_mode != off. > > Yes, on a primary, they are created when archive_mode > off. On standby, when > archive_mode=always. If a primary had archive_mode=always, a standby created > from its backup will still have archive_mode=always, with no changes. > Maybe I miss your point, sorry. Sorry, it was ambiguous. > That is, WAL files with .ready files are removed when either > archive_mode!=always in standby mode or archive_mode=off. If we have .ready file when archive_mode = off, the cluster (or the original of the copy cluster) should have been running in archive = on or always. That is, archive_mode has been changed. But anyway that discussion would not be in much relevance. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
At Thu, 9 Apr 2020 18:46:22 +0200, Jehan-Guillaume de Rorthais <jgdr@dalibo.com> wrote in > On Wed, 8 Apr 2020 13:58:30 +0200 > Jehan-Guillaume de Rorthais <jgdr@dalibo.com> wrote: > [...] > > > May I ask why this new test is added to 011_crash_recovery.pl which is > > > aimed at testing crash and redo, while we have 002_archiving.pl that > > > is dedicated to archiving in a more general manner? > > > > I thought it was a better place because the test happen during crash recovery. > > > > In the meantime, while working on other tests related to $SUBJECT and the > > current consensus, I was wondering if a new file would be a better place > > anyway. > > So, 002_archiving.pl deals more with testing recovering on hot standby side > than archiving. Maybe it could be renamed? I have the same feeling with Michael. The test that archives are created correctly seems to fit the file. It would be unintentionally that the file is not exercising archiving so much. > While discussing this, I created a new file to add some more tests about WAL > archiving and how recovery deal with them. Please, find the patch in > attachment. I'll be able to move them elsewhere later, depending on the > conclusions of this discussion. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Fri, Apr 10, 2020 at 11:14:54AM +0900, Kyotaro Horiguchi wrote: > I have the same feeling with Michael. The test that archives are > created correctly seems to fit the file. It would be unintentionally > that the file is not exercising archiving so much. I have been finally looking at this thread and the latest patch set, sorry for the late reply. XLogCtl->XLogCacheBlck = XLOGbuffers - 1; - XLogCtl->SharedRecoveryInProgress = true; XLogCtl->SharedHotStandbyActive = false; XLogCtl->SharedPromoteIsTriggered = false; XLogCtl->WalWriterSleeping = false; [...] + switch (ControlFile->state) + { + case DB_SHUTDOWNED: + case DB_SHUTDOWNED_IN_RECOVERY: + XLogCtl->SharedRecoveryState = ARCHIVE_RECOVERING; + break; + default: + XLogCtl->SharedRecoveryState = CRASH_RECOVERING; + } It seems to me that the initial value of SharedRecoveryState should be CRASH_RECOVERING all the time no? StartupXLOG() is a code path taken even if starting cleanly, and the flag would be reset correctly at the end to NOT_IN_RECOVERY. +typedef enum RecoveryState +{ + NOT_IN_RECOVERY = 0, /* currently in production */ + CRASH_RECOVERING, /* recovering from a crash */ + ARCHIVE_RECOVERING /* recovering archives as requested */ +} RecoveryState; I also have some issues with the name of those variables, here is an idea for the three states: - RECOVERY_STATE_CRASH - RECOVERT_STATE_ARCHIVE - RECOVERY_STATE_NONE I would recommend to use the same suffix for those variables to ease grepping. /* - * Local copy of SharedRecoveryInProgress variable. True actually means "not - * known, need to check the shared state". + * This is false when SharedRecoveryState is not ARCHIVE_RECOVERING. + * True actually means "not known, need to check the shared state". */ A double negation sounds wrong to me. And actually this variable is false when the shared state is set to NOT_IN_RECOVERY, and true when the state is either CRASH_RECOVERING or ARCHIVE_RECOVERING because it means that recovery is running, be it archive recovery or crash recovery, so the comment is wrong. + /* The file is always deletable if archive_mode is "off". */ + if ( ! XLogArchivingActive() ) + return true; [...] + if ( recoveryState == ARCHIVE_RECOVERING && !XLogArchivingAlways() ) return true; Incorrect indentation. UpdateControlFile(); + + SpinLockAcquire(&XLogCtl->info_lck); + XLogCtl->SharedRecoveryState = ARCHIVE_RECOVERING; + SpinLockRelease(&XLogCtl->info_lck); + LWLockRelease(ControlFileLock); Here, the shared flag is updated while holding ControlFileLock to ensure a consistent state of things within shared memory, so it would be nice to add a comment about that. +RecoveryState +GetRecoveryState(void) +{ + volatile XLogCtlData *xlogctl = XLogCtl; + + return xlogctl->SharedRecoveryState; +} Er, you need to acquire info_lck to look at the state here, no? /* - * The file is always deletable if archive_mode is "off". On standbys - * archiving is disabled if archive_mode is "on", and enabled with - * "always". On a primary, archiving is enabled if archive_mode is "on" - * or "always". + * On standbys, the file is deletable if archive_mode is not + * "always". */ It would be good to mention that while in crash recovery, files are not considered as deletable. I agree that having a separate .pl file for the tests of this thread is just cleaner. Here are more comments about these. +# temporary fail archive_command for futur tests +$node->safe_psql('postgres', q{ + ALTER SYSTEM SET archive_command TO 'false'; + SELECT pg_reload_conf(); +}); That's likely portable on Windows even if you skip the tests there, and I am not sure that it is a good idea to rely on it being in PATH. Depending on the system, the path of the command is also likely going to be different. As here the goal is to prevent the archiver to do its work, why not relying on the configuration where archive_mode is set but archive_command is not? This would cause the archiver to be a no-op process, and .ready files will remain around. You could then replace the lookup of pg_stat_archiver with poll_query_until() and a query that makes use of pg_stat_file() to make sure that the .ready exists when needed. + ok( -f "$node_data/pg_wal/archive_status/000000010000000000000001.ready", + "WAL still ready to archive in archive_status"); It would be good to mention in the description the check applies to a primary. +# test recovery without archive_mode=always does not keep .ready WALs +$standby1 = get_new_node('standby'); +$standby1->init_from_backup($node, 'backup', has_restoring => 1); +$standby1_data = $standby1->data_dir; +$standby1->start; +$standby1->safe_psql('postgres', q{CHECKPOINT}); For readability archive_mode = on should be added to the configuration file? Okay, this is inherited from the primary, still that would avoid any issues with this code is refactored in some way. "WAL waiting to be archived in backup removed with archive_mode=on on standby" ); That should be "WAL segment" or "WAL file", but not WAL. Regarding the tests on a standby, it seems to me that the following is necessary: 1) Test that with archive_mode = on, segments are not marked with .ready. 2) Test that with archive_mode = always, segments are marked with .ready during archive recovery. 3) Test that with archive_mode = always, segments are not removed during crash recovery. I can see tests for 1) and 2), but not 3). Could you add a stop('immediate')+start() for $standby2 at the end of 020_archive_status.pl and check that the .ready file is still there after crash recovery? The end of the tests actually relies on the fact that archive_command is set to "false" when the cold backup is taken, before resetting it. I think that it would be cleaner to enforce the configuration you want to test before starting each standby. It becomes easier to understand the flow of the test for the reader. -- Michael
Attachment
Re: [BUG] non archived WAL removed during production crash recovery
From
Jehan-Guillaume de Rorthais
Date:
On Mon, 13 Apr 2020 16:14:14 +0900 Michael Paquier <michael@paquier.xyz> wrote: [...] > XLogCtl->XLogCacheBlck = XLOGbuffers - 1; > - XLogCtl->SharedRecoveryInProgress = true; > XLogCtl->SharedHotStandbyActive = false; > XLogCtl->SharedPromoteIsTriggered = false; > XLogCtl->WalWriterSleeping = false; > [...] > + switch (ControlFile->state) > + { > + case DB_SHUTDOWNED: > + case DB_SHUTDOWNED_IN_RECOVERY: > + XLogCtl->SharedRecoveryState = ARCHIVE_RECOVERING; > + break; > + default: > + XLogCtl->SharedRecoveryState = CRASH_RECOVERING; > + } > It seems to me that the initial value of SharedRecoveryState should be > CRASH_RECOVERING all the time no? StartupXLOG() is a code path taken > even if starting cleanly, and the flag would be reset correctly at the > end to NOT_IN_RECOVERY. Previous version of the patch was setting CRASH_RECOVERING. Fujii-san reported (the 4 Apr 2020 02:49:50 +0900) that it could be useful to expose a better value until relevant code is reached in StartupXLOG() so GetRecoveryState() returns a safer value for futur use. As I answered upthread, I'm not sure who would need this information before the WAL machinery is up though. Note that ARCHIVE_RECOVERING and CRASH_RECOVERING are compatible with the previous behavior. Maybe the solution would be to init with CRASH_RECOVERING and add some comment in GetRecoveryState() to warn the state is "enforced" after the XLOG machinery is started and is init'ed to RECOVERING in the meantime? I initialized it to CRASH_RECOVERING in the new v5 patch and added a comment to GetRecoveryState(). > +typedef enum RecoveryState > +{ > + NOT_IN_RECOVERY = 0, /* currently in production */ > + CRASH_RECOVERING, /* recovering from a crash */ > + ARCHIVE_RECOVERING /* recovering archives as requested */ > +} RecoveryState; > I also have some issues with the name of those variables, here is an > idea for the three states: > - RECOVERY_STATE_CRASH > - RECOVERT_STATE_ARCHIVE > - RECOVERY_STATE_NONE > I would recommend to use the same suffix for those variables to ease > grepping. Sounds really good to me. Thanks! > /* > - * Local copy of SharedRecoveryInProgress variable. True actually means "not > - * known, need to check the shared state". > + * This is false when SharedRecoveryState is not ARCHIVE_RECOVERING. > + * True actually means "not known, need to check the shared state". > */ > A double negation sounds wrong to me. And actually this variable is > false when the shared state is set to NOT_IN_RECOVERY, and true when > the state is either CRASH_RECOVERING or ARCHIVE_RECOVERING because it > means that recovery is running, be it archive recovery or crash > recovery, so the comment is wrong. Indeed, sorry. Fixed. > + /* The file is always deletable if archive_mode is "off". */ > + if ( ! XLogArchivingActive() ) > + return true; > [...] > + if ( recoveryState == ARCHIVE_RECOVERING && !XLogArchivingAlways() ) > return true; > Incorrect indentation. Is it the spaces as reported by Horiguchi-san? I removed them in latest patch. > UpdateControlFile(); > + > + SpinLockAcquire(&XLogCtl->info_lck); > + XLogCtl->SharedRecoveryState = ARCHIVE_RECOVERING; > + SpinLockRelease(&XLogCtl->info_lck); > + > LWLockRelease(ControlFileLock); > Here, the shared flag is updated while holding ControlFileLock to > ensure a consistent state of things within shared memory, so it would > be nice to add a comment about that. Indeed. the original code had no such comment and I asked myself the same question. Added. > +RecoveryState > +GetRecoveryState(void) > +{ > + volatile XLogCtlData *xlogctl = XLogCtl; > + > + return xlogctl->SharedRecoveryState; > +} > Er, you need to acquire info_lck to look at the state here, no? Yes, fixed. > /* > - * The file is always deletable if archive_mode is "off". On standbys > - * archiving is disabled if archive_mode is "on", and enabled with > - * "always". On a primary, archiving is enabled if archive_mode is "on" > - * or "always". > + * On standbys, the file is deletable if archive_mode is not > + * "always". > */ > It would be good to mention that while in crash recovery, files are > not considered as deletable. Well, in fact, I am still wondering about this. I was hesitant to add a shortcut like: /* no WAL segment cleanup during crash recovery */ if (recoveryState == RECOVERT_STATE_CRASH) return false; But, what if for example we crashed for lack of disk space during intensive wirtes? During crash recovery, any WAL marked as .done could be removed and allow the system to start again and maybe make even further WAL cleanup by archiving some more WAL without competing with previous high write ratio. When we recover a primary, this behavior seems conform with any value of archive_mode, even if it has been changed after crash and before starting it. On a standby, we might create .ready files, but they will be removed during the first restartpoint if needed. > I agree that having a separate .pl file for the tests of this thread > is just cleaner. Here are more comments about these. > > +# temporary fail archive_command for futur tests > +$node->safe_psql('postgres', q{ > + ALTER SYSTEM SET archive_command TO 'false'; > + SELECT pg_reload_conf(); > +}); > That's likely portable on Windows even if you skip the tests there, > and I am not sure that it is a good idea to rely on it being in PATH. > Depending on the system, the path of the command is also likely going > to be different. As here the goal is to prevent the archiver to do > its work, why not relying on the configuration where archive_mode is > set but archive_command is not? This would cause the archiver to be a > no-op process, and .ready files will remain around. You could then > replace the lookup of pg_stat_archiver with poll_query_until() and a > query that makes use of pg_stat_file() to make sure that the .ready > exists when needed. I needed a failure so I can test pg_stat_archiver reports it as well. In my mind, using "false" would either trigger a failure because false returns 1 or... a failure because the command is not found. In either way, the result is the same. Using poll_query_until+pg_stat_file, is a good idea, but not enough as archiver reports a failure some moment after the .ready signal file appears. > + ok( -f "$node_data/pg_wal/archive_status/000000010000000000000001.ready", > + "WAL still ready to archive in archive_status"); > It would be good to mention in the description the check applies to a > primary. done > +# test recovery without archive_mode=always does not keep .ready WALs > +$standby1 = get_new_node('standby'); > +$standby1->init_from_backup($node, 'backup', has_restoring => 1); > +$standby1_data = $standby1->data_dir; > +$standby1->start; > +$standby1->safe_psql('postgres', q{CHECKPOINT}); > For readability archive_mode = on should be added to the configuration > file? Okay, this is inherited from the primary, still that would > avoid any issues with this code is refactored in some way. added > "WAL waiting to be archived in backup removed with archive_mode=on > on > standby" ); That should be "WAL segment" or "WAL file", but not WAL. updated everywhere. > Regarding the tests on a standby, it seems to me that the following > is necessary: > 1) Test that with archive_mode = on, segments are not marked with > .ready. > 2) Test that with archive_mode = always, segments are marked with > .ready during archive recovery. > 3) Test that with archive_mode = always, segments are not removed > during crash recovery. > I can see tests for 1) and 2), Not really. The current tests does not check that segments created *after* the backup are marked or not with .ready file on standby. I added these tests plus some various other ones. > but not 3). Could you add a > stop('immediate')+start() for $standby2 at the end of > 020_archive_status.pl and check that the .ready file is still there > after crash recovery? done. > The end of the tests actually relies on the > fact that archive_command is set to "false" when the cold backup is > taken, before resetting it. I think that it would be cleaner to > enforce the configuration you want to test before starting each > standby. It becomes easier to understand the flow of the test for the > reader. done as well. Thank you for your review!
Attachment
Re: [BUG] non archived WAL removed during production crash recovery
From
Jehan-Guillaume de Rorthais
Date:
On Fri, 10 Apr 2020 11:00:31 +0900 (JST) Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: [...] > > > but the mistake here is it thinks that inRecovery represents whether it is > > > running as a standby or not, but actually it is true on primary during > > > crash recovery. > > > > Indeed. > > > > > On the other hand, with the patch, standby with archive_mode=on > > > wrongly archives WAL files during crash recovery. > > > > "without the patch" you mean? You are talking about 78ea8b5daab, right? > > No. I menat the v4 patch in [1]. > > [1] https://www.postgresql.org/message-id/20200407171736.61906608%40firost > > Prior to appllying the patch (that is the commit 78ea..), > XLogArchiveCheckDone() correctly returns true (= allow to remove it) > for the same condition. > > The proposed patch does the folloing thing. > > if (!XLogArchivingActive() || > recoveryState == ARCHIVE_RECOVERING && !XLogArchivingAlways()) > return true; > > It doesn't return for the condition "recoveryState=CRASH_RECOVERING > and archive_mode = on". Then the WAL files is mitakenly marked as > ".ready" if not marked yet. Indeed. But .ready files are then deleted during the first restartpoint. I'm not sure how to fix this behavior without making the code too complex. This is discussed in my last answer to Michael few minutes ago as well. > By the way, the code seems not following the convention a bit > here. Let the inserting code be in the same style to the existing code > around. > > + if ( ! XLogArchivingActive() ) > > I think we don't put the spaces within the parentheses above. Indeed, This is fixed in patch v5 sent few minutes ago. > | ARCHIVE_RECOVERING/CRASH_RECOVERING/NOT_IN_RECOVERY > > The first two and the last one are in different style. *I* prefer them > (if we use it) be "IN_ARCHIVE_RECOVERY/IN_CRASH_RECOVERY/NOT_IN_RECOVERY". I like Michael's proposal. See v5 of the patch. Thank you for your review! Regards,
Re: [BUG] non archived WAL removed during production crash recovery
From
Jehan-Guillaume de Rorthais
Date:
On Fri, 10 Apr 2020 11:00:31 +0900 (JST) Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: [...] > > > but the mistake here is it thinks that inRecovery represents whether it is > > > running as a standby or not, but actually it is true on primary during > > > crash recovery. > > > > Indeed. > > > > > On the other hand, with the patch, standby with archive_mode=on > > > wrongly archives WAL files during crash recovery. > > > > "without the patch" you mean? You are talking about 78ea8b5daab, right? > > No. I menat the v4 patch in [1]. > > [1] https://www.postgresql.org/message-id/20200407171736.61906608%40firost > > Prior to appllying the patch (that is the commit 78ea..), > XLogArchiveCheckDone() correctly returns true (= allow to remove it) > for the same condition. > > The proposed patch does the folloing thing. > > if (!XLogArchivingActive() || > recoveryState == ARCHIVE_RECOVERING && !XLogArchivingAlways()) > return true; > > It doesn't return for the condition "recoveryState=CRASH_RECOVERING > and archive_mode = on". Then the WAL files is mitakenly marked as > ".ready" if not marked yet. Indeed. But .ready files are then deleted during the first restartpoint. I'm not sure how to fix this behavior without making the code too complex. This is discussed in my last answer to Michael few minutes ago as well. > By the way, the code seems not following the convention a bit > here. Let the inserting code be in the same style to the existing code > around. > > + if ( ! XLogArchivingActive() ) > > I think we don't put the spaces within the parentheses above. Indeed, This is fixed in patch v5 sent few minutes ago. > | ARCHIVE_RECOVERING/CRASH_RECOVERING/NOT_IN_RECOVERY > > The first two and the last one are in different style. *I* prefer them > (if we use it) be "IN_ARCHIVE_RECOVERY/IN_CRASH_RECOVERY/NOT_IN_RECOVERY". I like Michael's proposal. See v5 of the patch. Thank you for your review! Regards,
On Tue, Apr 14, 2020 at 06:03:19PM +0200, Jehan-Guillaume de Rorthais wrote: Thanks for the new version. > On Mon, 13 Apr 2020 16:14:14 +0900 Michael Paquier <michael@paquier.xyz> wrote: >> It seems to me that the initial value of SharedRecoveryState should be >> CRASH_RECOVERING all the time no? StartupXLOG() is a code path taken >> even if starting cleanly, and the flag would be reset correctly at the >> end to NOT_IN_RECOVERY. > > Previous version of the patch was setting CRASH_RECOVERING. Fujii-san reported > (the 4 Apr 2020 02:49:50 +0900) that it could be useful to expose a better value > until relevant code is reached in StartupXLOG() so GetRecoveryState() returns > a safer value for futur use. > > As I answered upthread, I'm not sure who would need this information before the > WAL machinery is up though. Note that ARCHIVE_RECOVERING and CRASH_RECOVERING > are compatible with the previous behavior. I am not sure either if we need this information until the startup process is up, but even if we need it I'd rather keep the code consistent with the past practice, which was that SharedRecoveryInProgress got set to true, the equivalent of crash recovery as that's more generic than the archive recovery switch. > Maybe the solution would be to init with CRASH_RECOVERING and add some comment > in GetRecoveryState() to warn the state is "enforced" after the XLOG machinery > is started and is init'ed to RECOVERING in the meantime? > > I initialized it to CRASH_RECOVERING in the new v5 patch and added a comment > to GetRecoveryState(). Not sure that the comment is worth it. Your description of the state looks enough, and the code is clear that we have just an initial shared memory state in this case. >> + /* The file is always deletable if archive_mode is "off". */ >> + if ( ! XLogArchivingActive() ) >> + return true; >> [...] >> + if ( recoveryState == ARCHIVE_RECOVERING && !XLogArchivingAlways() ) >> return true; >> Incorrect indentation. > > Is it the spaces as reported by Horiguchi-san? I removed them in latest patch. Yes. >> It would be good to mention that while in crash recovery, files are >> not considered as deletable. > > Well, in fact, I am still wondering about this. I was hesitant to add a > shortcut like: > > /* no WAL segment cleanup during crash recovery */ > if (recoveryState == RECOVERT_STATE_CRASH) > return false; > > But, what if for example we crashed for lack of disk space during intensive > writes? During crash recovery, any WAL marked as .done could be removed and > allow the system to start again and maybe make even further WAL cleanup by > archiving some more WAL without competing with previous high write ratio. > > When we recover a primary, this behavior seems conform with any value of > archive_mode, even if it has been changed after crash and before starting it. > On a standby, we might create .ready files, but they will be removed during the > first restartpoint if needed. I guess that you mean .ready and not .done here? Removing .done files does not matter as the segments related to them are already gone. Even with that, why should we need to make the backend smarter about the removal of .ready files during crash recovery. It seems to me that we should keep them, and an operator could always come by himself and do some manual cleanup to free some space in the pg_wal partition. > I needed a failure so I can test pg_stat_archiver reports it as well. In my > mind, using "false" would either trigger a failure because false returns 1 or... > a failure because the command is not found. In either way, the result is the > same. > > Using poll_query_until+pg_stat_file, is a good idea, but not enough as > archiver reports a failure some moment after the .ready signal file appears. Using an empty string makes the test more portable, but while I looked at it I have found out that it leads to delays in the archiver except if you force the generation of more segments in the test, causing the logic to get more complicated with the manipulation of the .ready and .done files. And I was then finding myself to add an archive_timeout.. Anyway, this reduced the readability of the test so I am pretty much giving up on this idea. >> The end of the tests actually relies on the >> fact that archive_command is set to "false" when the cold backup is >> taken, before resetting it. I think that it would be cleaner to >> enforce the configuration you want to test before starting each >> standby. It becomes easier to understand the flow of the test for the >> reader. > > done as well. I have put my hands on the code, and attached is a cleaned up version for the backend part. Below are some notes. + * RECOVERT_STATE_ARCHIVE is set for archive recovery or for a standby. Typo here that actually comes from my previous email, and that you blindly copy-pasted, repeated five times in the tree actually. + RecoveryState recoveryState = GetRecoveryState(); + + /* The file is always deletable if archive_mode is "off". */ + if (!XLogArchivingActive()) + return true; There is no point to call GetRecoveryState() is archive_mode = off. + * There's two different reasons to recover WAL: when standby mode is requested + * or after a crash to catchup with consistency. Nit: s/There's/There are/. Anyway, I don't see much point in keeping this comment as the description of each state value should be enough, so I have removed it. I am currently in the middle of reviewing the test and there are a couple of things that can be improved but I lack of time today, so I'll continue tomorrow on it. There is no need to send two separate patches by the way as the code paths touched are different. -- Michael
Attachment
Re: [BUG] non archived WAL removed during production crash recovery
From
Jehan-Guillaume de Rorthais
Date:
On Thu, 16 Apr 2020 17:11:00 +0900 Michael Paquier <michael@paquier.xyz> wrote: > On Tue, Apr 14, 2020 at 06:03:19PM +0200, Jehan-Guillaume de Rorthais wrote: > > Thanks for the new version. Thank you for v6. > > On Mon, 13 Apr 2020 16:14:14 +0900 Michael Paquier <michael@paquier.xyz> > > wrote: > >> It seems to me that the initial value of SharedRecoveryState should be > >> CRASH_RECOVERING all the time no? StartupXLOG() is a code path taken > >> even if starting cleanly, and the flag would be reset correctly at the > >> end to NOT_IN_RECOVERY. > > > > Previous version of the patch was setting CRASH_RECOVERING. Fujii-san > > reported (the 4 Apr 2020 02:49:50 +0900) that it could be useful to expose > > a better value until relevant code is reached in StartupXLOG() so > > GetRecoveryState() returns a safer value for futur use. > > > > As I answered upthread, I'm not sure who would need this information before > > the WAL machinery is up though. Note that ARCHIVE_RECOVERING and > > CRASH_RECOVERING are compatible with the previous behavior. > > I am not sure either if we need this information until the startup > process is up, but even if we need it I'd rather keep the code > consistent with the past practice, which was that > SharedRecoveryInProgress got set to true, the equivalent of crash > recovery as that's more generic than the archive recovery switch. OK. > > Maybe the solution would be to init with CRASH_RECOVERING and add some > > comment in GetRecoveryState() to warn the state is "enforced" after the > > XLOG machinery is started and is init'ed to RECOVERING in the meantime? > > > > I initialized it to CRASH_RECOVERING in the new v5 patch and added a comment > > to GetRecoveryState(). > > Not sure that the comment is worth it. Your description of the state > looks enough, and the code is clear that we have just an initial > shared memory state in this case. OK. Will remove later after your review of the tests. > >> It would be good to mention that while in crash recovery, files are > >> not considered as deletable. > > > > Well, in fact, I am still wondering about this. I was hesitant to add a > > shortcut like: > > > > /* no WAL segment cleanup during crash recovery */ > > if (recoveryState == RECOVERT_STATE_CRASH) > > return false; > > > > But, what if for example we crashed for lack of disk space during intensive > > writes? During crash recovery, any WAL marked as .done could be removed and > > allow the system to start again and maybe make even further WAL cleanup by > > archiving some more WAL without competing with previous high write ratio. > > > > When we recover a primary, this behavior seems conform with any value of > > archive_mode, even if it has been changed after crash and before starting > > it. On a standby, we might create .ready files, but they will be removed > > during the first restartpoint if needed. > > I guess that you mean .ready and not .done here? No, I meant .done. > Removing .done files does not matter as the segments related to them are > already gone. Not necessarily. There's a time windows between the moment the archiver set the .done file and when the checkpointer removes the associated WAL file. So, after a PANIC because of lack of space, WAL associated with .done files but not removed yet will be removed during the crash recovery. > Even with that, why should we need to make the backend smarter about > the removal of .ready files during crash recovery. It seems to me > that we should keep them, and an operator could always come by himself > and do some manual cleanup to free some space in the pg_wal partition. We are agree on this. > > I needed a failure so I can test pg_stat_archiver reports it as well. In my > > mind, using "false" would either trigger a failure because false returns 1 > > or... a failure because the command is not found. In either way, the result > > is the same. > > > > Using poll_query_until+pg_stat_file, is a good idea, but not enough as > > archiver reports a failure some moment after the .ready signal file > > appears. > > Using an empty string makes the test more portable, but while I looked > at it I have found out that it leads to delays in the archiver except > if you force the generation of more segments in the test, causing the > logic to get more complicated with the manipulation of the .ready and > .done files. And I was then finding myself to add an > archive_timeout.. Anyway, this reduced the readability of the test so > I am pretty much giving up on this idea. Unless I'm wrong, the empty string does not raise an error in pg_stat_archiver, and I wanted to add a test on this as well. > >> The end of the tests actually relies on the > >> fact that archive_command is set to "false" when the cold backup is > >> taken, before resetting it. I think that it would be cleaner to > >> enforce the configuration you want to test before starting each > >> standby. It becomes easier to understand the flow of the test for the > >> reader. > > > > done as well. > > I have put my hands on the code, and attached is a cleaned up > version for the backend part. Below are some notes. > > + * RECOVERT_STATE_ARCHIVE is set for archive recovery or for a > standby. > Typo here that actually comes from my previous email, and that you > blindly copy-pasted, repeated five times in the tree actually. Oops...yes, even in a comment with RECOVERT_STATE_NONE :/ Sorry. > + RecoveryState recoveryState = GetRecoveryState(); > + > + /* The file is always deletable if archive_mode is "off". */ > + if (!XLogArchivingActive()) > + return true; > There is no point to call GetRecoveryState() is archive_mode = off. good catch! > + * There's two different reasons to recover WAL: when standby mode is > requested > + * or after a crash to catchup with consistency. > Nit: s/There's/There are/. Anyway, I don't see much point in keeping > this comment as the description of each state value should be enough, > so I have removed it. OK > I am currently in the middle of reviewing the test and there are a > couple of things that can be improved but I lack of time today, so > I'll continue tomorrow on it. There is no need to send two separate > patches by the way as the code paths touched are different. OK. Thanks for your review! Let me know if you want me to add/change/fix some tests. Regards,
On Fri, Apr 17, 2020 at 12:07:39AM +0200, Jehan-Guillaume de Rorthais wrote: > On Thu, 16 Apr 2020 17:11:00 +0900 Michael Paquier <michael@paquier.xyz> wrote: >> Removing .done files does not matter as the segments related to them are >> already gone. > > Not necessarily. There's a time windows between the moment the archiver set > the .done file and when the checkpointer removes the associated WAL file. > So, after a PANIC because of lack of space, WAL associated with .done files but > not removed yet will be removed during the crash recovery. Not sure that it is something that matters for this thread though, so if necessary I think that it could be be discussed separately. >> Even with that, why should we need to make the backend smarter about >> the removal of .ready files during crash recovery. It seems to me >> that we should keep them, and an operator could always come by himself >> and do some manual cleanup to free some space in the pg_wal partition. > > We are agree on this. Okay. > Unless I'm wrong, the empty string does not raise an error in pg_stat_archiver, > and I wanted to add a test on this as well. Exactly, it won't raise an error. Instead I switched to use a poll query with pg_stat_file() and .ready files, but this has proved to delay the test considerably if we did not create more segments. And your approach has the merit to be more simple with only two segments manipulated for the whole test. So I have tried first my idea, noticed the mess it introduced, and just kept your approach. > Thanks for your review! Let me know if you want me to add/change/fix some tests. Thanks, I have worked more on the test, refactoring pieces related to the segment names, adjusting some comments and fixing some of the logic. Note that you introduced something incorrect at the creation of $standby2 as you have been updating postgresql.conf.auto for $standby1. I have noticed an extra issue while looking at the backend pieces today: at the beginning of the REDO loop we forgot one place where SharedRecoveryState *has* to be updated to a correct state (around the comment "Update pg_control to show that we are..." in xlog.c) as the startup process may decide to switch the control file state to DB_IN_ARCHIVE_RECOVERY or DB_IN_CRASH_RECOVERY, but we forgot to update the new shared flag at this early stage. It did not matter before because SharedRecoveryInProgress would be only "true" for both, but that's not the case anymore as we need to make the difference between crash recovery and archive recovery in the new flag. There is no actual need to update SharedRecoveryState to RECOVERY_STATE_CRASH as the initial shared memory state is RECOVERY_STATE_CRASH, but updating the flag makes the code more consistent IMO so I updated it anyway in the attached. I have the feeling that I need to work a bit more on this patch, but my impression is that we are getting to something committable here. Thoughts? -- Michael
Attachment
Re: [BUG] non archived WAL removed during production crash recovery
From
Jehan-Guillaume de Rorthais
Date:
On Fri, 17 Apr 2020 15:50:43 +0900 Michael Paquier <michael@paquier.xyz> wrote: > On Fri, Apr 17, 2020 at 12:07:39AM +0200, Jehan-Guillaume de Rorthais wrote: > > On Thu, 16 Apr 2020 17:11:00 +0900 Michael Paquier <michael@paquier.xyz> > > wrote: > >> Removing .done files does not matter as the segments related to them are > >> already gone. > > > > Not necessarily. There's a time windows between the moment the archiver set > > the .done file and when the checkpointer removes the associated WAL file. > > So, after a PANIC because of lack of space, WAL associated with .done files > > but not removed yet will be removed during the crash recovery. > > Not sure that it is something that matters for this thread though, so > if necessary I think that it could be be discussed separately. OK. However, unless I'm wrong, what I am describing as an desired behavior is the current behavior of XLogArchiveCheckDone. So, we might want to decide if v8 should return false during crash recovery no matter the archive_mode setup, or if we keep the curent behavior. I vote for keeping it this way. [...] > > Unless I'm wrong, the empty string does not raise an error in > > pg_stat_archiver, and I wanted to add a test on this as well. > > Exactly, it won't raise an error. Instead I switched to use a poll > query with pg_stat_file() and .ready files, but this has proved to > delay the test considerably if we did not create more segments. And > your approach has the merit to be more simple with only two segments > manipulated for the whole test. So I have tried first my idea, > noticed the mess it introduced, and just kept your approach. Maybe we could use something more common for all plateform? Eg.: archive_command='this command does not exist' At least, we would have the same error everywhere, as far as it could matter... > > Thanks for your review! Let me know if you want me to add/change/fix some > > tests. > > Thanks, I have worked more on the test, refactoring pieces related to > the segment names, adjusting some comments and fixing some of the > logic. Note that you introduced something incorrect at the creation > of $standby2 as you have been updating postgresql.conf.auto for > $standby1. erf, last minute quick edit with lack of review on my side :( > I have noticed an extra issue while looking at the backend pieces > today: at the beginning of the REDO loop we forgot one place where > SharedRecoveryState *has* to be updated to a correct state (around > the comment "Update pg_control to show that we are..." in xlog.c) as > the startup process may decide to switch the control file state to > DB_IN_ARCHIVE_RECOVERY or DB_IN_CRASH_RECOVERY, but we forgot to > update the new shared flag at this early stage. It did not matter > before because SharedRecoveryInProgress would be only "true" for both, > but that's not the case anymore as we need to make the difference > between crash recovery and archive recovery in the new flag. There is > no actual need to update SharedRecoveryState to RECOVERY_STATE_CRASH > as the initial shared memory state is RECOVERY_STATE_CRASH, but > updating the flag makes the code more consistent IMCRASHO so I updated it > anyway in the attached. Grmbl...I had this logic the other way around: init with RECOVERY_STATE_RECOVERY and set to CRASH in this exact if/then/else block... I removed it in v4 when setting XLogCtl->SharedRecoveryState to RECOVERY or CRASH based on ControlFile->state. Sorry, I forgot it after discussing the init value in v5 :( Regards,
On Fri, Apr 17, 2020 at 03:33:04PM +0200, Jehan-Guillaume de Rorthais wrote: > On Fri, 17 Apr 2020 15:50:43 +0900 Michael Paquier <michael@paquier.xyz> wrote: >> Not sure that it is something that matters for this thread though, so >> if necessary I think that it could be discussed separately. > > OK. However, unless I'm wrong, what I am describing as a desired behavior > is the current behavior of XLogArchiveCheckDone. So, we might want to decide if > v8 should return false during crash recovery no matter the archive_mode setup, > or if we keep the current behavior. I vote for keeping it this way. I would rather avoid that now, as we don't check explicitely for crash recovery in this code path. And for the purpose of this patch it is fine to stick with the extra check on a standby with (RECOVERY_STATE_ARCHIVE && archive_mode = always). > Maybe we could use something more common for all plateform? Eg.: > > archive_command='this command does not exist' > > At least, we would have the same error everywhere, as far as it could matter... Yeah. We could try to do with "false" as command anyway, and see what the buildfarm thinks. As the test is skipped on Windows, I would assume that it does not matter much anyway. Let's see what others think about this piece. I don't have plans to touch again this patch until likely the middle of next week. >> Thanks, I have worked more on the test, refactoring pieces related to >> the segment names, adjusting some comments and fixing some of the >> logic. Note that you introduced something incorrect at the creation >> of $standby2 as you have been updating postgresql.conf.auto for >> $standby1. > > erf, last minute quick edit with lack of review on my side :( No problem. It happens. >> I have noticed an extra issue while looking at the backend pieces >> today: at the beginning of the REDO loop we forgot one place where >> SharedRecoveryState *has* to be updated to a correct state (around >> the comment "Update pg_control to show that we are..." in xlog.c) as >> the startup process may decide to switch the control file state to >> DB_IN_ARCHIVE_RECOVERY or DB_IN_CRASH_RECOVERY, but we forgot to >> update the new shared flag at this early stage. It did not matter >> before because SharedRecoveryInProgress would be only "true" for both, >> but that's not the case anymore as we need to make the difference >> between crash recovery and archive recovery in the new flag. There is >> no actual need to update SharedRecoveryState to RECOVERY_STATE_CRASH >> as the initial shared memory state is RECOVERY_STATE_CRASH, but >> updating the flag makes the code more consistent IMCRASHO so I updated it >> anyway in the attached. > > Grmbl...I had this logic the other way around: init with > RECOVERY_STATE_RECOVERY and set to CRASH in this exact if/then/else block... I > removed it in v4 when setting XLogCtl->SharedRecoveryState to RECOVERY or CRASH > based on ControlFile->state. > > Sorry, I forgot it after discussing the init value in v5 :( Indeed. The extra initialization was part of v4, and got removed as of v5. Still, it seems to me that this part was not complete without updating the shared memory field correctly at the beginning of the REDO processing as the last version of the patch does. -- Michael
Attachment
At Sat, 18 Apr 2020 18:26:11 +0900, Michael Paquier <michael@paquier.xyz> wrote in > On Fri, Apr 17, 2020 at 03:33:04PM +0200, Jehan-Guillaume de Rorthais wrote: > > On Fri, 17 Apr 2020 15:50:43 +0900 Michael Paquier <michael@paquier.xyz> wrote: > >> Not sure that it is something that matters for this thread though, so > >> if necessary I think that it could be discussed separately. > > > > OK. However, unless I'm wrong, what I am describing as a desired behavior > > is the current behavior of XLogArchiveCheckDone. So, we might want to decide if > > v8 should return false during crash recovery no matter the archive_mode setup, > > or if we keep the current behavior. I vote for keeping it this way. > > I would rather avoid that now, as we don't check explicitely for crash > recovery in this code path. And for the purpose of this patch it is > fine to stick with the extra check on a standby with > (RECOVERY_STATE_ARCHIVE && archive_mode = always). The commit 78ea8b5daa intends that WAL segments are properly removed on standby with archive_mode=on by not marking .ready. The v7 actually let such segments be marked .ready, but they are finally removed after entering archive recovery. It preserves the patch's intention in that perspective. (I'd rather prefer to distinguish "ArchiveRecoveryRequested" somehow but it would be more complex and it is not the agreement on this thread.) As the result, +1 to what v7 is doing and discussing on earlier removal of such WAL segments separately if needed. > > Maybe we could use something more common for all plateform? Eg.: > > > > archive_command='this command does not exist' > > > > At least, we would have the same error everywhere, as far as it could matter... > > Yeah. We could try to do with "false" as command anyway, and see what > the buildfarm thinks. As the test is skipped on Windows, I would > assume that it does not matter much anyway. Let's see what others > think about this piece. I don't have plans to touch again this patch > until likely the middle of next week. Couldn't we use "/" as a globally-results-in-failure command? But that doesn't increment failed_count. The reason is pgarch_archiveXLog exits with FATAL for "is a directory" error. The comment asserts that we exit with FATAL for SIGINT or SIGQUIT and if so it is enough to check only exit-by-signal case. The following fix worked. diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c index 01ffd6513c..def6a68063 100644 --- a/src/backend/postmaster/pgarch.c +++ b/src/backend/postmaster/pgarch.c @@ -595,7 +595,7 @@ pgarch_archiveXlog(char *xlog) * "command not found" type of error. If we overreact it's no big * deal, the postmaster will just start the archiver again. */ - int lev = wait_result_is_any_signal(rc, true) ? FATAL : LOG; + int lev = wait_result_is_any_signal(rc, false) ? FATAL : LOG; if (WIFEXITED(rc)) { I didn't tested it on Windows (I somehow broke my repo and it's too slow to clone.) but system("/") returned 1 and I think that result increments the counter. > >> Thanks, I have worked more on the test, refactoring pieces related to > >> the segment names, adjusting some comments and fixing some of the > >> logic. Note that you introduced something incorrect at the creation > >> of $standby2 as you have been updating postgresql.conf.auto for > >> $standby1. > > > > erf, last minute quick edit with lack of review on my side :( > > No problem. It happens. > > >> I have noticed an extra issue while looking at the backend pieces > >> today: at the beginning of the REDO loop we forgot one place where > >> SharedRecoveryState *has* to be updated to a correct state (around > >> the comment "Update pg_control to show that we are..." in xlog.c) as > >> the startup process may decide to switch the control file state to > >> DB_IN_ARCHIVE_RECOVERY or DB_IN_CRASH_RECOVERY, but we forgot to > >> update the new shared flag at this early stage. It did not matter > >> before because SharedRecoveryInProgress would be only "true" for both, > >> but that's not the case anymore as we need to make the difference > >> between crash recovery and archive recovery in the new flag. There is > >> no actual need to update SharedRecoveryState to RECOVERY_STATE_CRASH > >> as the initial shared memory state is RECOVERY_STATE_CRASH, but > >> updating the flag makes the code more consistent IMCRASHO so I updated it > >> anyway in the attached. > > > > Grmbl...I had this logic the other way around: init with > > RECOVERY_STATE_RECOVERY and set to CRASH in this exact if/then/else block... I > > removed it in v4 when setting XLogCtl->SharedRecoveryState to RECOVERY or CRASH > > based on ControlFile->state. > > > > Sorry, I forgot it after discussing the init value in v5 :( > > Indeed. The extra initialization was part of v4, and got removed as > of v5. Still, it seems to me that this part was not complete without > updating the shared memory field correctly at the beginning of the > REDO processing as the last version of the patch does. I may not be following the discussion, but I think it is reasonable that SharedRecoveryState is initialized as CRASH then moves to ARCHIVE as needed and finished by NONE. That transition also stables RecoveryInProgress(). Other minor comments: + RECOVERY_STATE_NONE /* currently in production */ I think it would be better be RECOVERY_STATE_DONE. By the way I noticed that RecoveryState is exactly a subset of DBState. And changes of SharedRecoveryState happens side-by-side with ControlFileData->state in most places. Coundn't we just usee ControlFile->state instead of SharedRecoveryState? By the way I found a typo. +# Recovery tests for the achiving with a standby partially check s/achiving/archiving/ regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Mon, Apr 20, 2020 at 04:02:31PM +0900, Kyotaro Horiguchi wrote: > At Sat, 18 Apr 2020 18:26:11 +0900, Michael Paquier <michael@paquier.xyz> wrote in > As the result, +1 to what v7 is doing and discussing on earlier > removal of such WAL segments separately if needed. Thanks for the extra review. >> Yeah. We could try to do with "false" as command anyway, and see what >> the buildfarm thinks. As the test is skipped on Windows, I would >> assume that it does not matter much anyway. Let's see what others >> think about this piece. I don't have plans to touch again this patch >> until likely the middle of next week. > > Couldn't we use "/" as a globally-results-in-failure command? But > that doesn't increment failed_count. The reason is pgarch_archiveXLog > exits with FATAL for "is a directory" error. The comment asserts that > we exit with FATAL for SIGINT or SIGQUIT and if so it is enough to > check only exit-by-signal case. The following fix worked. Yeah, I was working on this stuff today and I noticed this problem. I was just going to send an email on the matter with a more portable patch and you also just beat me to it with this one :) So yes, using "false" may be a bad idea because we cannot rely on the case where the command does not exist in an environment in this test. After more testing, I have been hit hard about the fact that the archiver exits immediately if an archive command cannot be found (errcode = 127), and it does not report this failure back to pg_stat_archiver, which would cause the test to wait until the timeout of poll_query_until() kills the test. There is however an extra method not mentioned yet on this thread: we know that cp/copy is portable enough per the buildfarm, so let's use a copy command that we know *will* fail. A simple way of doing this is a command where the origin file does not exist. > --- a/src/backend/postmaster/pgarch.c > +++ b/src/backend/postmaster/pgarch.c > @@ -595,7 +595,7 @@ pgarch_archiveXlog(char *xlog) > * "command not found" type of error. If we overreact it's no big > * deal, the postmaster will just start the archiver again. > */ > - int lev = wait_result_is_any_signal(rc, true) ? FATAL : LOG; > + int lev = wait_result_is_any_signal(rc, false) ? FATAL : LOG; > > if (WIFEXITED(rc)) > { > > I didn't tested it on Windows (I somehow broke my repo and it's too > slow to clone.) but system("/") returned 1 and I think that result > increments the counter. No, this would be a behavior change, which is not acceptable in my view. (By the way, just nuke your full repo if it does not work anymore on Windows, this method works). >> Indeed. The extra initialization was part of v4, and got removed as >> of v5. Still, it seems to me that this part was not complete without >> updating the shared memory field correctly at the beginning of the >> REDO processing as the last version of the patch does. > > I may not be following the discussion, but I think it is reasonable > that SharedRecoveryState is initialized as CRASH then moves to ARCHIVE > as needed and finished by NONE. That transition also stables > RecoveryInProgress(). Thought as well about that over the weekend, and that's still the best option to me. > I think it would be better be RECOVERY_STATE_DONE. I like this suggestion better than the original in v7. > By the way I noticed that RecoveryState is exactly a subset of > DBState. And changes of SharedRecoveryState happens side-by-side with > ControlFileData->state in most places. Coundn't we just usee > ControlFile->state instead of SharedRecoveryState? I actually found confusing to use the same thing, because then the reader would thing that SharedRecoveryState could be set to more values but we don't want that. > By the way I found a typo. > > +# Recovery tests for the achiving with a standby partially check > s/achiving/archiving/ Thanks, fixed. Attached is an updated patch, where I tweaked more comments. Jehan-Guillaume, who is your colleague who found originally about this problem? We should credit him in the commit message. -- Michael
Attachment
Re: [BUG] non archived WAL removed during production crash recovery
From
Jehan-Guillaume de Rorthais
Date:
On Mon, 20 Apr 2020 16:34:44 +0900 Michael Paquier <michael@paquier.xyz> wrote: [...] > > By the way I noticed that RecoveryState is exactly a subset of > > DBState. And changes of SharedRecoveryState happens side-by-side with > > ControlFileData->state in most places. Coundn't we just usee > > ControlFile->state instead of SharedRecoveryState? > > I actually found confusing to use the same thing, because then the > reader would thing that SharedRecoveryState could be set to more > values but we don't want that. I thought about this while studying various possible fix. https://www.postgresql.org/message-id/flat/20200401181735.11100908%40firost#6192afba4e4549b8d9bac03168bad46b The problem is that we would have to read the controldata file each time we wonder if a segment should be archived/removed. Moreover, the controldata file might not be in sync quickly enough with the real state for some other code path or futur needs. [...] > Attached is an updated patch, where I tweaked more comments. > > Jehan-Guillaume, who is your colleague who found originally about this > problem? We should credit him in the commit message. Indeed, Benoît Lobréau reported this behavior to me. Regards,
On Mon, Apr 20, 2020 at 02:22:35PM +0200, Jehan-Guillaume de Rorthais wrote: > The problem is that we would have to read the controldata file each time we > wonder if a segment should be archived/removed. Moreover, the controldata > file might not be in sync quickly enough with the real state for some other > code path or futur needs. I don't think that this is what Horiguchi-san meant here. What I got from his previous message would be to be to copy the shared value from the control file when necessary, and have the shared state use only a subset of the existing values of DBState, aka: - DB_IN_CRASH_RECOVERY - DB_IN_ARCHIVE_RECOVERY - DB_IN_PRODUCTION Still, that sounds wrong to me because then somebody would be tempted to change the shared value thinking that things like DB_SHUTDOWNING, DB_SHUTDOWNED_* or DB_STARTUP are valid but we don't want that here. Note that there may be a case for DB_STARTUP to be used in XLOGShmemInit(), but I'd rather let the code use the safest default, DB_IN_CRASH_RECOVERY to control that we won't remove .ready files by default until the startup process sees fit to do the actual switch depending on the checkpoint record lookup, if archive recovery was actually requested, etc. > Indeed, Benoît Lobréau reported this behavior to me. Noted. Thanks for the information. I don't think that I have ever met Benoît in person, do I? Tell him that I owe him one beer or a beverage of his choice when we meet IRL, and that he had better use this message-id to make me keep my promise :) -- Michael
Attachment
At Tue, 21 Apr 2020 11:15:01 +0900, Michael Paquier <michael@paquier.xyz> wrote in > On Mon, Apr 20, 2020 at 02:22:35PM +0200, Jehan-Guillaume de Rorthais wrote: > > The problem is that we would have to read the controldata file each time we > > wonder if a segment should be archived/removed. Moreover, the controldata > > file might not be in sync quickly enough with the real state for some other > > code path or futur needs. > > I don't think that this is what Horiguchi-san meant here. What I got > from his previous message would be to be to copy the shared value from > the control file when necessary, and have the shared state use only a > subset of the existing values of DBState, aka: > - DB_IN_CRASH_RECOVERY > - DB_IN_ARCHIVE_RECOVERY > - DB_IN_PRODUCTION First I thought as above, but I thought that we could use ControlFile->state itself in this case, by regarding the symbols less than DB_IN_CRASH_RECOVERY as RECOVERY_STATE_CRASH. I don't think there's no problem if the update to DB_IN_ARCHIVE_RECOVERY reaches checkpointer with some delay. > Still, that sounds wrong to me because then somebody would be tempted > to change the shared value thinking that things like DB_SHUTDOWNING, > DB_SHUTDOWNED_* or DB_STARTUP are valid but we don't want that here. That is not an issue if we just use DBState to know whether we have started archive recovery. > Note that there may be a case for DB_STARTUP to be used in > XLOGShmemInit(), but I'd rather let the code use the safest default, > DB_IN_CRASH_RECOVERY to control that we won't remove .ready files by > default until the startup process sees fit to do the actual switch > depending on the checkpoint record lookup, if archive recovery was > actually requested, etc. I'm not sure I read this correctly. But I think I agree to this. + if (!XLogArchivingAlways() && + GetRecoveryState() == RECOVERY_STATE_ARCHIVE) Is rewritten as + if (!XLogArchivingAlways() && + GetDBState() > DB_IN_CRASH_RECOVERY) FWIW, what annoyed me is there are three variables that are quite similar but has different domains, ControlFile->state, XLogCtl->SharedRecoveryState, and LocalRecoveryInProgress. I didn't mind there were two, but three seems a bit too many to me. But it may be different issue. > > Indeed, Benoît Lobréau reported this behavior to me. > > Noted. Thanks for the information. I don't think that I have ever > met Benoît in person, do I? Tell him that I owe him one beer or a > beverage of his choice when we meet IRL, and that he had better use > this message-id to make me keep my promise :) regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Tue, Apr 21, 2020 at 12:09:25PM +0900, Kyotaro Horiguchi wrote: > + if (!XLogArchivingAlways() && > + GetRecoveryState() == RECOVERY_STATE_ARCHIVE) > > Is rewritten as > > + if (!XLogArchivingAlways() && > + GetDBState() > DB_IN_CRASH_RECOVERY) > > FWIW, what annoyed me is there are three variables that are quite > similar but has different domains, ControlFile->state, > XLogCtl->SharedRecoveryState, and LocalRecoveryInProgress. I didn't > mind there were two, but three seems a bit too many to me. That's actually the pattern I would avoid for clarity. There is no need to add more dependencies to the entries of DBState for the sake of this patch, and this smells like a trap if more values are added to it in an order that does not match what we have been assuming in the context of this thread. -- Michael
Attachment
At Tue, 21 Apr 2020 13:57:39 +0900, Michael Paquier <michael@paquier.xyz> wrote in > On Tue, Apr 21, 2020 at 12:09:25PM +0900, Kyotaro Horiguchi wrote: > > + if (!XLogArchivingAlways() && > > + GetRecoveryState() == RECOVERY_STATE_ARCHIVE) > > > > Is rewritten as > > > > + if (!XLogArchivingAlways() && > > + GetDBState() > DB_IN_CRASH_RECOVERY) > > > > FWIW, what annoyed me is there are three variables that are quite > > similar but has different domains, ControlFile->state, > > XLogCtl->SharedRecoveryState, and LocalRecoveryInProgress. I didn't > > mind there were two, but three seems a bit too many to me. > > That's actually the pattern I would avoid for clarity. There is no > need to add more dependencies to the entries of DBState for the sake > of this patch, and this smells like a trap if more values are added to > it in an order that does not match what we have been assuming in the > context of this thread. Yes. Anywaay that would be another issue, if it is an issue. I'm fine with the current state. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: [BUG] non archived WAL removed during production crash recovery
From
Jehan-Guillaume de Rorthais
Date:
On Tue, 21 Apr 2020 15:08:17 +0900 (JST) Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > At Tue, 21 Apr 2020 13:57:39 +0900, Michael Paquier <michael@paquier.xyz> > wrote in > > On Tue, Apr 21, 2020 at 12:09:25PM +0900, Kyotaro Horiguchi wrote: > > > + if (!XLogArchivingAlways() && > > > + GetRecoveryState() == RECOVERY_STATE_ARCHIVE) > > > > > > Is rewritten as > > > > > > + if (!XLogArchivingAlways() && > > > + GetDBState() > DB_IN_CRASH_RECOVERY) > > > > > > FWIW, what annoyed me is there are three variables that are quite > > > similar but has different domains, ControlFile->state, > > > XLogCtl->SharedRecoveryState, and LocalRecoveryInProgress. I didn't > > > mind there were two, but three seems a bit too many to me. In fact, my original goal [1] was to avoid adding another shared boolean related to the same topic. So I understand your feeling. I played with this idea again, based on your argument that there's no problem if the update to DB_IN_ARCHIVE_RECOVERY reaches checkpointer with some delay. The other point I feel bad with is to open and check the controlfile again and again for each segment to archive, even on running production instance. It's not that it would be heavy, but it feels overkill to fetch this information that should be available more easily. That leads me an idea where we would keep the ControlFile data up-to-date in shared memory. There's a few duplicates between ControlFile and XLogCtl, so maybe it could make the code a little simpler at some other places than just fixing $SUBJECT using DBState? I'm not sure of the implications and impacts though. This seems way bigger than the current fix and with many traps on the way. Maybe we could discuss this in another thread if you think it deserves it? Regards, [1] https://www.postgresql.org/message-id/flat/20200403182625.0fccc6fd%40firost#28a756094a4b1f3dd24927fb6311927d
Re: [BUG] non archived WAL removed during production crash recovery
From
Jehan-Guillaume de Rorthais
Date:
Hello, I did another round of review of v8. - LocalRecoveryInProgress = xlogctl->SharedRecoveryInProgress; + LocalRecoveryInProgress = (xlogctl->SharedRecoveryState != RECOVERY_STATE_DONE); Do we need to acquire info_lck to look at the state here, as we do in GetRecoveryState()? Why is it missing from previous code where SharedRecoveryInProgress was protected by info_lck as well? Plus, the new line length overflow the 80-column, but I'm not sure where to break this line. +if ($Config{osname} eq 'MSWin32') +{ + + # some Windows Perls at least don't like IPC::Run's start/kill_kill regime. + plan skip_all => "Test fails on Windows perl"; +} In fact, this was inherited from 011_crash_recovery.pl where I originally added some tests. As 020_archive_status.pl doesn't use IPC::Run, the comment is wrong. But I wonder if this whole block is really needed. Unfortunately I can't test on MSWin32 :/ On Tue, 21 Apr 2020 11:15:01 +0900 Michael Paquier <michael@paquier.xyz> wrote: > > Indeed, Benoît Lobréau reported this behavior to me. > > Noted. Thanks for the information. I don't think that I have ever > met Benoît in person, do I? I don't think so. > Tell him that I owe him one beer or a beverage of his choice when we meet > IRL, and that he had better use this message-id to make me keep my promise :) I told him (but I'm sure he was reading anyway :)). Regards,
On Wed, Apr 22, 2020 at 12:00:22AM +0200, Jehan-Guillaume de Rorthais wrote: > That leads me an idea where we would keep the ControlFile data up-to-date in > shared memory. There's a few duplicates between ControlFile and XLogCtl, so > maybe it could make the code a little simpler at some other places than just > fixing $SUBJECT using DBState? I'm not sure of the implications and impacts > though. This seems way bigger than the current fix and with many traps on the > way. Maybe we could discuss this in another thread if you think it deserves it? It seems to me that this could have wider applications than just the recovery state, no? I would recommend to keep this discussion on a separate thread to give more visibility to the topic. -- Michael
Attachment
On Wed, Apr 22, 2020 at 12:41:21AM +0200, Jehan-Guillaume de Rorthais wrote: > Do we need to acquire info_lck to look at the state here, as we do in > GetRecoveryState()? Why is it missing from previous code where > SharedRecoveryInProgress was protected by info_lck as well? Please see 1a3d104. > Plus, the new line length overflow the 80-column, but I'm not sure where to > break this line. pgindent has been run on v8, and it did not complain. > In fact, this was inherited from 011_crash_recovery.pl where I originally > added some tests. As 020_archive_status.pl doesn't use IPC::Run, the comment is > wrong. But I wonder if this whole block is really needed. Unfortunately I can't > test on MSWin32 :/ You are right here. The restriction can be removed, and I have checked that the test from v8 is able to pass on my Windows dev VM. -- Michael
Attachment
On Wed, Apr 22, 2020 at 10:19:35AM +0900, Michael Paquier wrote: > You are right here. The restriction can be removed, and I have > checked that the test from v8 is able to pass on my Windows dev VM. Attached are versions for each branch down to 9.5. While working on the backpatch, I have not found major conflicts except one thing: up to 10, Postgres does WAL segment recycling after two completed checkpoints, and the 8th test of the script relies on the behavior of 11~ of one completed checkpoint (first .ready file present in the cold backup but removed removed from $standby1). I have taken the simplest approach to fix the test by checking that the .ready file actually exists, while the rest of the test remains the same. It is worth noting that for 9.5 and 9.6 the test had compatibility issues with the renaming of pg_xlog to pg_wal, including paths and functions. The calls to poll_query_until() also needed tweaks, but I got the tests to work. -- Michael
Attachment
- 0001-Fix-handling-of-WAL-segments-ready-to-be-arch-master.patch
- 0001-Fix-handling-of-WAL-segments-ready-to-be-archived-12.patch
- 0001-Fix-handling-of-WAL-segments-ready-to-be-archived-11.patch
- 0001-Fix-handling-of-WAL-segments-ready-to-be-archived-10.patch
- 0001-Fix-handling-of-WAL-segments-ready-to-be-archived-96.patch
- 0001-Fix-handling-of-WAL-segments-ready-to-be-archived-95.patch
- signature.asc
Re: [BUG] non archived WAL removed during production crash recovery
From
Jehan-Guillaume de Rorthais
Date:
On Wed, 22 Apr 2020 10:19:35 +0900 Michael Paquier <michael@paquier.xyz> wrote: > On Wed, Apr 22, 2020 at 12:41:21AM +0200, Jehan-Guillaume de Rorthais wrote: > > Do we need to acquire info_lck to look at the state here, as we do in > > GetRecoveryState()? Why is it missing from previous code where > > SharedRecoveryInProgress was protected by info_lck as well? > > Please see 1a3d104. Understood. Interesting. Thank you. > > Plus, the new line length overflow the 80-column, but I'm not sure where to > > break this line. > > pgindent has been run on v8, and it did not complain. OK. > > In fact, this was inherited from 011_crash_recovery.pl where I originally > > added some tests. As 020_archive_status.pl doesn't use IPC::Run, the > > comment is wrong. But I wonder if this whole block is really needed. > > Unfortunately I can't test on MSWin32 :/ > > You are right here. The restriction can be removed, and I have > checked that the test from v8 is able to pass on my Windows dev VM. Thanks!
Re: [BUG] non archived WAL removed during production crash recovery
From
Jehan-Guillaume de Rorthais
Date:
On Wed, 22 Apr 2020 16:32:23 +0900 Michael Paquier <michael@paquier.xyz> wrote: > On Wed, Apr 22, 2020 at 10:19:35AM +0900, Michael Paquier wrote: > > You are right here. The restriction can be removed, and I have > > checked that the test from v8 is able to pass on my Windows dev VM. > > Attached are versions for each branch down to 9.5. While working on > the backpatch, I have not found major conflicts except one thing: > up to 10, Postgres does WAL segment recycling after two completed > checkpoints, and the 8th test of the script relies on the behavior of > 11~ of one completed checkpoint (first .ready file present in the cold > backup but removed removed from $standby1). I have taken the simplest > approach to fix the test by checking that the .ready file actually > exists, while the rest of the test remains the same. This test seems useless to me. It should either be removed or patched to test the signal has been removed after a second restartpoint. Please, find in attachment a patch for 9.6 implementing this. If it seems reasonable to you, I can create the backpatch to 9.5. > It is worth noting that for 9.5 and 9.6 the test had compatibility > issues with the renaming of pg_xlog to pg_wal, including paths and > functions. The calls to poll_query_until() also needed tweaks, but > I got the tests to work. Thanks for the backpatching work! Regards,
Attachment
Re: [BUG] non archived WAL removed during production crash recovery
From
Jehan-Guillaume de Rorthais
Date:
On Wed, 22 Apr 2020 17:58:24 +0200 Jehan-Guillaume de Rorthais <jgdr@dalibo.com> wrote: > On Wed, 22 Apr 2020 16:32:23 +0900 > Michael Paquier <michael@paquier.xyz> wrote: > > [...] > [...] > [...] > > This test seems useless to me. It should either be removed or patched to test > the signal has been removed after a second restartpoint. > > Please, find in attachment a patch for 9.6 implementing this. If it seems > reasonable to you, I can create the backpatch to 9.5. I found an extra useless line of code in v9 patch. Please, find in attachment v10. Sorry for this. Regards,
Attachment
On Wed, Apr 22, 2020 at 06:17:17PM +0200, Jehan-Guillaume de Rorthais wrote: > I found an extra useless line of code in v9 patch. Please, find in > attachment v10. Sorry for this. Thanks for helping here, your changes make sense. This looks mostly fine to me except that part: +$standby1->poll_query_until('postgres', + qq{ SELECT pg_xlog_location_diff('$primary_lsn', pg_last_xlog_replay_location()) = 0 }) + or die "Timed out while waiting for xlog replay"; Here we should check if $primary_lsn is at least pg_last_xlog_replay_location(). Checking for an equality may stuck the test if more WAL gets replayed. For example you could have a concurrent autovacuum generating WAL. -- Michael
Attachment
FWIW, the test for 10- looks fine, but I have one qustion. + 'archive success reported in pg_stat_archiver for WAL segment $segment_name_ This seems intending to show an actual segment name in the message, but it is really shown as "... WAL segment $segment_name_1". Is that intended? At Thu, 23 Apr 2020 08:46:18 +0900, Michael Paquier <michael@paquier.xyz> wrote in > On Wed, Apr 22, 2020 at 06:17:17PM +0200, Jehan-Guillaume de Rorthais wrote: > > I found an extra useless line of code in v9 patch. Please, find in > > attachment v10. Sorry for this. > > Thanks for helping here, your changes make sense. This looks mostly > fine to me except that part: > +$standby1->poll_query_until('postgres', > + qq{ SELECT pg_xlog_location_diff('$primary_lsn', pg_last_xlog_replay_location()) = 0 }) > + or die "Timed out while waiting for xlog replay"; > Here we should check if $primary_lsn is at least > pg_last_xlog_replay_location(). Checking for an equality may stuck > the test if more WAL gets replayed. For example you could have a > concurrent autovacuum generating WAL. Autovacuum is turned off in this case, but anyway other kinds of WAL records can be generated. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: [BUG] non archived WAL removed during production crash recovery
From
Jehan-Guillaume de Rorthais
Date:
On Thu, 23 Apr 2020 14:05:46 +0900 (JST) Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > FWIW, the test for 10- looks fine, but I have one qustion. > > + 'archive success reported in pg_stat_archiver for WAL segment > $segment_name_ > > This seems intending to show an actual segment name in the message, > but it is really shown as "... WAL segment $segment_name_1". Is that > intended? Good catch. Fixed in v11. Thank you. > At Thu, 23 Apr 2020 08:46:18 +0900, Michael Paquier <michael@paquier.xyz> > wrote in > > On Wed, Apr 22, 2020 at 06:17:17PM +0200, Jehan-Guillaume de Rorthais > > wrote: > > > I found an extra useless line of code in v9 patch. Please, find in > > > attachment v10. Sorry for this. > > > > Thanks for helping here, your changes make sense. This looks mostly > > fine to me except that part: > > +$standby1->poll_query_until('postgres', > > + qq{ SELECT pg_xlog_location_diff('$primary_lsn', > > pg_last_xlog_replay_location()) = 0 }) > > + or die "Timed out while waiting for xlog replay"; > > Here we should check if $primary_lsn is at least > > pg_last_xlog_replay_location(). Checking for an equality may stuck > > the test if more WAL gets replayed. For example you could have a > > concurrent autovacuum generating WAL. > > Autovacuum is turned off in this case, but anyway other kinds of WAL > records can be generated. make sense. Fixed in v11. Please, find in v11 for version 9.5, 9.6 and 10. Regards,
Attachment
On Thu, Apr 23, 2020 at 06:59:53PM +0200, Jehan-Guillaume de Rorthais wrote: > Please, find in v11 for version 9.5, 9.6 and 10. I have worked more on that using your v11, tweaked few comments in the new test parts for 9.5~10, and applied the whole. Thanks all for your work. I am keeping now an eye on the buildfarm. -- Michael
Attachment
Hi, On 2020-04-24 08:54:02 +0900, Michael Paquier wrote: > On Thu, Apr 23, 2020 at 06:59:53PM +0200, Jehan-Guillaume de Rorthais wrote: > > Please, find in v11 for version 9.5, 9.6 and 10. > > I have worked more on that using your v11, tweaked few comments in the > new test parts for 9.5~10, and applied the whole. Thanks all for your > work. I am keeping now an eye on the buildfarm. I am confused by this commit. You added shared memory to differentiate between crash recovery and standby mode/archive recovery, correct? You write: > This commit fixes the regression by tracking in shared memory if a live > cluster is either in crash recovery or archive recovery as the handling > of WAL segments ready to be archived is different in both cases (those > WAL segments should not be removed during crash recovery), and by using > this new shared memory state to decide if a segment can be recycled or > not. But don't we pretty much know this already from the state of the system? During crash recovery there's nothing running RemoveOldXLogFiles() but the startup process. Right? And in DB_IN_ARCHIVE_RECOVERY it should only happen as part of restartpoints (i.e. the checkpointer). Did you add the new shared state to avoid deducing things from the "environment"? If so, it should really be mentioned in the commit message & code. Because: > Previously, it was not possible to know if a cluster was in crash > recovery or archive recovery as the shared state was able to track only > if recovery was happening or not, leading to the problem. really doesn't make that obvious. Greetings, Andres Freund
Michael Paquier <michael@paquier.xyz> writes: > I have worked more on that using your v11, tweaked few comments in the > new test parts for 9.5~10, and applied the whole. Thanks all for your > work. I am keeping now an eye on the buildfarm. Looks like the news is not good :-( I see that my own florican is one of the failing critters, though it failed only on HEAD which seems odd. Any suggestions what to look for? regards, tom lane
On Thu, Apr 23, 2020 at 10:21:15PM -0400, Tom Lane wrote: > Looks like the news is not good :-( Yes, I was looking at that for the last couple of hours, and just pushed something to put back the buildfarm to a green state for now (based on the first results things seem stable now) by removing the defective subset of tests. > I see that my own florican is one of the failing critters, though > it failed only on HEAD which seems odd. Any suggestions what to > look for? The issue comes from the parts of the test where we expect some .ready files to exist (or not) after triggering a restartpoint to force some segments to be recycled. And looking more at it, I suspect that the issue is actually that we don't make sure in the test that the standbys started have replayed up to the segment switch record triggered on the primary (the one within generate_series(10,20)), and then the follow-up restart point does not actually recycle the segments we expect to recycle. That's more likely going to be a problem on slower machines as the window gets wider between the moment the standbys reach their consistency point and the moment the switch record is replayed. -- Michael
Attachment
On Thu, Apr 23, 2020 at 06:48:56PM -0700, Andres Freund wrote: > But don't we pretty much know this already from the state of the system? > During crash recovery there's nothing running RemoveOldXLogFiles() but > the startup process. Right? And in DB_IN_ARCHIVE_RECOVERY it should only > happen as part of restartpoints (i.e. the checkpointer). > > Did you add the new shared state to avoid deducing things from the > "environment"? If so, it should really be mentioned in the commit > message & code. Because: Hmm. Sorry, I see your point. The key of the logic here is from XLogArchiveCheckDone() which could be called from other processes than the startup process. There is one code path at the end of a base backup for backup history files where not using a shared state would be a problem. -- Michael
Attachment
Re: [BUG] non archived WAL removed during production crash recovery
From
Jehan-Guillaume de Rorthais
Date:
On Fri, 24 Apr 2020 12:43:51 +0900 Michael Paquier <michael@paquier.xyz> wrote: > On Thu, Apr 23, 2020 at 10:21:15PM -0400, Tom Lane wrote: > > Looks like the news is not good :-( > > Yes, I was looking at that for the last couple of hours, and just > pushed something to put back the buildfarm to a green state for now > (based on the first results things seem stable now) by removing the > defective subset of tests. > > > I see that my own florican is one of the failing critters, though > > it failed only on HEAD which seems odd. Any suggestions what to > > look for? > > The issue comes from the parts of the test where we expect some .ready > files to exist (or not) after triggering a restartpoint to force some > segments to be recycled. And looking more at it, I suspect that the > issue is actually that we don't make sure in the test that the > standbys started have replayed up to the segment switch record > triggered on the primary (the one within generate_series(10,20)), and > then the follow-up restart point does not actually recycle the > segments we expect to recycle. That's more likely going to be a > problem on slower machines as the window gets wider between the moment > the standbys reach their consistency point and the moment the switch > record is replayed. Indeed. In regard with your fix, as we don't know if the standby caught up with the latest available record, there's really no point to keep this test either: # Recovery with archive_mode=on should not create .ready files. # Note that this segment did not exist in the backup. ok( !-f "$standby1_data/$segment_path_2_ready", ".ready file for WAL segment $segment_name_2 not created on standby when archive_mode=on on standby" ); I agree the three tests could be removed as they were not covering the bug we were chasing. However, they might still be useful to detect futur non expected behavior changes. If you agree with this, please, find in attachment a patch proposal against HEAD that recreate these three tests **after** a waiting loop on both standby1 and standby2. This waiting loop is inspired from the tests in 9.5 -> 10. Regards,
Attachment
On Fri, Apr 24, 2020 at 03:03:00PM +0200, Jehan-Guillaume de Rorthais wrote: > I agree the three tests could be removed as they were not covering the bug we > were chasing. However, they might still be useful to detect futur non expected > behavior changes. If you agree with this, please, find in attachment a patch > proposal against HEAD that recreate these three tests **after** a waiting loop > on both standby1 and standby2. This waiting loop is inspired from the tests in > 9.5 -> 10. FWIW, I would prefer keeping all three tests as well. So.. I have spent more time on this problem and mereswin here is a very good sample because it failed all three tests: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mereswine&dt=2020-04-24%2006%3A03%3A53 For standby2, we get this failure: ok 11 - .ready file for WAL segment 000000010000000000000001 existing in backup is kept with archive_mode=always on standby not ok 12 - .ready file for WAL segment 000000010000000000000002 created with archive_mode=always on standby Then, looking at 020_archive_status_standby2.log, we have the following logs: 2020-04-24 02:08:32.032 PDT [9841:3] 020_archive_status.pl LOG: statement: CHECKPOINT [...] 2020-04-24 02:08:32.303 PDT [9821:7] LOG: restored log file "000000010000000000000002" from archive In this case, the test forced a checkpoint to test the segment recycling *before* the extra restored segment we'd like to work on was actually restored. So it looks like my initial feeling about the timing issue was right, and I am also able to reproduce the original set of failures by adding a manual sleep to delay restores of segments, like that for example: --- a/src/backend/access/transam/xlogarchive.c +++ b/src/backend/access/transam/xlogarchive.c @@ -74,6 +74,8 @@ RestoreArchivedFile(char *path, const char *xlogfname, if (recoveryRestoreCommand == NULL || strcmp(recoveryRestoreCommand, "") == 0) goto not_available; + pg_usleep(10 * 1000000); /* 10s */ + /* With your patch the problem does not show up anymore even with the delay added, so I would like to apply what you have sent and add back those tests. For now, I would just patch HEAD though as that's not worth the risk of destabilizing stable branches in the buildfarm. > $primary->poll_query_until('postgres', > q{SELECT archived_count FROM pg_stat_archiver}, '1') > - or die "Timed out while waiting for archiving to finish"; > + or die "Timed out while waiting for archiving to finish"; Some noise in the patch. This may have come from some unfinished business with pgindent. -- Michael
Attachment
At Mon, 27 Apr 2020 16:49:45 +0900, Michael Paquier <michael@paquier.xyz> wrote in > On Fri, Apr 24, 2020 at 03:03:00PM +0200, Jehan-Guillaume de Rorthais wrote: > > I agree the three tests could be removed as they were not covering the bug we > > were chasing. However, they might still be useful to detect futur non expected > > behavior changes. If you agree with this, please, find in attachment a patch > > proposal against HEAD that recreate these three tests **after** a waiting loop > > on both standby1 and standby2. This waiting loop is inspired from the tests in > > 9.5 -> 10. > > FWIW, I would prefer keeping all three tests as well. > > So.. I have spent more time on this problem and mereswin here is a > very good sample because it failed all three tests: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mereswine&dt=2020-04-24%2006%3A03%3A53 > > For standby2, we get this failure: > ok 11 - .ready file for WAL segment 000000010000000000000001 existing > in backup is kept with archive_mode=always on standby > not ok 12 - .ready file for WAL segment 000000010000000000000002 > created with archive_mode=always on standby > > Then, looking at 020_archive_status_standby2.log, we have the > following logs: > 2020-04-24 02:08:32.032 PDT [9841:3] 020_archive_status.pl LOG: > statement: CHECKPOINT > [...] > 2020-04-24 02:08:32.303 PDT [9821:7] LOG: restored log file > "000000010000000000000002" from archive > > In this case, the test forced a checkpoint to test the segment > recycling *before* the extra restored segment we'd like to work on was > actually restored. So it looks like my initial feeling about the > timing issue was right, and I am also able to reproduce the original > set of failures by adding a manual sleep to delay restores of > segments, like that for example: > --- a/src/backend/access/transam/xlogarchive.c > +++ b/src/backend/access/transam/xlogarchive.c > @@ -74,6 +74,8 @@ RestoreArchivedFile(char *path, const char *xlogfname, > if (recoveryRestoreCommand == NULL || > strcmp(recoveryRestoreCommand, "") == 0) > goto not_available; > > + pg_usleep(10 * 1000000); /* 10s */ > + > /* > > With your patch the problem does not show up anymore even with the > delay added, so I would like to apply what you have sent and add back > those tests. For now, I would just patch HEAD though as that's not > worth the risk of destabilizing stable branches in the buildfarm. Agreed to the diagnosis and the fix. The fix reliably cause a restart point then the restart point manipulats the status files the right way before the CHECKPOINT command resturns, in the both cases. If I would add something to the fix, the following line may need a comment. +# Wait for the checkpoint record is replayed so that the following +# CHECKPOINT causes a restart point reliably. |+$standby1->poll_query_until('postgres', |+ qq{ SELECT pg_wal_lsn_diff(pg_last_wal_replay_lsn(), '$primary_lsn') >= 0 } regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: [BUG] non archived WAL removed during production crash recovery
From
Jehan-Guillaume de Rorthais
Date:
On Mon, 27 Apr 2020 18:21:07 +0900 (JST) Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > At Mon, 27 Apr 2020 16:49:45 +0900, Michael Paquier <michael@paquier.xyz> > wrote in [...] > > With your patch the problem does not show up anymore even with the > > delay added, so I would like to apply what you have sent and add back > > those tests. For now, I would just patch HEAD though as that's not > > worth the risk of destabilizing stable branches in the buildfarm. Good for me. Thanks! > Agreed to the diagnosis and the fix. The fix reliably cause a restart > point then the restart point manipulats the status files the right way > before the CHECKPOINT command resturns, in the both cases. > > If I would add something to the fix, the following line may need a > comment. > > +# Wait for the checkpoint record is replayed so that the following > +# CHECKPOINT causes a restart point reliably. > |+$standby1->poll_query_until('postgres', > |+ qq{ SELECT pg_wal_lsn_diff(pg_last_wal_replay_lsn(), > '$primary_lsn') >= 0 } Agree. Regards,
On Mon, Apr 27, 2020 at 06:21:07PM +0900, Kyotaro Horiguchi wrote: > Agreed to the diagnosis and the fix. The fix reliably cause a restart > point then the restart point manipulats the status files the right way > before the CHECKPOINT command resturns, in the both cases. Thanks for checking! > If I would add something to the fix, the following line may need a > comment. > > +# Wait for the checkpoint record is replayed so that the following > +# CHECKPOINT causes a restart point reliably. > |+$standby1->poll_query_until('postgres', > |+ qq{ SELECT pg_wal_lsn_diff(pg_last_wal_replay_lsn(), '$primary_lsn') >= 0 } Makes sense, added a comment and applied to HEAD. I have also improved the comment around the split with pg_switch_wal(), and actually simplified the test to use as wait point the return value from the function. -- Michael
Attachment
Re: [BUG] non archived WAL removed during production crash recovery
From
Jehan-Guillaume de Rorthais
Date:
On Tue, 28 Apr 2020 08:01:38 +0900 Michael Paquier <michael@paquier.xyz> wrote: > On Mon, Apr 27, 2020 at 06:21:07PM +0900, Kyotaro Horiguchi wrote: > > Agreed to the diagnosis and the fix. The fix reliably cause a restart > > point then the restart point manipulats the status files the right way > > before the CHECKPOINT command resturns, in the both cases. > > Thanks for checking! > > > If I would add something to the fix, the following line may need a > > comment. > > > > +# Wait for the checkpoint record is replayed so that the following > > +# CHECKPOINT causes a restart point reliably. > > |+$standby1->poll_query_until('postgres', > > |+ qq{ SELECT pg_wal_lsn_diff(pg_last_wal_replay_lsn(), > > '$primary_lsn') >= 0 } > > Makes sense, added a comment and applied to HEAD. I have also > improved the comment around the split with pg_switch_wal(), and > actually simplified the test to use as wait point the return value > from the function. Thank you guys for your help, reviews and commits!
On Tue, Apr 28, 2020 at 01:58:47AM +0200, Jehan-Guillaume de Rorthais wrote: > Thank you guys for your help, reviews and commits! The buildfarm does not complain after the latest commit, meaning that we are good here. Thanks for your efforts. -- Michael