Re: [Bug fix]There is the case archive_timeout parameter is ignoredafter recovery works. - Mailing list pgsql-hackers
From | Fujii Masao |
---|---|
Subject | Re: [Bug fix]There is the case archive_timeout parameter is ignoredafter recovery works. |
Date | |
Msg-id | 10a11134-c00e-26c7-bacd-d8c5bce99207@oss.nttdata.com Whole thread Raw |
In response to | Re: [Bug fix]There is the case archive_timeout parameter isignored after recovery works. (Kyotaro Horiguchi <horikyota.ntt@gmail.com>) |
Responses |
RE: [Bug fix]There is the case archive_timeout parameter is ignoredafter recovery works.
|
List | pgsql-hackers |
On 2020/06/29 16:41, Kyotaro Horiguchi wrote: > Hello. > > At Mon, 29 Jun 2020 04:35:11 +0000, "higuchi.daisuke@fujitsu.com" <higuchi.daisuke@fujitsu.com> wrote in >> Hi, >> >> I found the bug about archive_timeout parameter. >> There is the case archive_timeout parameter is ignored after recovery works. > ... >> [Problem] >> When the value of archive_timeout is smaller than that of checkpoint_timeout and recovery works, archive_timeout is ignoredin the first WAL archiving. >> Once WAL is archived, the archive_timeout seems to be valid after that. > ... >> Currently, cur_timeout is set according to only checkpoint_timeout when it is during recovery. >> Even during recovery, the cur_timeout should be calculated including archive_timeout as well as checkpoint_timeout, Ithink. >> I attached the patch to solve this problem. > > Unfortunately the diff command in your test script doesn't show me > anything, but I can understand what you are thinking is a problem, > maybe. But the patch doesn't seem the fix for the issue. > > Archiving works irrelevantly from that parameter. Completed WAL > segments are immediately marked as ".ready" and archiver does its task > immediately independently from checkpointer. The parameter, as > described in documentation, forces the server to switch to a new WAL > segment file periodically so that it can be archived, that is, it > works only on primary. On the other hand on standby, a WAL segment is > not marked as ".ready" until any data for the *next* segment comes. So > the patch is not the fix for the issue. The problems that you're describing and Daisuke-san reported are really the same? The reported problem seems that checkpointer can sleep on the latch for more than archive_timeout just after recovery and cannot switch WAL files promptly even if necessary. The cause of this problem is that the checkpointer's sleep time is calculated from both checkpoint_timeout and archive_timeout during normal running, but calculated only from checkpoint_timeout during recovery. So Daisuke-san's patch tries to change that so that it's calculated from both of them even during recovery. No? - if (XLogArchiveTimeout > 0 && !RecoveryInProgress()) + if (XLogArchiveTimeout > 0) { elapsed_secs = now - last_xlog_switch_time; - if (elapsed_secs >= XLogArchiveTimeout) + if (elapsed_secs >= XLogArchiveTimeout && !RecoveryInProgress()) continue; /* no sleep for us ... */ cur_timeout = Min(cur_timeout, XLogArchiveTimeout - elapsed_secs); last_xlog_switch_time is not updated during recovery. So "elapsed_secs" can be large and cur_timeout can be negative. Isn't this problematic? As another approach, what about waking the checkpointer up at the end of recovery like we already do for walsenders? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
pgsql-hackers by date: