Re: Unnecessary WAL archiving after failover - Mailing list pgsql-hackers
From | Noah Misch |
---|---|
Subject | Re: Unnecessary WAL archiving after failover |
Date | |
Msg-id | 20120605063730.GA26031@tornado.leadboat.com Whole thread Raw |
In response to | Re: Unnecessary WAL archiving after failover (Fujii Masao <masao.fujii@gmail.com>) |
List | pgsql-hackers |
On Fri, Mar 23, 2012 at 11:03:27PM +0900, Fujii Masao wrote: > > On Wed, Feb 29, 2012 at 5:48 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > >> In streaming replication, after failover, new master might have lots > >> of un-applied > >> WAL files with old timeline ID. They are the WAL files which were recycled as a > >> future ones when the server was running as a standby. Since they will never be > >> used later, they don't need to be archived after failover. But since they have > >> neither .ready nor .done file in archive_status, checkpoints after > >> failover newly > >> create .reacy files for them, and then finally they are archived. > >> Which might cause > >> disk I/O spike both in WAL and archive storage. If the old master archived later WAL that the new master never restored, won't this attempt to archive a file under a name that already exists in the archive? The documentation says this: The archive command should generally be designed to refuse to overwrite any pre-existing archive file. This is an importantsafety feature to preserve the integrity of your archive in case of administrator error (such as sending the outputof two different servers to the same archive directory). It is advisable to test your proposed archive command to ensure that it indeed does not overwrite an existing file, andthat it returns nonzero status in this case. Archiving on the new master would halt until the operator intervenes. > >> To avoid the above problem, I think that un-applied WAL files with old > >> timeline ID > >> should be marked as already-archived and recycled immediately at the end of > >> recovery. Thought? A small hazard comes to mind. If the administrator manually copied post-timeline-divergence segments from the failed master to the new master's pg_xlog, the current implementation loads them into the archive for you. The new master could never apply those files locally, but they might be useful for alternate recoveries down the previous timeline. Nonetheless, we can just as reasonably specify that it's not a role of the new master to provide this service. Call the fact that it did so in previous releases an implementation artifact. What about instead creating an archive status file at recycle time and deleting it as we begin to populate the file? That distinguishes copied-in, unarchived segments from recycled ones. Incidentally, RemoveOldXlogFiles() has this comment: /* * We ignore the timeline part of the XLOG segment identifiers in * deciding whether a segment is still needed. This ensures that we * won't prematurely remove a segment from a parent timeline. We could * probably bea little more proactive about removing segments of * non-parent timelines, but that would be a whole lot more *complicated. Should both instances of "parent" be "child" or "descendant"? > Just after failover, there can be three kinds of WAL files in new > master's pg_xlog directory: > > (1) WAL files which were recycled to by restartpoint > > I've already explained upthread the issue which these WAL files cause > after failover. > > > (2) WAL files which were restored from the archive > > In 9.1 or before, the restored WAL files don't remain after failover > because they are always restored onto the temporary filename > "RECOVERYXLOG". So the issue which I explain from now doesn't exist > in 9.1 or before. > > In 9.2dev, as the result of supporting cascade replication, > an archived WAL file is restored onto correct file name so that > cascading walsender can send it to another standby. This restored The documentation still says this: WAL segments that cannot be found in the archive will be sought in pg_xlog/; this allows use of recent un-archived segments.However, segments that are available from the archive will be used in preference to files in pg_xlog/. The systemwill not overwrite the existing contents of pg_xlog/ when retrieving archived files. I gather the last sentence is now false? > WAL file has neither .ready nor .done archive status file. After > failover, checkpoint checks the archive status file of the restored > WAL file to attempt to recycle it, finds that it has neither .ready > nor ,done, and creates .ready. Because of existence of .ready, > it will be archived again even though it obviously already exists in > the archival storage :( > > To prevent a restored WAL file from being archived again, I think > that .done should be created whenever WAL file is successfully > restored (of course this should happen only when archive_mode is > enabled). Thought? Your proposed fix makes sense, and I cannot think of any disadvantage. Concerning only doing it when archive_mode=on, would there ever be a case where a segment is restored under archive_mode=off, then the server restarted with archive_mode=on and an archival attempted on that segment? > (3) WAL files which were streamed from the master > > These WAL files also don't have any archive status, so checkpoint > creates .ready for them after failover. And then, all or many of > them will be archived at a time, which would cause I/O spike on > both WAL and archival storage. > > To avoid this problem, I think that we should change walreceiver > so that it creates .ready as soon as it completes the WAL file. Also > we should change the archiver process so that it starts up even in > standby mode and archives the WAL files. > > If each server has its own archival storage, the above solution would > work fine. But if all servers share the archival storage, multiple archiver > processes in those servers might archive the same WAL file to > the shared area at the same time. Is this OK? If not, to avoid this, > we might need to separate archive_mode into two: one for normal mode > (i.e., master), another for standbfy mode. If the archive is shared, > we can ensure that only one archiver in the master copies the WAL file > at the same time by disabling WAL archiving in standby mode but > enabling it in normal mode. Thought? I don't think we should remove the recommendation to make archive_command fail when the archive already has the file. However, the new master is likely to have at least one segment not appearing in the archive along with some already-archived segments. There's certainly a use case for completing the shared archive with local-only segments. I think this also ties into the prerequisites for letting former peers of the new master begin to follow the new master without fresh base backups. More thought is needed here. Thanks, nm
pgsql-hackers by date: