Re: Unnecessary WAL archiving after failover - Mailing list pgsql-hackers
From | Fujii Masao |
---|---|
Subject | Re: Unnecessary WAL archiving after failover |
Date | |
Msg-id | CAHGQGwGgPO6GUNQ99yry8ykpg6ZHfz1W4OMdvcPWf2vxVcZVtQ@mail.gmail.com Whole thread Raw |
In response to | Re: Unnecessary WAL archiving after failover (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Unnecessary WAL archiving after failover
Re: Unnecessary WAL archiving after failover Re: Unnecessary WAL archiving after failover |
List | pgsql-hackers |
On Thu, Mar 22, 2012 at 12:56 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, Feb 29, 2012 at 5:48 AM, Fujii Masao <masao.fujii@gmail.com> wrote: >> Hi, >> >> In streaming replication, after failover, new master might have lots >> of un-applied >> WAL files with old timeline ID. They are the WAL files which were recycled as a >> future ones when the server was running as a standby. Since they will never be >> used later, they don't need to be archived after failover. But since they have >> neither .ready nor .done file in archive_status, checkpoints after >> failover newly >> create .reacy files for them, and then finally they are archived. >> Which might cause >> disk I/O spike both in WAL and archive storage. >> >> To avoid the above problem, I think that un-applied WAL files with old >> timeline ID >> should be marked as already-archived and recycled immediately at the end of >> recovery. Thought? > > I'm not an expert on this, but that makes sense to me. Thanks for agreeing with my idea. On second thought, I found other issues about WAL archiving after failover. So let me clarify the issues again. Just after failover, there can be three kinds of WAL files in new master's pg_xlog directory: (1) WAL files which were recycled to by restartpoint I've already explained upthread the issue which these WAL files cause after failover. (2) WAL files which were restored from the archive In 9.1 or before, the restored WAL files don't remain after failover because they are always restored onto the temporary filename "RECOVERYXLOG". So the issue which I explain from now doesn't exist in 9.1 or before. In 9.2dev, as the result of supporting cascade replication, an archived WAL file is restored onto correct file name so that cascading walsender can send it to another standby. This restored WAL file has neither .ready nor .done archive status file. After failover, checkpoint checks the archive status file of the restored WAL file to attempt to recycle it, finds that it has neither .ready nor ,done, and creates .ready. Because of existence of .ready, it will be archived again even though it obviously already exists in the archival storage :( To prevent a restored WAL file from being archived again, I think that .done should be created whenever WAL file is successfully restored (of course this should happen only when archive_mode is enabled). Thought? Since this is the oversight of cascade replication, I'm thinking to implement the patch for 9.2dev. (3) WAL files which were streamed from the master These WAL files also don't have any archive status, so checkpoint creates .ready for them after failover. And then, all or many of them will be archived at a time, which would cause I/O spike on both WAL and archival storage. To avoid this problem, I think that we should change walreceiver so that it creates .ready as soon as it completes the WAL file. Also we should change the archiver process so that it starts up even in standby mode and archives the WAL files. If each server has its own archival storage, the above solution would work fine. But if all servers share the archival storage, multiple archiver processes in those servers might archive the same WAL file to the shared area at the same time. Is this OK? If not, to avoid this, we might need to separate archive_mode into two: one for normal mode (i.e., master), another for standbfy mode. If the archive is shared, we can ensure that only one archiver in the master copies the WAL file at the same time by disabling WAL archiving in standby mode but enabling it in normal mode. Thought? Invoking the archiver process in standby mode is new feature, not a bug fix. It's too late to propose new feature for 9.2. So I'll propose this for 9.3. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
pgsql-hackers by date: