Re: pg_basebackup -x/X doesn't play well with archive_mode & wal_keep_segments - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: pg_basebackup -x/X doesn't play well with archive_mode & wal_keep_segments
Date
Msg-id CAHGQGwFNBywzAf1CxQmWyAL2ap-9WxK76XqtX+qHhpBPNJON_w@mail.gmail.com
Whole thread Raw
In response to pg_basebackup -x/X doesn't play well with archive_mode & wal_keep_segments  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: pg_basebackup -x/X doesn't play well with archive_mode & wal_keep_segments  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-hackers
On Fri, Dec 5, 2014 at 9:28 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> Hi,
>
> We've recently observed a case where, after a promotion, a postgres
> server suddenly started to archive a large amount of old WAL.
>
> After some digging the problem is this:
>
> pg_basebackup -X creates files in pg_xlog/ without creating the
> corresponding .done file. Note that walreceiver *does* create them. The
> standby in this case, just like the master, had a significant
> wal_keep_segments.  RemoveOldXlogFiles() then, during recovery restart
> points, calls XLogArchiveCheckDone() which in turn does:
>         /* Retry creation of the .ready file */
>         XLogArchiveNotify(xlog);
>         return false;
> if there's neither a .done nor a .ready file present and archive_mode is
> enabled. These segments then aren't removed because there's a .ready
> present and they're never archived as long as the node is a standby
> because we don't do archiving on standbys.
> Once the node is promoted archiver will be started and suddenly archive
> all these files - which might be months old.
>
> And additional, at first strange, nice detail is that a lot of the
> .ready files had nearly the same timestamps. Turns out that's due to
> wal_keep_segments. Initially RemoveOldXlogFiles() doesn't process the
> files because they're newer than allowed due to wal_keep_segments. Then
> every checkpoint a couple segments would be old enough to reach
> XLogArchiveCheckDone() which then'd create the .ready marker... But not
> all at once :)
>
>
> So I think we just need to make pg_basebackup create to .ready
> files.

s/.ready/.done? If yes, +1.

> Given that the walreceiver and restore_command already
> unconditionally do XLogArchiveForceDone() I think we'd follow the
> established precedent. Arguably it could make sense to archive files
> again on the standby after a promotion as they aren't guaranteed to have
> been on the then primary. But we don't have any infrastructure anyway
> for that and walsender doesn't do so, so it doesn't seem to make any
> sense to do that for pg_basebackup.
>
> Independent from this bug, there's also some debatable behaviour about
> what happens if a node with a high wal_keep_segments turns on
> archive_mode. Suddenly all those old files are archived... I think it
> might be a good idea to simply always create .done files when
> archive_mode is disabled while a wal segment is finished.

+1

Regards,

-- 
Fujii Masao



pgsql-hackers by date:

Previous
From: "Amit Langote"
Date:
Subject: Re: On partitioning
Next
From: Amit Kapila
Date:
Subject: Re: On partitioning