Re: pg_rewind WAL segments deletion pitfall - Mailing list pgsql-hackers

From Polina Bungina
Subject Re: pg_rewind WAL segments deletion pitfall
Date
Msg-id CAAtGL4A1UN7gxWjsmm-PxT0qmfQc8WQRT+Ag-YBUO-98pN9x4Q@mail.gmail.com
Whole thread Raw
In response to Re: pg_rewind WAL segments deletion pitfall  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses Re: pg_rewind WAL segments deletion pitfall
Re: pg_rewind WAL segments deletion pitfall
List pgsql-hackers

Hello Kayotaro,


Here is the new version of the patch that includes the changes you suggested. It is smaller now but I doubt if it is as easy to understand as it used to be.


The need of manipulations with the target’s pg_wal/archive_status directory is a question to discuss…

At first glance it seems to be useless for .ready files: checkpointer process will anyway recreate them if archiving is enabled on the rewound old primary and we will finally have them in the archive. As for the .done files, it seems reasonable to follow the pg_basebackup logic and keep .done files together with the corresponding segments (those between the last common checkpoint and the point of divergence) to protect them from being archived once again.

But on the other hand it seems to be not that straightforward: imaging we have WAL segment X on the target along with X.done file and we decide to preserve them both (or we download it from archive and force .done file creation), while archive_mode was set to ‘always’ and the source (promoted replica) also still has WAL segment X and X.ready file. After pg_rewind we will end up with both X.ready and X.done, which seems to be not a good situation (but most likely not critical either).


Regards,

Polina Bungina



On Wed, Aug 31, 2022 at 7:30 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
At Tue, 30 Aug 2022 11:01:58 +0200, Alexander Kukushkin <cyberdemn@gmail.com> wrote in
> On Tue, 30 Aug 2022 at 10:27, Kyotaro Horiguchi <horikyota.ntt@gmail.com>
> wrote:
>
> >
> > Hmm. Doesn't it work to ignoring tli then? All segments that their
> > segment number is equal to or larger than the checkpoint locaiton are
> > preserved regardless of TLI?
> >
>
> If we ignore TLI there is a chance that we may retain some unnecessary (or
> just wrong) files.

Right. I mean I don't think thats a problem and we can rely on
postgres itself for later cleanup. Theoretically some out-of-range tli
or segno files are left alone but they surely will be gone soon after
the server starts.

> > > Also, we need to take into account the divergency LSN. Files after it are
> > > not required.
> >
> > They are removed at the later checkpoints. But also we can remove
> > segments that are out of the range between the last common checkpoint
> > and divergence point ignoring TLI.
>
>
> Everything that is newer last_common_checkpoint_seg could be removed (but
> it already happens automatically, because these files are missing on the
> new primary).
> WAL files that are older than last_common_checkpoint_seg could be either
> removed or at least not copied from the new primary.
..
> The current implementation relies on tracking WAL files being open while
> searching for the last common checkpoint. It automatically starts from the
> divergence_seg, automatically finishes at last_common_checkpoint_seg, and
> last but not least, automatically handles timeline changes. I don't think
> that manually written code that decides what to do from the WAL file name
> (and also takes into account TLI) could be much simpler than the current
> approach.

Yeah, I know.  My expectation is taking the simplest way for the same
effect. My concern was the additional hash. On second thought, I
concluded that we should that on the existing filehash.

We can just add a FILE_ACTION_NONE entry to the file hash from
SimpleXLogPageRead.  Since this happens before decide_file_action()
call, decide_file_action() should ignore the entries with
FILE_ACTION_NONE. Also we need to call filehash_init() earlier.

> Actually, since we start doing some additional "manipulations" with files
> in pg_wal, we probably should do a symmetric action with files inside
> pg_wal/archive_status

In that sense, pg_rewind rather should place missing
archive_status/*.done for segments including restored ones seen while
finding checkpoint.  This is analogous of the behavior with
pg_basebackup and pg_receivewal. Also we should add FILE_ACTION_NONE
entries for .done files for segments read while finding checkpoint.

What do you think about that?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachment

pgsql-hackers by date:

Previous
From: "houzj.fnst@fujitsu.com"
Date:
Subject: RE: Perform streaming logical transactions by background workers and parallel apply
Next
From: Ranier Vilela
Date:
Subject: Re: Avoid overhead open-close indexes (catalog updates)