Re: pg_rewind WAL segments deletion pitfall - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: pg_rewind WAL segments deletion pitfall
Date
Msg-id 20220831.143031.834123808244621694.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: pg_rewind WAL segments deletion pitfall  (Alexander Kukushkin <cyberdemn@gmail.com>)
Responses Re: pg_rewind WAL segments deletion pitfall
Re: pg_rewind WAL segments deletion pitfall
List pgsql-hackers
At Tue, 30 Aug 2022 11:01:58 +0200, Alexander Kukushkin <cyberdemn@gmail.com> wrote in 
> On Tue, 30 Aug 2022 at 10:27, Kyotaro Horiguchi <horikyota.ntt@gmail.com>
> wrote:
> 
> >
> > Hmm. Doesn't it work to ignoring tli then? All segments that their
> > segment number is equal to or larger than the checkpoint locaiton are
> > preserved regardless of TLI?
> >
> 
> If we ignore TLI there is a chance that we may retain some unnecessary (or
> just wrong) files.

Right. I mean I don't think thats a problem and we can rely on
postgres itself for later cleanup. Theoretically some out-of-range tli
or segno files are left alone but they surely will be gone soon after
the server starts.

> > > Also, we need to take into account the divergency LSN. Files after it are
> > > not required.
> >
> > They are removed at the later checkpoints. But also we can remove
> > segments that are out of the range between the last common checkpoint
> > and divergence point ignoring TLI.
> 
> 
> Everything that is newer last_common_checkpoint_seg could be removed (but
> it already happens automatically, because these files are missing on the
> new primary).
> WAL files that are older than last_common_checkpoint_seg could be either
> removed or at least not copied from the new primary.
..
> The current implementation relies on tracking WAL files being open while
> searching for the last common checkpoint. It automatically starts from the
> divergence_seg, automatically finishes at last_common_checkpoint_seg, and
> last but not least, automatically handles timeline changes. I don't think
> that manually written code that decides what to do from the WAL file name
> (and also takes into account TLI) could be much simpler than the current
> approach.

Yeah, I know.  My expectation is taking the simplest way for the same
effect. My concern was the additional hash. On second thought, I
concluded that we should that on the existing filehash.

We can just add a FILE_ACTION_NONE entry to the file hash from
SimpleXLogPageRead.  Since this happens before decide_file_action()
call, decide_file_action() should ignore the entries with
FILE_ACTION_NONE. Also we need to call filehash_init() earlier.

> Actually, since we start doing some additional "manipulations" with files
> in pg_wal, we probably should do a symmetric action with files inside
> pg_wal/archive_status

In that sense, pg_rewind rather should place missing
archive_status/*.done for segments including restored ones seen while
finding checkpoint.  This is analogous of the behavior with
pg_basebackup and pg_receivewal. Also we should add FILE_ACTION_NONE
entries for .done files for segments read while finding checkpoint.

What do you think about that?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: New strategies for freezing, advancing relfrozenxid early
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: pg_rewind WAL segments deletion pitfall