Re: pg_rewind WAL segments deletion pitfall - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: pg_rewind WAL segments deletion pitfall
Date
Msg-id 20220927.165054.2142431385277288474.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: pg_rewind WAL segments deletion pitfall  (Polina Bungina <bungina@gmail.com>)
Responses Re: pg_rewind WAL segments deletion pitfall
List pgsql-hackers
At Thu, 1 Sep 2022 13:33:09 +0200, Polina Bungina <bungina@gmail.com> wrote in 
> Here is the new version of the patch that includes the changes you
> suggested. It is smaller now but I doubt if it is as easy to understand as
> it used to be.

pg_rewind works in two steps. First it constructs file map which
decides the action for each file, then second, it performs file
operations according to the file map. So, if we are going to do
something on some files, that action should be record that in the file
map, I think.

Regarding the the patch, pg_rewind starts reading segments from the
divergence point back to the nearest checkpoint, then moves foward
during rewinding. So, the fact that SimpleXLogPageRead have read a
segment suggests that the segment is required during the next startup.
So I don't think we need to move around the keepWalSeg flag.  All
files that are wanted while rewinding should be preserved
unconditionally.

It's annoying that the file path for file map and open(2) have
different top directory. But sharing the same path string between the
two seems rather ugly..

I feel uncomfortable to directly touch the internal of file_entry_t
outside filemap.c. I'd like to hide the internals in filemap.c, but
pg_rewind already does that..

+        /*
+         * Some entries (WAL segments) already have an action assigned
+         * (see SimpleXLogPageRead()).
+         */
+        if (entry->action == FILE_ACTION_NONE)
+            continue;
         entry->action = decide_file_action(entry);

It might be more reasonable to call decide_file_action() when action
is UNDECIDED.

> The need of manipulations with the target’s pg_wal/archive_status directory
> is a question to discuss…
>
> At first glance it seems to be useless for .ready files: checkpointer
> process will anyway recreate them if archiving is enabled on the rewound
> old primary and we will finally have them in the archive. As for the .done
> files, it seems reasonable to follow the pg_basebackup logic and keep .done
> files together with the corresponding segments (those between the last
> common checkpoint and the point of divergence) to protect them from being
> archived once again.
> 
> But on the other hand it seems to be not that straightforward: imaging we
> have WAL segment X on the target along with X.done file and we decide to
> preserve them both (or we download it from archive and force .done file
> creation), while archive_mode was set to ‘always’ and the source (promoted
> replica) also still has WAL segment X and X.ready file. After pg_rewind we
> will end up with both X.ready and X.done, which seems to be not a good
> situation (but most likely not critical either).

Thanks for the thought. Yes, it's not so straight-forward. And, as you
mentioned, the worst result comes from not doing that is that some
already-archived segments are archived at next run, which is generally
harmless. So I think we're ok to ignore that in this patdh then create
other patch if we still want to do that.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: John Naylor
Date:
Subject: Re: [RFC] building postgres with meson - v13
Next
From: Maxim Orlov
Date:
Subject: Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.