Re: pg_rewind WAL segments deletion pitfall - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: pg_rewind WAL segments deletion pitfall
Date
Msg-id 20220830.163945.294488629720711896.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: pg_rewind WAL segments deletion pitfall  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
List pgsql-hackers
At Tue, 30 Aug 2022 14:50:26 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> IFAIS pg_rewind doesn't. -c option contrarily restores the all
> segments after the last (common) checkpoint and all of them are left
> alone after pg_rewind finishes.  postgres itself removes the WAL files
> after recovery. After-promotion cleanup and checkpoint revmoes the
> files on the previous timeline.
> 
> Before pg_rewind runs in the repro below, the old primary has the
> following segments.
> 
> TLI1:  2 8 9 A B C D
> 
> Just after pg_rewind finishes, the old primary has the following
> segments.
> 
> TLI1:  2 3   5 6 7
> TLI2:      4        (and 00000002.history)
> 
> pg_rewind copied 1-2 to 1-3 and 2-4 and history file from the new
1> primary, 1-4 to 1-7 from archive. After rewind finished, 1-4,1-8 to
> 1-D have been removed since the new primary didn't have them.
> 
> Recovery starts from 1-3 and promotes at 0/4_000000. postgres removes
> 1-5 to 1-7 by post-promotion cleanup and removes 1-2 to 1-4 by a
> restartpoint. All of the segments are useless after the old primary
> promotes.
> 
> When the old primary starts, it uses 1-3 and 2-4 for recovery and
> fails to fetch 2-5 from the new primary.  But it is not an issue of
> pg_rewind at all.

Ah. I think I understand what you are mentioning. If the new primary
didn't have the segment 1-3 to 1-6, pg_rewind removes it.  The new
primary doesn't have it in pg_wal nor in archive. The old primary has
it in its archive.  So get out from the situation, we need to the
following *two* things before the old primary can start:

1. copy 1-3 to 1-6 from the archive of the *old* primary
2. copy 2-7 and later from the archive of the *new* primary

Since pg_rewind have copied in to the old primary's pg_wal, removing them just have users to perform the task
duplicatedly,as you stated.
 

Okay, I completely understand the problem and convinced that it is
worth changing the behavior.

However, the proposed patch looks too complex to me.  It can be done
by just comparing xlog file name and the last checkpoint location and
TLI in decide_file_actions().

regards.

=====
# killall -9 postgres
# rm -r oldprim newprim oldarch newarch oldprim.log newprim.log
mkdir newarch oldarch
initdb -k -D oldprim
echo "archive_mode = 'on'">> oldprim/postgresql.conf
echo "archive_command = 'echo "archive %f" >&2; cp %p `pwd`/oldarch/%f'">> oldprim/postgresql.conf
pg_ctl -D oldprim -o '-p 5432' -l oldprim.log start
psql -p 5432 -c 'create table t(a int)'
pg_basebackup -D newprim -p 5432
echo "primary_conninfo='host=/tmp port=5432'">> newprim/postgresql.conf
echo "archive_command = 'echo "archive %f" >&2; cp %p `pwd`/newarch/%f'">> newprim/postgresql.conf
touch newprim/standby.signal
pg_ctl -D newprim -o '-p 5433' -l newprim.log start

# the last common checkpoint
psql -p 5432 -c 'checkpoint'

# record approx. diverging WAL segment
start_wal=`psql -p 5433 -Atc "select pg_walfile_name(pg_last_wal_replay_lsn() - (select setting from pg_settings where
name= 'wal_segment_size')::int);"`
 
for i in $(seq 1 5); do psql -p 5432 -c 'insert into t values(0); select pg_switch_wal();'; done
psql -p 5432 -c 'checkpoint'
pg_ctl -D newprim promote

# old rprimary loses diverging WAL segment
for i in $(seq 1 4); do psql -p 5432 -c 'insert into t values(0); select pg_switch_wal();'; done
psql -p 5432 -c 'checkpoint;'
psql -p 5433 -c 'checkpoint;'

# old primary cannot archive any more
echo "archive_command = 'false'">> oldprim/postgresql.conf
pg_ctl -D oldprim reload
pg_ctl -D oldprim stop

# rewind the old primary, using its own archive
# pg_rewind -D oldprim --source-server='port=5433' # should fail
echo "restore_command = 'echo "restore %f" >&2; cp `pwd`/oldarch/%f %p'">> oldprim/postgresql.conf
pg_rewind -D oldprim --source-server='port=5433' -c

# advance WAL on the old primary; new primary loses the launching WAL seg
for i in $(seq 1 4); do psql -p 5433 -c 'insert into t values(0); select pg_switch_wal();'; done
psql -p 5433 -c 'checkpoint'
echo "primary_conninfo='host=/tmp port=5433'">> oldprim/postgresql.conf
touch oldprim/standby.signal

#### copy the missing file of the old timeline
## cp oldarch/00000001000000000000000[3456] oldprim/pg_wal
## cp newarch/00000002000000000000000* oldprim/pg_wal

postgres -D oldprim  # fails with "WAL file has been removed"


# The alternative of copying-in
# echo "restore_command = 'echo "restore %f" >&2; cp `pwd`/newarch/%f %p'">> oldprim/postgresql.conf

# copy-in WAL files from new primary's archive to old primary
(cd newarch;
for f in `ls`; do
  if [[ "$f" > "$start_wal" ]]; then echo copy $f; cp $f ../oldprim/pg_wal; fi
done)

postgres -D oldprim
=====

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: postgres_fdw hint messages
Next
From: "wangw.fnst@fujitsu.com"
Date:
Subject: RE: Data is copied twice when specifying both child and parent table in publication