Re: pg_rewind WAL segments deletion pitfall - Mailing list pgsql-hackers

From Alexander Kukushkin
Subject Re: pg_rewind WAL segments deletion pitfall
Date
Msg-id CAFh8B=kyrzXbsuyhM-Fydu6TG3kyu9=AFCyf4tG4cYrfw3897A@mail.gmail.com
Whole thread Raw
In response to Re: pg_rewind WAL segments deletion pitfall  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses Re: pg_rewind WAL segments deletion pitfall
Re: pg_rewind WAL segments deletion pitfall
List pgsql-hackers
Hello Kyotaro,

On Tue, 30 Aug 2022 at 07:50, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:

 
So, if I understand you correctly, the issue you are complaining is
not about the WAL segments on the old timeline but about those on the
new timeline, which don't have a business with what pg_rewind does. As
the same with the case of pg_basebackup, the missing segments need to
be somehow copied from the new primary since the old primary never had
the chance to have them before.

No, we are complaining exactly about WAL segments from the old timeline that are removed by pg_rewind.
Those segments haven't been archived by the old primary and the new primary already recycled them.


 

Thus I don't follow this..

I did a slight modification of your script that reproduces a problem.
 

====
mkdir newarch oldarch
initdb -k -D oldprim
echo "archive_mode = 'on'">> oldprim/postgresql.conf
echo "archive_command = 'echo "archive %f" >&2; cp %p `pwd`/oldarch/%f'">> oldprim/postgresql.conf
pg_ctl -D oldprim -o '-p 5432' -l oldprim.log start
psql -p 5432 -c 'create table t(a int)'
pg_basebackup -D newprim -p 5432
echo "primary_conninfo='host=/tmp port=5432'">> newprim/postgresql.conf
echo "archive_command = 'echo "archive %f" >&2; cp %p `pwd`/newarch/%f'">> newprim/postgresql.conf
touch newprim/standby.signal
pg_ctl -D newprim -o '-p 5433' -l newprim.log start

# the last common checkpoint
psql -p 5432 -c 'checkpoint'

# old primary cannot archive any more
echo "archive_command = 'false'">> oldprim/postgresql.conf
pg_ctl -D oldprim reload
# advance WAL on the old primary; four WAL segments will never make it to the archive
for i in $(seq 1 4); do psql -p 5432 -c 'insert into t values(0); select pg_switch_wal();'; done

# record approx. diverging WAL segment
start_wal=`psql -p 5432 -Atc "select pg_walfile_name(pg_last_wal_replay_lsn() - (select setting from pg_settings where name = 'wal_segment_size')::int);"`
pg_ctl -D newprim promote

# old rprimary loses diverging WAL segment
for i in $(seq 1 4); do psql -p 5432 -c 'insert into t values(0); select pg_switch_wal();'; done
psql -p 5432 -c 'checkpoint;'
psql -p 5433 -c 'checkpoint;'

pg_ctl -D oldprim stop

# rewind the old primary, using its own archive
# pg_rewind -D oldprim --source-server='port=5433' # should fail
echo "restore_command = 'echo "restore %f" >&2; cp `pwd`/oldarch/%f %p'">> oldprim/postgresql.conf
pg_rewind -D oldprim --source-server='port=5433' -c

# advance WAL on the old primary; new primary loses the launching WAL seg
for i in $(seq 1 4); do psql -p 5433 -c 'insert into t values(0); select pg_switch_wal();'; done
psql -p 5433 -c 'checkpoint'
echo "primary_conninfo='host=/tmp port=5433'">> oldprim/postgresql.conf
touch oldprim/standby.signal

postgres -D oldprim  # fails with "WAL file has been removed"

# The alternative of copying-in
# echo "restore_command = 'echo "restore %f" >&2; cp `pwd`/newarch/%f %p'">> oldprim/postgresql.conf

# copy-in WAL files from new primary's archive to old primary
(cd newarch;
for f in `ls`; do
  if [[ "$f" > "$start_wal" ]]; then echo copy $f; cp $f ../oldprim/pg_wal; fi
done)

postgres -D oldprim  # also fails with "requested WAL segment XXX has already been removed"
===

Regards,
--
Alexander Kukushkin

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: patch: Add missing descriptions for rmgr APIs
Next
From: Alexander Kukushkin
Date:
Subject: Re: pg_rewind WAL segments deletion pitfall