At Wed, 28 Sep 2022 08:50:12 +0000, "Lahnov, Igor" <Igor.Lahnov@nexign.com> wrote in
> Hi,
> After failover all stand by nodes could not start streaming wal recovery.
> Streaming recovery start from 1473/A5000000, but standby start at 1473/A5FFEE08, this seems to be the problem.
It's not a problem at all. It is quite normal for standby to start
streaming from the beginning of a WAL segment.
> What can we do in this case to restore?
> Is it possible to shift wal streaming recovery point on primary?
> Can checkpoint on primary help in this situation?
> 2022-09-26 14:08:23.672 [3747868] LOG: started streaming WAL from primary at 1473/A5000000 on timeline 18
> 2022-09-26 14:08:24.363 [3747796] LOG: invalid record length at 1473/A5FFEE08: wanted 24, got 0
> 2022-09-26 14:08:24.366 [3747868] FATAL: terminating walreceiver process due to administrator command
This seems to mean someone emtpied primary_conninfo.
> 2022-09-26 14:08:24.366 [3747796] LOG: invalid record length at 1473/A5FFEE08: wanted 24, got 0
> 2022-09-26 14:08:24.366 [3747796] LOG: invalid record length at 1473/A5FFEE08: wanted 24, got 0
I don't fully understand the situation. A situation that leads the
this state I can come up with is that somehow the standby restored an
incomplete WAL segment from the primary. For example, in a case
wheresomeone copied the current active WAL file from pg_wal to archive
on the primary, or a case where restore_command on the standby fetches
WAL files from pg_wal on the primary instead of its archive. Both are
not normal operations.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center