At Fri, 23 Feb 2024 04:04:03 +0000, Mark Schloss <Mark.Schloss@austrac.gov.au> wrote in
> <2024-02-23 07:50:05.637 AEDT [1957121]: [1-1] user=,db= > LOG: started streaming WAL from primary at 6/B0000000 on
timeline5
> <2024-02-23 07:50:05.696 AEDT [1957117]: [6-1] user=,db= > LOG: invalid magic number 0000 in log segment
0000000500000006000000B0,offset 0
This appears to suggest that the WAL file that the standby fetched was
zero-filled on the primary side, which cannot happen by a normal
operation. A preallocated WAL segment can be zero-filled but it cannot
be replicated under normal operations.
> <2024-02-22 14:20:23.383 AEDT [565231]: [6-1] user=,db= > FATAL: terminating walreceiver process due to
administratorcommand
This may suggest a config reload with some parameter changes.
One possible scenario matching the log lines could be that someone
switched primary_conninfo to a newly-restored primary. However, if the
new primary had older data than the previously connected primary,
possibly leading to the situation where the segment 0..5..6..B0 on it
was a preallocated one that was filled with zeros, the standby could
end up fetching the zero-filled WAL segment (file) and might fail this
way. If this is the case, such operations should be avoided.
Unfortunately, due to the lack of a reproducer or detailed operations
that took place there, the best I can do now is to guess a possible
scenario as described above. I'm not sure how come the situation
actually arose.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center