On Wed, Aug 29, 2018 at 08:59:16AM +0200, Alexander Kukushkin wrote:
> Why the block 72478 of index relfile doesn't meet our expectations
> (contains so few tuples)?
> The answer to this question is in the page header. LSN, written in the
> indexpage header is AB3/56BF3B68.
> That has only one meaning, while the postgres was working before the
> crash it managed to apply WAL stream til at least AB3/56BF3B68, what
> is far ahead of "Minimum recovery ending location: AB3/4A1B3118".
Yeah, that's the pinpoint. Do you know by chance what was the content
of the control file for each standby you have upgraded to 9.6.10 before
starting them with the new binaries? You mentioned a cluster of three
nodes, so I guess that you have two standbys, and that one of them did
not see the symptoms discussed here, while the other saw them. Do you
still have the logs of the recovery just after starting the other
standby with 9.4.10 which did not see the symptom? All your standbys
are using the background worker which would cause the btree deletion
code to be scanned, right?
I am trying to work on a reproducer with a bgworker starting once
recovery has been reached, without success yet. Does your cluster
generate some XLOG_PARAMETER_CHANGE records? In some cases, 9.4.8 could
have updated minRecoveryPoint to go backward, which is something that
8d68ee6 has been working on addressing.
Did you also try to use local WAL segments up where AB3/56BF3B68 is
applied, and also have a restore_command so as extra WAL segment fetches
from the archive would happen?
--
Michael