On Wed, Oct 22, 2025 at 4:44 AM badfilez@gmail.com <badfilez@gmail.com> wrote:
> Do I get in right,
> this corruption was somehow transferred to replicas first, and then wal was tried to apply over corrupted index?
I don't know what happened. There are just too many possibilities for
me to even guess.
> Why it did not crash the master then?
Probably because the REDO routine just doesn't run there. The
corruption on the primary *might* have led to a crash in some other
place.
In short, it's hard (perhaps impossible) to make a strong guarantee that
there won't be a SIGSEGV when the database is corrupt.
> On 22/10/2025 09:19, badfilez@gmail.com wrote:
>
> Hi,
>
> Thank you,
> there still are 2 broken indexes in master DB,
> one of them exactly matches the said relation 151181595.
>
> still,
> is it proper wal apply procedure, to segfault in such a case?
It's not ideal. We try to avoid that. But even if the REDO routine
didn't SIGSEGV, it would still have to fail in some other way (given
the kind of corruption that we see here).
The only advantage of not segfaulting is that the standby can at least
continue to accept queries for a while. But it will still inevitably
fall further and further behind, and so you'd still have to recreate
the replica (possibly only after resolving the corruption on the
primary) to get things working again. The important thing is to try to
determine where the corruption came from, to avoid the same underlying
problem causing more corruption in the future. And that you repair the
corruption.
--
Peter Geoghegan