Re: PATCH: standby crashed when replay block which truncated instandby but failed to truncate in master node - Mailing list pgsql-hackers

From Andres Freund
Subject Re: PATCH: standby crashed when replay block which truncated instandby but failed to truncate in master node
Date
Msg-id 20200203134918.rtyclvh2u2ny7bto@alap3.anarazel.de
Whole thread Raw
In response to Re: PATCH: standby crashed when replay block which truncated instandby but failed to truncate in master node  (Fujii Masao <masao.fujii@oss.nttdata.com>)
List pgsql-hackers
Hi,

On 2020-01-21 15:41:54 +0900, Fujii Masao wrote:
> On 2020/01/21 13:39, Michael Paquier wrote:
> > On Tue, Jan 21, 2020 at 08:45:14AM +0530, Amit Kapila wrote:
> > > The original email doesn't say so.  I might be missing something, but
> > > can you explain what makes you think so.
> >
> > Oops.  Incorrect thread, I was thinking about this one previously:
> > https://www.postgresql.org/message-id/822113470.250068.1573246011818@connect.xfinity.com
> >
> > Re-reading the root of the thread, I am still not sure what we could
> > do, as that's rather tricky.

Did anybody consider the proposal at
https://www.postgresql.org/message-id/20191223005430.yhf4n3zr4ojwbcn2%40alap3.anarazel.de ?
I think we're going to have to do something like that to actually fix
the problem, rather than polish around the edges.


> See here:
> https://www.postgresql.org/message-id/20190927061414.GF8485@paquier.xyz

On 2019-09-27 15:14:14 +0900, Michael Paquier wrote:
> Wrapping the call of smgrtruncate() within RelationTruncate() to use a
> critical section would make things worse from the user perspective on
> the primary, no?  If the physical truncation fails, we would still
> fail WAL replay on the standby, but instead of generating an ERROR in
> the session of the user attempting the TRUNCATE, the whole primary
> would be taken down.

FWIW, to me this argument just doesn't make any sense - even if a few
people have argued it.

A failure in the FS truncate currently yields to a cluster in a
corrupted state in multiple ways:
1) Dirty buffer contents were thrown away, and going forward their old
   contents will be read back.
2) We have WAL logged something that we haven't done. That's *obviously*
   something *completely* violating WAL logging rules. And break WAL
   replay (including locally, should we crash before the next
   checkpoint - there could be subsequent WAL records relying on the
   block's existance).

That's so obviously worse than a PANIC restart, that I really don't
understand the "worse from the user perspective" argument from your
email above.  Obviously it sucks that the error might re-occur during
recovery. But that's something that usually actually can be fixed -
whereas the data corruption can't.


> The original proposal, i.e., holding the interrupts during
> the truncation, is worth considering? It is not a perfect
> solution but might improve the situation a bit.

I don't think it's useful in isolation.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Cache relation sizes?
Next
From: Fujii Masao
Date:
Subject: Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side