Re: Corruption during WAL replay - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Corruption during WAL replay
Date
Msg-id CA+fd4k5q+yGLS4wvY35Vqst7Uvw2JRQUJEhOtFLuMRFzjx3PAw@mail.gmail.com
Whole thread Raw
In response to Re: Corruption during WAL replay  (Andres Freund <andres@anarazel.de>)
Responses Re: Corruption during WAL replay  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
List pgsql-hackers
On Mon, 13 Apr 2020 at 17:40, Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2020-04-13 15:24:55 +0900, Masahiko Sawada wrote:
> > On Sat, 11 Apr 2020 at 09:00, Teja Mupparti <tejeswarm@hotmail.com> wrote:
> > >
> > > Thanks Andres and Kyotaro for the quick review.  I have fixed the typos and also included the critical section
(emulatedit with try-catch block since palloc()s are causing issues in the truncate code). This time I used git
format-patch.
> > >
> >
> > I briefly looked at the latest patch but I'm not sure it's the right
> > thing here to use PG_TRY/PG_CATCH to report the PANIC error. For
> > example, with the following code you changed, we will always end up
> > with emitting a PANIC "failed to truncate the relation" regardless of
> > the actual cause of the error.
> >
> > +   PG_CATCH();
> > +   {
> > +       ereport(PANIC, (errcode(ERRCODE_INTERNAL_ERROR),
> > +           errmsg("failed to truncate the relation")));
> > +   }
> > +   PG_END_TRY();
> >
> > And the comments of RelationTruncate() mentions:
>
> I think that's just a workaround for mdtruncate not being usable in
> critical sections.
>
>
> > /*
> >  * We WAL-log the truncation before actually truncating, which means
> >  * trouble if the truncation fails. If we then crash, the WAL replay
> >  * likely isn't going to succeed in the truncation either, and cause a
> >  * PANIC. It's tempting to put a critical section here, but that cure
> >  * would be worse than the disease. It would turn a usually harmless
> >  * failure to truncate, that might spell trouble at WAL replay, into a
> >  * certain PANIC.
> >  */
>
> Yea, but that reasoning is just plain *wrong*. It's *never* ok to WAL
> log something and then not perform the action. This leads to to primary
> / replica getting out of sync, crash recovery potentially not completing
> (because of records referencing the should-be-truncated pages), ...
>
>
> > As a second idea, I wonder if we can defer truncation until commit
> > time like smgrDoPendingDeletes mechanism. The sequence would be:
>
> This is mostly an issue during [auto]vacuum partially truncating the end
> of the file. We intentionally release the AEL regularly to allow other
> accesses to continue.
>
> For transactional truncations we don't go down this path (as we create a
> new relfilenode).
>
>
> > At RelationTruncate(),
> > 1. WAL logging.
> > 2. Remember buffers to be dropped.
>
> You definitely cannot do that, as explained above.

Ah yes, you're right.

So it seems to me currently what we can do for this issue would be to
enclose the truncation operation in a critical section. IIUC it's not
enough just to reverse the order of dropping buffers and physical file
truncation because it cannot solve the problem of inconsistency on the
standby. And as Horiguchi-san mentioned, there is no need to reverse
that order if we envelop the truncation operation by a critical
section because we can recover page changes during crash recovery. The
strategy of writing out all dirty buffers before dropping buffers,
proposed as (a) in [1], also seems not enough.

Regards,

[1] https://www.postgresql.org/message-id/20191207001232.klidxnm756wqxvwx%40alap3.anarazel.deDoing
sync before truncation


--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: doc review for parallel vacuum
Next
From: tushar
Date:
Subject: Re: [Proposal] Global temporary tables