Re: Corruption during WAL replay - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Corruption during WAL replay |
Date | |
Msg-id | CA+fd4k5q+yGLS4wvY35Vqst7Uvw2JRQUJEhOtFLuMRFzjx3PAw@mail.gmail.com Whole thread Raw |
In response to | Re: Corruption during WAL replay (Andres Freund <andres@anarazel.de>) |
Responses |
Re: Corruption during WAL replay
|
List | pgsql-hackers |
On Mon, 13 Apr 2020 at 17:40, Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2020-04-13 15:24:55 +0900, Masahiko Sawada wrote: > > On Sat, 11 Apr 2020 at 09:00, Teja Mupparti <tejeswarm@hotmail.com> wrote: > > > > > > Thanks Andres and Kyotaro for the quick review. I have fixed the typos and also included the critical section (emulatedit with try-catch block since palloc()s are causing issues in the truncate code). This time I used git format-patch. > > > > > > > I briefly looked at the latest patch but I'm not sure it's the right > > thing here to use PG_TRY/PG_CATCH to report the PANIC error. For > > example, with the following code you changed, we will always end up > > with emitting a PANIC "failed to truncate the relation" regardless of > > the actual cause of the error. > > > > + PG_CATCH(); > > + { > > + ereport(PANIC, (errcode(ERRCODE_INTERNAL_ERROR), > > + errmsg("failed to truncate the relation"))); > > + } > > + PG_END_TRY(); > > > > And the comments of RelationTruncate() mentions: > > I think that's just a workaround for mdtruncate not being usable in > critical sections. > > > > /* > > * We WAL-log the truncation before actually truncating, which means > > * trouble if the truncation fails. If we then crash, the WAL replay > > * likely isn't going to succeed in the truncation either, and cause a > > * PANIC. It's tempting to put a critical section here, but that cure > > * would be worse than the disease. It would turn a usually harmless > > * failure to truncate, that might spell trouble at WAL replay, into a > > * certain PANIC. > > */ > > Yea, but that reasoning is just plain *wrong*. It's *never* ok to WAL > log something and then not perform the action. This leads to to primary > / replica getting out of sync, crash recovery potentially not completing > (because of records referencing the should-be-truncated pages), ... > > > > As a second idea, I wonder if we can defer truncation until commit > > time like smgrDoPendingDeletes mechanism. The sequence would be: > > This is mostly an issue during [auto]vacuum partially truncating the end > of the file. We intentionally release the AEL regularly to allow other > accesses to continue. > > For transactional truncations we don't go down this path (as we create a > new relfilenode). > > > > At RelationTruncate(), > > 1. WAL logging. > > 2. Remember buffers to be dropped. > > You definitely cannot do that, as explained above. Ah yes, you're right. So it seems to me currently what we can do for this issue would be to enclose the truncation operation in a critical section. IIUC it's not enough just to reverse the order of dropping buffers and physical file truncation because it cannot solve the problem of inconsistency on the standby. And as Horiguchi-san mentioned, there is no need to reverse that order if we envelop the truncation operation by a critical section because we can recover page changes during crash recovery. The strategy of writing out all dirty buffers before dropping buffers, proposed as (a) in [1], also seems not enough. Regards, [1] https://www.postgresql.org/message-id/20191207001232.klidxnm756wqxvwx%40alap3.anarazel.deDoing sync before truncation -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: