Home > mailing lists

Re: PATCH: standby crashed when replay block which truncated instandby but failed to truncate in master node - Mailing list pgsql-hackers

From	Michael Paquier
Subject	Re: PATCH: standby crashed when replay block which truncated instandby but failed to truncate in master node
Date	September 24, 2019 01:40:19
Msg-id	20190924014019.GB2012@paquier.xyz Whole thread Raw
In response to	Re: PATCH: standby crashed when replay block which truncated instandby but failed to truncate in master node (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses	Re: PATCH: standby crashed when replay block which truncated instandby but failed to truncate in master node Re: PATCH: standby crashed when replay block which truncated instandby but failed to truncate in master node
List	pgsql-hackers

Tree view

On Mon, Sep 23, 2019 at 01:45:14PM +0200, Tomas Vondra wrote:
> On Mon, Sep 23, 2019 at 03:48:50PM +0800, Thunder wrote:
>> Is this an issue?
>> Can we fix like this?
>> Thanks!
>>
>
> I do think it is a valid issue. No opinion on the fix yet, though.
> The report was sent on saturday, so patience ;-)

And for some others it was even a longer weekend.  Anyway, the problem
can be reproduced if you apply the attached which introduces a failure
point, and then if you run the following commands:
create table aa as select 1;
delete from aa;
\! touch /tmp/truncate_flag
vacuum aa;
\! rm /tmp/truncate_flag
vacuum aa; -- panic on standby

This also points out that there are other things to worry about than
interruptions, as for example DropRelFileNodeLocalBuffers() could lead
to an ERROR, and this happens before the physical truncation is done
but after the WAL record is replayed on the standby, so any failures
happening at the truncation phase before the work is done would be a
problem.  However we are talking about failures which should not
happen and these are elog() calls.  It would be tempting to add a
critical section here, but we could still have problems if we have a
failure after the WAL record has been flushed, which means that it
would be replayed on the standby, and the surrounding comments are
clear about that.  In short, as a matter of safety I'd like to think
that what you are suggesting is rather acceptable (aka hold interrupts
before the WAL record is written and release after the physical
truncate), so as truncation avoids failures possible to avoid.

Do others have thoughts to share on the matter?
--
Michael

Attachment

pgsql-hackers by date:

From: "Finnerty, Jim"
Date: 24 September 2019, 01:19:41
Subject: Re: Unwanted expression simplification in PG12b2

From: Amit Langote
Date: 24 September 2019, 01:52:30
Subject: Fix example in partitioning documentation

Re: PATCH: standby crashed when replay block which truncated instandby but failed to truncate in master node - Mailing list pgsql-hackers

Attachment

Previous

Next