Re: PATCH: standby crashed when replay block which truncated instandby but failed to truncate in master node - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: PATCH: standby crashed when replay block which truncated instandby but failed to truncate in master node
Date
Msg-id 20190927061414.GF8485@paquier.xyz
Whole thread Raw
In response to Re: PATCH: standby crashed when replay block which truncated instandby but failed to truncate in master node  (Fujii Masao <masao.fujii@gmail.com>)
Responses Re: PATCH: standby crashed when replay block which truncated instandby but failed to truncate in master node
List pgsql-hackers
On Thu, Sep 26, 2019 at 01:13:56AM +0900, Fujii Masao wrote:
> On Tue, Sep 24, 2019 at 10:41 AM Michael Paquier <michael@paquier.xyz> wrote:
>> This also points out that there are other things to worry about than
>> interruptions, as for example DropRelFileNodeLocalBuffers() could lead
>> to an ERROR, and this happens before the physical truncation is done
>> but after the WAL record is replayed on the standby, so any failures
>> happening at the truncation phase before the work is done would be a
>> problem.  However we are talking about failures which should not
>> happen and these are elog() calls.  It would be tempting to add a
>> critical section here, but we could still have problems if we have a
>> failure after the WAL record has been flushed, which means that it
>> would be replayed on the standby, and the surrounding comments are
>> clear about that.
>
> Could you elaborate what problem adding a critical section there occurs?

Wrapping the call of smgrtruncate() within RelationTruncate() to use a
critical section would make things worse from the user perspective on
the primary, no?  If the physical truncation fails, we would still
fail WAL replay on the standby, but instead of generating an ERROR in
the session of the user attempting the TRUNCATE, the whole primary
would be taken down.
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions
Next
From: Masahiko Sawada
Date:
Subject: Re: recovery starting when backup_label exists, but not recovery.signal