On Fri, Sep 27, 2019 at 3:14 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Thu, Sep 26, 2019 at 01:13:56AM +0900, Fujii Masao wrote:
> > On Tue, Sep 24, 2019 at 10:41 AM Michael Paquier <michael@paquier.xyz> wrote:
> >> This also points out that there are other things to worry about than
> >> interruptions, as for example DropRelFileNodeLocalBuffers() could lead
> >> to an ERROR, and this happens before the physical truncation is done
> >> but after the WAL record is replayed on the standby, so any failures
> >> happening at the truncation phase before the work is done would be a
> >> problem. However we are talking about failures which should not
> >> happen and these are elog() calls. It would be tempting to add a
> >> critical section here, but we could still have problems if we have a
> >> failure after the WAL record has been flushed, which means that it
> >> would be replayed on the standby, and the surrounding comments are
> >> clear about that.
> >
> > Could you elaborate what problem adding a critical section there occurs?
>
> Wrapping the call of smgrtruncate() within RelationTruncate() to use a
> critical section would make things worse from the user perspective on
> the primary, no? If the physical truncation fails, we would still
> fail WAL replay on the standby, but instead of generating an ERROR in
> the session of the user attempting the TRUNCATE, the whole primary
> would be taken down.
Thanks for elaborating that! Understood.
But this can cause subsequent recovery to always fail with invalid-pages error
and the server not to start up. This is bad. So, to allviate the situation,
I'm thinking it would be worth adding something like igore_invalid_pages
developer parameter. When this parameter is set to true, the startup process
always ignores invalid-pages errors. Thought?
Regards,
--
Fujii Masao