Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
Date
Msg-id CALj2ACWUHL-pjFEVzPpxjauUCML2_i4u2k-=pmYmnRwc7zJoPQ@mail.gmail.com
Whole thread Raw
In response to Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work  (Julien Rouhaud <rjuju123@gmail.com>)
Responses Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
List pgsql-hackers
On Sat, Jan 15, 2022 at 2:59 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> Hi,
>
> On Sat, Jan 15, 2022 at 02:04:12PM +0530, Bharath Rupireddy wrote:
> >
> > We had an issue where there were many mapping files generated during
> > the crash recovery and end-of-recovery checkpoint was taking a lot of
> > time. We had to manually intervene and delete some of the mapping
> > files (although it may not sound sensible) to make end-of-recovery
> > checkpoint faster. Because of the race condition between manual
> > deletion and checkpoint deletion, the unlink error occurred which
> > crashed the server and the server entered the recovery again wasting
> > the entire earlier recovery work.
>
> Maybe I'm missing something but wouldn't
> https://commitfest.postgresql.org/36/3448/ better solve the problem?

The error can cause the new background process proposed there in that
thread to restart, which is again costly. Since we have LOG-only and
continue behavior in CheckPointSnapBuild already, having the same
behavior for CheckPointLogicalRewriteHeap helps a lot.

Regards,
Bharath Rupireddy.



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Skipping logical replication transactions on subscriber side
Next
From: Magnus Hagander
Date:
Subject: Re: Refactoring of compression options in pg_basebackup