On Sat, Jan 15, 2022 at 2:59 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> Hi,
>
> On Sat, Jan 15, 2022 at 02:04:12PM +0530, Bharath Rupireddy wrote:
> >
> > We had an issue where there were many mapping files generated during
> > the crash recovery and end-of-recovery checkpoint was taking a lot of
> > time. We had to manually intervene and delete some of the mapping
> > files (although it may not sound sensible) to make end-of-recovery
> > checkpoint faster. Because of the race condition between manual
> > deletion and checkpoint deletion, the unlink error occurred which
> > crashed the server and the server entered the recovery again wasting
> > the entire earlier recovery work.
>
> Maybe I'm missing something but wouldn't
> https://commitfest.postgresql.org/36/3448/ better solve the problem?
The error can cause the new background process proposed there in that
thread to restart, which is again costly. Since we have LOG-only and
continue behavior in CheckPointSnapBuild already, having the same
behavior for CheckPointLogicalRewriteHeap helps a lot.
Regards,
Bharath Rupireddy.