Hi,
On 2022-02-17 14:58:38 -0800, Nathan Bossart wrote:
> On Thu, Feb 17, 2022 at 02:28:29PM -0800, Andres Freund wrote:
> > As far as I understand, the primary concern are logical decoding serialized
> > snapshots, because a lot of them can accumulate if there e.g. is an old unused
> > / far behind slot. It should be easy to reduce the number of those snapshots
> > by e.g. eliding some redundant ones. Perhaps we could also make backends in
> > logical decoding occasionally do a bit of cleanup themselves.
> >
> > I've not seen reports of the number of mapping files to be an real issue?
>
> I routinely see all four of these tasks impacting customers, but I'd say
> the most common one is the temporary file cleanup.
I took temp file cleanup and StartupReorderBuffer() "out of consideration" for
custodian, because they're not needed during normal running.
> Besides eliminating some redundant files and having backends perform some
> cleanup, what do you think about skipping the logical decoding cleanup
> during end-of-recovery/shutdown checkpoints?
I strongly disagree with it. Then you might never get the cleanup done, but
keep on operating until you hit corruption issues.
> > The improvements around deleting temporary files and serialized snapshots
> > afaict don't require a dedicated process - they're only relevant during
> > startup. We could use the approach of renaming the directory out of the way as
> > done in this patchset but perform the cleanup in the startup process after
> > we're up.
>
> Perhaps this is a good place to start. As I mentioned above, IME the
> temporary file cleanup is the most common problem, so I think even getting
> that one fixed would be a huge improvement.
Cool.
Greetings,
Andres Freund