Thread: O(n) tasks cause lengthy startups and checkpoints

O(n) tasks cause lengthy startups and checkpoints

From

"Bossart, Nathan"

Date:

01 December 2021, 20:24:25

Hi hackers,

Thanks to 61752af, SyncDataDirectory() can make use of syncfs() to
avoid individually syncing all database files after a crash.  However,
as noted earlier this year [0], there are still a number of O(n) tasks
that affect startup and checkpointing that I'd like to improve.
Below, I've attempted to summarize each task and to offer ideas for
improving matters.  I'll likely split each of these into its own
thread, given there is community interest for such changes.

1) CheckPointSnapBuild(): This function loops through
   pg_logical/snapshots to remove all snapshots that are no longer
   needed.  If there are many entries in this directory, this can take
   a long time.  The note above this function indicates that this is
   done during checkpoints simply because it is convenient.  IIUC
   there is no requirement that this function actually completes for a
   given checkpoint.  My current idea is to move this to a new
   maintenance worker.
2) CheckPointLogicalRewriteHeap(): This function loops through
   pg_logical/mappings to remove old mappings and flush all remaining
   ones.  IIUC there is no requirement that the "remove old mappings"
   part must complete for a given checkpoint, but the "flush all
   remaining" portion allows replay after a checkpoint to only "deal
   with the parts of a mapping that have been written out after the
   checkpoint started."  Therefore, I think we should move the "remove
   old mappings" part to a new maintenance worker (probably the same
   one as for 1), and we should consider using syncfs() for the "flush
   all remaining" part.  (I suspect the main argument against the
   latter will be that it could cause IO spikes.)
3) RemovePgTempFiles(): This step can delay startup if there are many
   temporary files to individually remove.  This step is already
   optionally done after a crash via the remove_temp_files_after_crash
   GUC.  I propose that we have startup move the temporary file
   directories aside and create new ones, and then a separate worker
   (probably the same one from 1 and 2) could clean up the old files.
4) StartupReorderBuffer(): This step deletes logical slot data that
   has been spilled to disk.  This code appears to be written to avoid
   deleting different types of files in these directories, but AFAICT
   there shouldn't be any other files.  Therefore, I think we could do
   something similar to 3 (i.e., move the directories aside during
   startup and clean them up via a new maintenance worker).

I realize adding a new maintenance worker might be a bit heavy-handed,
but I think it would be nice to have somewhere to offload tasks that
really shouldn't impact startup and checkpointing.  I imagine such a
process would come in handy down the road, too.  WDYT?

Nathan

[0] https://postgr.es/m/32B59582-AA6C-4609-B08F-2256A271F7A5%40amazon.com

Re: O(n) tasks cause lengthy startups and checkpoints

From

SATYANARAYANA NARLAPURAM

Date:

01 December 2021, 21:35:00

+1 to the idea. I don't see a reason why checkpointer has to do all of that. Keeping checkpoint to minimal essential work helps servers recover faster in the event of a crash.

RemoveOldXlogFiles is also an O(N) operation that can at least be avoided during the end of recovery (CHECKPOINT_END_OF_RECOVERY) checkpoint. When a sufficient number of WAL files accumulated and the previous checkpoint did not get a chance to cleanup, this can increase the unavailability of the server.

    RemoveOldXlogFiles(_logSegNo, RedoRecPtr, recptr);

On Wed, Dec 1, 2021 at 12:24 PM Bossart, Nathan <bossartn@amazon.com> wrote:

Hi hackers,

Thanks to 61752af, SyncDataDirectory() can make use of syncfs() to
avoid individually syncing all database files after a crash. However,
as noted earlier this year [0], there are still a number of O(n) tasks
that affect startup and checkpointing that I'd like to improve.
Below, I've attempted to summarize each task and to offer ideas for
improving matters. I'll likely split each of these into its own
thread, given there is community interest for such changes.

1) CheckPointSnapBuild(): This function loops through
pg_logical/snapshots to remove all snapshots that are no longer
needed. If there are many entries in this directory, this can take
a long time. The note above this function indicates that this is
done during checkpoints simply because it is convenient. IIUC
there is no requirement that this function actually completes for a
given checkpoint. My current idea is to move this to a new
maintenance worker.
2) CheckPointLogicalRewriteHeap(): This function loops through
pg_logical/mappings to remove old mappings and flush all remaining
ones. IIUC there is no requirement that the "remove old mappings"
part must complete for a given checkpoint, but the "flush all
remaining" portion allows replay after a checkpoint to only "deal
with the parts of a mapping that have been written out after the
checkpoint started." Therefore, I think we should move the "remove
old mappings" part to a new maintenance worker (probably the same
one as for 1), and we should consider using syncfs() for the "flush
all remaining" part. (I suspect the main argument against the
latter will be that it could cause IO spikes.)
3) RemovePgTempFiles(): This step can delay startup if there are many
temporary files to individually remove. This step is already
optionally done after a crash via the remove_temp_files_after_crash
GUC. I propose that we have startup move the temporary file
directories aside and create new ones, and then a separate worker
(probably the same one from 1 and 2) could clean up the old files.
4) StartupReorderBuffer(): This step deletes logical slot data that
has been spilled to disk. This code appears to be written to avoid
deleting different types of files in these directories, but AFAICT
there shouldn't be any other files. Therefore, I think we could do
something similar to 3 (i.e., move the directories aside during
startup and clean them up via a new maintenance worker).

I realize adding a new maintenance worker might be a bit heavy-handed,
but I think it would be nice to have somewhere to offload tasks that
really shouldn't impact startup and checkpointing. I imagine such a
process would come in handy down the road, too. WDYT?

Nathan

[0] https://postgr.es/m/32B59582-AA6C-4609-B08F-2256A271F7A5%40amazon.com

Re: O(n) tasks cause lengthy startups and checkpoints

From

Andres Freund

Date:

01 December 2021, 22:56:32

Hi,

On 2021-12-01 20:24:25 +0000, Bossart, Nathan wrote:
> I realize adding a new maintenance worker might be a bit heavy-handed,
> but I think it would be nice to have somewhere to offload tasks that
> really shouldn't impact startup and checkpointing.  I imagine such a
> process would come in handy down the road, too.  WDYT?

-1. I think the overhead of an additional worker is disproportional here. And
there's simplicity benefits in having a predictable cleanup interlock as well.

I think particularly for the snapshot stuff it'd be better to optimize away
unnecessary snapshot files, rather than making the cleanup more asynchronous.

Greetings,

Andres Freund

Re: O(n) tasks cause lengthy startups and checkpoints

From

"Bossart, Nathan"

Date:

02 December 2021, 00:19:19

On 12/1/21, 2:56 PM, "Andres Freund" <andres@anarazel.de> wrote:
> On 2021-12-01 20:24:25 +0000, Bossart, Nathan wrote:
>> I realize adding a new maintenance worker might be a bit heavy-handed,
>> but I think it would be nice to have somewhere to offload tasks that
>> really shouldn't impact startup and checkpointing.  I imagine such a
>> process would come in handy down the road, too.  WDYT?
>
> -1. I think the overhead of an additional worker is disproportional here. And
> there's simplicity benefits in having a predictable cleanup interlock as well.

Another idea I had was to put some upper limit on how much time is
spent on such tasks.  For example, a checkpoint would only spend X
minutes on CheckPointSnapBuild() before giving up until the next one.
I think the main downside of that approach is that it could lead to
unbounded growth, so perhaps we would limit (or even skip) such tasks
only for end-of-recovery and shutdown checkpoints.  Perhaps the
startup tasks could be limited in a similar fashion.

> I think particularly for the snapshot stuff it'd be better to optimize away
> unnecessary snapshot files, rather than making the cleanup more asynchronous.

I can look into this.  Any pointers would be much appreciated.

Nathan

Re: O(n) tasks cause lengthy startups and checkpoints

From

"Euler Taveira"

Date:

02 December 2021, 02:05:03

On Wed, Dec 1, 2021, at 9:19 PM, Bossart, Nathan wrote:

On 12/1/21, 2:56 PM, "Andres Freund" <andres@anarazel.de> wrote:
> On 2021-12-01 20:24:25 +0000, Bossart, Nathan wrote:
>> I realize adding a new maintenance worker might be a bit heavy-handed,
>> but I think it would be nice to have somewhere to offload tasks that
>> really shouldn't impact startup and checkpointing. I imagine such a
>> process would come in handy down the road, too. WDYT?
>
> -1. I think the overhead of an additional worker is disproportional here. And
> there's simplicity benefits in having a predictable cleanup interlock as well.

Another idea I had was to put some upper limit on how much time is
spent on such tasks. For example, a checkpoint would only spend X
minutes on CheckPointSnapBuild() before giving up until the next one.
I think the main downside of that approach is that it could lead to
unbounded growth, so perhaps we would limit (or even skip) such tasks
only for end-of-recovery and shutdown checkpoints. Perhaps the
startup tasks could be limited in a similar fashion.

Saying that a certain task is O(n) doesn't mean it needs a separate process to

handle it. Did you have a use case or even better numbers (% of checkpoint /

startup time) that makes your proposal worthwhile?

I would try to optimize (1) and (2). However, delayed removal can be a

long-term issue if the new routine cannot keep up with the pace of file

creation (specially if the checkpoints are far apart).

For (3), there is already a GUC that would avoid the slowdown during startup.

Use it if you think the startup time is more important that disk space occupied

by useless files.

For (4), you are forgetting that the on-disk state of replication slots is

stored in the pg_replslot/SLOTNAME/state. It seems you cannot just rename the

replication slot directory and copy the state file. What happen if there is a

crash before copying the state file?

While we are talking about items (1), (2) and (4), we could probably have an

option to create some ephemeral logical decoding files into ramdisk (similar to

statistics directory). I wouldn't like to hijack this thread but this proposal

could alleviate the possible issues that you pointed out. If people are

interested in this proposal, I can start a new thread about it.

Euler Taveira

EDB https://www.enterprisedb.com/

Re: O(n) tasks cause lengthy startups and checkpoints

From

Bharath Rupireddy

Date:

02 December 2021, 02:47:06

On Thu, Dec 2, 2021 at 1:54 AM Bossart, Nathan <bossartn@amazon.com> wrote:
>
> Hi hackers,
>
> Thanks to 61752af, SyncDataDirectory() can make use of syncfs() to
> avoid individually syncing all database files after a crash.  However,
> as noted earlier this year [0], there are still a number of O(n) tasks
> that affect startup and checkpointing that I'd like to improve.
> Below, I've attempted to summarize each task and to offer ideas for
> improving matters.  I'll likely split each of these into its own
> thread, given there is community interest for such changes.
>
> 1) CheckPointSnapBuild(): This function loops through
>    pg_logical/snapshots to remove all snapshots that are no longer
>    needed.  If there are many entries in this directory, this can take
>    a long time.  The note above this function indicates that this is
>    done during checkpoints simply because it is convenient.  IIUC
>    there is no requirement that this function actually completes for a
>    given checkpoint.  My current idea is to move this to a new
>    maintenance worker.
> 2) CheckPointLogicalRewriteHeap(): This function loops through
>    pg_logical/mappings to remove old mappings and flush all remaining
>    ones.  IIUC there is no requirement that the "remove old mappings"
>    part must complete for a given checkpoint, but the "flush all
>    remaining" portion allows replay after a checkpoint to only "deal
>    with the parts of a mapping that have been written out after the
>    checkpoint started."  Therefore, I think we should move the "remove
>    old mappings" part to a new maintenance worker (probably the same
>    one as for 1), and we should consider using syncfs() for the "flush
>    all remaining" part.  (I suspect the main argument against the
>    latter will be that it could cause IO spikes.)
> 3) RemovePgTempFiles(): This step can delay startup if there are many
>    temporary files to individually remove.  This step is already
>    optionally done after a crash via the remove_temp_files_after_crash
>    GUC.  I propose that we have startup move the temporary file
>    directories aside and create new ones, and then a separate worker
>    (probably the same one from 1 and 2) could clean up the old files.
> 4) StartupReorderBuffer(): This step deletes logical slot data that
>    has been spilled to disk.  This code appears to be written to avoid
>    deleting different types of files in these directories, but AFAICT
>    there shouldn't be any other files.  Therefore, I think we could do
>    something similar to 3 (i.e., move the directories aside during
>    startup and clean them up via a new maintenance worker).
>
> I realize adding a new maintenance worker might be a bit heavy-handed,
> but I think it would be nice to have somewhere to offload tasks that
> really shouldn't impact startup and checkpointing.  I imagine such a
> process would come in handy down the road, too.  WDYT?

+1 for the overall idea of making the checkpoint faster. In fact, we
here at our team have been thinking about this problem for a while. If
there are a lot of files that checkpoint has to loop over and remove,
IMO, that task can be delegated to someone else (maybe a background
worker called background cleaner or bg cleaner, of course, we can have
a GUC to enable or disable it). The checkpoint can just write some
marker files (for instance, it can write snapshot_<cutofflsn> files
with file name itself representing the cutoff lsn so that the new bg
cleaner can remove the snapshot files, similarly it can write marker
files for other file removals). Having said that, a new bg cleaner
deleting the files asynchronously on behalf of checkpoint can look an
overkill until we have some numbers that we could save with this
approach. For this purpose, I did a small experiment to figure out how
much usually file deletion takes [1] on a SSD, for 1million files
8seconds, I'm sure it will be much more on HDD.

The bg cleaner can also be used for RemovePgTempFiles, probably the
postmaster just renaming the pgsql_temp to something
pgsql_temp_delete, then proceeding with the server startup, the bg
cleaner can then delete the files.
Also, we could do something similar for removing/recycling old xlog
files and StartupReorderBuffer.

Another idea could be to parallelize the checkpoint i.e. IIUC, the
tasks that checkpoint do in CheckPointGuts are independent and if we
have some counters like (how many snapshot/mapping files that the
server generated)

[1] on SSD:
deletion of 1000000 files took 7.930380 seconds
deletion of 500000 files took 3.921676 seconds
deletion of 100000 files took 0.768772 seconds
deletion of 50000 files took 0.400623 seconds
deletion of 10000 files took 0.077565 seconds
deletion of 1000 files took 0.006232 seconds

Regards,
Bharath Rupireddy.

Re: O(n) tasks cause lengthy startups and checkpoints

From

"Bossart, Nathan"

Date:

02 December 2021, 21:19:16

On 12/1/21, 6:06 PM, "Euler Taveira" <euler@eulerto.com> wrote:
> Saying that a certain task is O(n) doesn't mean it needs a separate process to
> handle it. Did you have a use case or even better numbers (% of checkpoint /
> startup time) that makes your proposal worthwhile?

I don't have specific numbers on hand, but each of the four functions
I listed is something I routinely see impacting customers.

> For (3), there is already a GUC that would avoid the slowdown during startup.
> Use it if you think the startup time is more important that disk space occupied
> by useless files.

Setting remove_temp_files_after_crash to false only prevents temp file
cleanup during restart after a backend crash.  It is always called for
other startups.

> For (4), you are forgetting that the on-disk state of replication slots is
> stored in the pg_replslot/SLOTNAME/state. It seems you cannot just rename the
> replication slot directory and copy the state file. What happen if there is a
> crash before copying the state file?

Good point.  I think it's possible to deal with this, though.  Perhaps
the files that should be deleted on startup should go in a separate
directory, or maybe we could devise a way to ensure the state file is
copied even if there is a crash at an inconvenient time.

Nathan

Re: O(n) tasks cause lengthy startups and checkpoints

From

"Bossart, Nathan"

Date:

02 December 2021, 21:31:17

On 12/1/21, 6:48 PM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote:
> +1 for the overall idea of making the checkpoint faster. In fact, we
> here at our team have been thinking about this problem for a while. If
> there are a lot of files that checkpoint has to loop over and remove,
> IMO, that task can be delegated to someone else (maybe a background
> worker called background cleaner or bg cleaner, of course, we can have
> a GUC to enable or disable it). The checkpoint can just write some

Right.  IMO it isn't optimal to have critical things like startup and
checkpointing depend on somewhat-unrelated tasks.  I understand the
desire to avoid adding additional processes, and maybe it is a bigger
hammer than what is necessary to reduce the impact, but it seemed like
a natural solution for this problem.  That being said, I'm all for
exploring other ways to handle this.

> Another idea could be to parallelize the checkpoint i.e. IIUC, the
> tasks that checkpoint do in CheckPointGuts are independent and if we
> have some counters like (how many snapshot/mapping files that the
> server generated)

Could you elaborate on this?  Is your idea that the checkpointer would
create worker processes like autovacuum does?

Nathan

Re: O(n) tasks cause lengthy startups and checkpoints

From

Bharath Rupireddy

Date:

03 December 2021, 13:56:46

On Fri, Dec 3, 2021 at 3:01 AM Bossart, Nathan <bossartn@amazon.com> wrote:
>
> On 12/1/21, 6:48 PM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote:
> > +1 for the overall idea of making the checkpoint faster. In fact, we
> > here at our team have been thinking about this problem for a while. If
> > there are a lot of files that checkpoint has to loop over and remove,
> > IMO, that task can be delegated to someone else (maybe a background
> > worker called background cleaner or bg cleaner, of course, we can have
> > a GUC to enable or disable it). The checkpoint can just write some
>
> Right.  IMO it isn't optimal to have critical things like startup and
> checkpointing depend on somewhat-unrelated tasks.  I understand the
> desire to avoid adding additional processes, and maybe it is a bigger
> hammer than what is necessary to reduce the impact, but it seemed like
> a natural solution for this problem.  That being said, I'm all for
> exploring other ways to handle this.

Having a generic background cleaner process (controllable via a few
GUCs), which can delete a bunch of files (snapshot, mapping, old WAL,
temp files etc.) or some other task on behalf of the checkpointer,
seems to be the easiest solution.

I'm too open for other ideas.

> > Another idea could be to parallelize the checkpoint i.e. IIUC, the
> > tasks that checkpoint do in CheckPointGuts are independent and if we
> > have some counters like (how many snapshot/mapping files that the
> > server generated)
>
> Could you elaborate on this?  Is your idea that the checkpointer would
> create worker processes like autovacuum does?

Yes, I was thinking that the checkpointer creates one or more dynamic
background workers (we can assume one background worker for now) to
delete the files. If a threshold of files crosses (snapshot files
count is more than this threshold), the new worker gets spawned which
would then enumerate the files and delete the unneeded ones, the
checkpointer can proceed with the other tasks and finish the
checkpointing. Having said this, I prefer the background cleaner
approach over the dynamic background worker. The advantage with the
background cleaner being that it can do other tasks (like other kinds
of file deletion).

Another idea could be that, use the existing background writer to do
the file deletion while the checkpoint is happening. But again, this
might cause problems because the bg writer flushing dirty buffers will
get delayed.

Regards,
Bharath Rupireddy.

Re: O(n) tasks cause lengthy startups and checkpoints

From

"Bossart, Nathan"

Date:

03 December 2021, 18:20:44

On 12/3/21, 5:57 AM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote:
> On Fri, Dec 3, 2021 at 3:01 AM Bossart, Nathan <bossartn@amazon.com> wrote:
>>
>> On 12/1/21, 6:48 PM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote:
>> > +1 for the overall idea of making the checkpoint faster. In fact, we
>> > here at our team have been thinking about this problem for a while. If
>> > there are a lot of files that checkpoint has to loop over and remove,
>> > IMO, that task can be delegated to someone else (maybe a background
>> > worker called background cleaner or bg cleaner, of course, we can have
>> > a GUC to enable or disable it). The checkpoint can just write some
>>
>> Right.  IMO it isn't optimal to have critical things like startup and
>> checkpointing depend on somewhat-unrelated tasks.  I understand the
>> desire to avoid adding additional processes, and maybe it is a bigger
>> hammer than what is necessary to reduce the impact, but it seemed like
>> a natural solution for this problem.  That being said, I'm all for
>> exploring other ways to handle this.
>
> Having a generic background cleaner process (controllable via a few
> GUCs), which can delete a bunch of files (snapshot, mapping, old WAL,
> temp files etc.) or some other task on behalf of the checkpointer,
> seems to be the easiest solution.
>
> I'm too open for other ideas.

I might hack something together for the separate worker approach, if
for no other reason than to make sure I really understand how these
functions work.  If/when a better idea emerges, we can alter course.

Nathan

Re: O(n) tasks cause lengthy startups and checkpoints

From

Bharath Rupireddy

Date:

06 December 2021, 11:43:44

On Fri, Dec 3, 2021 at 11:50 PM Bossart, Nathan <bossartn@amazon.com> wrote:
>
> On 12/3/21, 5:57 AM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote:
> > On Fri, Dec 3, 2021 at 3:01 AM Bossart, Nathan <bossartn@amazon.com> wrote:
> >>
> >> On 12/1/21, 6:48 PM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote:
> >> > +1 for the overall idea of making the checkpoint faster. In fact, we
> >> > here at our team have been thinking about this problem for a while. If
> >> > there are a lot of files that checkpoint has to loop over and remove,
> >> > IMO, that task can be delegated to someone else (maybe a background
> >> > worker called background cleaner or bg cleaner, of course, we can have
> >> > a GUC to enable or disable it). The checkpoint can just write some
> >>
> >> Right.  IMO it isn't optimal to have critical things like startup and
> >> checkpointing depend on somewhat-unrelated tasks.  I understand the
> >> desire to avoid adding additional processes, and maybe it is a bigger
> >> hammer than what is necessary to reduce the impact, but it seemed like
> >> a natural solution for this problem.  That being said, I'm all for
> >> exploring other ways to handle this.
> >
> > Having a generic background cleaner process (controllable via a few
> > GUCs), which can delete a bunch of files (snapshot, mapping, old WAL,
> > temp files etc.) or some other task on behalf of the checkpointer,
> > seems to be the easiest solution.
> >
> > I'm too open for other ideas.
>
> I might hack something together for the separate worker approach, if
> for no other reason than to make sure I really understand how these
> functions work.  If/when a better idea emerges, we can alter course.

Thanks. As I said upthread we've been discussing the approach of
offloading some of the checkpoint tasks like (deleting snapshot files)
internally for quite some time and I would like to share a patch that
adds a new background cleaner process (currently able to delete the
logical replication snapshot files, if required can be extended to do
other tasks as well). I don't mind if it gets rejected. Please have a
look.

Regards,
Bharath Rupireddy.

Attachment

v1-0001-background-cleaner-to-offload-checkpoint-tasks.patch

Re: O(n) tasks cause lengthy startups and checkpoints

From

"Bossart, Nathan"

Date:

06 December 2021, 19:22:03

On 12/6/21, 3:44 AM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote:
> On Fri, Dec 3, 2021 at 11:50 PM Bossart, Nathan <bossartn@amazon.com> wrote:
>> I might hack something together for the separate worker approach, if
>> for no other reason than to make sure I really understand how these
>> functions work.  If/when a better idea emerges, we can alter course.
>
> Thanks. As I said upthread we've been discussing the approach of
> offloading some of the checkpoint tasks like (deleting snapshot files)
> internally for quite some time and I would like to share a patch that
> adds a new background cleaner process (currently able to delete the
> logical replication snapshot files, if required can be extended to do
> other tasks as well). I don't mind if it gets rejected. Please have a
> look.

Thanks for sharing!  I've also spent some time on a patch set, which I
intend to share once I have handling for all four tasks (so far I have
handling for CheckPointSnapBuild() and RemovePgTempFiles()).  I'll
take a look at your patch as well.

Nathan

Re: O(n) tasks cause lengthy startups and checkpoints

From

"Bossart, Nathan"

Date:

10 December 2021, 19:03:17

On 12/6/21, 11:23 AM, "Bossart, Nathan" <bossartn@amazon.com> wrote:
> On 12/6/21, 3:44 AM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote:
>> Thanks. As I said upthread we've been discussing the approach of
>> offloading some of the checkpoint tasks like (deleting snapshot files)
>> internally for quite some time and I would like to share a patch that
>> adds a new background cleaner process (currently able to delete the
>> logical replication snapshot files, if required can be extended to do
>> other tasks as well). I don't mind if it gets rejected. Please have a
>> look.
>
> Thanks for sharing!  I've also spent some time on a patch set, which I
> intend to share once I have handling for all four tasks (so far I have
> handling for CheckPointSnapBuild() and RemovePgTempFiles()).  I'll
> take a look at your patch as well.

Well, I haven't had a chance to look at your patch, and my patch set
still only has handling for CheckPointSnapBuild() and
RemovePgTempFiles(), but I thought I'd share what I have anyway.  I
split it into 5 patches:

0001 - Adds a new "custodian" auxiliary process that does nothing.
0002 - During startup, remove the pgsql_tmp directories instead of
       only clearing the contents.
0003 - Split temporary file cleanup during startup into two stages.
       The first renames the directories, and the second clears them.
0004 - Moves the second stage from 0003 to the custodian process.
0005 - Moves CheckPointSnapBuild() to the custodian process.

This is still very much a work in progress, and I've done minimal
testing so far.

Nathan

On 12/13/21, 10:21 AM, "Bossart, Nathan" <bossartn@amazon.com> wrote:
> Thanks for chiming in.  I have an almost-complete patch set that I'm
> hoping to post to the lists in the next couple of days.

As promised, here is v2.  This patch set includes handling for all
four tasks noted upthread.  I'd still consider this a work-in-
progress, as I've done minimal testing.  At the very least, it should
demonstrate what an auxiliary process approach might look like.

Nathan

Thanks for your review.

On 1/2/22, 11:00 PM, "Amul Sul" <sulamul@gmail.com> wrote:
> On Mon, Jan 3, 2022 at 2:56 AM Andres Freund <andres@anarazel.de> wrote:
>> This generates a compiler warning:
>> https://cirrus-ci.com/task/5740581082103808?logs=mingw_cross_warning#L378
>
> Somehow, I am not getting these compiler warnings on the latest master
> head (69872d0bbe6).

I attempted to fix this by including time.h in custodian.c.

> Here are the few minor comments for the v2 version, I thought would help:
>
> + * Copyright (c) 2021, PostgreSQL Global Development Group
>
> Time to change the year :)

Fixed in v3.

> +
> +       /* These operations are really just a minimal subset of
> +        * AbortTransaction().  We don't have very many resources to worry
> +        * about.
> +        */
>
> Incorrect formatting, the first line should be empty in the multiline
> code comment.

Fixed in v3.

> +   XLogRecPtr  logical_rewrite_mappings_cutoff;    /* can remove
> older mappings */
> +   XLogRecPtr  logical_rewrite_mappings_cutoff_set;
>
> Look like logical_rewrite_mappings_cutoff gets to set only once and
> never get reset, if it is true then I think that variable can be
> skipped completely and set the initial logical_rewrite_mappings_cutoff
> to InvalidXLogRecPtr, that will do the needful.

I think the problem with this is that when the cutoff is
InvalidXLogRecPtr, it is taken to mean that all logical rewrite files
can be removed.  If we just used the cutoff variable, we could remove
files we need if the custodian ran before the cutoff was set.  I
suppose we could initially set the cutoff to MaxXLogRecPtr to indicate
that the value is not yet set, but I see no real advantage to doing it
that way versus just using a bool.  Speaking of which,
logical_rewrite_mappings_cutoff_set obviously should be a bool.  I've
fixed that in v3.

Nathan

Attachment

Re: O(n) tasks cause lengthy startups and checkpoints

From

Maxim Orlov

Date:

14 January 2022, 11:41:46

The code seems to be in good condition. All the tests are running ok with no errors.

I like the whole idea of shifting additional checkpointer jobs as much as possible to another worker. In my view, it is more appropriate to call this worker "bg cleaner" or "bg file cleaner" or smth.

It could be useful for systems with high load, which may deal with deleting many files at once, but I'm not sure about "small" installations. Extra bg worker need more resources to do occasional deletion of small amounts of files. I really do not know how to do it better, maybe to have two different code paths switched by GUC?

Should we also think about adding WAL preallocation into custodian worker from the patch "Pre-alocationg WAL files" [1] ?

[1] https://www.postgresql.org/message-id/flat/20201225200953.jjkrytlrzojbndh5@alap3.anarazel.de

Best regards,

Maxim Orlov.

Re: O(n) tasks cause lengthy startups and checkpoints

From

"Bossart, Nathan"

Date:

14 January 2022, 19:16:01

On 1/14/22, 3:43 AM, "Maxim Orlov" <orlovmg@gmail.com> wrote:
> The code seems to be in good condition. All the tests are running ok with no errors.

Thanks for your review.

> I like the whole idea of shifting additional checkpointer jobs as much as possible to another worker. In my view, it
ismore appropriate to call this worker "bg cleaner" or "bg file cleaner" or smth.

>
> It could be useful for systems with high load, which may deal with deleting many files at once, but I'm not sure
about"small" installations. Extra bg worker need more resources to do occasional deletion of small amounts of files. I
reallydo not know how to do it better, maybe to have two different code paths switched by GUC?

I'd personally like to avoid creating two code paths for the same
thing.  Are there really cases when this one extra auxiliary process
would be too many?  And if so, how would a user know when to adjust
this GUC?  I understand the point that we should introduce new
processes sparingly to avoid burdening low-end systems, but I don't
think we should be afraid to add new ones when it is needed.

That being said, if making the extra worker optional addresses the
concerns about resource usage, maybe we should consider it.  Justin
suggested using something like max_parallel_maintenance_workers
upthread [0].

> Should we also think about adding WAL preallocation into custodian worker from the patch "Pre-alocationg WAL files"
[1]?

This was brought up in the pre-allocation thread [1].  I don't think
the custodian process would be the right place for it, and I'm also
not as concerned about it because it will generally be a small, fixed,
and configurable amount of work.  In any case, I don't sense a ton of
support for a new auxiliary process in this thread, so I'm hesitant to
go down the same path for pre-allocation.

Nathan

[0] https://postgr.es/m/20211213171935.GX17618%40telsasoft.com
[1] https://postgr.es/m/B2ACCC5A-F9F2-41D9-AC3B-251362A0A254%40amazon.com

Re: O(n) tasks cause lengthy startups and checkpoints

From

Bharath Rupireddy

Date:

15 January 2022, 07:25:43

On Sat, Jan 15, 2022 at 12:46 AM Bossart, Nathan <bossartn@amazon.com> wrote:
>
> On 1/14/22, 3:43 AM, "Maxim Orlov" <orlovmg@gmail.com> wrote:
> > The code seems to be in good condition. All the tests are running ok with no errors.
>
> Thanks for your review.
>
> > I like the whole idea of shifting additional checkpointer jobs as much as possible to another worker. In my view,
itis more appropriate to call this worker "bg cleaner" or "bg file cleaner" or smth. 

I personally prefer "background cleaner" as the new process name in
line with "background writer" and "background worker".

> > It could be useful for systems with high load, which may deal with deleting many files at once, but I'm not sure
about"small" installations. Extra bg worker need more resources to do occasional deletion of small amounts of files. I
reallydo not know how to do it better, maybe to have two different code paths switched by GUC? 
>
> I'd personally like to avoid creating two code paths for the same
> thing.  Are there really cases when this one extra auxiliary process
> would be too many?  And if so, how would a user know when to adjust
> this GUC?  I understand the point that we should introduce new
> processes sparingly to avoid burdening low-end systems, but I don't
> think we should be afraid to add new ones when it is needed.

IMO, having a GUC for enabling/disabling this new worker and it's
related code would be a better idea. The reason is that if the
postgres has no replication slots at all(which is quite possible in
real stand-alone production environments) or if the file enumeration
(directory traversal and file removal) is fast enough on the servers,
there's no point having this new worker, the checkpointer itself can
take care of the work as it is doing today.

> That being said, if making the extra worker optional addresses the
> concerns about resource usage, maybe we should consider it.  Justin
> suggested using something like max_parallel_maintenance_workers
> upthread [0].

I don't think having this new process is built as part of
max_parallel_maintenance_workers, instead I prefer to have it as an
auxiliary process much like "background writer", "wal writer" and so
on.

I think now it's the time for us to run some use cases and get the
perf reports to see how beneficial this new process is going to be, in
terms of improving the checkpoint timings.

> > Should we also think about adding WAL preallocation into custodian worker from the patch "Pre-alocationg WAL files"
[1]? 
>
> This was brought up in the pre-allocation thread [1].  I don't think
> the custodian process would be the right place for it, and I'm also
> not as concerned about it because it will generally be a small, fixed,
> and configurable amount of work.  In any case, I don't sense a ton of
> support for a new auxiliary process in this thread, so I'm hesitant to
> go down the same path for pre-allocation.
>
> [0] https://postgr.es/m/20211213171935.GX17618%40telsasoft.com
> [1] https://postgr.es/m/B2ACCC5A-F9F2-41D9-AC3B-251362A0A254%40amazon.com

I think the idea of weaving every non-critical task to a common
background process is a good idea but let's not mix up with the new
background cleaner process here for now, at least until we get some
numbers and prove that the idea proposed here will be beneficial.

Regards,
Bharath Rupireddy.

Re: O(n) tasks cause lengthy startups and checkpoints

From

"Bossart, Nathan"

Date:

18 January 2022, 20:00:41

On 1/14/22, 11:26 PM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote:
> On Sat, Jan 15, 2022 at 12:46 AM Bossart, Nathan <bossartn@amazon.com> wrote:
>> I'd personally like to avoid creating two code paths for the same
>> thing.  Are there really cases when this one extra auxiliary process
>> would be too many?  And if so, how would a user know when to adjust
>> this GUC?  I understand the point that we should introduce new
>> processes sparingly to avoid burdening low-end systems, but I don't
>> think we should be afraid to add new ones when it is needed.
>
> IMO, having a GUC for enabling/disabling this new worker and it's
> related code would be a better idea. The reason is that if the
> postgres has no replication slots at all(which is quite possible in
> real stand-alone production environments) or if the file enumeration
> (directory traversal and file removal) is fast enough on the servers,
> there's no point having this new worker, the checkpointer itself can
> take care of the work as it is doing today.

IMO introducing a GUC wouldn't be doing users many favors.  Their
cluster might work just fine for a long time before they begin
encountering problems during startups/checkpoints.  Once the user
discovers the underlying reason, they have to then find a GUC for
enabling a special background worker that makes this problem go away.
Why not just fix the problem for everybody by default?

I've been thinking about what other approaches we could take besides
creating more processes.  The root of the problem seems to be that
there are a number of tasks that are performed synchronously that can
take a long time.  The process approach essentially makes these tasks
asynchronous so that they do not block startup and checkpointing.  But
perhaps this can be done in an existing process, possibly even the
checkpointer.  Like the current WAL pre-allocation patch, we could do
this work when the checkpointer isn't checkpointing, and we could also
do small amounts of work in CheckpointWriteDelay() (or a new function
called in a similar way).  In theory, this would help avoid delaying
checkpoints too long while doing cleanup at every opportunity to lower
the chances it falls far behind.

Nathan

Re: O(n) tasks cause lengthy startups and checkpoints

From

Nathan Bossart

Date:

11 February 2022, 18:02:49

Here is a rebased patch set.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: O(n) tasks cause lengthy startups and checkpoints

From

Nathan Bossart

Date:

17 February 2022, 00:50:57

Here is another rebased patch set.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

On Fri, Jun 24, 2022 at 11:45:22AM +0100, Simon Riggs wrote:
> On Thu, 23 Jun 2022 at 18:15, Nathan Bossart <nathandbossart@gmail.com> wrote:
>> I'm grateful for the discussion in this thread so far, but I'm not seeing a
>> clear path forward.
> 
> +1 to add the new auxiliary process.

I went ahead and put together a new patch set for this in which I've
attempted to address most of the feedback from upthread.  Notably, I've
abandoned 0007 and 0008, added a way for processes to request specific
tasks for the custodian, and removed all the checks for
ShutdownRequestPending.

I haven't addressed the existing transaction ID wraparound risk with the
logical replication files.  My instinct is that this deserveѕ its own
thread, and it might need to be considered a prerequisite to this change
based on the prior discussion here.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Here's a new revision where I've attempted to address all the feedback I've
received thus far.  Notably, the custodian now uses a queue for registering
tasks and determining which tasks to execute.  Other changes include
splitting the temporary file functions apart to avoid consecutive boolean
flags, using a timestamp instead of an integer for the staging name for
temporary directories, moving temporary directories to a dedicated
directory so that the custodian doesn't need to scan relation files,
ERROR-ing when something goes wrong when cleaning up temporary files,
executing requested tasks immediately in single-user mode, and more.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: O(n) tasks cause lengthy startups and checkpoints

From

Nathan Bossart

Date:

11 August 2022, 23:09:21

On Wed, Jul 06, 2022 at 09:51:10AM -0700, Nathan Bossart wrote:
> Here's a new revision where I've attempted to address all the feedback I've
> received thus far.  Notably, the custodian now uses a queue for registering
> tasks and determining which tasks to execute.  Other changes include
> splitting the temporary file functions apart to avoid consecutive boolean
> flags, using a timestamp instead of an integer for the staging name for
> temporary directories, moving temporary directories to a dedicated
> directory so that the custodian doesn't need to scan relation files,
> ERROR-ing when something goes wrong when cleaning up temporary files,
> executing requested tasks immediately in single-user mode, and more.

Here is a rebased patch set for cfbot.  There are no other differences
between v7 and v8.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: O(n) tasks cause lengthy startups and checkpoints

From

Nathan Bossart

Date:

24 August 2022, 16:46:24

On Thu, Aug 11, 2022 at 04:09:21PM -0700, Nathan Bossart wrote:
> Here is a rebased patch set for cfbot.  There are no other differences
> between v7 and v8.

Another rebase for cfbot.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: O(n) tasks cause lengthy startups and checkpoints

From

Nathan Bossart

Date:

02 September 2022, 22:07:44

On Wed, Aug 24, 2022 at 09:46:24AM -0700, Nathan Bossart wrote:
> Another rebase for cfbot.

And another.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: O(n) tasks cause lengthy startups and checkpoints

From

Nathan Bossart

Date:

23 September 2022, 17:41:54

On Fri, Sep 02, 2022 at 03:07:44PM -0700, Nathan Bossart wrote:
> And another.

v11 adds support for building with meson.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: O(n) tasks cause lengthy startups and checkpoints

From

Nathan Bossart

Date:

06 November 2022, 22:38:42

On Fri, Sep 23, 2022 at 10:41:54AM -0700, Nathan Bossart wrote:
> v11 adds support for building with meson.

rebased

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: O(n) tasks cause lengthy startups and checkpoints

From

Nathan Bossart

Date:

24 November 2022, 00:19:07

On Sun, Nov 06, 2022 at 02:38:42PM -0800, Nathan Bossart wrote:
> rebased

another rebase for cfbot

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: O(n) tasks cause lengthy startups and checkpoints

From

Simon Riggs

Date:

24 November 2022, 17:31:02

On Thu, 24 Nov 2022 at 00:19, Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Sun, Nov 06, 2022 at 02:38:42PM -0800, Nathan Bossart wrote:
> > rebased
>
> another rebase for cfbot

0001 seems good to me
* I like that it sleeps forever until requested
* not sure I believe that everything it does can always be aborted out
of and shutdown - to achieve that you will need a
CHECK_FOR_INTERRUPTS() calls in the loops in patches 5 and 6 at least
* not sure why you want immediate execution of custodian tasks - I
feel supporting two modes will be a lot harder. For me, I would run
locally when !IsUnderPostmaster and also in an Assert build, so we can
test it works right - i.e. running in its own process is just a
production optimization for performance (which is the stated reason
for having this)

0005 seems good from what I know
* There is no check to see if it worked in any sane time
* It seems possible that "Old" might change meaning - will that make
it break/fail?

0006 seems good also
* same comments for 5

Rather than explicitly use DEBUG1 everywhere I would have an
#define CUSTODIAN_LOG_LEVEL     LOG
so we can run with it in LOG mode and then set it to DEBUG1 with a one
line change in a later phase of Beta

I can't really comment with knowledge on sub-patches 0002 to 0004.

Perhaps you should aim to get 1, 5, 6 committed first and then return
to the others in a later CF/separate thread?

-- 
Simon Riggs                http://www.EnterpriseDB.com/

Re: O(n) tasks cause lengthy startups and checkpoints

From

Nathan Bossart

Date:

27 November 2022, 23:34:34

Thanks for taking a look!

On Thu, Nov 24, 2022 at 05:31:02PM +0000, Simon Riggs wrote:
> * not sure I believe that everything it does can always be aborted out
> of and shutdown - to achieve that you will need a
> CHECK_FOR_INTERRUPTS() calls in the loops in patches 5 and 6 at least

I did something like this earlier, but was advised to simply let the
functions finish as usual during shutdown [0].  I think this is what the
checkpointer process does today, anyway.

> * not sure why you want immediate execution of custodian tasks - I
> feel supporting two modes will be a lot harder. For me, I would run
> locally when !IsUnderPostmaster and also in an Assert build, so we can
> test it works right - i.e. running in its own process is just a
> production optimization for performance (which is the stated reason
> for having this)

I added this because 0004 involves requesting a task from the postmaster,
so checking for IsUnderPostmaster doesn't work.  Those tasks would always
run immediately.  However, we could use IsPostmasterEnvironment instead,
which would allow us to remove the "immediate" argument.  I did it this way
in v14.

I'm not sure about running locally in Assert builds.  It's true that would
help ensure there's test coverage for the task logic, but it would also
reduce coverage for the custodian logic.  And in general, I'm worried about
having Assert builds use a different code path than production builds.

> 0005 seems good from what I know
> * There is no check to see if it worked in any sane time

What did you have in mind?  Should the custodian begin emitting WARNINGs
after a while?

> * It seems possible that "Old" might change meaning - will that make
> it break/fail?

I don't believe so.

> Rather than explicitly use DEBUG1 everywhere I would have an
> #define CUSTODIAN_LOG_LEVEL     LOG
> so we can run with it in LOG mode and then set it to DEBUG1 with a one
> line change in a later phase of Beta

I can create a separate patch for this, but I don't think I've ever seen
this sort of thing before.  Is the idea just to help with debugging during
the development phase?

> I can't really comment with knowledge on sub-patches 0002 to 0004.
> 
> Perhaps you should aim to get 1, 5, 6 committed first and then return
> to the others in a later CF/separate thread?

That seems like a good idea since those are all relatively self-contained.
I removed 0002-0004 in v14.

[0] https://postgr.es/m/20220217065938.x2esfdppzypegn5j%40alap3.anarazel.de

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Okay, here is a new patch set.  0004 adds logic to prevent custodian tasks
from delaying shutdown.

I haven't added any logging for long-running tasks yet.  Tasks might
ordinarily take a while, so such logs wouldn't necessarily indicate
something is wrong.  Perhaps we could add a GUC for the amount of time to
wait before logging.  This feature would be off by default.  Another option
could be to create a log_custodian GUC that causes tasks to be logged when
completed, similar to log_checkpoints.  Thoughts?

On Mon, Nov 28, 2022 at 01:37:01PM -0500, Robert Haas wrote:
> On Mon, Nov 28, 2022 at 1:31 PM Andres Freund <andres@anarazel.de> wrote:
>> On 2022-11-28 13:08:57 +0000, Simon Riggs wrote:
>> > On Sun, 27 Nov 2022 at 23:34, Nathan Bossart <nathandbossart@gmail.com> wrote:
>> > > > Rather than explicitly use DEBUG1 everywhere I would have an
>> > > > #define CUSTODIAN_LOG_LEVEL     LOG
>> > > > so we can run with it in LOG mode and then set it to DEBUG1 with a one
>> > > > line change in a later phase of Beta
>> > >
>> > > I can create a separate patch for this, but I don't think I've ever seen
>> > > this sort of thing before.
>> >
>> > Much of recovery is coded that way, for the same reason.
>>
>> I think that's not a good thing to copy without a lot more justification than
>> "some old code also does it that way". It's sometimes justified, but also
>> makes code harder to read (one doesn't know what it does without looking up
>> the #define, line length).
> 
> Yeah. If people need some of the log messages at a higher level during
> development, they can patch their own copies.
> 
> I think there might be some argument for having a facility that lets
> you pick subsystems or even individual messages that you want to trace
> and pump up the log level for just those call sites. But I don't know
> exactly what that would look like, and I don't think inventing one-off
> mechanisms for particular cases is a good idea.

Given this discussion, I haven't made any changes to the logging in the new
patch set.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: O(n) tasks cause lengthy startups and checkpoints

From

Simon Riggs

Date:

29 November 2022, 12:02:44

On Mon, 28 Nov 2022 at 23:40, Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> Okay, here is a new patch set.  0004 adds logic to prevent custodian tasks
> from delaying shutdown.

That all seems good, thanks.

The last important point for me is tests, in src/test/modules
probably. It might be possible to reuse the final state of other
modules' tests to test cleanup, or at least integrate a custodian test
into each module.

-- 
Simon Riggs                http://www.EnterpriseDB.com/

Re: O(n) tasks cause lengthy startups and checkpoints

From

Nathan Bossart

Date:

30 November 2022, 03:56:53

On Tue, Nov 29, 2022 at 12:02:44PM +0000, Simon Riggs wrote:
> The last important point for me is tests, in src/test/modules
> probably. It might be possible to reuse the final state of other
> modules' tests to test cleanup, or at least integrate a custodian test
> into each module.

Of course.  I found some existing tests for the test_decoding plugin that
appear to reliably generate the files we want the custodian to clean up, so
I added them there.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: O(n) tasks cause lengthy startups and checkpoints

From

Nathan Bossart

Date:

30 November 2022, 05:18:33

On Tue, Nov 29, 2022 at 07:56:53PM -0800, Nathan Bossart wrote:
> On Tue, Nov 29, 2022 at 12:02:44PM +0000, Simon Riggs wrote:
>> The last important point for me is tests, in src/test/modules
>> probably. It might be possible to reuse the final state of other
>> modules' tests to test cleanup, or at least integrate a custodian test
>> into each module.
> 
> Of course.  I found some existing tests for the test_decoding plugin that
> appear to reliably generate the files we want the custodian to clean up, so
> I added them there.

cfbot is not happy with v16.  AFAICT this is just due to poor placement, so
here is another attempt with the tests moved to a new location.  Apologies
for the noise.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

On Wed, Nov 30, 2022 at 05:27:10PM +0530, Bharath Rupireddy wrote:
> On Wed, Nov 30, 2022 at 4:52 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
>> Thanks for the patches. I spent some time on reviewing v17 patch set
>> and here are my comments:

Thanks for reviewing!

>> 0001:
>> 1. I think the custodian process needs documentation - it needs a
>> definition in glossary.sgml and perhaps a dedicated page describing
>> what tasks it takes care of.

Good catch.  I added this in v18.  I stopped short of adding a dedicated
page to describe the tasks because 1) there are no parameters for the
custodian and 2) AFAICT none of its tasks are described in the docs today.

>> 2.
>> +        LWLockReleaseAll();
>> +        ConditionVariableCancelSleep();
>> +        AbortBufferIO();
>> +        UnlockBuffers();
>> +        ReleaseAuxProcessResources(false);
>> +        AtEOXact_Buffers(false);
>> +        AtEOXact_SMgr();
>> +        AtEOXact_Files(false);
>> +        AtEOXact_HashTables(false);
>> Do we need all of these in the exit path? Isn't the stuff that
>> ShutdownAuxiliaryProcess() does enough for the custodian process?
>> AFAICS, the custodian process uses LWLocks (which the
>> ShutdownAuxiliaryProcess() takes care of) and it doesn't access shared
>> buffers and so on.
>> Having said that, I'm fine to keep them for future use and all of
>> those cleanup functions exit if nothing related occurs.

Yeah, I don't think we need a few of these.  In v18, I've kept the
following:
    * LWLockReleaseAll()
    * ConditionVariableCancelSleep()
    * ReleaseAuxProcessResources(false)
    * AtEOXact_Files(false)

>> 3.
>> +     * Advertise out latch that backends can use to wake us up while we're
>> Typo - %s/out/our

fixed

>> 4. Is it a good idea to add log messages in the DoCustodianTasks()
>> loop? Maybe at a debug level? The log message can say the current task
>> the custodian is processing. And/Or setting the custodian's status on
>> the ps display is also a good idea IMO.

I'd like to pick these up in a new thread if/when this initial patch set is
committed.  The tasks already do some logging, and the checkpointer process
doesn't update the ps display for these tasks today.

>> 0002 and 0003:
>> 1.
>> +CHECKPOINT;
>> +DO $$
>> I think we need to ensure that there are some snapshot files before
>> the checkpoint. Otherwise, it may happen that the above test case
>> exits without the custodian process doing anything.
>>
>> 2. I think the best way to test the custodian process code is by
>> adding a TAP test module to see actually the custodian process kicks
>> in. Perhaps, add elog(DEBUGX,...) messages to various custodian
>> process functions and see if we see the logs in server logs.

The test appears to reliably create snapshot and mapping files, so if the
directories are empty at some point after the checkpoint at the end, we can
be reasonably certain the custodian took action.  I didn't add explicit
checks that there are files in the directories before the checkpoint
because a concurrent checkpoint could make such checks unreliable.

>> 0004:
>> I think the 0004 patch can be merged into 0001, 0002 and 0003 patches.
>> Otherwise the patch LGTM.

I'm keeping this one separate because I've received conflicting feedback
about the idea.

>> 1. I think we can trivially extend the custodian process to remove any
>> future WAL files on the old timeline, something like the attached
>> 0001-Move-removal-of-future-WAL-files-on-the-old-timeline.text file).
>> While this offloads the recovery a bit, the server may archive such
>> WAL files before the custodian removes them. We can do a bit more to
>> stop the server from archiving such WAL files, but that needs more
>> coding. I don't think we need to do all that now, perhaps, we can give
>> it a try once the basic custodian stuff gets in.
>> 2. Moving RemovePgTempFiles() to the custodian can bring up the server
>> soon. The idea is that the postmaster just renames the temp
>> directories and informs the custodian so that it can go delete such
>> temp files and directories. I have personally seen cases where the
>> server spent a good amount of time cleaning up temp files. We can park
>> it for later.
>> 3. Moving RemoveOldXlogFiles() to the custodian can make checkpoints faster.
>> 4. PreallocXlogFiles() - if we ever have plans to make pre-allocation
>> more aggressive (pre-allocate more than 1 WAL file), perhaps letting
>> custodian do that is a good idea. Again, too many tasks for a single
>> process.

I definitely want to do #2.  І have some patches for that upthread, but I
removed them for now based on Simon's feedback.  I intend to pick that up
in a new thread.  I haven't thought too much about the others yet.

> Another comment:
> IIUC, there's no custodian_delay GUC as we want to avoid unnecessary
> wakeups for power savings (being discussed in the other thread).
> However, can it happen that the custodian missed to capture SetLatch
> wakeups by other backends? In other words, can the custodian process
> be sleeping when there's work to do?

I'm not aware of any way this could happen, but if there is one, I think we
should treat it as a bug instead of relying on the custodian process to
periodically wake up and check for work to do.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

rebased for cfbot

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: O(n) tasks cause lengthy startups and checkpoints

From

Nathan Bossart

Date:

17 February 2023, 23:43:44

On Thu, Feb 02, 2023 at 09:48:08PM -0800, Nathan Bossart wrote:
> rebased for cfbot

another rebase for cfbot

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

On Tue, Jul 04, 2023 at 09:30:43AM +0200, Daniel Gustafsson wrote:
>> On 4 Apr 2023, at 05:36, Nathan Bossart <nathandbossart@gmail.com> wrote:
>> 
>> I sent this one to the next commitfest and marked it as waiting-on-author
>> and targeted for v17.  I'm aiming to have something that addresses the
>> latest feedback ready for the July commitfest.
> 
> Have you had a chance to look at this such that there is something ready?

Not yet, sorry.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com