Thread: Generalize ereport_startup_progress infrastructure

Generalize ereport_startup_progress infrastructure

From

Bharath Rupireddy

Date:

02 August 2022, 07:25:01

Hi,

ereport_startup_progress infrastructure added by commit 9ce346e [1]
will be super-useful for reporting progress of any long-running server
operations, not just the startup process operations. For instance,
postmaster can use it for reporting progress of temp file and temp
relation file removals [2], checkpointer can use it for reporting
progress of snapshot or mapping file processing or even WAL file
processing and so on. And I'm sure there can be many places in the
code where we have while or for loops which can, at times, take a long
time to finish and having a log message there would definitely help.

Here's an attempt to generalize the ereport_startup_progress
infrastructure. The attached v1 patch places the code in elog.c/.h,
renames associated functions and variables, something like
ereport_startup_progress to ereport_progress,
log_startup_progress_interval to log_progress_report_interval and so
on.

Thoughts?

Thanks Robert for an offlist chat.

[1]
commit 9ce346eabf350a130bba46be3f8c50ba28506969
Author: Robert Haas <rhaas@postgresql.org>
Date:   Mon Oct 25 11:51:57 2021 -0400

    Report progress of startup operations that take a long time.

[2] https://www.postgresql.org/message-id/CALj2ACWeUFhhnDJKm6R5YxCsF4K7aB2pmRMvqP0BVTxdyce3EA%40mail.gmail.com

-- 
Bharath Rupireddy
RDS Open Source Databases: https://aws.amazon.com/

Attachment

v1-0001-Generalize-ereport_startup_progress-infrastructur.patch

Re: Generalize ereport_startup_progress infrastructure

From

Robert Haas

Date:

02 August 2022, 18:40:56

On Tue, Aug 2, 2022 at 3:25 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> ereport_startup_progress infrastructure added by commit 9ce346e [1]
> will be super-useful for reporting progress of any long-running server
> operations, not just the startup process operations. For instance,
> postmaster can use it for reporting progress of temp file and temp
> relation file removals [2], checkpointer can use it for reporting
> progress of snapshot or mapping file processing or even WAL file
> processing and so on. And I'm sure there can be many places in the
> code where we have while or for loops which can, at times, take a long
> time to finish and having a log message there would definitely help.
>
> Here's an attempt to generalize the ereport_startup_progress
> infrastructure. The attached v1 patch places the code in elog.c/.h,
> renames associated functions and variables, something like
> ereport_startup_progress to ereport_progress,
> log_startup_progress_interval to log_progress_report_interval and so
> on.

I'm not averse to reusing this infrastructure in other places, but I
doubt we'd want all of those places to be controlled by a single GUC,
especially because that GUC is also the on/off switch for the feature.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Generalize ereport_startup_progress infrastructure

From

"Drouvot, Bertrand"

Date:

03 August 2022, 11:49:37

Hi,

On 8/2/22 8:40 PM, Robert Haas wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you
canconfirm the sender and know the content is safe.
 
>
>
>
> On Tue, Aug 2, 2022 at 3:25 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
>> ereport_startup_progress infrastructure added by commit 9ce346e [1]
>> will be super-useful for reporting progress of any long-running server
>> operations, not just the startup process operations. For instance,
>> postmaster can use it for reporting progress of temp file and temp
>> relation file removals [2], checkpointer can use it for reporting
>> progress of snapshot or mapping file processing or even WAL file
>> processing and so on. And I'm sure there can be many places in the
>> code where we have while or for loops which can, at times, take a long
>> time to finish and having a log message there would definitely help.
>>
>> Here's an attempt to generalize the ereport_startup_progress
>> infrastructure. The attached v1 patch places the code in elog.c/.h,
>> renames associated functions and variables, something like
>> ereport_startup_progress to ereport_progress,
>> log_startup_progress_interval to log_progress_report_interval and so
>> on.
> I'm not averse to reusing this infrastructure in other places, but I
> doubt we'd want all of those places to be controlled by a single GUC,
> especially because that GUC is also the on/off switch for the feature.

+1 on the idea to generalize this infrastructure in other places.

I also doubt about having one single GUC to control all the places: What 
about adding in the patch the calls to the new API where you think it 
could be useful too? (and in the same time make use of dedicated GUC(s) 
where it makes sense?)

Regards,

-- 

Bertrand Drouvot
Amazon Web Services: https://aws.amazon.com

Re: Generalize ereport_startup_progress infrastructure

From

Bharath Rupireddy

Date:

04 August 2022, 04:27:48

On Wed, Aug 3, 2022 at 12:11 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Tue, Aug 2, 2022 at 3:25 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> > ereport_startup_progress infrastructure added by commit 9ce346e [1]
> > will be super-useful for reporting progress of any long-running server
> > operations, not just the startup process operations. For instance,
> > postmaster can use it for reporting progress of temp file and temp
> > relation file removals [2], checkpointer can use it for reporting
> > progress of snapshot or mapping file processing or even WAL file
> > processing and so on. And I'm sure there can be many places in the
> > code where we have while or for loops which can, at times, take a long
> > time to finish and having a log message there would definitely help.
> >
> > Here's an attempt to generalize the ereport_startup_progress
> > infrastructure. The attached v1 patch places the code in elog.c/.h,
> > renames associated functions and variables, something like
> > ereport_startup_progress to ereport_progress,
> > log_startup_progress_interval to log_progress_report_interval and so
> > on.
>
> I'm not averse to reusing this infrastructure in other places, but I
> doubt we'd want all of those places to be controlled by a single GUC,
> especially because that GUC is also the on/off switch for the feature.

Thanks Robert! How about we tweak the function a bit -
begin_progress_report_phase(int timeout), so that each process can use
their own timeout interval? In  this case, do we want to retain
log_startup_progress_interval as-is specific to the startup process?
If yes, other processes might come up with their own GUCs (if they
don't want to use hard-coded timeouts) similar to
log_startup_progress_interval, which isn't the right way IMO.

I think the notion of ereport_progress feature being disabled when the
timeout is 0, makes sense to me at least.

On the flip side, what if we just have a single GUC
log_progress_report_interval (as proposed in the v1 patch)? Do we ever
want different processes to emit progress report messages at different
frequencies? Well, I can think of the startup process during standby
recovery needing to emit recovery progress report messages at a much
lower frequency than the startup process during the crash recovery.
Again, controlling the frequencies with different GUCs isn't the way
forward. But we can do something like: process 1 emits messages with a
frequency of 2*log_progress_report_interval, process 2 with  a
frequency 4*log_progress_report_interval and so on without needing
additional GUCs.

Thoughts?

--
Bharath Rupireddy
RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/

Re: Generalize ereport_startup_progress infrastructure

From

Bharath Rupireddy

Date:

08 August 2022, 04:29:09

On Thu, Aug 4, 2022 at 9:57 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Wed, Aug 3, 2022 at 12:11 AM Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > On Tue, Aug 2, 2022 at 3:25 AM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > > ereport_startup_progress infrastructure added by commit 9ce346e [1]
> > > will be super-useful for reporting progress of any long-running server
> > > operations, not just the startup process operations. For instance,
> > > postmaster can use it for reporting progress of temp file and temp
> > > relation file removals [2], checkpointer can use it for reporting
> > > progress of snapshot or mapping file processing or even WAL file
> > > processing and so on. And I'm sure there can be many places in the
> > > code where we have while or for loops which can, at times, take a long
> > > time to finish and having a log message there would definitely help.
> > >
> > > Here's an attempt to generalize the ereport_startup_progress
> > > infrastructure. The attached v1 patch places the code in elog.c/.h,
> > > renames associated functions and variables, something like
> > > ereport_startup_progress to ereport_progress,
> > > log_startup_progress_interval to log_progress_report_interval and so
> > > on.
> >
> > I'm not averse to reusing this infrastructure in other places, but I
> > doubt we'd want all of those places to be controlled by a single GUC,
> > especially because that GUC is also the on/off switch for the feature.
>
> Thanks Robert! How about we tweak the function a bit -
> begin_progress_report_phase(int timeout), so that each process can use
> their own timeout interval? In  this case, do we want to retain
> log_startup_progress_interval as-is specific to the startup process?
> If yes, other processes might come up with their own GUCs (if they
> don't want to use hard-coded timeouts) similar to
> log_startup_progress_interval, which isn't the right way IMO.
>
> I think the notion of ereport_progress feature being disabled when the
> timeout is 0, makes sense to me at least.
>
> On the flip side, what if we just have a single GUC
> log_progress_report_interval (as proposed in the v1 patch)? Do we ever
> want different processes to emit progress report messages at different
> frequencies? Well, I can think of the startup process during standby
> recovery needing to emit recovery progress report messages at a much
> lower frequency than the startup process during the crash recovery.
> Again, controlling the frequencies with different GUCs isn't the way
> forward. But we can do something like: process 1 emits messages with a
> frequency of 2*log_progress_report_interval, process 2 with  a
> frequency 4*log_progress_report_interval and so on without needing
> additional GUCs.
>
> Thoughts?

Here's v2 patch, passing progress report interval as an input to
begin_progress_report_phase() so that the processes can use their own
intervals(hard-coded or GUC) if they wish to not use the generic GUC
log_progress_report_interval.

Thoughts?

-- 
Bharath Rupireddy
RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/

Attachment

v2-0001-Generalize-ereport_startup_progress-infrastructur.patch

Re: Generalize ereport_startup_progress infrastructure

From

Robert Haas

Date:

09 August 2022, 12:35:24

On Mon, Aug 8, 2022 at 12:29 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> Here's v2 patch, passing progress report interval as an input to
> begin_progress_report_phase() so that the processes can use their own
> intervals(hard-coded or GUC) if they wish to not use the generic GUC
> log_progress_report_interval.

I don't think we should rename the GUC to be more generic. I like it
the way that it is.

I also think you should extend this patch series with 1 or 2
additional patches showing where else you think we should be using
this infrastructure.

If no such places exist, this is pointless.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Generalize ereport_startup_progress infrastructure

From

Bharath Rupireddy

Date:

09 August 2022, 15:54:37

On Tue, Aug 9, 2022 at 6:05 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Mon, Aug 8, 2022 at 12:29 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> > Here's v2 patch, passing progress report interval as an input to
> > begin_progress_report_phase() so that the processes can use their own
> > intervals(hard-coded or GUC) if they wish to not use the generic GUC
> > log_progress_report_interval.
>
> I don't think we should rename the GUC to be more generic. I like it
> the way that it is.

Done.

> I also think you should extend this patch series with 1 or 2
> additional patches showing where else you think we should be using
> this infrastructure.
>
> If no such places exist, this is pointless.

I'm attaching 0002 for reporting removal of temp files and temp
relation files by postmaster.

If this looks okay, I can code 0003 for reporting processing of
snapshot, mapping and old WAL files by checkpointer.

Thoughts?

-- 
Bharath Rupireddy
RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/

Attachment

Re: Generalize ereport_startup_progress infrastructure

From

Nitin Jadhav

Date:

10 August 2022, 12:50:54

> > > Here's an attempt to generalize the ereport_startup_progress
> > > infrastructure. The attached v1 patch places the code in elog.c/.h,
> > > renames associated functions and variables, something like
> > > ereport_startup_progress to ereport_progress,
> > > log_startup_progress_interval to log_progress_report_interval and so
> > > on.
> >
> > I'm not averse to reusing this infrastructure in other places, but I
> > doubt we'd want all of those places to be controlled by a single GUC,
> > especially because that GUC is also the on/off switch for the feature.
>
> Thanks Robert! How about we tweak the function a bit -
> begin_progress_report_phase(int timeout), so that each process can use
> their own timeout interval? In  this case, do we want to retain
> log_startup_progress_interval as-is specific to the startup process?
> If yes, other processes might come up with their own GUCs (if they
> don't want to use hard-coded timeouts) similar to
> log_startup_progress_interval, which isn't the right way IMO.
>
> I think the notion of ereport_progress feature being disabled when the
> timeout is 0, makes sense to me at least.
>
> On the flip side, what if we just have a single GUC
> log_progress_report_interval (as proposed in the v1 patch)? Do we ever
> want different processes to emit progress report messages at different
> frequencies? Well, I can think of the startup process during standby
> recovery needing to emit recovery progress report messages at a much
> lower frequency than the startup process during the crash recovery.
> Again, controlling the frequencies with different GUCs isn't the way
> forward. But we can do something like: process 1 emits messages with a
> frequency of 2*log_progress_report_interval, process 2 with  a
> frequency 4*log_progress_report_interval and so on without needing
> additional GUCs.
>
> Thoughts?

+1 for the idea to generalize the infrastructure.

Given two options, option-1 is to use a single GUC across all kind of
log running operations and option-2 is to use multiple GUCs (one for
each kind of long running operations), I go with option-1 because if a
user is interested to see a log message after every 10s for startup
operations (or any other long running operations) then it is likely
that he is interested to see other long running operations after every
10s only. It does not make sense to use different intervals for each
kind of long running operation here. It also increases the number of
GUCs which makes things complex. So it is a good idea to use a single
GUC here. But I am worried about the on/off switch as Robert
mentioned. How about using a new GUC to indicate features on/off. Say
"log_long_running_operations" which contains a comma separated string
which indicates the features to be enabled. For example,
"log_long_running_operations = startup, postmaster" will enable
logging for startup and postmaster operations and disables logging of
other long running operations. With this the number of GUCs will be
limited to 2 and it is simple and easy for the user.


Thanks & Regards,
Nitin Jadhav

On Thu, Aug 4, 2022 at 9:58 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Wed, Aug 3, 2022 at 12:11 AM Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > On Tue, Aug 2, 2022 at 3:25 AM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > > ereport_startup_progress infrastructure added by commit 9ce346e [1]
> > > will be super-useful for reporting progress of any long-running server
> > > operations, not just the startup process operations. For instance,
> > > postmaster can use it for reporting progress of temp file and temp
> > > relation file removals [2], checkpointer can use it for reporting
> > > progress of snapshot or mapping file processing or even WAL file
> > > processing and so on. And I'm sure there can be many places in the
> > > code where we have while or for loops which can, at times, take a long
> > > time to finish and having a log message there would definitely help.
> > >
> > > Here's an attempt to generalize the ereport_startup_progress
> > > infrastructure. The attached v1 patch places the code in elog.c/.h,
> > > renames associated functions and variables, something like
> > > ereport_startup_progress to ereport_progress,
> > > log_startup_progress_interval to log_progress_report_interval and so
> > > on.
> >
> > I'm not averse to reusing this infrastructure in other places, but I
> > doubt we'd want all of those places to be controlled by a single GUC,
> > especially because that GUC is also the on/off switch for the feature.
>
> Thanks Robert! How about we tweak the function a bit -
> begin_progress_report_phase(int timeout), so that each process can use
> their own timeout interval? In  this case, do we want to retain
> log_startup_progress_interval as-is specific to the startup process?
> If yes, other processes might come up with their own GUCs (if they
> don't want to use hard-coded timeouts) similar to
> log_startup_progress_interval, which isn't the right way IMO.
>
> I think the notion of ereport_progress feature being disabled when the
> timeout is 0, makes sense to me at least.
>
> On the flip side, what if we just have a single GUC
> log_progress_report_interval (as proposed in the v1 patch)? Do we ever
> want different processes to emit progress report messages at different
> frequencies? Well, I can think of the startup process during standby
> recovery needing to emit recovery progress report messages at a much
> lower frequency than the startup process during the crash recovery.
> Again, controlling the frequencies with different GUCs isn't the way
> forward. But we can do something like: process 1 emits messages with a
> frequency of 2*log_progress_report_interval, process 2 with  a
> frequency 4*log_progress_report_interval and so on without needing
> additional GUCs.
>
> Thoughts?
>
> --
> Bharath Rupireddy
> RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/
>
>

Re: Generalize ereport_startup_progress infrastructure

From

Robert Haas

Date:

10 August 2022, 15:00:20

On Tue, Aug 9, 2022 at 11:54 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> I'm attaching 0002 for reporting removal of temp files and temp
> relation files by postmaster.
>
> If this looks okay, I can code 0003 for reporting processing of
> snapshot, mapping and old WAL files by checkpointer.

I think that someone is going to complain about the changes to
timeout.c. Some trouble has been taken to allow things like
SetLatch(MyLatch) to be unconditional. Aside from that, I am unsure
how generally safe it is to use the timeout infrastructure in the
postmaster.

From a user-interface point of view, log_postmaster_progress_interval
seems a bit awkward. It's really quite narrow, basically just checking
for one thing. I'm not sure I like adding a GUC for something that
specific, although I also don't have another idea at the moment
either. Hmm.

Maybe the checkpointer is a better candidate, but somehow I feel that
we can't consider this sort of thing separate from the existing
progress reporting that checkpointer already does. Perhaps we need to
think of changing or improving that in some way rather than adding
something wholly new alongside the existing system.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Generalize ereport_startup_progress infrastructure

From

Nathan Bossart

Date:

16 August 2022, 21:15:44

On Wed, Aug 10, 2022 at 11:00:20AM -0400, Robert Haas wrote:
> Maybe the checkpointer is a better candidate, but somehow I feel that
> we can't consider this sort of thing separate from the existing
> progress reporting that checkpointer already does. Perhaps we need to
> think of changing or improving that in some way rather than adding
> something wholly new alongside the existing system.

I agree that the checkpointer has a good chance of being a better
candidate.  Are you thinking of integrating this into log_checkpoints
somehow?  Perhaps this parameter could optionally accept an interval for
logging the progress of ongoing checkpoints.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Generalize ereport_startup_progress infrastructure

From

Bharath Rupireddy

Date:

17 August 2022, 08:01:10

On Wed, Aug 10, 2022 at 6:21 PM Nitin Jadhav
<nitinjadhavpostgres@gmail.com> wrote:
>
> Given two options, option-1 is to use a single GUC across all kind of
> log running operations and option-2 is to use multiple GUCs (one for
> each kind of long running operations), I go with option-1 because if a
> user is interested to see a log message after every 10s for startup
> operations (or any other long running operations) then it is likely
> that he is interested to see other long running operations after every
> 10s only. It does not make sense to use different intervals for each
> kind of long running operation here. It also increases the number of
> GUCs which makes things complex. So it is a good idea to use a single
> GUC here.

+1.

> But I am worried about the on/off switch as Robert
> mentioned.

Are you worried that users might want to switch off the progress
report messages at process level, for instance, they want to log the
startup process' long running operations progress but not, say,
checkpointer or postmaster? IMO, a long running operation, if it is
happening in any of the processes, is a concern for the users and
having progress report log messages for them would help users debug
any issues or improve observability of the server as a whole. With
single GUC, the server log might contain progress reports of all the
long running (wherever we use this ereport_progress()) operations in
the entire server's lifecycle, which isn't bad IMO.

I'd still vote for a single GUC log_progress_report_interval without
worrying much about process-level enable/disable capability. However,
let's hear what other hackers think about this.

-- 
Bharath Rupireddy
RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/

Re: Generalize ereport_startup_progress infrastructure

From

Bharath Rupireddy

Date:

17 August 2022, 08:29:58

On Wed, Aug 17, 2022 at 2:45 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Wed, Aug 10, 2022 at 11:00:20AM -0400, Robert Haas wrote:
> > Maybe the checkpointer is a better candidate, but somehow I feel that
> > we can't consider this sort of thing separate from the existing
> > progress reporting that checkpointer already does. Perhaps we need to
> > think of changing or improving that in some way rather than adding
> > something wholly new alongside the existing system.
>
> I agree that the checkpointer has a good chance of being a better
> candidate.  Are you thinking of integrating this into log_checkpoints
> somehow?  Perhaps this parameter could optionally accept an interval for
> logging the progress of ongoing checkpoints.

Certainly the checkpointer is an immediate candidate. For instance, I
can think of using ereport_progress() in CheckPointSnapBuild() for
snapshot files processing, CheckPointLogicalRewriteHeap() for mapping
files processing, BufferSync() for checkpointing dirty buffers (?),
ProcessSyncRequests() for processing fsync() requests,
RemoveOldXlogFiles(), RemoveNonParentXlogFiles()(?). I personally have
seen cases where some of these checkpoint operations take a lot of
time in production environments and a better observability would help
a lot.

However, I'm not sure if turning log_checkpoints to an integer type to
use for checkpoint progress reporting is a good idea here.

As I explained upthread [1], I'd vote for a single GUC at the entire
server level. If the users/customers request per-process or
per-operation progress report GUCs, we can then consider it.

Thoughts?

[1] https://www.postgresql.org/message-id/CALj2ACUJA73nCK_Li7v4_OOkRqwQBX14Fx2ALb7GDRwUTNGK-Q%40mail.gmail.com

-- 
Bharath Rupireddy
RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/

Re: Generalize ereport_startup_progress infrastructure

From

Robert Haas

Date:

17 August 2022, 15:14:24

On Wed, Aug 17, 2022 at 4:30 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> As I explained upthread [1], I'd vote for a single GUC at the entire
> server level. If the users/customers request per-process or
> per-operation progress report GUCs, we can then consider it.

Well, I don't agree that either of the proposed new uses of this
infrastructure are the right way to solve the problems in question, so
worrying about how to name the GUCs when we have a bunch of uses of
this infrastructure seems to me to be premature. The proposed use in
the postmaster doesn't look very safe, so you either need to give up
on that or figure out a way to make it safe. The proposed use in the
checkpointer looks like it needs more design work, because it's not
clear whether or how it should interact with log_checkpoints. While I
agree that changing log_checkpoints into an integer value doesn't
necessarily make sense, having some kind of new checkpoint logging
that is completely unrelated to existing checkpoint logging doesn't
necessarily make sense to me either.

I do have some sympathy with the idea that if people care about
operations that unexpectedly run for a long time, they probably care
about all of them, and probably don't care about changing the timeout
or even the enable switch for each one individually. Honestly, it's
not very clear to me who would want to ever turn off the startup
progress stuff, or why they'd want to change the interval. I added a
GUC for it out of an abundance of caution, but I don't know why you'd
really want a different setting. Maybe there's some reason, but it's
not clear to me. At the same time, I don't think the overall picture
here is too clear yet. I'm reluctant to commit to a specific UI for a
feature whose scope we don't seem to know.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Generalize ereport_startup_progress infrastructure

From

Bharath Rupireddy

Date:

29 September 2022, 11:57:54

On Wed, Aug 17, 2022 at 8:44 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> Well, I don't agree that either of the proposed new uses of this
> infrastructure are the right way to solve the problems in question, so
> worrying about how to name the GUCs when we have a bunch of uses of
> this infrastructure seems to me to be premature.

Agreed.

> The proposed use in
> the postmaster doesn't look very safe, so you either need to give up
> on that or figure out a way to make it safe.

Is registering a SIGALRM handler in postmaster not a good idea? Is
setting the MyLatch conditionally [1] a concern?

I agree that the handle_sig_alarm() code for postmaster may not look
good as it holds interrupts and does a bunch of other things. But is
it a bigger issue?

> The proposed use in the
> checkpointer looks like it needs more design work, because it's not
> clear whether or how it should interact with log_checkpoints. While I
> agree that changing log_checkpoints into an integer value doesn't
> necessarily make sense, having some kind of new checkpoint logging
> that is completely unrelated to existing checkpoint logging doesn't
> necessarily make sense to me either.

Hm. Yes, we cannot forget about log_checkpoints while considering
adding more logs and controls with other GUCs. We could say that one
needs to enable both log_checkpoints and the progress report GUC, but
that's not great from usability perspective.

> I do have some sympathy with the idea that if people care about
> operations that unexpectedly run for a long time, they probably care
> about all of them, and probably don't care about changing the timeout
> or even the enable switch for each one individually.

I've seen the cases myself and asked by many about the server being
unresponsive in the cases where it processes files, for instance, temp
files in postmaster after a restart or snapshot or mapping or
BufferSync() during checkpoint where this sort of progress reporting
would've helped.

Thinking of another approach for reporting file processing alone - a
GUC log_file_processing_traffic = {none, medium, high} or {0, 1, 2,
..... limit} that users can set to emit a file processing log after a
certain number of files. It doesn't require a timeout mechanism, so it
can be used by any process. But, it is specific to just files.

Similar to above but a bit generic, not specific to just file
processing, a GUC log_processing_traffic = {none, medium, high} or {0,
1, 2, ..... limit}.

Thoughts?

[1]
     /*
      * SIGALRM is always cause for waking anything waiting on the process
      * latch.
+     *
+     * Postmaster has no latch associated with it.
      */
-    SetLatch(MyLatch);
+    if (MyLatch)
+        SetLatch(MyLatch);

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com