Re: Generalize ereport_startup_progress infrastructure - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: Generalize ereport_startup_progress infrastructure
Date
Msg-id CALj2ACVBej9d55dvaaC74MMrxxDNv-orirEArDZnXLSgpQPWDA@mail.gmail.com
Whole thread Raw
In response to Re: Generalize ereport_startup_progress infrastructure  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Wed, Aug 17, 2022 at 8:44 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> Well, I don't agree that either of the proposed new uses of this
> infrastructure are the right way to solve the problems in question, so
> worrying about how to name the GUCs when we have a bunch of uses of
> this infrastructure seems to me to be premature.

Agreed.

> The proposed use in
> the postmaster doesn't look very safe, so you either need to give up
> on that or figure out a way to make it safe.

Is registering a SIGALRM handler in postmaster not a good idea? Is
setting the MyLatch conditionally [1] a concern?

I agree that the handle_sig_alarm() code for postmaster may not look
good as it holds interrupts and does a bunch of other things. But is
it a bigger issue?

> The proposed use in the
> checkpointer looks like it needs more design work, because it's not
> clear whether or how it should interact with log_checkpoints. While I
> agree that changing log_checkpoints into an integer value doesn't
> necessarily make sense, having some kind of new checkpoint logging
> that is completely unrelated to existing checkpoint logging doesn't
> necessarily make sense to me either.

Hm. Yes, we cannot forget about log_checkpoints while considering
adding more logs and controls with other GUCs. We could say that one
needs to enable both log_checkpoints and the progress report GUC, but
that's not great from usability perspective.

> I do have some sympathy with the idea that if people care about
> operations that unexpectedly run for a long time, they probably care
> about all of them, and probably don't care about changing the timeout
> or even the enable switch for each one individually.

I've seen the cases myself and asked by many about the server being
unresponsive in the cases where it processes files, for instance, temp
files in postmaster after a restart or snapshot or mapping or
BufferSync() during checkpoint where this sort of progress reporting
would've helped.

Thinking of another approach for reporting file processing alone - a
GUC log_file_processing_traffic = {none, medium, high} or {0, 1, 2,
..... limit} that users can set to emit a file processing log after a
certain number of files. It doesn't require a timeout mechanism, so it
can be used by any process. But, it is specific to just files.

Similar to above but a bit generic, not specific to just file
processing, a GUC log_processing_traffic = {none, medium, high} or {0,
1, 2, ..... limit}.

Thoughts?

[1]
     /*
      * SIGALRM is always cause for waking anything waiting on the process
      * latch.
+     *
+     * Postmaster has no latch associated with it.
      */
-    SetLatch(MyLatch);
+    if (MyLatch)
+        SetLatch(MyLatch);

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Aleksander Alekseev
Date:
Subject: Re: Refactor UnpinBuffer()
Next
From: Damir Belyalov
Date:
Subject: Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)