Re: Logging checkpoints and other slowdown causes - Mailing list pgsql-patches

From Heikki Linnakangas
Subject Re: Logging checkpoints and other slowdown causes
Date
Msg-id 46443F02.6060300@enterprisedb.com
Whole thread Raw
In response to Re: Logging checkpoints and other slowdown causes  (Greg Smith <gsmith@gregsmith.com>)
Responses Re: Logging checkpoints and other slowdown causes
Re: Logging checkpoints and other slowdown causes
List pgsql-patches
Greg Smith wrote:
> On Tue, 3 Apr 2007, Peter Eisentraut wrote:
>
>> Something that is aimed at a user should not be enabled at a "debug"
>> level.  Debug levels are for debugging, not for very high verbosity.
>
> I asked for feedback about where to log at when I intially sent the
> first version of this in and didn't hear anything back on that part, so
> I pushed these in line with other log messages I saw.  The messages for
> when checkpoints start and stop were both logged at DEBUG2, so I put
> progress reports on the other significant phases of the process there as
> well.

I agree that debug levels are not suitable for this.

I'm thinking of INFO, NOTICE or LOG. The user manual says about LOG:

LOG

     Reports information of interest to administrators, e.g., checkpoint
activity.

But looking at the code, all the checkpoint related messages are at
DEBUG-levels, nothing gets printed at LOG-level. Printing the messages
at LOG-level would bring the code in line with the documentation, but I
don't think we want to fill the log with checkpoint chatter unless the
DBA explicitly asks for that.

How about INFO? It seems like the best level for information on normal
activity of the system. The documentation however has this to say about it:

INFO

     Provides information implicitly requested by the user, e.g., during
VACUUM VERBOSE.

We should adjust the documentation, but INFO seems like the best level
to me. Or we could add a GUC variable similar to log_connections or
log_statement to control if the messages are printed or not, and use LOG.

> I don't expect these messages will be helpful for a normal user--that's
> what the new data in pg_stats_bgwriter is for.  Their main purpose of
> this patch is debugging checkpoint related performance issues at a level
> I'd expect only a developer to work at; they're also helpful for someone
> writing benchmark code.

I disagree. They would be helpful to an administrator chasing down
checkpoint related problems. E.g. checkpoints taking too long, occurring
too often (though we already have log_checkpoint_warning for that), or
to identify if transient performance problems that users complain about
coincide with checkpoints. And at least I like to have messages like
that in the log just to comfort me that everything is going ok.

> There are several patches in process floating around that aim to adjust
> either the background writer or the checkpoint process to reduce the
> impact of checkpoints.  This logging allows grading their success at
> that. As my tests with this patch in place suggest this problem is far
> from solved with any of the current suggestions, I'd like to get other
> developers looking at that problem the same way I have been; that's why
> I'd like to see some standardization on how checkpoints are
> instrumented. The fact that really advanced users might also use this
> for troubleshooting I consider a bonus rather than the main focus here.

Agreed.


Looking at the patch, I think we want one line in the log when
checkpoint starts, one when buffer flushing starts (or maybe not, if we
assume that checkpointing clog, subtrans and multixact don't take long),
one when sync-phase starts and when the checkpoint is done. We don't
need to print the times elapsed in each phase on a separate line, that's
just derived information from the other lines, unless we use different
log-levels for detail lines (like you did in your patch).

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

pgsql-patches by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: ECPG patch to use prepare for improved performance
Next
From: "Guillaume Smet"
Date:
Subject: Re: Logging checkpoints and other slowdown causes