Re: Checkpoint logging, revised patch - Mailing list pgsql-patches

From Greg Smith
Subject Re: Checkpoint logging, revised patch
Date
Msg-id Pine.GSO.4.64.0706292041560.2908@westnet.com
Whole thread Raw
In response to Checkpoint logging, revised patch  (Heikki Linnakangas <heikki@enterprisedb.com>)
Responses Re: Checkpoint logging, revised patch
List pgsql-patches
On Fri, 29 Jun 2007, Heikki Linnakangas wrote:

> LOG:  checkpoint complete; buffers written=5869 (35.8%); write=2.081 s,
> sync=4.851 s, total=7.066 s

My original patch converted the buffers written to MB. Easier to estimate
MB/s by eye; I really came to hate multiplying by 8K. And people who have
multiple boxes with different BLCKZ could merge their logs together and
not have to worry about adjusting for that.

I found this simpler to present the results to non-PostgreSQL people
without having to explain the buffer size.  It's easy for any IT person to
follow; the idea I was working toward was "see, this log entry shows the
long pauses are because it's writing 700MB of data all at once here, right
next to that statment that took 10 seconds we found with
log_min_duration_statement".

The current equivilant of what I had would be
CheckpointStats.ckpt_bufs_written * BLCKSZ / (1024*1024) formatted as
"%.1f MB"

> One thing that's missing, that I originally hoped to achieve with this,
> is logging the cause of a checkpoint.

The way pg_stat_bgwriter keeps two separate counts for requested (which
includes timeouts) and required (out of segments) checkpoints now
satisfies the main reason I originally collected that info.  I felt it was
critical when I wrote that patch for it to be possible to distinguish
between which of the two was happening more, it's only in the nice to have
category now.

Now, onto some slightly different things here, which were all aimed at
developer-level troubleshooting.  With the nice reworking for logging
checkpoints better, there were three tiny things my original patch did
that got lost along the way that I'd suggest Tom might want to consider
putting back during the final apply.  I'll include mini pseudo-diffs here
for those so no one has to find them in my original patch, they're all
one-liners:

1) Log every time a new WAL file was created, which ties into the recent
discussion here that being a possible performance issue.  At least you can
look for it happening this way:

src/backend/access/transam/xlog.c
--- 1856,1863 ----
                                 (errcode_for_file_access(),
                                  errmsg("could not create file \"%s\":
%m", tmppath)));

+       ereport(DEBUG2, (errmsg("WAL creating and filling new file on
disk")));

         /*
          * Zero-fill the file.  We have to do this the hard way to ensure
that all

2) Add a lower-level DEBUG statement when autovaccum was finished, which
helped me in several causes figure out if that was the cause of a problem
(when really doing low-level testing, I would see a vacuum start, not know
if it was done, and then wonder if that was the cause of a slow
statement):

*** src/backend/postmaster/autovacuum.c
--- 811,814 ----
                 do_autovacuum();
+               ereport(DEBUG2,
+                               (errmsg("autovacuum: processing database
\"%s\" complete", dbname)));
         }

3) I fixed a line in postmaster.c so it formatted fork PIDs the same way
most other log statements do; most statements report it as (PID %d) and
the difference in this form seemed undesirable (I spent a lot of time at
DEBUG2 and these little things started to bug me):

*** src/backend/postmaster/postmaster.c
*** 2630,2636 ****
         /* in parent, successful fork */
         ereport(DEBUG2,
!                       (errmsg_internal("forked new backend, pid=%d
socket=%d",
                                                          (int) pid,
port->sock)));
--- 2630,2636 ----
         /* in parent, successful fork */
         ereport(DEBUG2,
!                       (errmsg_internal("forked new backend (PID %d)
socket=%d",
                                                          (int) pid,
port->sock)));

Little stuff, but all things I've found valuable on several occasions,
which suggests eventually someone else may appreciate them as well.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

pgsql-patches by date:

Previous
From: Tom Lane
Date:
Subject: Re: Checkpoint logging, revised patch
Next
From: Heikki Linnakangas
Date:
Subject: Re: Checkpoint logging, revised patch