Re: Checkpoint logging, revised patch - Mailing list pgsql-patches
From | Greg Smith |
---|---|
Subject | Re: Checkpoint logging, revised patch |
Date | |
Msg-id | Pine.GSO.4.64.0706292041560.2908@westnet.com Whole thread Raw |
In response to | Checkpoint logging, revised patch (Heikki Linnakangas <heikki@enterprisedb.com>) |
Responses |
Re: Checkpoint logging, revised patch
|
List | pgsql-patches |
On Fri, 29 Jun 2007, Heikki Linnakangas wrote: > LOG: checkpoint complete; buffers written=5869 (35.8%); write=2.081 s, > sync=4.851 s, total=7.066 s My original patch converted the buffers written to MB. Easier to estimate MB/s by eye; I really came to hate multiplying by 8K. And people who have multiple boxes with different BLCKZ could merge their logs together and not have to worry about adjusting for that. I found this simpler to present the results to non-PostgreSQL people without having to explain the buffer size. It's easy for any IT person to follow; the idea I was working toward was "see, this log entry shows the long pauses are because it's writing 700MB of data all at once here, right next to that statment that took 10 seconds we found with log_min_duration_statement". The current equivilant of what I had would be CheckpointStats.ckpt_bufs_written * BLCKSZ / (1024*1024) formatted as "%.1f MB" > One thing that's missing, that I originally hoped to achieve with this, > is logging the cause of a checkpoint. The way pg_stat_bgwriter keeps two separate counts for requested (which includes timeouts) and required (out of segments) checkpoints now satisfies the main reason I originally collected that info. I felt it was critical when I wrote that patch for it to be possible to distinguish between which of the two was happening more, it's only in the nice to have category now. Now, onto some slightly different things here, which were all aimed at developer-level troubleshooting. With the nice reworking for logging checkpoints better, there were three tiny things my original patch did that got lost along the way that I'd suggest Tom might want to consider putting back during the final apply. I'll include mini pseudo-diffs here for those so no one has to find them in my original patch, they're all one-liners: 1) Log every time a new WAL file was created, which ties into the recent discussion here that being a possible performance issue. At least you can look for it happening this way: src/backend/access/transam/xlog.c --- 1856,1863 ---- (errcode_for_file_access(), errmsg("could not create file \"%s\": %m", tmppath))); + ereport(DEBUG2, (errmsg("WAL creating and filling new file on disk"))); /* * Zero-fill the file. We have to do this the hard way to ensure that all 2) Add a lower-level DEBUG statement when autovaccum was finished, which helped me in several causes figure out if that was the cause of a problem (when really doing low-level testing, I would see a vacuum start, not know if it was done, and then wonder if that was the cause of a slow statement): *** src/backend/postmaster/autovacuum.c --- 811,814 ---- do_autovacuum(); + ereport(DEBUG2, + (errmsg("autovacuum: processing database \"%s\" complete", dbname))); } 3) I fixed a line in postmaster.c so it formatted fork PIDs the same way most other log statements do; most statements report it as (PID %d) and the difference in this form seemed undesirable (I spent a lot of time at DEBUG2 and these little things started to bug me): *** src/backend/postmaster/postmaster.c *** 2630,2636 **** /* in parent, successful fork */ ereport(DEBUG2, ! (errmsg_internal("forked new backend, pid=%d socket=%d", (int) pid, port->sock))); --- 2630,2636 ---- /* in parent, successful fork */ ereport(DEBUG2, ! (errmsg_internal("forked new backend (PID %d) socket=%d", (int) pid, port->sock))); Little stuff, but all things I've found valuable on several occasions, which suggests eventually someone else may appreciate them as well. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
pgsql-patches by date: