Home > mailing lists
Instrument checkpoint sync calls - Mailing list pgsql-hackers

From	Greg Smith
Subject	Instrument checkpoint sync calls
Date	November 14, 2010 17:45:32
Msg-id	4CE056B6.4050901@2ndquadrant.com Whole thread Raw
Responses	Re: Instrument checkpoint sync calls
List	pgsql-hackers
Tree view
Attached patch adds some logging for each individual fsync call made
during a checkpoint, along with a summary at the end.  You need to have
the following to see all of the detail:

log_checkpoints=on
log_min_messages=debug1

And here's a sample:

LOG:  checkpoint starting: immediate force wait
DEBUG:  checkpoint sync: file=1 time=1.946000 msec
DEBUG:  checkpoint sync: file=2 time=0.666000 msec
DEBUG:  checkpoint sync: file=3 time=0.004000 msec
LOG:  checkpoint sync: files=3 longest=1.946000 msec average=0.872000 msec
LOG:  checkpoint complete: wrote 3 buffers (0.1%); 0 transaction log
file(s) added, 0 removed, 0 recycled; write=0.000 s, sync=0.002 s,
total=0.003 s

I think that it's reasonable for the sort of people who turn
log_checkpoints on to also get the sync summary line, thus it being
logged at LOG level.  The detail on individual syncs might go to DEBUG2
or lower instead of DEBUG1 where I put it, that part I don't have a
strong opinion on.  It's at DEBUG1 to make testing the patch without a
gigantic amount of log data also coming in easier.

Right now the code is written such that all the calls that grab timing
information are wrapped around "ifdef DEBUG_FSYNC", which is a variable
set to 1 that could be a compile-time option like DEBUG_DEADLOCK, to
allow turning this code path off at build time.  I personally think that
if you're already making an fsync call and have log_checkpoints on, the
additional overhead of also timing that fsync is minimal even on
platforms where timing is slow (I don't have such a system to test that
assumption however).  And I've seen enough value in troubleshooting
nasty checkpoint sync problems using this patch to feel it's worth
having even if it does add some overhead.

I'm a little concerned about log_checkpoints changing on me in the
middle of the execution of a checkpoint, which would cause some problems
here.  Not sure if that's something I should explicitly code for, given
that all I think it will do is screw up one of the timing results.  It
does seem a risk from the last minute self-review I just did of the code.

I'll give a sample program that stresses the system, generating slow
timing results and other types of bad behavior, along with the next
patch I submit here shortly.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services and Support        www.2ndQuadrant.us


diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index 1219fcf..c72da72 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -20,6 +20,7 @@

 #include "catalog/catalog.h"
 #include "miscadmin.h"
+#include "portability/instr_time.h"
 #include "postmaster/bgwriter.h"
 #include "storage/fd.h"
 #include "storage/bufmgr.h"
@@ -29,6 +30,11 @@
 #include "utils/memutils.h"
 #include "pg_trace.h"

+/*
+ * For now just leave this on all the time, eventually this will
+ * need to be a full compiler flag like DEBUG_DEADLOCK
+ */
+#define DEBUG_FSYNC    1

 /* interval for calling AbsorbFsyncRequests in mdsync */
 #define FSYNCS_PER_ABSORB        10
@@ -927,6 +933,15 @@ mdsync(void)
     PendingOperationEntry *entry;
     int            absorb_counter;

+#ifdef DEBUG_FSYNC
+    /* Statistics on sync times */
+    int processed = 0;
+    instr_time sync_start,sync_end,diff;
+    double elapsed;
+    double longest = 0;
+    double total_elapsed = 0;
+#endif
+
     /*
      * This is only called during checkpoints, and checkpoints should only
      * occur in processes that have created a pendingOpsTable.
@@ -1069,9 +1084,32 @@ mdsync(void)
                 seg = _mdfd_getseg(reln, entry->tag.forknum,
                               entry->tag.segno * ((BlockNumber) RELSEG_SIZE),
                                    false, EXTENSION_RETURN_NULL);
+
+#ifdef DEBUG_FSYNC
+                if (log_checkpoints)
+                    INSTR_TIME_SET_CURRENT(sync_start);
+#endif
                 if (seg != NULL &&
                     FileSync(seg->mdfd_vfd) >= 0)
+                {
+
+#ifdef DEBUG_FSYNC
+                    if (log_checkpoints)
+                    {
+                        INSTR_TIME_SET_CURRENT(sync_end);
+                        diff=sync_end;
+                        INSTR_TIME_SUBTRACT(diff, sync_start);
+                        elapsed = (double) INSTR_TIME_GET_MICROSEC(diff) / 1000.0;
+                        if (elapsed > longest)
+                            longest = elapsed;
+                        total_elapsed+=elapsed;
+                        processed++;
+                        /* Reduce the log level here to DEBUG2 in final patch? */
+                        elog(DEBUG1, "checkpoint sync: file=%d time=%f msec", processed, elapsed);
+                    }
+#endif
                     break;        /* success; break out of retry loop */
+                    }

                 /*
                  * XXX is there any point in allowing more than one retry?
@@ -1113,6 +1151,13 @@ mdsync(void)
             elog(ERROR, "pendingOpsTable corrupted");
     }                            /* end loop over hashtable entries */

+#ifdef DEBUG_FSYNC
+    /* Report on sync performance metrics */
+    if (log_checkpoints && (processed > 0))
+        elog(LOG, "checkpoint sync: files=%d longest=%f msec average=%f msec",
+            processed, longest, total_elapsed / processed);
+#endif
+
     /* Flag successful completion of mdsync */
     mdsync_in_progress = false;
 }
pgsql-hackers by date:
From: Tom Lane
Date: 14 November 2010, 17:43:32
Subject: Re: pg_stat_bgwriter broken?
From: David Fetter
Date: 14 November 2010, 17:47:37
Subject: Re: wCTE behaviour
Instrument checkpoint sync calls - Mailing list pgsql-hackers

Previous

Next