Instrument checkpoint sync calls - Mailing list pgsql-hackers
From | Greg Smith |
---|---|
Subject | Instrument checkpoint sync calls |
Date | |
Msg-id | 4CE056B6.4050901@2ndquadrant.com Whole thread Raw |
Responses |
Re: Instrument checkpoint sync calls
|
List | pgsql-hackers |
Attached patch adds some logging for each individual fsync call made during a checkpoint, along with a summary at the end. You need to have the following to see all of the detail: log_checkpoints=on log_min_messages=debug1 And here's a sample: LOG: checkpoint starting: immediate force wait DEBUG: checkpoint sync: file=1 time=1.946000 msec DEBUG: checkpoint sync: file=2 time=0.666000 msec DEBUG: checkpoint sync: file=3 time=0.004000 msec LOG: checkpoint sync: files=3 longest=1.946000 msec average=0.872000 msec LOG: checkpoint complete: wrote 3 buffers (0.1%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=0.000 s, sync=0.002 s, total=0.003 s I think that it's reasonable for the sort of people who turn log_checkpoints on to also get the sync summary line, thus it being logged at LOG level. The detail on individual syncs might go to DEBUG2 or lower instead of DEBUG1 where I put it, that part I don't have a strong opinion on. It's at DEBUG1 to make testing the patch without a gigantic amount of log data also coming in easier. Right now the code is written such that all the calls that grab timing information are wrapped around "ifdef DEBUG_FSYNC", which is a variable set to 1 that could be a compile-time option like DEBUG_DEADLOCK, to allow turning this code path off at build time. I personally think that if you're already making an fsync call and have log_checkpoints on, the additional overhead of also timing that fsync is minimal even on platforms where timing is slow (I don't have such a system to test that assumption however). And I've seen enough value in troubleshooting nasty checkpoint sync problems using this patch to feel it's worth having even if it does add some overhead. I'm a little concerned about log_checkpoints changing on me in the middle of the execution of a checkpoint, which would cause some problems here. Not sure if that's something I should explicitly code for, given that all I think it will do is screw up one of the timing results. It does seem a risk from the last minute self-review I just did of the code. I'll give a sample program that stresses the system, generating slow timing results and other types of bad behavior, along with the next patch I submit here shortly. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services and Support www.2ndQuadrant.us diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c index 1219fcf..c72da72 100644 --- a/src/backend/storage/smgr/md.c +++ b/src/backend/storage/smgr/md.c @@ -20,6 +20,7 @@ #include "catalog/catalog.h" #include "miscadmin.h" +#include "portability/instr_time.h" #include "postmaster/bgwriter.h" #include "storage/fd.h" #include "storage/bufmgr.h" @@ -29,6 +30,11 @@ #include "utils/memutils.h" #include "pg_trace.h" +/* + * For now just leave this on all the time, eventually this will + * need to be a full compiler flag like DEBUG_DEADLOCK + */ +#define DEBUG_FSYNC 1 /* interval for calling AbsorbFsyncRequests in mdsync */ #define FSYNCS_PER_ABSORB 10 @@ -927,6 +933,15 @@ mdsync(void) PendingOperationEntry *entry; int absorb_counter; +#ifdef DEBUG_FSYNC + /* Statistics on sync times */ + int processed = 0; + instr_time sync_start,sync_end,diff; + double elapsed; + double longest = 0; + double total_elapsed = 0; +#endif + /* * This is only called during checkpoints, and checkpoints should only * occur in processes that have created a pendingOpsTable. @@ -1069,9 +1084,32 @@ mdsync(void) seg = _mdfd_getseg(reln, entry->tag.forknum, entry->tag.segno * ((BlockNumber) RELSEG_SIZE), false, EXTENSION_RETURN_NULL); + +#ifdef DEBUG_FSYNC + if (log_checkpoints) + INSTR_TIME_SET_CURRENT(sync_start); +#endif if (seg != NULL && FileSync(seg->mdfd_vfd) >= 0) + { + +#ifdef DEBUG_FSYNC + if (log_checkpoints) + { + INSTR_TIME_SET_CURRENT(sync_end); + diff=sync_end; + INSTR_TIME_SUBTRACT(diff, sync_start); + elapsed = (double) INSTR_TIME_GET_MICROSEC(diff) / 1000.0; + if (elapsed > longest) + longest = elapsed; + total_elapsed+=elapsed; + processed++; + /* Reduce the log level here to DEBUG2 in final patch? */ + elog(DEBUG1, "checkpoint sync: file=%d time=%f msec", processed, elapsed); + } +#endif break; /* success; break out of retry loop */ + } /* * XXX is there any point in allowing more than one retry? @@ -1113,6 +1151,13 @@ mdsync(void) elog(ERROR, "pendingOpsTable corrupted"); } /* end loop over hashtable entries */ +#ifdef DEBUG_FSYNC + /* Report on sync performance metrics */ + if (log_checkpoints && (processed > 0)) + elog(LOG, "checkpoint sync: files=%d longest=%f msec average=%f msec", + processed, longest, total_elapsed / processed); +#endif + /* Flag successful completion of mdsync */ mdsync_in_progress = false; }
pgsql-hackers by date: