Thread: Instrument checkpoint sync calls

Instrument checkpoint sync calls

From

Greg Smith

Date:

14 November 2010, 17:45:32

Attached patch adds some logging for each individual fsync call made
during a checkpoint, along with a summary at the end.  You need to have
the following to see all of the detail:

log_checkpoints=on
log_min_messages=debug1

And here's a sample:

LOG:  checkpoint starting: immediate force wait
DEBUG:  checkpoint sync: file=1 time=1.946000 msec
DEBUG:  checkpoint sync: file=2 time=0.666000 msec
DEBUG:  checkpoint sync: file=3 time=0.004000 msec
LOG:  checkpoint sync: files=3 longest=1.946000 msec average=0.872000 msec
LOG:  checkpoint complete: wrote 3 buffers (0.1%); 0 transaction log
file(s) added, 0 removed, 0 recycled; write=0.000 s, sync=0.002 s,
total=0.003 s

I think that it's reasonable for the sort of people who turn
log_checkpoints on to also get the sync summary line, thus it being
logged at LOG level.  The detail on individual syncs might go to DEBUG2
or lower instead of DEBUG1 where I put it, that part I don't have a
strong opinion on.  It's at DEBUG1 to make testing the patch without a
gigantic amount of log data also coming in easier.

Right now the code is written such that all the calls that grab timing
information are wrapped around "ifdef DEBUG_FSYNC", which is a variable
set to 1 that could be a compile-time option like DEBUG_DEADLOCK, to
allow turning this code path off at build time.  I personally think that
if you're already making an fsync call and have log_checkpoints on, the
additional overhead of also timing that fsync is minimal even on
platforms where timing is slow (I don't have such a system to test that
assumption however).  And I've seen enough value in troubleshooting
nasty checkpoint sync problems using this patch to feel it's worth
having even if it does add some overhead.

I'm a little concerned about log_checkpoints changing on me in the
middle of the execution of a checkpoint, which would cause some problems
here.  Not sure if that's something I should explicitly code for, given
that all I think it will do is screw up one of the timing results.  It
does seem a risk from the last minute self-review I just did of the code.

I'll give a sample program that stresses the system, generating slow
timing results and other types of bad behavior, along with the next
patch I submit here shortly.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services and Support        www.2ndQuadrant.us


diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index 1219fcf..c72da72 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -20,6 +20,7 @@

 #include "catalog/catalog.h"
 #include "miscadmin.h"
+#include "portability/instr_time.h"
 #include "postmaster/bgwriter.h"
 #include "storage/fd.h"
 #include "storage/bufmgr.h"
@@ -29,6 +30,11 @@
 #include "utils/memutils.h"
 #include "pg_trace.h"

+/*
+ * For now just leave this on all the time, eventually this will
+ * need to be a full compiler flag like DEBUG_DEADLOCK
+ */
+#define DEBUG_FSYNC    1

 /* interval for calling AbsorbFsyncRequests in mdsync */
 #define FSYNCS_PER_ABSORB        10
@@ -927,6 +933,15 @@ mdsync(void)
     PendingOperationEntry *entry;
     int            absorb_counter;

+#ifdef DEBUG_FSYNC
+    /* Statistics on sync times */
+    int processed = 0;
+    instr_time sync_start,sync_end,diff;
+    double elapsed;
+    double longest = 0;
+    double total_elapsed = 0;
+#endif
+
     /*
      * This is only called during checkpoints, and checkpoints should only
      * occur in processes that have created a pendingOpsTable.
@@ -1069,9 +1084,32 @@ mdsync(void)
                 seg = _mdfd_getseg(reln, entry->tag.forknum,
                               entry->tag.segno * ((BlockNumber) RELSEG_SIZE),
                                    false, EXTENSION_RETURN_NULL);
+
+#ifdef DEBUG_FSYNC
+                if (log_checkpoints)
+                    INSTR_TIME_SET_CURRENT(sync_start);
+#endif
                 if (seg != NULL &&
                     FileSync(seg->mdfd_vfd) >= 0)
+                {
+
+#ifdef DEBUG_FSYNC
+                    if (log_checkpoints)
+                    {
+                        INSTR_TIME_SET_CURRENT(sync_end);
+                        diff=sync_end;
+                        INSTR_TIME_SUBTRACT(diff, sync_start);
+                        elapsed = (double) INSTR_TIME_GET_MICROSEC(diff) / 1000.0;
+                        if (elapsed > longest)
+                            longest = elapsed;
+                        total_elapsed+=elapsed;
+                        processed++;
+                        /* Reduce the log level here to DEBUG2 in final patch? */
+                        elog(DEBUG1, "checkpoint sync: file=%d time=%f msec", processed, elapsed);
+                    }
+#endif
                     break;        /* success; break out of retry loop */
+                    }

                 /*
                  * XXX is there any point in allowing more than one retry?
@@ -1113,6 +1151,13 @@ mdsync(void)
             elog(ERROR, "pendingOpsTable corrupted");
     }                            /* end loop over hashtable entries */

+#ifdef DEBUG_FSYNC
+    /* Report on sync performance metrics */
+    if (log_checkpoints && (processed > 0))
+        elog(LOG, "checkpoint sync: files=%d longest=%f msec average=%f msec",
+            processed, longest, total_elapsed / processed);
+#endif
+
     /* Flag successful completion of mdsync */
     mdsync_in_progress = false;
 }

Re: Instrument checkpoint sync calls

From

Magnus Hagander

Date:

14 November 2010, 18:22:01

On Sun, Nov 14, 2010 at 22:37, Greg Smith <greg@2ndquadrant.com> wrote:
> Attached patch adds some logging for each individual fsync call made during
> a checkpoint, along with a summary at the end.  You need to have the
> following to see all of the detail:
>
> log_checkpoints=on
> log_min_messages=debug1
>
> And here's a sample:
>
> LOG:  checkpoint starting: immediate force wait
> DEBUG:  checkpoint sync: file=1 time=1.946000 msec
> DEBUG:  checkpoint sync: file=2 time=0.666000 msec
> DEBUG:  checkpoint sync: file=3 time=0.004000 msec
> LOG:  checkpoint sync: files=3 longest=1.946000 msec average=0.872000 msec
> LOG:  checkpoint complete: wrote 3 buffers (0.1%); 0 transaction log file(s)
> added, 0 removed, 0 recycled; write=0.000 s, sync=0.002 s, total=0.003 s
>
> I think that it's reasonable for the sort of people who turn log_checkpoints
> on to also get the sync summary line, thus it being logged at LOG level.

In that case, wouldn't it make more sense to add a couple of more
fields to the existing line? Easier to post-process the logfile that
way...


>  The detail on individual syncs might go to DEBUG2 or lower instead of
> DEBUG1 where I put it, that part I don't have a strong opinion on.  It's at
> DEBUG1 to make testing the patch without a gigantic amount of log data also
> coming in easier.
>
> Right now the code is written such that all the calls that grab timing
> information are wrapped around "ifdef DEBUG_FSYNC", which is a variable set
> to 1 that could be a compile-time option like DEBUG_DEADLOCK, to allow
> turning this code path off at build time.  I personally think that if you're
> already making an fsync call and have log_checkpoints on, the additional
> overhead of also timing that fsync is minimal even on platforms where timing
> is slow (I don't have such a system to test that assumption however).  And
> I've seen enough value in troubleshooting nasty checkpoint sync problems
> using this patch to feel it's worth having even if it does add some
> overhead.

It sounds like it should be - but that should be possible to measure, no?


--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Instrument checkpoint sync calls

From

Greg Smith

Date:

14 November 2010, 20:05:10

Magnus Hagander wrote:<br /><blockquote cite="mid:AANLkTi=prse2HvBnaj5Amzf9zZq7izNPs0APwx1MP-5z@mail.gmail.com"
type="cite"><blockquotetype="cite"><pre wrap="">
 
I think that it's reasonable for the sort of people who turn log_checkpoints
on to also get the sync summary line, thus it being logged at LOG level.   </pre></blockquote><pre wrap="">
In that case, wouldn't it make more sense to add a couple of more
fields to the existing line? Easier to post-process the logfile that
way... </pre></blockquote><br /> It might.  One trade-off is that if you're looking at the sync write detail, the
summarycomes out in a similar form.  And it was easy to put in here--I'd have to return some new data out of the sync
phasecall in order for that to show up in the main log.  If there's general buy-in on the idea, I could do all of
that.<br/><br /><br /><blockquote cite="mid:AANLkTi=prse2HvBnaj5Amzf9zZq7izNPs0APwx1MP-5z@mail.gmail.com"
type="cite"><blockquotetype="cite"><pre wrap="">I personally think that if you're
 
already making an fsync call and have log_checkpoints on, the additional
overhead of also timing that fsync is minimal even on platforms where timing
is slow (I don't have such a system to test that assumption however)...</pre></blockquote><pre wrap="">
It sounds like it should be - but that should be possible to measure, no? </pre></blockquote><br /> What I was alluding
tois that I know gettimeofday executes fast on my Linux system here, so even if I did measure the overhead and showed
it'snear zero that doesn't mean it will be so on every platform.  The "how long does it take to find out the current
timeon every supported PostgreSQL platform?" question is one I'd like to have an answer to, but it's hard to collect
properly. All I know is that I don't have any system where it's slow to properly test again here.<br /><br /><pre
class="moz-signature"cols="72">-- 
 
Greg Smith   2ndQuadrant US    <a class="moz-txt-link-abbreviated"
href="mailto:greg@2ndQuadrant.com">greg@2ndQuadrant.com</a>  Baltimore, MD
 
PostgreSQL Training, Services and Support        <a class="moz-txt-link-abbreviated"
href="http://www.2ndQuadrant.us">www.2ndQuadrant.us</a>

</pre>

Re: Instrument checkpoint sync calls

From

Robert Haas

Date:

15 November 2010, 15:31:03

On Sun, Nov 14, 2010 at 7:04 PM, Greg Smith <greg@2ndquadrant.com> wrote:
> It might.  One trade-off is that if you're looking at the sync write detail,
> the summary comes out in a similar form.  And it was easy to put in
> here--I'd have to return some new data out of the sync phase call in order
> for that to show up in the main log.  If there's general buy-in on the idea,
> I could do all of that.

+1 for that approach.

> What I was alluding to is that I know gettimeofday executes fast on my Linux
> system here, so even if I did measure the overhead and showed it's near zero
> that doesn't mean it will be so on every platform.  The "how long does it
> take to find out the current time on every supported PostgreSQL platform?"
> question is one I'd like to have an answer to, but it's hard to collect
> properly.  All I know is that I don't have any system where it's slow to
> properly test again here.

I would be very surprised if we can find a system where gettimeofday()
takes a significant amount of time compared with fsync().  It might be
(probably is) too expensive to stick into code paths that are heavily
CPU-bounded, but surely the cost here is going to be dwarfed by the
fsync(), no?  Unless maybe there's no I/O to be done anyway, but that
case doesn't seem important to optimize for.  I think making this code
dependent on ifdef DEBUG_FSYNC is going to make in order of magnitude
less useful, so I'd really rather not do that unless we have some hard
evidence that there is a real problem to worry about here.  Making it
conditional on log_checkpoints seems entirely sufficient to me.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Instrument checkpoint sync calls

From

Tom Lane

Date:

15 November 2010, 15:48:39

Robert Haas <robertmhaas@gmail.com> writes:
> I would be very surprised if we can find a system where gettimeofday()
> takes a significant amount of time compared with fsync().  It might be
> (probably is) too expensive to stick into code paths that are heavily
> CPU-bounded, but surely the cost here is going to be dwarfed by the
> fsync(), no?  Unless maybe there's no I/O to be done anyway, but that
> case doesn't seem important to optimize for.

I'm not sure I buy that --- the whole point of spread checkpoints is
that we hope the I/O happens before we actually call fsync.

> Making it
> conditional on log_checkpoints seems entirely sufficient to me.

But I'll agree with that.  If you're turning on log_checkpoints then
you've given the system permission to indulge in extra overhead for
monitoring.
        regards, tom lane

Re: Instrument checkpoint sync calls

From

Greg Smith

Date:

15 November 2010, 16:10:25

So my task list is:

0) Rebase against the HEAD that just code related to this touched today

1) Assume that log_checkpoints is sufficient control over whether the 
timing overhead added is worth collecting, and therefore remove the 
half-baked idea of also wrapping with a compile-time option.

2) Have the sync summary returned upwards, so it can be put onto the 
same line as the rest of the rest of the log_checkpoint info.

All seems reasonable to me.  Will rev a new patch by tomorrow.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services and Support        www.2ndQuadrant.us

Re: Instrument checkpoint sync calls

From

Robert Haas

Date:

15 November 2010, 16:19:12

On Mon, Nov 15, 2010 at 2:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> I would be very surprised if we can find a system where gettimeofday()
>> takes a significant amount of time compared with fsync().  It might be
>> (probably is) too expensive to stick into code paths that are heavily
>> CPU-bounded, but surely the cost here is going to be dwarfed by the
>> fsync(), no?  Unless maybe there's no I/O to be done anyway, but that
>> case doesn't seem important to optimize for.
>
> I'm not sure I buy that --- the whole point of spread checkpoints is
> that we hope the I/O happens before we actually call fsync.

Possibly.  Even then, it's not guaranteed, because it might well be
that fsync() has to do more grovelling through system data structures
to determine that there is in fact no work to be done than what
gettimeofday() has to do to read off a counter somewhere...

>> Making it
>> conditional on log_checkpoints seems entirely sufficient to me.
>
> But I'll agree with that.  If you're turning on log_checkpoints then
> you've given the system permission to indulge in extra overhead for
> monitoring.

...but if we agree on the way forward then the details don't matter.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Instrument checkpoint sync calls

From

Robert Haas

Date:

21 November 2010, 21:46:38

On Mon, Nov 15, 2010 at 3:09 PM, Greg Smith <greg@2ndquadrant.com> wrote:
> So my task list is:
>
> 0) Rebase against the HEAD that just code related to this touched today
>
> 1) Assume that log_checkpoints is sufficient control over whether the timing
> overhead added is worth collecting, and therefore remove the half-baked idea
> of also wrapping with a compile-time option.
>
> 2) Have the sync summary returned upwards, so it can be put onto the same
> line as the rest of the rest of the log_checkpoint info.
>
> All seems reasonable to me.  Will rev a new patch by tomorrow.

Did this get eaten by the email goblin, or you're still working on it?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Instrument checkpoint sync calls

From

Jeff Janes

Date:

22 November 2010, 15:27:41

On Mon, Nov 15, 2010 at 12:09 PM, Greg Smith <greg@2ndquadrant.com> wrote:
> So my task list is:
>
> 0) Rebase against the HEAD that just code related to this touched today
>
> 1) Assume that log_checkpoints is sufficient control over whether the timing
> overhead added is worth collecting, and therefore remove the half-baked idea
> of also wrapping with a compile-time option.
>
> 2) Have the sync summary returned upwards, so it can be put onto the same
> line as the rest of the rest of the log_checkpoint info.
>
> All seems reasonable to me.  Will rev a new patch by tomorrow.

For the individual file sync times emitted under debug1, it would be
very handy if the file being synced was identified, for example
"relation base/16384/16523". Rather than being numbered sequentially
within a given checkpoint.

Cheers,

Jeff

Re: Instrument checkpoint sync calls

From

Greg Smith

Date:

24 November 2010, 02:02:31

Robert Haas wrote:
> Did this get eaten by the email goblin, or you're still working on it?
>   

Fell behind due to an unfortunately timed bit of pneumonia.  Hurray for 
the health benefits of cross country flights.  Will fix this up, rebase 
my other patch, and head toward some more review/'Fest cleanup now that 
I'm feeling better.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services and Support        www.2ndQuadrant.us

Re: Instrument checkpoint sync calls

From

Robert Haas

Date:

24 November 2010, 07:56:41

On Wed, Nov 24, 2010 at 1:02 AM, Greg Smith <greg@2ndquadrant.com> wrote:
> Robert Haas wrote:
>> Did this get eaten by the email goblin, or you're still working on it?
>
> Fell behind due to an unfortunately timed bit of pneumonia.  Hurray for the
> health benefits of cross country flights.  Will fix this up, rebase my other
> patch, and head toward some more review/'Fest cleanup now that I'm feeling
> better.

Ouch.  Fringe benefits of consulting.  Thanks for the update.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Instrument checkpoint sync calls

From

Greg Smith

Date:

26 November 2010, 13:25:56

Jeff Janes wrote:
> For the individual file sync times emitted under debug1, it would be
> very handy if the file being synced was identified, for example
> "relation base/16384/16523".

I was in the middle of looking into adding that already, so consider 
that part of the target for the next update I'm working on.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services and Support        www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

Re: Instrument checkpoint sync calls

From

Greg Smith

Date:

30 November 2010, 12:38:27

Jeff Janes wrote:
> For the individual file sync times emitted under debug1, it would be
> very handy if the file being synced was identified, for example
> "relation base/16384/16523". Rather than being numbered sequentially
> within a given checkpoint.
>

I was numbering them sequentially so that it's straightforward to graph
the sync times in an external analysis tool, but the relation data is
helpful too.  New patch reflecting all upthread suggestions is
attached.  The output looks like this now at DEBUG1:

LOG:  checkpoint starting: xlog
DEBUG:  checkpoint sync: number=1 file=base/16424/11645
time=11589.549000 msec
DEBUG:  checkpoint sync: number=2 file=base/16424/16438 time=16.148000 msec
DEBUG:  checkpoint sync: number=3 file=base/16424/16437 time=53.530000 msec
DEBUG:  checkpoint sync: number=4 file=base/16424/16447 time=10.214000 msec
DEBUG:  checkpoint sync: number=5 file=base/16424/11607 time=1.499000 msec
DEBUG:  checkpoint sync: number=6 file=base/16424/16425_fsm
time=2.921000 msec
DEBUG:  checkpoint sync: number=7 file=base/16424/16437.1 time=4.237000 msec
DEBUG:  checkpoint sync: number=8 file=base/16424/16428_fsm
time=1.654000 msec
DEBUG:  checkpoint sync: number=9 file=base/16424/16442 time=7.920000 msec
DEBUG:  checkpoint sync: number=10 file=base/16424/16428_vm
time=2.613000 msec
DEBUG:  checkpoint sync: number=11 file=base/16424/11618 time=1.468000 msec
DEBUG:  checkpoint sync: number=12 file=base/16424/16437_fsm
time=2.638000 msec
DEBUG:  checkpoint sync: number=13 file=base/16424/16428 time=2.883000 msec
DEBUG:  checkpoint sync: number=14 file=base/16424/16425 time=3.369000 msec
DEBUG:  checkpoint sync: number=15 file=base/16424/16437_vm
time=8.686000 msec
DEBUG:  checkpoint sync: number=16 file=base/16424/16425_vm
time=5.984000 msec
LOG:  checkpoint complete: wrote 2074 buffers (50.6%); 0 transaction log
file(s) added, 0 removed, 3 recycled; write=0.617 s, sync=11.715 s,
total=22.167 s; sync files=16, longest=11.589 s, average=0.724 s

I kept the units for the DEBUG level ones in msec because that's a
better scale for the common really short syncs during good behavior.
But the summary info in seconds now appears at the end of the existing
"checkpoint complete" message, so only one line to parse for those
looking to analyze the gross checkpoint data.  That looks to work well
enough for finding situations like the big ext3 spikes.  You can easily
see one in this example by the fact that "longest=11.589 s" is almost
the entirety of "sync=11.715 s".  That's the really key thing there's
currently no visibility into, that's made obvious with this patch.

This might be ready for some proper review now.  I know there's at least
one blatant bug still in here I haven't found yet, related to how the
averages are computed.  I saw this once:

LOG:  checkpoint complete: wrote 0 buffers (0.0%); 0 transaction log
file(s) added, 0 removed, 1 recycled; write=0.000 s, sync=0.000 s,
total=0.001 s; sync files=0, longest=0.000 s,
average=-9223372036854775808.-2147483 s

After an immediate checkpoint, so at least one path not quite right yet.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD


diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index ede6ceb..6f6eb3b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7063,10 +7063,15 @@ LogCheckpointEnd(bool restartpoint)
 {
     long        write_secs,
                 sync_secs,
-                total_secs;
+                total_secs,
+                longest_secs,
+                average_secs;
     int            write_usecs,
                 sync_usecs,
-                total_usecs;
+                total_usecs,
+                longest_usecs,
+                average_usecs;
+    double        average_sync_time;

     CheckpointStats.ckpt_end_t = GetCurrentTimestamp();

@@ -7082,18 +7087,35 @@ LogCheckpointEnd(bool restartpoint)
                         CheckpointStats.ckpt_sync_end_t,
                         &sync_secs, &sync_usecs);

+    /*
+     * Timing values returned from CheckpointStats are in milliseconds.
+     * Convert to the second plus microsecond form that TimestampDifference
+     * returns for homogeneous printing.
+     */
+    longest_secs = (long) (CheckpointStats.ckpt_longest_sync / 1000);
+    longest_usecs = 1000 * (CheckpointStats.ckpt_longest_sync - longest_secs * 1000);
+
+    average_sync_time = CheckpointStats.ckpt_longest_sync / CheckpointStats.ckpt_sync_rels;
+    average_secs = (long) (average_sync_time / 1000);
+    average_usecs = 1000 * (average_sync_time - average_secs * 1000);
+
     if (restartpoint)
         elog(LOG, "restartpoint complete: wrote %d buffers (%.1f%%); "
-             "write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s",
+             "write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s; "
+             "sync files=%d, longest=%ld.%03d s, average=%ld.%03d s",
              CheckpointStats.ckpt_bufs_written,
              (double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers,
              write_secs, write_usecs / 1000,
              sync_secs, sync_usecs / 1000,
-             total_secs, total_usecs / 1000);
+             total_secs, total_usecs / 1000,
+             CheckpointStats.ckpt_sync_rels,
+             longest_secs, longest_usecs / 1000,
+             average_secs, average_usecs / 1000);
     else
         elog(LOG, "checkpoint complete: wrote %d buffers (%.1f%%); "
              "%d transaction log file(s) added, %d removed, %d recycled; "
-             "write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s",
+             "write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s; "
+             "sync files=%d, longest=%ld.%03d s, average=%ld.%03d s",
              CheckpointStats.ckpt_bufs_written,
              (double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers,
              CheckpointStats.ckpt_segs_added,
@@ -7101,7 +7123,10 @@ LogCheckpointEnd(bool restartpoint)
              CheckpointStats.ckpt_segs_recycled,
              write_secs, write_usecs / 1000,
              sync_secs, sync_usecs / 1000,
-             total_secs, total_usecs / 1000);
+             total_secs, total_usecs / 1000,
+             CheckpointStats.ckpt_sync_rels,
+             longest_secs, longest_usecs / 1000,
+             average_secs, average_usecs / 1000);
 }

 /*
diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index 604db21..cadd938 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -20,6 +20,7 @@

 #include "catalog/catalog.h"
 #include "miscadmin.h"
+#include "portability/instr_time.h"
 #include "postmaster/bgwriter.h"
 #include "storage/fd.h"
 #include "storage/bufmgr.h"
@@ -927,6 +928,13 @@ mdsync(void)
     PendingOperationEntry *entry;
     int            absorb_counter;

+    /* Statistics on sync times */
+    int processed = 0;
+    instr_time sync_start, sync_end, sync_diff;
+    double elapsed;
+    double longest = 0;
+    double total_elapsed = 0;
+
     /*
      * This is only called during checkpoints, and checkpoints should only
      * occur in processes that have created a pendingOpsTable.
@@ -1069,9 +1077,29 @@ mdsync(void)
                 seg = _mdfd_getseg(reln, entry->tag.forknum,
                               entry->tag.segno * ((BlockNumber) RELSEG_SIZE),
                                    false, EXTENSION_RETURN_NULL);
+
+                if (log_checkpoints)
+                    INSTR_TIME_SET_CURRENT(sync_start);
+
                 if (seg != NULL &&
                     FileSync(seg->mdfd_vfd) >= 0)
+                {
+                    if (log_checkpoints)
+                    {
+                        INSTR_TIME_SET_CURRENT(sync_end);
+                        sync_diff = sync_end;
+                        INSTR_TIME_SUBTRACT(sync_diff, sync_start);
+                        elapsed = (double) INSTR_TIME_GET_MICROSEC(sync_diff) / 1000.0;
+                        if (elapsed > longest)
+                            longest = elapsed;
+                        total_elapsed += elapsed;
+                        processed++;
+                        elog(DEBUG1, "checkpoint sync: number=%d file=%s time=%f msec",
+                            processed, FilePathName(seg->mdfd_vfd), elapsed);
+                    }
+
                     break;        /* success; break out of retry loop */
+                    }

                 /*
                  * XXX is there any point in allowing more than one retry?
@@ -1113,6 +1141,11 @@ mdsync(void)
             elog(ERROR, "pendingOpsTable corrupted");
     }                            /* end loop over hashtable entries */

+    /* Return sync performance metrics for report at checkpoint end */
+    CheckpointStats.ckpt_sync_rels = processed;
+    CheckpointStats.ckpt_longest_sync = longest;
+    CheckpointStats.ckpt_agg_sync_time = total_elapsed;
+
     /* Flag successful completion of mdsync */
     mdsync_in_progress = false;
 }
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index fa7ae2a..ec4d9db 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -257,6 +257,13 @@ typedef struct CheckpointStatsData
     int            ckpt_segs_added;    /* # of new xlog segments created */
     int            ckpt_segs_removed;        /* # of xlog segments deleted */
     int            ckpt_segs_recycled;        /* # of xlog segments recycled */
+
+    int         ckpt_sync_rels;        /* # of relations synced */
+    double         ckpt_longest_sync;    /* Longest sync for one relation */
+    double         ckpt_agg_sync_time;    /* The sum of all the individual sync
+                                       times, which is not necessarily the
+                                       same as the total elapsed time for
+                                       the entire sync phase. */
 } CheckpointStatsData;

 extern CheckpointStatsData CheckpointStats;

Re: Instrument checkpoint sync calls

From

Cédric Villemain

Date:

30 November 2010, 14:32:49

2010/11/30 Greg Smith <greg@2ndquadrant.com>:
> Jeff Janes wrote:
>>
>> For the individual file sync times emitted under debug1, it would be
>> very handy if the file being synced was identified, for example
>> "relation base/16384/16523". Rather than being numbered sequentially
>> within a given checkpoint.
>>
>
> I was numbering them sequentially so that it's straightforward to graph the
> sync times in an external analysis tool, but the relation data is helpful
> too.  New patch reflecting all upthread suggestions is attached.  The output
> looks like this now at DEBUG1:
>
> LOG:  checkpoint starting: xlog
> DEBUG:  checkpoint sync: number=1 file=base/16424/11645 time=11589.549000
> msec
> DEBUG:  checkpoint sync: number=2 file=base/16424/16438 time=16.148000 msec
> DEBUG:  checkpoint sync: number=3 file=base/16424/16437 time=53.530000 msec
> DEBUG:  checkpoint sync: number=4 file=base/16424/16447 time=10.214000 msec
> DEBUG:  checkpoint sync: number=5 file=base/16424/11607 time=1.499000 msec
> DEBUG:  checkpoint sync: number=6 file=base/16424/16425_fsm time=2.921000
> msec
> DEBUG:  checkpoint sync: number=7 file=base/16424/16437.1 time=4.237000 msec
> DEBUG:  checkpoint sync: number=8 file=base/16424/16428_fsm time=1.654000
> msec
> DEBUG:  checkpoint sync: number=9 file=base/16424/16442 time=7.920000 msec
> DEBUG:  checkpoint sync: number=10 file=base/16424/16428_vm time=2.613000
> msec
> DEBUG:  checkpoint sync: number=11 file=base/16424/11618 time=1.468000 msec
> DEBUG:  checkpoint sync: number=12 file=base/16424/16437_fsm time=2.638000
> msec
> DEBUG:  checkpoint sync: number=13 file=base/16424/16428 time=2.883000 msec
> DEBUG:  checkpoint sync: number=14 file=base/16424/16425 time=3.369000 msec
> DEBUG:  checkpoint sync: number=15 file=base/16424/16437_vm time=8.686000
> msec
> DEBUG:  checkpoint sync: number=16 file=base/16424/16425_vm time=5.984000
> msec
> LOG:  checkpoint complete: wrote 2074 buffers (50.6%); 0 transaction log
> file(s) added, 0 removed, 3 recycled; write=0.617 s, sync=11.715 s,
> total=22.167 s; sync files=16, longest=11.589 s, average=0.724 s
>
> I kept the units for the DEBUG level ones in msec because that's a better
> scale for the common really short syncs during good behavior.  But the
> summary info in seconds now appears at the end of the existing "checkpoint
> complete" message, so only one line to parse for those looking to analyze
> the gross checkpoint data.  That looks to work well enough for finding
> situations like the big ext3 spikes.  You can easily see one in this example
> by the fact that "longest=11.589 s" is almost the entirety of "sync=11.715
> s".  That's the really key thing there's currently no visibility into,
> that's made obvious with this patch.

wonderfull.

>
> This might be ready for some proper review now.  I know there's at least one
> blatant bug still in here I haven't found yet, related to how the averages
> are computed.  I saw this once:
>
> LOG:  checkpoint complete: wrote 0 buffers (0.0%); 0 transaction log file(s)
> added, 0 removed, 1 recycled; write=0.000 s, sync=0.000 s, total=0.001 s;
> sync files=0, longest=0.000 s, average=-9223372036854775808.-2147483 s
>
> After an immediate checkpoint, so at least one path not quite right yet.
>
> --
> Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
>
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>



--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

Re: Instrument checkpoint sync calls

From

Jeff Janes

Date:

30 November 2010, 16:16:03

On Tue, Nov 30, 2010 at 8:38 AM, Greg Smith <greg@2ndquadrant.com> wrote:

Hi Greg,

Thanks for the update.

> This might be ready for some proper review now.  I know there's at least one
> blatant bug still in here I haven't found yet, related to how the averages
> are computed.

Yes, the blatant bug:

average_sync_time = CheckpointStats.ckpt_longest_sync /
CheckpointStats.ckpt_sync_rels;

That should clearly be ckpt_agg_sync_time, not ckpt_longest_sync.

> I saw this once:
>
> LOG:  checkpoint complete: wrote 0 buffers (0.0%); 0 transaction log file(s)
> added, 0 removed, 1 recycled; write=0.000 s, sync=0.000 s, total=0.001 s;
> sync files=0, longest=0.000 s, average=-9223372036854775808.-2147483 s
>
> After an immediate checkpoint, so at least one path not quite right yet.

Not clear what the right thing to do here is.  I guess we should
special case the div by zero to yield zero for the average?

The patch is in unified diff rather than context diff.  Doesn't bother
me, but the wiki on doing a review says...

Cheers,

Jeff

Re: Instrument checkpoint sync calls

From

Jeff Janes

Date:

02 December 2010, 21:16:55

On Tue, Nov 30, 2010 at 12:15 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Tue, Nov 30, 2010 at 8:38 AM, Greg Smith <greg@2ndquadrant.com> wrote:
>
>
> Hi Greg,
>
> Thanks for the update.
>
>
>
>> This might be ready for some proper review now.  I know there's at least one
>> blatant bug still in here I haven't found yet, related to how the averages
>> are computed.
>
> Yes, the blatant bug:
>
> average_sync_time = CheckpointStats.ckpt_longest_sync /
> CheckpointStats.ckpt_sync_rels;
>
> That should clearly be ckpt_agg_sync_time, not ckpt_longest_sync.

I've attached a tiny patch to apply over yours, to deal with this and
with the case where no files are synced.

Combining this instrumentation patch with the backend sync one already
committed, the net result under log_min_messages=debug1is somewhat
undesirable in that I can now see the individual sync times for the
syncs done by the checkpoint writer, but I do not get times for the
syncs done by the backend (I only get informed of their existence).

I don't know what I would propose to fix this.  Having the reportage
of sync time of backend syncs be controlled by log_checkpoints seems
somewhat perverse, but the only alternative I see is to have
log_min_messages=debug1 always report times for the backend syncs.  Or
to have them go unreported altogether.

Cheers,

Jeff

Attachment

patch

Re: Instrument checkpoint sync calls

From

Greg Smith

Date:

05 December 2010, 17:23:54

Jeff Janes wrote:
> I've attached a tiny patch to apply over yours, to deal with this and
> with the case where no files are synced.
>

Thanks for that.  That obvious error eluded me because in most of the
patch update testing I was doing (on ext3), the longest sync was always
about the same length as the total sync time.

Attached patch (in correct diff form this time!) collects up all
changes.  That includes elimination of a potential race condition if
someone changes log_checkpoints while a long sync phase is executing.  I
don't know whether that can happen, and it obviously won't give accurate
stats going back to the beginning of the checkpoint in that case, but
it  tries to defend aginst producing complete garbage if that value
changes out from under it.

This is the first version of this patch I feel fairly good about; no
open concerns left on my side.  Jeff, since you're now the de-facto
credited reviewer of this one by virtue of suggesting bug fixes, could
you take this update out for a spin too?

> Combining this instrumentation patch with the backend sync one already
> committed, the net result under log_min_messages=debug1is somewhat
> undesirable in that I can now see the individual sync times for the
> syncs done by the checkpoint writer, but I do not get times for the
> syncs done by the backend (I only get informed of their existence).
>

On a properly working system, backend syncs shouldn't be happening.  So
if you see them, I think the question shouldn't be "how long are they
taking?", it's "how do I get rid of them?"  From that perspective,
knowing of their existence is sufficient to suggest the necessary tuning
changes, such as dropping bgwriter_delay.

When you get into a situation where they are showing up, a whole lot of
them can happen in a very brief period; enough that I'd actually be
concerned about the added timing overhead, something I normally don't
care about very much.  That's the main reason I didn't bother
instrumenting those too--the situations where they happen are ones
already running low on resources.

Big writes for things that can only be written out at checkpoint time,
on the other hand, are unavoidable in the current design.  Providing
detail on them is going to be relevant unless there's a major
refactoring of how those happen.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services and Support        www.2ndQuadrant.us


diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index ae9ed8f..6adc139 100644
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
*************** LogCheckpointEnd(bool restartpoint)
*** 6955,6964 ****
  {
      long        write_secs,
                  sync_secs,
!                 total_secs;
      int            write_usecs,
                  sync_usecs,
!                 total_usecs;

      CheckpointStats.ckpt_end_t = GetCurrentTimestamp();

--- 6955,6969 ----
  {
      long        write_secs,
                  sync_secs,
!                 total_secs,
!                 longest_secs,
!                 average_secs;
      int            write_usecs,
                  sync_usecs,
!                 total_usecs,
!                 longest_usecs,
!                 average_usecs;
!     double        average_sync_time;

      CheckpointStats.ckpt_end_t = GetCurrentTimestamp();

*************** LogCheckpointEnd(bool restartpoint)
*** 6974,6991 ****
                          CheckpointStats.ckpt_sync_end_t,
                          &sync_secs, &sync_usecs);

      if (restartpoint)
          elog(LOG, "restartpoint complete: wrote %d buffers (%.1f%%); "
!              "write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s",
               CheckpointStats.ckpt_bufs_written,
               (double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers,
               write_secs, write_usecs / 1000,
               sync_secs, sync_usecs / 1000,
!              total_secs, total_usecs / 1000);
      else
          elog(LOG, "checkpoint complete: wrote %d buffers (%.1f%%); "
               "%d transaction log file(s) added, %d removed, %d recycled; "
!              "write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s",
               CheckpointStats.ckpt_bufs_written,
               (double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers,
               CheckpointStats.ckpt_segs_added,
--- 6979,7017 ----
                          CheckpointStats.ckpt_sync_end_t,
                          &sync_secs, &sync_usecs);

+     /*
+      * Timing values returned from CheckpointStats are in milliseconds.
+      * Convert to the second plus microsecond form that TimestampDifference
+      * returns for homogeneous printing.
+      */
+     longest_secs = (long) (CheckpointStats.ckpt_longest_sync / 1000);
+     longest_usecs = 1000 * (CheckpointStats.ckpt_longest_sync -
+         longest_secs * 1000);
+
+     average_sync_time = 0;
+     if (CheckpointStats.ckpt_sync_rels > 0)
+         average_sync_time = CheckpointStats.ckpt_agg_sync_time /
+             CheckpointStats.ckpt_sync_rels;
+     average_secs = (long) (average_sync_time / 1000);
+     average_usecs = 1000 * (average_sync_time - average_secs * 1000);
+
      if (restartpoint)
          elog(LOG, "restartpoint complete: wrote %d buffers (%.1f%%); "
!              "write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s; "
!              "sync files=%d, longest=%ld.%03d s, average=%ld.%03d s",
               CheckpointStats.ckpt_bufs_written,
               (double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers,
               write_secs, write_usecs / 1000,
               sync_secs, sync_usecs / 1000,
!              total_secs, total_usecs / 1000,
!              CheckpointStats.ckpt_sync_rels,
!              longest_secs, longest_usecs / 1000,
!              average_secs, average_usecs / 1000);
      else
          elog(LOG, "checkpoint complete: wrote %d buffers (%.1f%%); "
               "%d transaction log file(s) added, %d removed, %d recycled; "
!              "write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s; "
!              "sync files=%d, longest=%ld.%03d s, average=%ld.%03d s",
               CheckpointStats.ckpt_bufs_written,
               (double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers,
               CheckpointStats.ckpt_segs_added,
*************** LogCheckpointEnd(bool restartpoint)
*** 6993,6999 ****
               CheckpointStats.ckpt_segs_recycled,
               write_secs, write_usecs / 1000,
               sync_secs, sync_usecs / 1000,
!              total_secs, total_usecs / 1000);
  }

  /*
--- 7019,7028 ----
               CheckpointStats.ckpt_segs_recycled,
               write_secs, write_usecs / 1000,
               sync_secs, sync_usecs / 1000,
!              total_secs, total_usecs / 1000,
!              CheckpointStats.ckpt_sync_rels,
!              longest_secs, longest_usecs / 1000,
!              average_secs, average_usecs / 1000);
  }

  /*
diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index 604db21..1f602a7 100644
*** a/src/backend/storage/smgr/md.c
--- b/src/backend/storage/smgr/md.c
***************
*** 20,25 ****
--- 20,26 ----

  #include "catalog/catalog.h"
  #include "miscadmin.h"
+ #include "portability/instr_time.h"
  #include "postmaster/bgwriter.h"
  #include "storage/fd.h"
  #include "storage/bufmgr.h"
*************** mdsync(void)
*** 927,932 ****
--- 928,940 ----
      PendingOperationEntry *entry;
      int            absorb_counter;

+     /* Statistics on sync times */
+     int processed = 0;
+     instr_time sync_start, sync_end, sync_diff;
+     double elapsed;
+     double longest = 0;
+     double total_elapsed = 0;
+
      /*
       * This is only called during checkpoints, and checkpoints should only
       * occur in processes that have created a pendingOpsTable.
*************** mdsync(void)
*** 1069,1077 ****
--- 1077,1107 ----
                  seg = _mdfd_getseg(reln, entry->tag.forknum,
                                entry->tag.segno * ((BlockNumber) RELSEG_SIZE),
                                     false, EXTENSION_RETURN_NULL);
+
+                 if (log_checkpoints)
+                     INSTR_TIME_SET_CURRENT(sync_start);
+                 else
+                     INSTR_TIME_SET_ZERO(sync_start);
+
                  if (seg != NULL &&
                      FileSync(seg->mdfd_vfd) >= 0)
+                 {
+                     if (log_checkpoints && (! INSTR_TIME_IS_ZERO(sync_start)))
+                     {
+                         INSTR_TIME_SET_CURRENT(sync_end);
+                         sync_diff = sync_end;
+                         INSTR_TIME_SUBTRACT(sync_diff, sync_start);
+                         elapsed = (double) INSTR_TIME_GET_MICROSEC(sync_diff) / 1000.0;
+                         if (elapsed > longest)
+                             longest = elapsed;
+                         total_elapsed += elapsed;
+                         processed++;
+                         elog(DEBUG1, "checkpoint sync: number=%d file=%s time=%f msec",
+                             processed, FilePathName(seg->mdfd_vfd), elapsed);
+                     }
+
                      break;        /* success; break out of retry loop */
+                 }

                  /*
                   * XXX is there any point in allowing more than one retry?
*************** mdsync(void)
*** 1113,1118 ****
--- 1143,1153 ----
              elog(ERROR, "pendingOpsTable corrupted");
      }                            /* end loop over hashtable entries */

+     /* Return sync performance metrics for report at checkpoint end */
+     CheckpointStats.ckpt_sync_rels = processed;
+     CheckpointStats.ckpt_longest_sync = longest;
+     CheckpointStats.ckpt_agg_sync_time = total_elapsed;
+
      /* Flag successful completion of mdsync */
      mdsync_in_progress = false;
  }
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index fa7ae2a..d95f843 100644
*** a/src/include/access/xlog.h
--- b/src/include/access/xlog.h
*************** typedef struct CheckpointStatsData
*** 257,262 ****
--- 257,269 ----
      int            ckpt_segs_added;    /* # of new xlog segments created */
      int            ckpt_segs_removed;        /* # of xlog segments deleted */
      int            ckpt_segs_recycled;        /* # of xlog segments recycled */
+
+     int            ckpt_sync_rels;        /* # of relations synced */
+     double        ckpt_longest_sync;    /* Longest sync for one relation */
+     double        ckpt_agg_sync_time;    /* The sum of all the individual sync
+                                        times, which is not necessarily the
+                                        same as the total elapsed time for
+                                        the entire sync phase. */
  } CheckpointStatsData;

  extern CheckpointStatsData CheckpointStats;

Re: Instrument checkpoint sync calls

From

Jeff Janes

Date:

07 December 2010, 13:10:07

On Sun, Dec 5, 2010 at 1:23 PM, Greg Smith <greg@2ndquadrant.com> wrote:
> Jeff Janes wrote:
>>
>> I've attached a tiny patch to apply over yours, to deal with this and
>> with the case where no files are synced.
>>
>
> Thanks for that.  That obvious error eluded me because in most of the patch
> update testing I was doing (on ext3), the longest sync was always about the
> same length as the total sync time.
>
> Attached patch (in correct diff form this time!) collects up all changes.
>  That includes elimination of a potential race condition if someone changes
> log_checkpoints while a long sync phase is executing.  I don't know whether
> that can happen, and it obviously won't give accurate stats going back to
> the beginning of the checkpoint in that case, but it  tries to defend aginst
> producing complete garbage if that value changes out from under it.

It looks to me that the checkpoint writer only reloads settings
between checkpoints, so log_checkpoints cannot change under it.
But if it can, the behavior of the patch is about as reasonable as one
can expect, I think.

>
> This is the first version of this patch I feel fairly good about; no open
> concerns left on my side.  Jeff, since you're now the de-facto credited
> reviewer of this one by virtue of suggesting bug fixes, could you take this
> update out for a spin too?

It applies and builds cleanly, and passes make check with -enable-cassert.
The details of log_checkpoints output are not documented, so no
document patches are needed.
I cannot think of any reasonable way to add a check into make check
framework, so I don't think one is needed.
It does what it says and what we want, and I don't see any dangers.

I haven't noticed any slow-downs.  It is listed under performance, but
it is intended to diagnose, not fix, performance problems.

It seems to meet the coding style.  It uses existing macros, so should
be as portable as the existing PG code is.

Comments seem sufficient.  The use of INSTR_TIME_SET_ZERO as a flag
for detecting changing log_checkpoints maybe could use a comment, but
it doesn't take much thought to figure out that that is what is going
on.

I could not make it crash.

I'm marking it as ready for committer.

>
>> Combining this instrumentation patch with the backend sync one already
>> committed, the net result under log_min_messages=debug1is somewhat
>> undesirable in that I can now see the individual sync times for the
>> syncs done by the checkpoint writer, but I do not get times for the
>> syncs done by the backend (I only get informed of their existence).
>>
>
> On a properly working system, backend syncs shouldn't be happening.  So if
> you see them, I think the question shouldn't be "how long are they taking?",
> it's "how do I get rid of them?"  From that perspective, knowing of their
> existence is sufficient to suggest the necessary tuning changes, such as
> dropping bgwriter_delay.

OK.  I was trying to figure out if the syncs were entangled (all ended
at about the same time, regardless of when they started).  But I can
see that that is basically a hacker thing to do, not production, so I
can add in my own logging at the end of the backend sync for my
purposes.

>
> When you get into a situation where they are showing up, a whole lot of them
> can happen in a very brief period;

In my test cases, the syncs that the backends were doing were almost
always to the same file that the checkpoint writer was already choking
on (so they are entangled simply by virtue of that).  So very quickly
all the backends hit the same wall and thunked to a halt.  This is
probably a feature of trying to use pgbench as the basis to get a very
artificial model.  I need to find a way to get dozens or hundreds of
files going.  Maybe pgbench could be hacked so that each connection
used a different schema.

Cheers,

Jeff

Re: Instrument checkpoint sync calls

From

Greg Smith

Date:

09 December 2010, 22:26:48

Jeff Janes wrote:
> In my test cases, the syncs that the backends were doing were almost
> always to the same file that the checkpoint writer was already choking
> on (so they are entangled simply by virtue of that).  So very quickly
> all the backends hit the same wall and thunked to a halt.  This is
> probably a feature of trying to use pgbench as the basis to get a very
> artificial model.

Yes--pgbench has some problems like you describe, ones that are a bit 
different than the way I've seen fsync writes get in each other's way in 
the production systems I've looked at.  That's good if you really want 
to provoke this behavior, which is one reason why I've used as an 
example for my patches so far (the other being that it's already 
available in everyone's installation).  But it's tough to get it to act 
more like a real-world system, which don't have quite so many localized 
updates, without cranking the scale way up.  And that then tends to 
aggravate other problems too.

The 8.3 checkpoint spreading work also got some useful results using the 
dbt-2 benchmark.  I'm at the point where I think I need to return to 
that test program for what I'm doing now.  I'd encourage you to try that 
out too if you get a chance.

Thanks for the feedback and the review.  I hope you appreciate now why I 
suggested you wait for the stuff I was submitting before getting back 
into the sorted checkpoint topic again.  That should be a lot easier to 
make sense of with this instrumentation in place.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services and Support        www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

Re: Instrument checkpoint sync calls

From

Robert Haas

Date:

12 December 2010, 23:21:00

On Sun, Dec 5, 2010 at 4:23 PM, Greg Smith <greg@2ndquadrant.com> wrote:
> Jeff Janes wrote:
>>
>> I've attached a tiny patch to apply over yours, to deal with this and
>> with the case where no files are synced.
>>
>
> Thanks for that.  That obvious error eluded me because in most of the patch
> update testing I was doing (on ext3), the longest sync was always about the
> same length as the total sync time.
>
> Attached patch (in correct diff form this time!) collects up all changes.
>  That includes elimination of a potential race condition if someone changes
> log_checkpoints while a long sync phase is executing.  I don't know whether
> that can happen, and it obviously won't give accurate stats going back to
> the beginning of the checkpoint in that case, but it  tries to defend aginst
> producing complete garbage if that value changes out from under it.
>
> This is the first version of this patch I feel fairly good about; no open
> concerns left on my side.  Jeff, since you're now the de-facto credited
> reviewer of this one by virtue of suggesting bug fixes, could you take this
> update out for a spin too?

I took a look at this and it looks generally good, but I'm wondering
why md.c is converting the results from an exact value to a floating
point, only to have xlog.c turn around and convert back to an integer.I think it could just return milliseconds
directly,or if you're 
worried about a checkpoint that takes more than 24 days to complete,
seconds and microseconds.  Or am I missing something?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Instrument checkpoint sync calls

From

Greg Smith

Date:

13 December 2010, 04:20:10

Robert Haas wrote:
> I'm wondering why md.c is converting the results from an exact value to a floating
> point, only to have xlog.c turn around and convert back to an integer.
>  I think it could just return milliseconds directly, or if you're
> worried about a checkpoint that takes more than 24 days to complete,
> seconds and microseconds.

md.c is printing the value as a float, so I converted early to a double 
and then percolated it upward from there.  More an artifact of how the 
code grew from its initial form than an intentional decision. I see your 
point that making elapsed, total_elapsed, ckpt_agg_sync_time, and 
ckpt_longest_sync all the same type of integer that 
INSTR_TIME_GET_MICROSEC returns would reduce the potential for rounding 
abberations.  I could do another rev of the patch tomorrow with that 
change if you'd prefer it that way.  I don't have a strong opinion about 
that implementation detail; given that xlog.c is printing a less 
fine-grained time anyway (seconds with 3 digits vs. msec with 3 digits) 
it seems unlikely to run into a real problem here.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services and Support        www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

Re: Instrument checkpoint sync calls

From

Robert Haas

Date:

13 December 2010, 08:56:33

On Mon, Dec 13, 2010 at 3:19 AM, Greg Smith <greg@2ndquadrant.com> wrote:
> Robert Haas wrote:
>>
>> I'm wondering why md.c is converting the results from an exact value to a
>> floating
>> point, only to have xlog.c turn around and convert back to an integer.
>>  I think it could just return milliseconds directly, or if you're
>> worried about a checkpoint that takes more than 24 days to complete,
>> seconds and microseconds.
>
> md.c is printing the value as a float, so I converted early to a double and
> then percolated it upward from there.  More an artifact of how the code grew
> from its initial form than an intentional decision. I see your point that
> making elapsed, total_elapsed, ckpt_agg_sync_time, and ckpt_longest_sync all
> the same type of integer that INSTR_TIME_GET_MICROSEC returns would reduce
> the potential for rounding abberations.  I could do another rev of the patch
> tomorrow with that change if you'd prefer it that way.  I don't have a
> strong opinion about that implementation detail; given that xlog.c is
> printing a less fine-grained time anyway (seconds with 3 digits vs. msec
> with 3 digits) it seems unlikely to run into a real problem here.

Yeah, it'd probably be OK anyway, but I think it doesn't really cost
us anything to avoid the unnecessary conversion steps, so count me as
a vote for doing it that way.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Instrument checkpoint sync calls

From

Greg Smith

Date:

14 December 2010, 08:22:31

Robert Haas wrote:
> I took a look at this and it looks generally good, but I'm wondering
> why md.c is converting the results from an exact value to a floating
> point, only to have xlog.c turn around and convert back to an integer.
>  I think it could just return milliseconds directly, or if you're
> worried about a checkpoint that takes more than 24 days to complete,
> seconds and microseconds.

Attached patch now does something like this, except without the
roll-over concern.  INSTR_TIME_GET_MICROSEC returns a uint64 value.  I
just made that the storage format for all these values until they're
converted for display.  Test output:

LOG:  checkpoint starting: xlog
DEBUG:  checkpoint sync: number=1 file=base/16385/16480 time=10422.859 msec
DEBUG:  checkpoint sync: number=2 file=base/16385/16475_vm time=2896.614
msec
DEBUG:  checkpoint sync: number=3 file=base/16385/16475.1 time=57.836 msec
DEBUG:  checkpoint sync: number=4 file=base/16385/16466 time=20.080 msec
DEBUG:  checkpoint sync: number=5 file=base/16385/16463 time=74.926 msec
DEBUG:  checkpoint sync: number=6 file=base/16385/16482 time=74.263 msec
DEBUG:  checkpoint sync: number=7 file=base/16385/16475_fsm time=7.062 msec
DEBUG:  checkpoint sync: number=8 file=base/16385/16475 time=35.164 msec
LOG:  checkpoint complete: wrote 2143 buffers (52.3%); 0 transaction log
file(s) added, 0 removed, 3 recycled; write=1.213 s, sync=13.589 s,
total=24.744 s; sync files=8, longest=10.422 s, average=1.698 s

This shows the hard truncation used, so 10422.859 msec becomes 10.422
s.  I don't think allowing up to 0.999ms of error there is a problem
given the expected scale.  But since none of the precision is lost until
the end, that could be changed with only touching the final display
formatting conversion of the value.  Following your general idea
further, why throw away any resolution inside of md.c; let xlog.c decide
how to show it.

Note that I also fixed the DEBUG level lines to only show their actual
precision.  Before that was printing 6 digits to the right of the
decimal point each time, the last three of which were always 0.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services and Support        www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 1ed9687..c9778df 100644
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
*************** LogCheckpointEnd(bool restartpoint)
*** 6955,6964 ****
  {
      long        write_secs,
                  sync_secs,
!                 total_secs;
      int            write_usecs,
                  sync_usecs,
!                 total_usecs;

      CheckpointStats.ckpt_end_t = GetCurrentTimestamp();

--- 6955,6969 ----
  {
      long        write_secs,
                  sync_secs,
!                 total_secs,
!                 longest_secs,
!                 average_secs;
      int            write_usecs,
                  sync_usecs,
!                 total_usecs,
!                 longest_usecs,
!                 average_usecs;
!     uint64        average_sync_time;

      CheckpointStats.ckpt_end_t = GetCurrentTimestamp();

*************** LogCheckpointEnd(bool restartpoint)
*** 6974,6991 ****
                          CheckpointStats.ckpt_sync_end_t,
                          &sync_secs, &sync_usecs);

      if (restartpoint)
          elog(LOG, "restartpoint complete: wrote %d buffers (%.1f%%); "
!              "write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s",
               CheckpointStats.ckpt_bufs_written,
               (double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers,
               write_secs, write_usecs / 1000,
               sync_secs, sync_usecs / 1000,
!              total_secs, total_usecs / 1000);
      else
          elog(LOG, "checkpoint complete: wrote %d buffers (%.1f%%); "
               "%d transaction log file(s) added, %d removed, %d recycled; "
!              "write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s",
               CheckpointStats.ckpt_bufs_written,
               (double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers,
               CheckpointStats.ckpt_segs_added,
--- 6979,7017 ----
                          CheckpointStats.ckpt_sync_end_t,
                          &sync_secs, &sync_usecs);

+     /*
+      * Timing values returned from CheckpointStats are in microseconds.
+      * Convert to the second plus microsecond form that TimestampDifference
+      * returns for homogeneous printing.
+      */
+     longest_secs = (long) (CheckpointStats.ckpt_longest_sync / 1000000);
+     longest_usecs = CheckpointStats.ckpt_longest_sync -
+         (uint64) longest_secs * 1000000;
+
+     average_sync_time = 0;
+     if (CheckpointStats.ckpt_sync_rels > 0)
+         average_sync_time = CheckpointStats.ckpt_agg_sync_time /
+             CheckpointStats.ckpt_sync_rels;
+     average_secs = (long) (average_sync_time / 1000000);
+     average_usecs = average_sync_time - (uint64) average_secs * 1000000;
+
      if (restartpoint)
          elog(LOG, "restartpoint complete: wrote %d buffers (%.1f%%); "
!              "write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s; "
!              "sync files=%d, longest=%ld.%03d s, average=%ld.%03d s",
               CheckpointStats.ckpt_bufs_written,
               (double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers,
               write_secs, write_usecs / 1000,
               sync_secs, sync_usecs / 1000,
!              total_secs, total_usecs / 1000,
!              CheckpointStats.ckpt_sync_rels,
!              longest_secs, longest_usecs / 1000,
!              average_secs, average_usecs / 1000);
      else
          elog(LOG, "checkpoint complete: wrote %d buffers (%.1f%%); "
               "%d transaction log file(s) added, %d removed, %d recycled; "
!              "write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s; "
!              "sync files=%d, longest=%ld.%03d s, average=%ld.%03d s",
               CheckpointStats.ckpt_bufs_written,
               (double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers,
               CheckpointStats.ckpt_segs_added,
*************** LogCheckpointEnd(bool restartpoint)
*** 6993,6999 ****
               CheckpointStats.ckpt_segs_recycled,
               write_secs, write_usecs / 1000,
               sync_secs, sync_usecs / 1000,
!              total_secs, total_usecs / 1000);
  }

  /*
--- 7019,7028 ----
               CheckpointStats.ckpt_segs_recycled,
               write_secs, write_usecs / 1000,
               sync_secs, sync_usecs / 1000,
!              total_secs, total_usecs / 1000,
!              CheckpointStats.ckpt_sync_rels,
!              longest_secs, longest_usecs / 1000,
!              average_secs, average_usecs / 1000);
  }

  /*
diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index 604db21..64f9027 100644
*** a/src/backend/storage/smgr/md.c
--- b/src/backend/storage/smgr/md.c
***************
*** 20,25 ****
--- 20,26 ----

  #include "catalog/catalog.h"
  #include "miscadmin.h"
+ #include "portability/instr_time.h"
  #include "postmaster/bgwriter.h"
  #include "storage/fd.h"
  #include "storage/bufmgr.h"
*************** mdsync(void)
*** 927,932 ****
--- 928,940 ----
      PendingOperationEntry *entry;
      int            absorb_counter;

+     /* Statistics on sync times */
+     int processed = 0;
+     instr_time sync_start, sync_end, sync_diff;
+     uint64 elapsed;
+     uint64 longest = 0;
+     uint64 total_elapsed = 0;
+
      /*
       * This is only called during checkpoints, and checkpoints should only
       * occur in processes that have created a pendingOpsTable.
*************** mdsync(void)
*** 1069,1077 ****
--- 1077,1107 ----
                  seg = _mdfd_getseg(reln, entry->tag.forknum,
                                entry->tag.segno * ((BlockNumber) RELSEG_SIZE),
                                     false, EXTENSION_RETURN_NULL);
+
+                 if (log_checkpoints)
+                     INSTR_TIME_SET_CURRENT(sync_start);
+                 else
+                     INSTR_TIME_SET_ZERO(sync_start);
+
                  if (seg != NULL &&
                      FileSync(seg->mdfd_vfd) >= 0)
+                 {
+                     if (log_checkpoints && (! INSTR_TIME_IS_ZERO(sync_start)))
+                     {
+                         INSTR_TIME_SET_CURRENT(sync_end);
+                         sync_diff = sync_end;
+                         INSTR_TIME_SUBTRACT(sync_diff, sync_start);
+                         elapsed = INSTR_TIME_GET_MICROSEC(sync_diff);
+                         if (elapsed > longest)
+                             longest = elapsed;
+                         total_elapsed += elapsed;
+                         processed++;
+                         elog(DEBUG1, "checkpoint sync: number=%d file=%s time=%.3f msec",
+                             processed, FilePathName(seg->mdfd_vfd), (double) elapsed / 1000);
+                     }
+
                      break;        /* success; break out of retry loop */
+                 }

                  /*
                   * XXX is there any point in allowing more than one retry?
*************** mdsync(void)
*** 1113,1118 ****
--- 1143,1153 ----
              elog(ERROR, "pendingOpsTable corrupted");
      }                            /* end loop over hashtable entries */

+     /* Return sync performance metrics for report at checkpoint end */
+     CheckpointStats.ckpt_sync_rels = processed;
+     CheckpointStats.ckpt_longest_sync = longest;
+     CheckpointStats.ckpt_agg_sync_time = total_elapsed;
+
      /* Flag successful completion of mdsync */
      mdsync_in_progress = false;
  }
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index fa7ae2a..484cb6c 100644
*** a/src/include/access/xlog.h
--- b/src/include/access/xlog.h
*************** typedef struct CheckpointStatsData
*** 257,262 ****
--- 257,269 ----
      int            ckpt_segs_added;    /* # of new xlog segments created */
      int            ckpt_segs_removed;        /* # of xlog segments deleted */
      int            ckpt_segs_recycled;        /* # of xlog segments recycled */
+
+     int            ckpt_sync_rels;        /* # of relations synced */
+     uint64        ckpt_longest_sync;    /* Longest sync for one relation */
+     uint64        ckpt_agg_sync_time;    /* The sum of all the individual sync
+                                        times, which is not necessarily the
+                                        same as the total elapsed time for
+                                        the entire sync phase. */
  } CheckpointStatsData;

  extern CheckpointStatsData CheckpointStats;

Re: Instrument checkpoint sync calls

From

Alvaro Herrera

Date:

14 December 2010, 10:29:21

I gave this patch a look and it seems pretty good to me, except that I'm
uncomfortable with the idea of mdsync filling in the details for
CheckpointStats fields directly.  Would it work to pass a struct (say
SmgrSyncStats) from CheckPointBuffers to smgrsync and from there to
mdsync, have this function fill it, and return it back so that
CheckPointBuffers copies the data from this struct into CheckpointStats?

Another minor nitpick: inside the block when you call FileSync, why
check for log_checkpoints at all?  Seems to me that just checking for
zero of sync_start should be enough.  Alternatively, seems simpler to
just have a local var with the value of log_checkpoints at the start of
mdsync and use that throughout the function.  (Surely if someone turns
off log_checkpoints in the middle of a checkpoint, it's not really a
problem that we collect and report stats during that checkpoint.)

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Instrument checkpoint sync calls

From

Robert Haas

Date:

14 December 2010, 10:35:03

On Tue, Dec 14, 2010 at 9:29 AM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
> I gave this patch a look and it seems pretty good to me, except

Err, woops.  I just committed this as-is.  Sorry.

> that I'm
> uncomfortable with the idea of mdsync filling in the details for
> CheckpointStats fields directly.  Would it work to pass a struct (say
> SmgrSyncStats) from CheckPointBuffers to smgrsync and from there to
> mdsync, have this function fill it, and return it back so that
> CheckPointBuffers copies the data from this struct into CheckpointStats?
>
> Another minor nitpick: inside the block when you call FileSync, why
> check for log_checkpoints at all?  Seems to me that just checking for
> zero of sync_start should be enough.  Alternatively, seems simpler to
> just have a local var with the value of log_checkpoints at the start of
> mdsync and use that throughout the function.  (Surely if someone turns
> off log_checkpoints in the middle of a checkpoint, it's not really a
> problem that we collect and report stats during that checkpoint.)

Neither of these things bothers me, but we can certainly discuss...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Instrument checkpoint sync calls

From

Alvaro Herrera

Date:

14 December 2010, 12:47:26

Excerpts from Robert Haas's message of mar dic 14 11:34:55 -0300 2010:
> On Tue, Dec 14, 2010 at 9:29 AM, Alvaro Herrera
> <alvherre@commandprompt.com> wrote:
> > I gave this patch a look and it seems pretty good to me, except
> 
> Err, woops.  I just committed this as-is.  Sorry.

I noticed :-)

> > that I'm
> > uncomfortable with the idea of mdsync filling in the details for
> > CheckpointStats fields directly.  Would it work to pass a struct (say
> > SmgrSyncStats) from CheckPointBuffers to smgrsync and from there to
> > mdsync, have this function fill it, and return it back so that
> > CheckPointBuffers copies the data from this struct into CheckpointStats?
> >
> > Another minor nitpick: inside the block when you call FileSync, why
> > check for log_checkpoints at all?  Seems to me that just checking for
> > zero of sync_start should be enough.  Alternatively, seems simpler to
> > just have a local var with the value of log_checkpoints at the start of
> > mdsync and use that throughout the function.  (Surely if someone turns
> > off log_checkpoints in the middle of a checkpoint, it's not really a
> > problem that we collect and report stats during that checkpoint.)
> 
> Neither of these things bothers me, but we can certainly discuss...

Well, the second one was just about simplifying it, so never mind that.
But referring to CheckpointStats in md.c seems to me to be a violation
of modularity that ought to be fixed.

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Instrument checkpoint sync calls

From

Robert Haas

Date:

14 December 2010, 14:49:51

On Tue, Dec 14, 2010 at 11:47 AM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
>> > that I'm
>> > uncomfortable with the idea of mdsync filling in the details for
>> > CheckpointStats fields directly.  Would it work to pass a struct (say
>> > SmgrSyncStats) from CheckPointBuffers to smgrsync and from there to
>> > mdsync, have this function fill it, and return it back so that
>> > CheckPointBuffers copies the data from this struct into CheckpointStats?
>
> But referring to CheckpointStats in md.c seems to me to be a violation
> of modularity that ought to be fixed.

Hmm.  I guess I can't get worked up about it.  We could do what you
propose, but I'm not sure what purpose it would serve.  It's not as if
mdsync() can possibly serve any other purpose other than to be the
guts of a checkpoint.  It seems to me that if we really wanted to get
this disentangled from the checkpoint stats stuff we'd also need to
think about moving that elog(DEBUG1) I added out of the function
somehow, but I think that would just be a lot of notional complexity
for no particular benefit.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Instrument checkpoint sync calls

From

Greg Smith

Date:

15 December 2010, 10:23:17

Alvaro Herrera wrote:
> I gave this patch a look and it seems pretty good to me, except that I'm
> uncomfortable with the idea of mdsync filling in the details for
> CheckpointStats fields directly.  Would it work to pass a struct (say
> SmgrSyncStats) from CheckPointBuffers to smgrsync and from there to
> mdsync, have this function fill it, and return it back so that
> CheckPointBuffers copies the data from this struct into CheckpointStats?
>   

That was originally how I planned to write this bit of code.  When I 
realized that the CheckpointStats structure was already visible there 
and stuffed with details that ultimately go into the same output line at 
the end, it just didn't seem worth the extra code complexity.  The 
abstraction layer around md.c was not exactly airtight before I poked 
that extra little hole in there, and I was aiming via the principal of a 
smaller diff usually being the better patch . 

If you feel strongly that the result led to a bad abstraction violation, 
I'll submit a patch to refactor it to pass a structure instead before 
the next CF.  I appreciate your concern, I'm just not sure it's worth 
spending time on.  What I'd really like to do is refactor out major 
parts of the leaky md/smgr layers altogether instead, but that's 
obviously a bigger project.

> Another minor nitpick: inside the block when you call FileSync, why
> check for log_checkpoints at all?  Seems to me that just checking for
> zero of sync_start should be enough.  Alternatively, seems simpler to
> just have a local var with the value of log_checkpoints at the start of
> mdsync and use that throughout the function.  (Surely if someone turns
> off log_checkpoints in the middle of a checkpoint, it's not really a
> problem that we collect and report stats during that checkpoint.)
>   

And now you're just getting picky!  This is a useful observation though, 
and I'll try to include that fix along with the next general checkpoint 
overhaul patch I submit.  Doesn't seem worth going through the trouble 
of committing that minor rework on its own, I'll slip it into the next 
useful thing that touches this area I do.  Thanks for the hint, this 
would work better than what I did.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services and Support        www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

Re: Instrument checkpoint sync calls

From

Robert Haas

Date:

15 December 2010, 10:53:10

On Wed, Dec 15, 2010 at 9:22 AM, Greg Smith <greg@2ndquadrant.com> wrote:
> patch I submit.  Doesn't seem worth going through the trouble of committing
> that minor rework on its own, I'll slip it into the next useful thing that
> touches this area I do.  Thanks for the hint, this would work better than
> what I did.

Well, if I'm the one committing it, I'll pull that part out again and
commit it separately.  Not sure if that affects your calculus, but I
much prefer patches that don't try to do ancillary things along the
way.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Instrument checkpoint sync calls

From

Greg Smith

Date:

16 December 2010, 07:48:29

Robert Haas wrote: <blockquote cite="mid:AANLkTimfCQYq4QdKi_ocsJWu_UW8LTnrEd7MGVrk=pBA@mail.gmail.com"
type="cite"><prewrap="">On Wed, Dec 15, 2010 at 9:22 AM, Greg Smith <a class="moz-txt-link-rfc2396E"
href="mailto:greg@2ndquadrant.com"><greg@2ndquadrant.com></a>wrote: </pre><blockquote type="cite"><pre
wrap="">patchI submit.  Doesn't seem worth going through the trouble of committing
 
that minor rework on its own, I'll slip it into the next useful thing that
touches this area I do.  Thanks for the hint, this would work better than
what I did.   </pre></blockquote><pre wrap="">
Well, if I'm the one committing it, I'll pull that part out again and
commit it separately.  Not sure if that affects your calculus, but I
much prefer patches that don't try to do ancillary things along the
way. </pre></blockquote><br /> I meant that I'd bundle it into the block of time I spend on that, and likely submit
withsomething else that touches the same area.  Obviously the correction patch would be better on its own when being
handedover to a committer.<br /><br /><pre class="moz-signature" cols="72">-- 
 
Greg Smith   2ndQuadrant US    <a class="moz-txt-link-abbreviated"
href="mailto:greg@2ndQuadrant.com">greg@2ndQuadrant.com</a>  Baltimore, MD
 
PostgreSQL Training, Services and Support        <a class="moz-txt-link-abbreviated"
href="http://www.2ndQuadrant.us">www.2ndQuadrant.us</a>
"PostgreSQL 9.0 High Performance": <a class="moz-txt-link-freetext"
href="http://www.2ndQuadrant.com/books">http://www.2ndQuadrant.com/books</a>
</pre>

Re: Instrument checkpoint sync calls

From

Robert Haas

Date:

16 December 2010, 09:56:06

On Thu, Dec 16, 2010 at 6:48 AM, Greg Smith <greg@2ndquadrant.com> wrote:
> I meant that I'd bundle it into the block of time I spend on that, and
> likely submit with something else that touches the same area.  Obviously the
> correction patch would be better on its own when being handed over to a
> committer.

Oh, that's very different.  Never mind!

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company