On Mon, Nov 7, 2016 at 4:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Jonathon Nelson <jdnelson@dyn.com> writes:
> > On Mon, Nov 7, 2016 at 1:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> I wonder if this is a problem similar to the autovacuum issue we fixed
> >> in da1a9d0f5, ie perhaps moving the system clock setting confuses the
> >> checkpoint timing logic.
>
> > That is more or less what we were thinking as well.
>
> Looking at the logic around this in checkpointer.c, it's pretty obvious
> that it would not behave nicely if system time goes backwards after a
> checkpoint starts; it would think it was ahead of schedule and would
> just loaf, basically, until the clock catches up to where it had been.
> There's no sanity check to notice a negative elapsed-time reading.
> But if system time goes forwards, it would think it was very far behind
> schedule and would do a burst of work, which doesn't seem to match your
> symptom.
>
> Please confirm the sign of the system clock correction that happened
> on your machine?
>
Before responding, I triple checked everything I have. I did make a
mistake, but it's one of scale: the time went forward 1d, 57m, and 1.7s
(earlier I said it was about an hour). Prior to the event, the system clock
was all over the place, however I cannot find evidence of any further time
corrections. This is a busy system and easily logs more than once a second,
so I chose to identify time jumps by taking the logs (in the order they
appeared in!) and subtracting the previous log's timestamp. If the
difference was either negative or greater than 2 seconds, I set it aside. I
did not find any such events during this timeframe.
--
Jon Nelson