Re: last_statrequest is in the future - Mailing list pgsql-hackers

From Tom Lane
Subject Re: last_statrequest is in the future
Date
Msg-id 19946.1269535540@sss.pgh.pa.us
Whole thread Raw
In response to last_statrequest is in the future  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
I wrote:
> ... lookee what we have here:
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jaguar&dt=2010-03-24%2004:00:07

Modifying the code to print the actual timestamps resulted in a wealth
of information that can be dredged from the postmaster log here:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jaguar&dt=2010-03-25%2004:00:04
Just grep for log messages like
[4baafb13.cd2:1] LOG:  last_statrequest 2010-03-25 06:58:42.118734+01 is later than collector's time 2010-03-25
06:58:40.817027+01

What this shows is that system timekeeping on jaguar is just utterly
broken.  gettimeofday() sometimes returns answers that are 5 seconds or
more skewed in different processes, and there are episodes where its
value fails to move at all in the stats collector while multiple seconds
elapse in the opinion of another process.

I wonder whether this doesn't also explain jaguar's propensity to show
some other unexplainable failures, like these recent examples:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jaguar&dt=2010-03-20%2004:00:03
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jaguar&dt=2010-03-15%2004:00:04
which are not at all unique in that machine's buildfarm history.

I had always supposed that these were caused by the extreme overall
slowness induced by CLOBBER_CACHE_ALWAYS, but now I think they are
more likely explained by system timekeeping issues.  In particular the
last-cited regression failure can be explained if now() fails to change
value across pg_sleep(0.1), and that's isomorphic to what seems to be
happening in the stats collector in some of the episodes captured today.

In short: either this machine has got broken clock hardware, or there's
a very serious bug in the kernel code for reading the clock.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: ALTER ROLE/DATABASE RESET ALL versus security
Next
From: Josh Berkus
Date:
Subject: Conditional trigger missing from pg_dump