Thread: gettimeofday is at the end of its usefulness?

gettimeofday is at the end of its usefulness?

From
Tom Lane
Date:
A recent question from Tim Kane prompted me to measure the overhead
costs of EXPLAIN ANALYZE, which I'd not checked in awhile.  Things
are far worse than I thought.  On my current server (by no means
lavish hardware: Xeon E5-2609 @2.40GHz) a simple seqscan can run
at something like 110 nsec per row:

regression=# create table foo as select x as f1 from generate_series(1,1000000) x;
SELECT 1000000
regression=# vacuum foo;
VACUUM
regression=# explain analyze select * from foo;
                                                  QUERY PLAN
---------------------------------------------------------------------------------------------------------------
 Seq Scan on foo  (cost=0.00..14425.00 rows=1000000 width=4) (actual time=0.053..111.720 rows=1000000 loops=1)
 Planning time: 0.222 ms
 Execution time: 166.682 ms
(3 rows)

(and, btw, this is a debug build --- without assert and memory
context checks it'd be faster.)

The problem with this number is that a simple test program shows that
gettimeofday() requires about 40 nsec on this hardware.  That means
that two-thirds of the above timing measurement is overhead.

To add insult to injury, gettimeofday's output cannot be more precise than
1 microsecond, making its use for measuring sub-microsecond intervals at
best stochastic.

I looked around a bit and found that recent versions of POSIX have a
function clock_gettime() that's a bit more modern than gettimeofday():
at least, the output struct provides nsec rather than usec available
precision.  I benchmarked this, with the CLOCK_REALTIME selector, and
found that it also requires about 40nsec, while the output is actually
good to perhaps 10nsec precision.  (I base this on seeing no duplicate
readings in a tight loop, so that the value is certainly getting advanced
more often than once every 40 nsec.)

There's also a CLOCK_REALTIME_COARSE selector, which is noticeably faster
--- about 10nsec for me --- but the output appears to only advance once
every millisecond, so it's probably useless for our purposes.  The other
selectors mentioned in the Linux man page are considerably slower than
CLOCK_REALTIME for me, suggesting that they actually call into the kernel.

I also tried a loop around a bare "rdtsc" assembly instruction, finding
that that instruction takes about 10nsec.  That would be a nice
improvement over gettimeofday, except that using that directly would
involve dealing with cross-CPU skew, which seems like no fun at all.
And I don't really want to get into finding equivalents for non-Intel
architectures, either.

Anyway it looks like clock_gettime() might be worth using on Linux
just for the more precise output.  It doesn't seem to exist on OS X
though, and I have no idea about elsewhere.

I'm curious if anybody has ideas about other things we might do for
portable high-precision timing.

            regards, tom lane

#include <stdio.h>
#include <time.h>
#include <sys/time.h>

int main(int argc, char **argv)
{
    long thistime, lasttime = 0;
    int nbogus = 0;
    int ndistinct = 0;
    int i;

    for (i=0; i<1000000000; i++)
    {
#if 1
        struct timespec tv;
        clock_gettime(CLOCK_REALTIME, &tv);
        thistime = (long) tv.tv_sec * 1000000000 + tv.tv_nsec;
#else
        struct timeval tv;
        gettimeofday(&tv, NULL);
        thistime = (long) tv.tv_sec * 1000000 + tv.tv_usec;
#endif

        if (thistime < lasttime)
            nbogus++;
        else if (thistime > lasttime)
            ndistinct++;
        lasttime = thistime;
    }

    printf("%d bogus readings\n", nbogus);
    printf("%d distinct readings\n", ndistinct);

    return 0;
}

Re: gettimeofday is at the end of its usefulness?

From
Peter Geoghegan
Date:
On Tue, May 13, 2014 at 3:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> There's also a CLOCK_REALTIME_COARSE selector, which is noticeably faster
> --- about 10nsec for me --- but the output appears to only advance once
> every millisecond, so it's probably useless for our purposes.  The other
> selectors mentioned in the Linux man page are considerably slower than
> CLOCK_REALTIME for me, suggesting that they actually call into the kernel.

What Linux kernel version is in use here? Apparently, as I think
you've stated another way, more recent versions have VDSO for this,
which can make a big difference. This article seems like a sensible
guide to all of this:

https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_MRG/1.3/html/Realtime_Tuning_Guide/sect-Realtime_Tuning_Guide-General_System_Tuning-gettimeofday_speedup.html

CLOCK_REALTIME_COARSE seemingly influences precision in a way that
allows user space applications to decide on their precision/cost
trade-off, rather than being forced to use the system default (that
procfs surfaces) through gettimeofday():
http://lwn.net/Articles/342018/

I can see a benefit in exposing this trade-off to Postgres code
directly. I still think that a correlated reference period will prove
useful, and while there are a number of ways to amortize the cost of
repeatedly (coarsely) getting the wall time in the ordinary course of
choosing victim buffers, it would be nice to do this too.
-- 
Peter Geoghegan



Re: gettimeofday is at the end of its usefulness?

From
Tom Lane
Date:
Peter Geoghegan <pg@heroku.com> writes:
> On Tue, May 13, 2014 at 3:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> There's also a CLOCK_REALTIME_COARSE selector, which is noticeably faster
>> --- about 10nsec for me --- but the output appears to only advance once
>> every millisecond, so it's probably useless for our purposes.  The other
>> selectors mentioned in the Linux man page are considerably slower than
>> CLOCK_REALTIME for me, suggesting that they actually call into the kernel.

> What Linux kernel version is in use here?

Ah, sorry, I should have specified.  This is RHEL6.5, current kernel
version 2.6.32-431.17.1.el6.x86_64.

> Apparently, as I think
> you've stated another way, more recent versions have VDSO for this,
> which can make a big difference. This article seems like a sensible
> guide to all of this:
>
https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_MRG/1.3/html/Realtime_Tuning_Guide/sect-Realtime_Tuning_Guide-General_System_Tuning-gettimeofday_speedup.html

This appears to be talking about RHEL5, which is quite a bit older
(and, I'd guess, trailing edge for anybody who might deploy PG 9.5).
I did confirm that /proc/sys/kernel/vsyscall64 exists and has a default
setting of 1 on RHEL6.  Setting it to 0 causes gettimeofday to take
150ns, which probably represents the time for a trivial kernel call.
The MRG extension described on the linked page doesn't seem to be
implemented in stock RHEL6 (setting vsyscall64 to 2 is allowed but
doesn't change behavior compared to 1).  However, if I'm reading it
right, all that does is make gettimeofday behave like
clock_gettime(CLOCK_REALTIME_COARSE).

> CLOCK_REALTIME_COARSE seemingly influences precision in a way that
> allows user space applications to decide on their precision/cost
> trade-off, rather than being forced to use the system default (that
> procfs surfaces) through gettimeofday():
> http://lwn.net/Articles/342018/

Yeah, I think these are the same implementations exposed to apps in two
different ways, one being a system-wide switch affecting gettimeofday()
while the other allows the app source code to say which one it wants.
        regards, tom lane



Re: gettimeofday is at the end of its usefulness?

From
Greg Stark
Date:
On Tue, May 13, 2014 at 11:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I also tried a loop around a bare "rdtsc" assembly instruction, finding
> that that instruction takes about 10nsec.  That would be a nice
> improvement over gettimeofday, except that using that directly would
> involve dealing with cross-CPU skew, which seems like no fun at all.
> And I don't really want to get into finding equivalents for non-Intel
> architectures, either.

I always assumed the kernel used rdtsc to implement some of the high
performance timers. It can save the current time in a mapped page when
it schedules a process and then in the vdso syscall (ie in user-space)
it can use rdtsc to calculate the offset needed to adjust that
timestamp to the current time. This seems consistent with your
calculations that showed the 40ns overhead with +/- 10ns precision.

I actually think it would be more interesting if we could measure the
overhead and adjust for it. I don't think people are really concerned
with how long EXPLAIN ANALYZE takes to run if they could get accurate
numbers out of it.

Other profiling tools I poked at in the past ran a tight loop around
the profiling code to estimate the time it actually took and then
subtracted that from all the measurements. I think that might work for
the actual clock_gettime overhead. If we did that then we could call
it twice and measure the time spent in the rest of the EXPLAIN ANALYZE
code and subtract that plus the time for the two clock_gettimes from
the run-time...

-- 
greg



Re: gettimeofday is at the end of its usefulness?

From
Tom Lane
Date:
Greg Stark <stark@mit.edu> writes:
> I actually think it would be more interesting if we could measure the
> overhead and adjust for it.

Actually, that's quite a good thought.  The overhead should be a pretty
stable number on any given machine, so in theory we could do this to
high precision.  And the numbers I just showed say that on current
x86_64 platforms, the *best we could possibly hope for* in terms of
direct overhead reduction is about 4x.  Which is something, but it
hardly makes the problem vanish.

I have a vague feeling that we discussed subtract-the-overhead once before
and thought it wasn't necessary yet.  Maybe it's time.

But we also need to be using something that gives better than 1usec
resolution.  So I'm still thinking we should use clock_gettime() where
available, and look for alternative APIs where not.
        regards, tom lane



Re: gettimeofday is at the end of its usefulness?

From
Jeff Janes
Date:
On Tuesday, May 13, 2014, Tom Lane <tgl@sss.pgh.pa.us> wrote:
A recent question from Tim Kane prompted me to measure the overhead
costs of EXPLAIN ANALYZE, which I'd not checked in awhile.  Things
are far worse than I thought.  On my current server (by no means
lavish hardware: Xeon E5-2609 @2.40GHz) a simple seqscan can run
at something like 110 nsec per row:

regression=# create table foo as select x as f1 from generate_series(1,1000000) x;
SELECT 1000000
regression=# vacuum foo;
VACUUM
regression=# explain analyze select * from foo;
                                                  QUERY PLAN
---------------------------------------------------------------------------------------------------------------
 Seq Scan on foo  (cost=0.00..14425.00 rows=1000000 width=4) (actual time=0.053..111.720 rows=1000000 loops=1)
 Planning time: 0.222 ms
 Execution time: 166.682 ms
(3 rows)

(and, btw, this is a debug build --- without assert and memory
context checks it'd be faster.)

The problem with this number is that a simple test program shows that
gettimeofday() requires about 40 nsec on this hardware.  That means
that two-thirds of the above timing measurement is overhead.

I'm all for finding something better if we can, but in the mean time this is certainly not unexpected, and isn't it exactly what "explain (analyze,timing off)" was invented for?

Cheers,

Jeff

Re: gettimeofday is at the end of its usefulness?

From
Ants Aasma
Date:
On Wed, May 14, 2014 at 6:34 AM, Greg Stark <stark@mit.edu> wrote:
> I always assumed the kernel used rdtsc to implement some of the high
> performance timers. It can save the current time in a mapped page when
> it schedules a process and then in the vdso syscall (ie in user-space)
> it can use rdtsc to calculate the offset needed to adjust that
> timestamp to the current time. This seems consistent with your
> calculations that showed the 40ns overhead with +/- 10ns precision.

Both gettimeofday and clock_gettime do exactly that. [1]
clock_gettime(CLOCK_MONOTONIC) is the mode of operation we would want
to use here.

> I actually think it would be more interesting if we could measure the
> overhead and adjust for it. I don't think people are really concerned
> with how long EXPLAIN ANALYZE takes to run if they could get accurate
> numbers out of it.

Measuring would also be a good idea so we can automatically turn on
performance counters like IO timing when we know it's not obscenely
expensive.

However, subtracting the overhead will still skew the numbers somewhat
by giving more breathing time for memory and IO prefetching
mechanisms. Another option to consider would be to add a sampling
based mechanism for low overhead time attribution. It would be even
better if we could distinguish between time spent waiting on locks vs.
waiting on IO vs. waiting to be scheduled vs. actually executing.

[1] https://github.com/torvalds/linux/blob/master/arch/x86/vdso/vclock_gettime.c#L223

Regards,
Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de



Re: gettimeofday is at the end of its usefulness?

From
Robert Haas
Date:
On Tue, May 13, 2014 at 11:34 PM, Greg Stark <stark@mit.edu> wrote:
> I always assumed the kernel used rdtsc to implement some of the high
> performance timers. It can save the current time in a mapped page when
> it schedules a process and then in the vdso syscall (ie in user-space)
> it can use rdtsc to calculate the offset needed to adjust that
> timestamp to the current time. This seems consistent with your
> calculations that showed the 40ns overhead with +/- 10ns precision.

Crazy idea: Instead of trying to time precisely the amount of time we
spend in each node, configure a very-high frequency timer interrupt
(or background thread?) that does:

SomeGlobalVariablePointingToTheCurrentNode->profiling_counter++;

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: gettimeofday is at the end of its usefulness?

From
Benedikt Grundmann
Date:
I posted this on this mailing list before at Jane Street we have developed very fast code to get timing information based on TSC if available.  It's all ocaml but well documented and mostly just calls to c functions so should be easy to port to C and we release it under a very liberal license so it should be no problem to take the ideas:

https://github.com/janestreet/core/blob/master/lib/time_stamp_counter.mli

Hope this is useful.

Bene


On Wed, May 14, 2014 at 12:41 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, May 13, 2014 at 11:34 PM, Greg Stark <stark@mit.edu> wrote:
> I always assumed the kernel used rdtsc to implement some of the high
> performance timers. It can save the current time in a mapped page when
> it schedules a process and then in the vdso syscall (ie in user-space)
> it can use rdtsc to calculate the offset needed to adjust that
> timestamp to the current time. This seems consistent with your
> calculations that showed the 40ns overhead with +/- 10ns precision.

Crazy idea: Instead of trying to time precisely the amount of time we
spend in each node, configure a very-high frequency timer interrupt
(or background thread?) that does:

SomeGlobalVariablePointingToTheCurrentNode->profiling_counter++;

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: gettimeofday is at the end of its usefulness?

From
Benedikt Grundmann
Date:

On Thu, May 15, 2014 at 8:19 AM, Benedikt Grundmann <bgrundmann@janestreet.com> wrote:
I posted this on this mailing list before at Jane Street we have developed very fast code to get timing information based on TSC if available.  It's all ocaml but well documented and mostly just calls to c functions so should be easy to port to C and we release it under a very liberal license so it should be no problem to take the ideas:

https://github.com/janestreet/core/blob/master/lib/time_stamp_counter.mli

Hope this is useful.

Bene


Also I'm sorry for top posting.  hackers is the only mailing list I'm on that requires this and some others require top posting so this runs counter my habits and I only realized after sending...

 

On Wed, May 14, 2014 at 12:41 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, May 13, 2014 at 11:34 PM, Greg Stark <stark@mit.edu> wrote:
> I always assumed the kernel used rdtsc to implement some of the high
> performance timers. It can save the current time in a mapped page when
> it schedules a process and then in the vdso syscall (ie in user-space)
> it can use rdtsc to calculate the offset needed to adjust that
> timestamp to the current time. This seems consistent with your
> calculations that showed the 40ns overhead with +/- 10ns precision.

Crazy idea: Instead of trying to time precisely the amount of time we
spend in each node, configure a very-high frequency timer interrupt
(or background thread?) that does:

SomeGlobalVariablePointingToTheCurrentNode->profiling_counter++;

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: gettimeofday is at the end of its usefulness?

From
Greg Stark
Date:
On Thu, May 15, 2014 at 8:19 AM, Benedikt Grundmann
<bgrundmann@janestreet.com> wrote:
> I posted this on this mailing list before at Jane Street we have developed
> very fast code to get timing information based on TSC if available.  It's
> all ocaml but well documented and mostly just calls to c functions so should
> be easy to port to C and we release it under a very liberal license so it
> should be no problem to take the ideas:

What OS do you run it on though? How fast is your implementation
compared to the kernel implementation of clock_gettime()?

Are you sure your implementation is actually faster? And are you sure
you're protected against clocks going backwards? I think you should
put some i/o in the loop in the test and start several threads running
it to make it more likely the thread is rescheduled to a different
processor during the test. It suspect you'll find the rdtsc goes
backwards sometimes or produces crazy results when switching
processors.

-- 
greg



Re: gettimeofday is at the end of its usefulness?

From
Benedikt Grundmann
Date:



On Thu, May 15, 2014 at 11:31 AM, Greg Stark <stark@mit.edu> wrote:
On Thu, May 15, 2014 at 8:19 AM, Benedikt Grundmann
<bgrundmann@janestreet.com> wrote:
> I posted this on this mailing list before at Jane Street we have developed
> very fast code to get timing information based on TSC if available.  It's
> all ocaml but well documented and mostly just calls to c functions so should
> be easy to port to C and we release it under a very liberal license so it
> should be no problem to take the ideas:

What OS do you run it on though? How fast is your implementation
compared to the kernel implementation of clock_gettime()?

Are you sure your implementation is actually faster? And are you sure
you're protected against clocks going backwards? I think you should
put some i/o in the loop in the test and start several threads running
it to make it more likely the thread is rescheduled to a different
processor during the test. It suspect you'll find the rdtsc goes
backwards sometimes or produces crazy results when switching
processors.


There are benchmarks in the link I posted (obtained by a micro benchmarking library we developed / use internally which takes great care to obtain reliable numbers) .  We use posix threads extensively. We internally spend a lot of time setting up ntp and monitoring systems so that clock backwards never happens (so with other words I wouldn't be surprised if the library does NOT work correctly when it does  -- our protection is outside).  I do not believe we have seen the tdtsc going backwards on thread context switch you mention (and as said we use lots of threads).  OS?  Centos 6.5 primarily.







--
greg

Re: gettimeofday is at the end of its usefulness?

From
Andres Freund
Date:
On 2014-05-15 12:04:25 +0100, Benedikt Grundmann wrote:
> On Thu, May 15, 2014 at 11:31 AM, Greg Stark <stark@mit.edu> wrote:
> 
> > On Thu, May 15, 2014 at 8:19 AM, Benedikt Grundmann
> > <bgrundmann@janestreet.com> wrote:
> > > I posted this on this mailing list before at Jane Street we have
> > developed
> > > very fast code to get timing information based on TSC if available.  It's
> > > all ocaml but well documented and mostly just calls to c functions so
> > should
> > > be easy to port to C and we release it under a very liberal license so it
> > > should be no problem to take the ideas:
> >
> > What OS do you run it on though? How fast is your implementation
> > compared to the kernel implementation of clock_gettime()?
> >
> > Are you sure your implementation is actually faster? And are you sure
> > you're protected against clocks going backwards? I think you should
> > put some i/o in the loop in the test and start several threads running
> > it to make it more likely the thread is rescheduled to a different
> > processor during the test. It suspect you'll find the rdtsc goes
> > backwards sometimes or produces crazy results when switching
> > processors.
> >
> >
> There are benchmarks in the link I posted (obtained by a micro benchmarking
> library we developed / use internally which takes great care to obtain
> reliable numbers) .  We use posix threads extensively. We internally spend
> a lot of time setting up ntp and monitoring systems so that clock backwards
> never happens (so with other words I wouldn't be surprised if the library
> does NOT work correctly when it does  -- our protection is outside).  I do
> not believe we have seen the tdtsc going backwards on thread context switch
> you mention (and as said we use lots of threads).  OS?  Centos 6.5
> primarily.

Did you test it on server with more one socket (i.e. not just multiple
cores, but distinct cpu cases)? That's where you expect to see
differences in TSC to have funny effects.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: gettimeofday is at the end of its usefulness?

From
Andres Freund
Date:
Hi,

On 2014-05-13 18:58:11 -0400, Tom Lane wrote:
> Anyway it looks like clock_gettime() might be worth using on Linux
> just for the more precise output.  It doesn't seem to exist on OS X
> though, and I have no idea about elsewhere.

Agreed that using clock_gettime() would be a good idea. I'd say we
should have a wrapper around it that is able to provide nanosecond
precision. If only gettimeofday() (and whatever windows is using) is
available, we can dynamically fall back to that.

> I'm curious if anybody has ideas about other things we might do for
> portable high-precision timing.

It's far from a solve-it-all, but can we perhaps try to coalesce
repeated time measurements? We'll very frequently do a   InstrStopNode();   /* minimal amount of work */
InstrStartNode();
which will measure the time twice. I think there's a fair number of
scenarios where once would be enough. I'll freely admit that I haven't
looked enough to determine how we could do that API wise.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: gettimeofday is at the end of its usefulness?

From
Bruce Momjian
Date:
On Tue, May 13, 2014 at 06:58:11PM -0400, Tom Lane wrote:
> A recent question from Tim Kane prompted me to measure the overhead
> costs of EXPLAIN ANALYZE, which I'd not checked in awhile.  Things
> are far worse than I thought.  On my current server (by no means
> lavish hardware: Xeon E5-2609 @2.40GHz) a simple seqscan can run
> at something like 110 nsec per row:

I assume you ran pg_test_timing too:
Testing timing overhead for 3 seconds.Per loop time including overhead: 41.70 nsecHistogram of timing durations:< usec
% of total      count     1     95.83035   68935459     2      4.16923    2999133     4      0.00037        268     8
  0.00004         31    16      0.00000          1    32      0.00000          1
 

My overhead of 41.70 nsec matches yours.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: gettimeofday is at the end of its usefulness?

From
Thom Brown
Date:
On 15 May 2014 at 19:56, Bruce Momjian <bruce@momjian.us> wrote:
On Tue, May 13, 2014 at 06:58:11PM -0400, Tom Lane wrote:
> A recent question from Tim Kane prompted me to measure the overhead
> costs of EXPLAIN ANALYZE, which I'd not checked in awhile.  Things
> are far worse than I thought.  On my current server (by no means
> lavish hardware: Xeon E5-2609 @2.40GHz) a simple seqscan can run
> at something like 110 nsec per row:

I assume you ran pg_test_timing too:

        Testing timing overhead for 3 seconds.
        Per loop time including overhead: 41.70 nsec
        Histogram of timing durations:
        < usec   % of total      count
             1     95.83035   68935459
             2      4.16923    2999133
             4      0.00037        268
             8      0.00004         31
            16      0.00000          1
            32      0.00000          1

My overhead of 41.70 nsec matches yours.

Did this idea die, or is it still worth considering?

Thom

Re: gettimeofday is at the end of its usefulness?

From
Tom Lane
Date:
Thom Brown <thom@linux.com> writes:
> On 15 May 2014 at 19:56, Bruce Momjian <bruce@momjian.us> wrote:
>> On Tue, May 13, 2014 at 06:58:11PM -0400, Tom Lane wrote:
>>> A recent question from Tim Kane prompted me to measure the overhead
>>> costs of EXPLAIN ANALYZE, which I'd not checked in awhile.  Things
>>> are far worse than I thought.  On my current server (by no means
>>> lavish hardware: Xeon E5-2609 @2.40GHz) a simple seqscan can run
>>> at something like 110 nsec per row:

> Did this idea die, or is it still worth considering?

We still have a problem, for sure.  I'm not sure that there was any
consensus on what to do about it.  Using clock_gettime(CLOCK_REALTIME)
if available would be a straightforward change that should ameliorate
gettimeofday()'s 1-usec-precision-limit problem; but it doesn't do
anything to fix the excessive-overhead problem.  The ideas about the
latter were all over the map, and none of them looked easy.

If you're feeling motivated to work on this area, feel free.
        regards, tom lane



Re: gettimeofday is at the end of its usefulness?

From
Jim Nasby
Date:
On 6/8/16 9:56 AM, Tom Lane wrote:
> Thom Brown <thom@linux.com> writes:
>> On 15 May 2014 at 19:56, Bruce Momjian <bruce@momjian.us> wrote:
>>> On Tue, May 13, 2014 at 06:58:11PM -0400, Tom Lane wrote:
>>>> A recent question from Tim Kane prompted me to measure the overhead
>>>> costs of EXPLAIN ANALYZE, which I'd not checked in awhile.  Things
>>>> are far worse than I thought.  On my current server (by no means
>>>> lavish hardware: Xeon E5-2609 @2.40GHz) a simple seqscan can run
>>>> at something like 110 nsec per row:
>
>> Did this idea die, or is it still worth considering?
>
> We still have a problem, for sure.  I'm not sure that there was any
> consensus on what to do about it.  Using clock_gettime(CLOCK_REALTIME)
> if available would be a straightforward change that should ameliorate
> gettimeofday()'s 1-usec-precision-limit problem; but it doesn't do
> anything to fix the excessive-overhead problem.  The ideas about the
> latter were all over the map, and none of them looked easy.
>
> If you're feeling motivated to work on this area, feel free.

Semi-related: someone (Robert I think) recently mentioned investigating 
"vectorized" executor nodes, where multiple tuples would be processed in 
one shot. If we had that presumably the explain penalty would be a moot 
point.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461



Re: gettimeofday is at the end of its usefulness?

From
Robert Haas
Date:
On Tue, Jun 14, 2016 at 4:27 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
> Semi-related: someone (Robert I think) recently mentioned investigating
> "vectorized" executor nodes, where multiple tuples would be processed in one
> shot. If we had that presumably the explain penalty would be a moot point.

Yeah, both Andres and I are interested in that, and I think he's
actively working on it.  It would be quite neat if this had the effect
of reducing EXPLAIN ANALYZE's overhead to something trivial.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: gettimeofday is at the end of its usefulness?

From
Andres Freund
Date:
On 2016-07-07 14:43:31 -0400, Robert Haas wrote:
> On Tue, Jun 14, 2016 at 4:27 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
> > Semi-related: someone (Robert I think) recently mentioned investigating
> > "vectorized" executor nodes, where multiple tuples would be processed in one
> > shot. If we had that presumably the explain penalty would be a moot point.
> 
> Yeah, both Andres and I are interested in that, and I think he's
> actively working on it.  It would be quite neat if this had the effect
> of reducing EXPLAIN ANALYZE's overhead to something trivial.

I am, and it does reduce the overhead. Depends on the type of plan
though. Index nestloops e.g. don't benefit on the inner side.

Andres



Re: gettimeofday is at the end of its usefulness?

From
Haribabu Kommi
Date:
On Thu, Jun 9, 2016 at 12:56 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thom Brown <thom@linux.com> writes:
>> On 15 May 2014 at 19:56, Bruce Momjian <bruce@momjian.us> wrote:
>>> On Tue, May 13, 2014 at 06:58:11PM -0400, Tom Lane wrote:
>>>> A recent question from Tim Kane prompted me to measure the overhead
>>>> costs of EXPLAIN ANALYZE, which I'd not checked in awhile.  Things
>>>> are far worse than I thought.  On my current server (by no means
>>>> lavish hardware: Xeon E5-2609 @2.40GHz) a simple seqscan can run
>>>> at something like 110 nsec per row:
>
>> Did this idea die, or is it still worth considering?
>
> We still have a problem, for sure.  I'm not sure that there was any
> consensus on what to do about it.  Using clock_gettime(CLOCK_REALTIME)
> if available would be a straightforward change that should ameliorate
> gettimeofday()'s 1-usec-precision-limit problem; but it doesn't do
> anything to fix the excessive-overhead problem.  The ideas about the
> latter were all over the map, and none of them looked easy.
>
> If you're feeling motivated to work on this area, feel free.

How about using both CLOCK_REALTIME and CLOCK_REALTIME_COARSE
as the clock id's in clock_gettime wherever applicable. COARSE option is used
wherever there is no timing calculation is required, because in my laptop, there
is a significant performance difference is observed (like 8 times) compared to
CLOCK_REALTIME.

If it is fine, I will try to update the code and send a patch.

Regards,
Hari Babu
Fujitsu Australia



Re: gettimeofday is at the end of its usefulness?

From
Haribabu Kommi
Date:


On Thu, Aug 25, 2016 at 3:12 PM, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
On Thu, Jun 9, 2016 at 12:56 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thom Brown <thom@linux.com> writes:
>> On 15 May 2014 at 19:56, Bruce Momjian <bruce@momjian.us> wrote:
>>> On Tue, May 13, 2014 at 06:58:11PM -0400, Tom Lane wrote:
>>>> A recent question from Tim Kane prompted me to measure the overhead
>>>> costs of EXPLAIN ANALYZE, which I'd not checked in awhile.  Things
>>>> are far worse than I thought.  On my current server (by no means
>>>> lavish hardware: Xeon E5-2609 @2.40GHz) a simple seqscan can run
>>>> at something like 110 nsec per row:
>
>> Did this idea die, or is it still worth considering?
>
> We still have a problem, for sure.  I'm not sure that there was any
> consensus on what to do about it.  Using clock_gettime(CLOCK_REALTIME)
> if available would be a straightforward change that should ameliorate
> gettimeofday()'s 1-usec-precision-limit problem; but it doesn't do
> anything to fix the excessive-overhead problem.  The ideas about the
> latter were all over the map, and none of them looked easy.
>
> If you're feeling motivated to work on this area, feel free.

How about using both CLOCK_REALTIME and CLOCK_REALTIME_COARSE
as the clock id's in clock_gettime wherever applicable. COARSE option is used
wherever there is no timing calculation is required, because in my laptop, there
is a significant performance difference is observed (like 8 times) compared to
CLOCK_REALTIME.

If it is fine, I will try to update the code and send a patch.

Attached a patch that replaces most of the getimeofday function calls, except
timeofday(user callable) and GetCurrentTimestamp functions.

Didn't add any configure checks in case if the clock_gettime function is not available,
the fallback logic to gettimeofday function call.

Any comments in proceeding further?

Regards,
Hari Babu
Fujitsu Australia
Attachment

Re: [HACKERS] gettimeofday is at the end of its usefulness?

From
Tom Lane
Date:
Haribabu Kommi <kommi.haribabu@gmail.com> writes:
> Attached a patch that replaces most of the getimeofday function calls,
> except timeofday(user callable) and GetCurrentTimestamp functions.
> Didn't add any configure checks in case if the clock_gettime function is
> not available, the fallback logic to gettimeofday function call.

Well, of course, configure is the hard part.

I got interested in this area again pursuant to a question from Joel
Jacobson, and looked around to see if things had changed any since
2014.  One pleasant surprise is that Apple got around to implementing
clock_gettime() as of the current macOS release (10.12 "Sierra").
That means that pretty much all interesting platforms now have
clock_gettime(), which removes one possible objection to starting to
use it.  However, it seems like there is not a lot of commonality
to the best "clockid" to use.

In theory, according to the POSIX spec, CLOCK_MONOTONIC would be what we
want to use for time interval measurement (EXPLAIN ANALYZE), since that
would be impervious to possible clock setting changes.  But some
implementations might only optimize the more common CLOCK_REALTIME,
and I found that there are a lot of platform-specific clock IDs that
we might want to consider.

On Linux (RHEL6, 2.4GHz x86_64), I find that gettimeofday(),
clock_gettime(CLOCK_MONOTONIC), and clock_gettime(CLOCK_REALTIME)
all take about 40ns.  Of course gettimeofday() only has 1us resolution,
but the other two have perhaps 10ns resolution (I get no duplicate
readings in a tight loop).  Other documented clockids include   CLOCK_REALTIME_COARSE: about 10ns to read, but only 1ms
resolution  CLOCK_MONOTONIC_COARSE: about 12ns to read, but only 1ms resolution   CLOCK_MONOTONIC_RAW: full resolution
butvery slow, ~145ns to read 
So CLOCK_MONOTONIC seems to be the thing to use here.  It won't buy
us anything speed-wise but the extra resolution will be nice.
However, we need to do more research to see if this holds true on
other popular distros.

On macOS (10.12.2, 2.7GHz x86_64), clock_gettime(CLOCK_REALTIME)
is actually a shade faster than gettimeofday: 40ns versus 46ns.
But it's only giving 1us resolution, no better than gettimeofday.
CLOCK_MONOTONIC is also 1us and it takes 75ns to read.  But there's
a CLOCK_MONOTONIC_RAW that takes 44ns to read and seems to offer
full precision -- no duplicate readings in a tight loop.  There's
also CLOCK_MONOTONIC_RAW_APPROX which can be read in 23ns but
the resolution is only around half an ms.

I also tried FreeBSD 11.0 on another Mac (2.3GHz x86_64),
and found that gettimeofday as well as basically all their
clock_gettime variants run in 27 to 28 ns; and clock_gettime
reliably delivers full precision, except for CLOCK_SECOND which
is intentionally truncated to 1s precision.  So there would be
no need to work with anything but CLOCK_MONOTONIC here.

However, it seems that these impressive results date back only to
June 2012, cf
https://github.com/freebsd/freebsd/commit/13a9f42818f6b89a72b3e40923be809b490400d8
and at least as of that commit, only x86 and x86_64 had the fast
clock_gettime code.  Older FreeBSD, or FreeBSD on another architecture,
is likely to be a lot worse.  But I lack an installation to try.

I also tried OpenBSD 6.0 on that same Mac, and got pretty horrid
results: gettimeofday, CLOCK_REALTIME, and CLOCK_MONOTONIC all
take about 613ns to read.  Ouch.  And so does time(NULL); double
ouch.  Evidently there's no optimization on this platform and
what we're seeing is the minimum cost for a kernel call.  Still,
we do get better precision from clock_gettime than gettimeofday,
so we might as well switch.

So it seems like the configure support we'd need is to detect
whether clock_gettime is available (note on Linux there's also
a library requirement, -lrt), and we would also need a way to
provide a platform-specific choice of clockid; we at least need
enough smarts to use CLOCK_MONOTONIC_RAW on macOS.
        regards, tom lane



Re: [HACKERS] gettimeofday is at the end of its usefulness?

From
Greg Stark
Date:


On Dec 26, 2016 10:35 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

So it seems like the configure support we'd need is to detect
whether clock_gettime is available (note on Linux there's also
a library requirement, -lrt), and we would also need a way to
provide a platform-specific choice of clockid; we at least need
enough smarts to use CLOCK_MONOTONIC_RAW on macOS.

This seems like something that really should be checked at runtime. It's very specific to the specific kernel you're running on, not the build environment, and it can hopefully be measured in only a second or even a fraction of a second. The only Pebblebrook would be if other things running on the system made the test results unpredictable so that you had a small chance of getting a very suboptimal choice and we ruling the dice each time you restarted...

Re: [HACKERS] gettimeofday is at the end of its usefulness?

From
Andres Freund
Date:
On 2016-12-27 01:35:05 +0000, Greg Stark wrote:
> On Dec 26, 2016 10:35 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
> 
> 
> So it seems like the configure support we'd need is to detect
> whether clock_gettime is available (note on Linux there's also
> a library requirement, -lrt), and we would also need a way to
> provide a platform-specific choice of clockid; we at least need
> enough smarts to use CLOCK_MONOTONIC_RAW on macOS.
> 
> This seems like something that really should be checked at runtime. It's
> very specific to the specific kernel you're running on, not the build
> environment, and it can hopefully be measured in only a second or even a
> fraction of a second. The only Pebblebrook would be if other things running
> on the system made the test results unpredictable so that you had a small
> chance of getting a very suboptimal choice and we ruling the dice each time
> you restarted...

I'm pretty strongly against doing performance measurements at
startup. Both the delay and the potential for differing test results
seem like pretty bad consequences.

Greetings,

Andres Freund



Re: [HACKERS] gettimeofday is at the end of its usefulness?

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2016-12-27 01:35:05 +0000, Greg Stark wrote:
>> On Dec 26, 2016 10:35 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
>>> So it seems like the configure support we'd need is to detect
>>> whether clock_gettime is available (note on Linux there's also
>>> a library requirement, -lrt), and we would also need a way to
>>> provide a platform-specific choice of clockid; we at least need
>>> enough smarts to use CLOCK_MONOTONIC_RAW on macOS.

>> This seems like something that really should be checked at runtime.

> I'm pretty strongly against doing performance measurements at
> startup. Both the delay and the potential for differing test results
> seem like pretty bad consequences.

Yeah, that doesn't sound great to me either.  And I don't entirely
see the point, at least not with what we know now.  I am a bit concerned
that we'll find out there are popular platforms where clock_gettime
compiles but fails with ENOSYS, or some similarly unhelpful behavior.
But we won't find that out if we don't try.
        regards, tom lane



Re: [HACKERS] gettimeofday is at the end of its usefulness?

From
Florian Weimer
Date:
* Tom Lane:

> On Linux (RHEL6, 2.4GHz x86_64), I find that gettimeofday(),
> clock_gettime(CLOCK_MONOTONIC), and clock_gettime(CLOCK_REALTIME)
> all take about 40ns.  Of course gettimeofday() only has 1us resolution,
> but the other two have perhaps 10ns resolution (I get no duplicate
> readings in a tight loop).  Other documented clockids include
>     CLOCK_REALTIME_COARSE: about 10ns to read, but only 1ms resolution
>     CLOCK_MONOTONIC_COARSE: about 12ns to read, but only 1ms resolution
>     CLOCK_MONOTONIC_RAW: full resolution but very slow, ~145ns to read
> So CLOCK_MONOTONIC seems to be the thing to use here.  It won't buy
> us anything speed-wise but the extra resolution will be nice.
> However, we need to do more research to see if this holds true on
> other popular distros.

Isn't this very specific to kernel and glibc versions, depending on
things like CONFIG_HZ settings and what level of vDSO support has been
backported?



Re: [HACKERS] gettimeofday is at the end of its usefulness?

From
Tom Lane
Date:
Florian Weimer <fw@deneb.enyo.de> writes:
> * Tom Lane:
>> On Linux (RHEL6, 2.4GHz x86_64), I find that gettimeofday(),
>> clock_gettime(CLOCK_MONOTONIC), and clock_gettime(CLOCK_REALTIME)
>> all take about 40ns.  Of course gettimeofday() only has 1us resolution,
>> but the other two have perhaps 10ns resolution (I get no duplicate
>> readings in a tight loop).

> Isn't this very specific to kernel and glibc versions, depending on
> things like CONFIG_HZ settings and what level of vDSO support has been
> backported?

No doubt, but I have yet to find a platform where clock_gettime() exists
but performs worse than gettimeofday().  Do you know of one?
        regards, tom lane



Re: [HACKERS] gettimeofday is at the end of its usefulness?

From
Thomas Munro
Date:
On Tue, Dec 27, 2016 at 10:34 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I also tried FreeBSD 11.0 on another Mac (2.3GHz x86_64),
> and found that gettimeofday as well as basically all their
> clock_gettime variants run in 27 to 28 ns; and clock_gettime
> reliably delivers full precision, except for CLOCK_SECOND which
> is intentionally truncated to 1s precision.  So there would be
> no need to work with anything but CLOCK_MONOTONIC here.
>
> However, it seems that these impressive results date back only to
> June 2012, cf
> https://github.com/freebsd/freebsd/commit/13a9f42818f6b89a72b3e40923be809b490400d8
> and at least as of that commit, only x86 and x86_64 had the fast
> clock_gettime code.  Older FreeBSD, or FreeBSD on another architecture,
> is likely to be a lot worse.  But I lack an installation to try.

That commit is in every 'production' and 'legacy' release of
FreeBSD[1], meaning as far back as 9.3 (expected to be EoL in the next
few days), because it landed in 9.2 (EoL).  ARM support landed in
FreeBSD 11.0[2].  That leaves the following architectures without
fast-path time functions:

macaque:freebsd munro$ git grep 'trivial-vdso_tc.c'
lib/libc/mips/sys/Makefile.inc:SRCS+=   trivial-vdso_tc.c
lib/libc/powerpc/Makefile.inc:SRCS+=    trivial-vdso_tc.c
lib/libc/powerpc64/Makefile.inc:SRCS+=  trivial-vdso_tc.c
lib/libc/powerpcspe/Makefile.inc:SRCS+= trivial-vdso_tc.c
lib/libc/riscv/sys/Makefile.inc:SRCS+=  trivial-vdso_tc.c
lib/libc/sparc64/Makefile.inc:SRCS+=    trivial-vdso_tc.c

[1] https://www.freebsd.org/releases/
[2] https://github.com/freebsd/freebsd/commit/80e8626b434515d16b3576174438526755336810

-- 
Thomas Munro
http://www.enterprisedb.com



Re: [HACKERS] gettimeofday is at the end of its usefulness?

From
Tom Lane
Date:
Thomas Munro <thomas.munro@enterprisedb.com> writes:
> On Tue, Dec 27, 2016 at 10:34 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> However, it seems that these impressive results date back only to
>> June 2012, cf
>> https://github.com/freebsd/freebsd/commit/13a9f42818f6b89a72b3e40923be809b490400d8
>> and at least as of that commit, only x86 and x86_64 had the fast
>> clock_gettime code.  Older FreeBSD, or FreeBSD on another architecture,
>> is likely to be a lot worse.  But I lack an installation to try.

> That commit is in every 'production' and 'legacy' release of
> FreeBSD[1], meaning as far back as 9.3 (expected to be EoL in the next
> few days), because it landed in 9.2 (EoL).

I'm unclear on whether there's any significant number of people running
out-of-support *BSD releases.  If it's not something we have to worry
about, fine.

> That leaves the following architectures without
> fast-path time functions:

> macaque:freebsd munro$ git grep 'trivial-vdso_tc.c'
> lib/libc/mips/sys/Makefile.inc:SRCS+=   trivial-vdso_tc.c
> lib/libc/powerpc/Makefile.inc:SRCS+=    trivial-vdso_tc.c
> lib/libc/powerpc64/Makefile.inc:SRCS+=  trivial-vdso_tc.c
> lib/libc/powerpcspe/Makefile.inc:SRCS+= trivial-vdso_tc.c
> lib/libc/riscv/sys/Makefile.inc:SRCS+=  trivial-vdso_tc.c
> lib/libc/sparc64/Makefile.inc:SRCS+=    trivial-vdso_tc.c

Yeah, I just finished getting results from FreeBSD 10.3 on PPC
(1.33GHz G4 laptop): gettimeofday takes about 1180 ns and clock_gettime
about 1200 ns.  That difference seems to be repeatable, but since it's
only 2% I'm not too fussed about it.  Interestingly, it's very easy
to tell that it is entering the kernel, because time(1) shows a
significant fraction of system time:

$ time ./testclock
0 bogus readings
100000000 distinct readings     117.96 real        26.80 user        90.31 sys

The same test on platforms with vDSO support shows zero system time.
        regards, tom lane



Re: [HACKERS] gettimeofday is at the end of its usefulness?

From
Tom Lane
Date:
Haribabu Kommi <kommi.haribabu@gmail.com> writes:
> Attached a patch that replaces most of the getimeofday function calls,
> except timeofday(user callable) and GetCurrentTimestamp functions.

I looked at this for awhile and could not convince myself that it's
a good idea.  Trying to do s/gettimeofday/clock_gettime/g is not going
to do much for us except create portability headaches.  According
to my tests, clock_gettime is not noticeably faster than gettimeofday
on any platform, except that if you use nonstandard clockids like
CLOCK_REALTIME_COARSE then on *some* platforms it's a little bit quicker,
at the cost of being a great deal less precise.  But we'd have to research
the existence and effects of nonstandard clockids on every platform.
So AFAICS the only clear advantage to switching is the extra precision
available from clock_gettime.

But ... most of the places you've touched in this patch have neither any
need for sub-microsecond precision nor any great need to worry about
shaving a few ns off the time taken by the call.  As far as I can find,
the only place where it's actually worth our trouble to deal with it is
instr_time.h (ie, EXPLAIN ANALYZE and a few other uses).

So I think we should do something more like the attached.


One issue I did not resolve in this WIP patch is what to do with this
gem of abstraction violation in pgbench:

        /* no, print raw transactions */
#ifndef WIN32

        /* This is more than we really ought to know about instr_time */
        if (skipped)
            fprintf(logfile, "%d " INT64_FORMAT " skipped %d %ld %ld",
                    st->id, st->cnt, st->use_file,
                    (long) now->tv_sec, (long) now->tv_usec);
        else
            fprintf(logfile, "%d " INT64_FORMAT " %.0f %d %ld %ld",
                    st->id, st->cnt, latency, st->use_file,
                    (long) now->tv_sec, (long) now->tv_usec);
#else

        /* On Windows, instr_time doesn't provide a timestamp anyway */
        if (skipped)
            fprintf(logfile, "%d " INT64_FORMAT " skipped %d 0 0",
                    st->id, st->cnt, st->use_file);
        else
            fprintf(logfile, "%d " INT64_FORMAT " %.0f %d 0 0",
                    st->id, st->cnt, latency, st->use_file);
#endif

We could either rip out the non-Windows code path entirely, or do
something about providing an honest elapsed-time measurement, perhaps
by doing INSTR_TIME_GET_DOUBLE() on the diff from run start to "now".
Given that we're calling fprintf, I doubt that the extra arithmetic
needed for that is a big problem.

            regards, tom lane

diff --git a/configure b/configure
index 0f143a0..e5dd6fb 100755
*** a/configure
--- b/configure
*************** if test "$ac_res" != no; then :
*** 9055,9060 ****
--- 9055,9116 ----

  fi

+ { $as_echo "$as_me:${as_lineno-$LINENO}: checking for library containing clock_gettime" >&5
+ $as_echo_n "checking for library containing clock_gettime... " >&6; }
+ if ${ac_cv_search_clock_gettime+:} false; then :
+   $as_echo_n "(cached) " >&6
+ else
+   ac_func_search_save_LIBS=$LIBS
+ cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+ /* end confdefs.h.  */
+
+ /* Override any GCC internal prototype to avoid an error.
+    Use char because int might match the return type of a GCC
+    builtin and then its argument prototype would still apply.  */
+ #ifdef __cplusplus
+ extern "C"
+ #endif
+ char clock_gettime ();
+ int
+ main ()
+ {
+ return clock_gettime ();
+   ;
+   return 0;
+ }
+ _ACEOF
+ for ac_lib in '' rt posix4; do
+   if test -z "$ac_lib"; then
+     ac_res="none required"
+   else
+     ac_res=-l$ac_lib
+     LIBS="-l$ac_lib  $ac_func_search_save_LIBS"
+   fi
+   if ac_fn_c_try_link "$LINENO"; then :
+   ac_cv_search_clock_gettime=$ac_res
+ fi
+ rm -f core conftest.err conftest.$ac_objext \
+     conftest$ac_exeext
+   if ${ac_cv_search_clock_gettime+:} false; then :
+   break
+ fi
+ done
+ if ${ac_cv_search_clock_gettime+:} false; then :
+
+ else
+   ac_cv_search_clock_gettime=no
+ fi
+ rm conftest.$ac_ext
+ LIBS=$ac_func_search_save_LIBS
+ fi
+ { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_search_clock_gettime" >&5
+ $as_echo "$ac_cv_search_clock_gettime" >&6; }
+ ac_res=$ac_cv_search_clock_gettime
+ if test "$ac_res" != no; then :
+   test "$ac_res" = "none required" || LIBS="$ac_res $LIBS"
+
+ fi
+
  # Solaris:
  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for library containing fdatasync" >&5
  $as_echo_n "checking for library containing fdatasync... " >&6; }
*************** fi
*** 12520,12526 ****
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`

! for ac_func in cbrt dlopen fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll pstat
pthread_is_threaded_npreadlink setproctitle setsid shm_open symlink sync_file_range towlower utime utimes wcstombs
wcstombs_l
  do :
    as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
  ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
--- 12576,12582 ----
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`

! for ac_func in cbrt clock_gettime dlopen fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll pstat
pthread_is_threaded_npreadlink setproctitle setsid shm_open symlink sync_file_range towlower utime utimes wcstombs
wcstombs_l
  do :
    as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
  ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
diff --git a/configure.in b/configure.in
index b9831bc..77aa3b4 100644
*** a/configure.in
--- b/configure.in
*************** AC_SEARCH_LIBS(getopt_long, [getopt gnug
*** 1016,1021 ****
--- 1016,1022 ----
  AC_SEARCH_LIBS(crypt, crypt)
  AC_SEARCH_LIBS(shm_open, rt)
  AC_SEARCH_LIBS(shm_unlink, rt)
+ AC_SEARCH_LIBS(clock_gettime, [rt posix4])
  # Solaris:
  AC_SEARCH_LIBS(fdatasync, [rt posix4])
  # Required for thread_test.c on Solaris
*************** PGAC_FUNC_WCSTOMBS_L
*** 1415,1421 ****
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`

! AC_CHECK_FUNCS([cbrt dlopen fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll pstat
pthread_is_threaded_npreadlink setproctitle setsid shm_open symlink sync_file_range towlower utime utimes wcstombs
wcstombs_l])

  AC_REPLACE_FUNCS(fseeko)
  case $host_os in
--- 1416,1422 ----
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`

! AC_CHECK_FUNCS([cbrt clock_gettime dlopen fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll pstat
pthread_is_threaded_npreadlink setproctitle setsid shm_open symlink sync_file_range towlower utime utimes wcstombs
wcstombs_l])

  AC_REPLACE_FUNCS(fseeko)
  case $host_os in
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index 42a3fc8..b9dfdd4 100644
*** a/src/include/pg_config.h.in
--- b/src/include/pg_config.h.in
***************
*** 105,110 ****
--- 105,113 ----
  /* Define to 1 if you have the `class' function. */
  #undef HAVE_CLASS

+ /* Define to 1 if you have the `clock_gettime' function. */
+ #undef HAVE_CLOCK_GETTIME
+
  /* Define to 1 if you have the <crtdefs.h> header file. */
  #undef HAVE_CRTDEFS_H

diff --git a/src/include/pg_config.h.win32 b/src/include/pg_config.h.win32
index ceb8b79..199668c 100644
*** a/src/include/pg_config.h.win32
--- b/src/include/pg_config.h.win32
***************
*** 75,80 ****
--- 75,83 ----
  /* Define to 1 if you have the `class' function. */
  /* #undef HAVE_CLASS */

+ /* Define to 1 if you have the `clock_gettime' function. */
+ /* #undef HAVE_CLOCK_GETTIME */
+
  /* Define to 1 if you have the `crypt' function. */
  /* #undef HAVE_CRYPT */

diff --git a/src/include/portability/instr_time.h b/src/include/portability/instr_time.h
index 16caf6e..b10d6f0 100644
*** a/src/include/portability/instr_time.h
--- b/src/include/portability/instr_time.h
***************
*** 4,13 ****
   *      portable high-precision interval timing
   *
   * This file provides an abstraction layer to hide portability issues in
!  * interval timing.  On Unix we use gettimeofday(), but on Windows that
!  * gives a low-precision result so we must use QueryPerformanceCounter()
!  * instead.  These macros also give some breathing room to use other
!  * high-precision-timing APIs on yet other platforms.
   *
   * The basic data type is instr_time, which all callers should treat as an
   * opaque typedef.  instr_time can store either an absolute time (of
--- 4,13 ----
   *      portable high-precision interval timing
   *
   * This file provides an abstraction layer to hide portability issues in
!  * interval timing.  On Unix we use clock_gettime() if available, else
!  * gettimeofday().  On Windows, gettimeofday() gives a low-precision result
!  * so we must use QueryPerformanceCounter() instead.  These macros also give
!  * some breathing room to use other high-precision-timing APIs.
   *
   * The basic data type is instr_time, which all callers should treat as an
   * opaque typedef.  instr_time can store either an absolute time (of
***************
*** 54,59 ****
--- 54,147 ----

  #ifndef WIN32

+ #ifdef HAVE_CLOCK_GETTIME
+
+ /* Use clock_gettime() */
+
+ #include <time.h>
+
+ /*
+  * The best clockid to use according to the POSIX spec is CLOCK_MONOTONIC,
+  * since that will give reliable interval timing even in the face of changes
+  * to the system clock.  However, POSIX doesn't require implementations to
+  * provide anything except CLOCK_REALTIME, so fall back to that if we don't
+  * find CLOCK_MONOTONIC.
+  *
+  * Also, some implementations have nonstandard clockids with better properties
+  * than CLOCK_MONOTONIC.  In particular, as of macOS 10.12, Apple provides
+  * CLOCK_MONOTONIC_RAW which is both faster to read and higher resolution than
+  * their version of CLOCK_MONOTONIC.
+  */
+ #if defined(__darwin__) && defined(CLOCK_MONOTONIC_RAW)
+ #define PG_INSTR_CLOCK    CLOCK_MONOTONIC_RAW
+ #elif defined(CLOCK_MONOTONIC)
+ #define PG_INSTR_CLOCK    CLOCK_MONOTONIC
+ #else
+ #define PG_INSTR_CLOCK    CLOCK_REALTIME
+ #endif
+
+ typedef struct timespec instr_time;
+
+ #define INSTR_TIME_IS_ZERO(t)    ((t).tv_nsec == 0 && (t).tv_sec == 0)
+
+ #define INSTR_TIME_SET_ZERO(t)    ((t).tv_sec = 0, (t).tv_nsec = 0)
+
+ #define INSTR_TIME_SET_CURRENT(t)    ((void) clock_gettime(PG_INSTR_CLOCK, &(t)))
+
+ #define INSTR_TIME_ADD(x,y) \
+     do { \
+         (x).tv_sec += (y).tv_sec; \
+         (x).tv_nsec += (y).tv_nsec; \
+         /* Normalize */ \
+         while ((x).tv_nsec >= 1000000000) \
+         { \
+             (x).tv_nsec -= 1000000000; \
+             (x).tv_sec++; \
+         } \
+     } while (0)
+
+ #define INSTR_TIME_SUBTRACT(x,y) \
+     do { \
+         (x).tv_sec -= (y).tv_sec; \
+         (x).tv_nsec -= (y).tv_nsec; \
+         /* Normalize */ \
+         while ((x).tv_nsec < 0) \
+         { \
+             (x).tv_nsec += 1000000000; \
+             (x).tv_sec--; \
+         } \
+     } while (0)
+
+ #define INSTR_TIME_ACCUM_DIFF(x,y,z) \
+     do { \
+         (x).tv_sec += (y).tv_sec - (z).tv_sec; \
+         (x).tv_nsec += (y).tv_nsec - (z).tv_nsec; \
+         /* Normalize after each add to avoid overflow/underflow of tv_nsec */ \
+         while ((x).tv_nsec < 0) \
+         { \
+             (x).tv_nsec += 1000000000; \
+             (x).tv_sec--; \
+         } \
+         while ((x).tv_nsec >= 1000000000) \
+         { \
+             (x).tv_nsec -= 1000000000; \
+             (x).tv_sec++; \
+         } \
+     } while (0)
+
+ #define INSTR_TIME_GET_DOUBLE(t) \
+     (((double) (t).tv_sec) + ((double) (t).tv_nsec) / 1000000000.0)
+
+ #define INSTR_TIME_GET_MILLISEC(t) \
+     (((double) (t).tv_sec * 1000.0) + ((double) (t).tv_nsec) / 1000000.0)
+
+ #define INSTR_TIME_GET_MICROSEC(t) \
+     (((uint64) (t).tv_sec * (uint64) 1000000) + (uint64) ((t).tv_nsec / 1000))
+
+ #else                            /* !HAVE_CLOCK_GETTIME */
+
+ /* Use gettimeofday() */
+
  #include <sys/time.h>

  typedef struct timeval instr_time;
*************** typedef struct timeval instr_time;
*** 113,120 ****
--- 201,213 ----

  #define INSTR_TIME_GET_MICROSEC(t) \
      (((uint64) (t).tv_sec * (uint64) 1000000) + (uint64) (t).tv_usec)
+
+ #endif   /* HAVE_CLOCK_GETTIME */
+
  #else                            /* WIN32 */

+ /* Use QueryPerformanceCounter() */
+
  typedef LARGE_INTEGER instr_time;

  #define INSTR_TIME_IS_ZERO(t)    ((t).QuadPart == 0)
*************** GetTimerFrequency(void)
*** 149,154 ****
--- 242,248 ----
      QueryPerformanceFrequency(&f);
      return (double) f.QuadPart;
  }
+
  #endif   /* WIN32 */

  #endif   /* INSTR_TIME_H */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gettimeofday is at the end of its usefulness?

From
Haribabu Kommi
Date:


On Fri, Dec 30, 2016 at 1:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Haribabu Kommi <kommi.haribabu@gmail.com> writes:
> Attached a patch that replaces most of the getimeofday function calls,
> except timeofday(user callable) and GetCurrentTimestamp functions.

I looked at this for awhile and could not convince myself that it's
a good idea.  Trying to do s/gettimeofday/clock_gettime/g is not going
to do much for us except create portability headaches.  According
to my tests, clock_gettime is not noticeably faster than gettimeofday
on any platform, except that if you use nonstandard clockids like
CLOCK_REALTIME_COARSE then on *some* platforms it's a little bit quicker,
at the cost of being a great deal less precise.  But we'd have to research
the existence and effects of nonstandard clockids on every platform.
So AFAICS the only clear advantage to switching is the extra precision
available from clock_gettime.

But ... most of the places you've touched in this patch have neither any
need for sub-microsecond precision nor any great need to worry about
shaving a few ns off the time taken by the call.  As far as I can find,
the only place where it's actually worth our trouble to deal with it is
instr_time.h (ie, EXPLAIN ANALYZE and a few other uses).

So I think we should do something more like the attached.

Thanks for your valuable input.

As the getimeofday() function is obsolete and any further enhancements
may happen to clock_gettime() function only, because of this reason, I changed
it many places.

Yes, I agree that until unless the clock_gettime() function that performs faster
in all platforms compared to gettimeofday(), we can retain the getimeofday()
function.

Regards,
Hari Babu
Fujitsu Australia

Re: [HACKERS] gettimeofday is at the end of its usefulness?

From
Florian Weimer
Date:
* Tom Lane:

> Florian Weimer <fw@deneb.enyo.de> writes:
>> * Tom Lane:
>>> On Linux (RHEL6, 2.4GHz x86_64), I find that gettimeofday(),
>>> clock_gettime(CLOCK_MONOTONIC), and clock_gettime(CLOCK_REALTIME)
>>> all take about 40ns.  Of course gettimeofday() only has 1us resolution,
>>> but the other two have perhaps 10ns resolution (I get no duplicate
>>> readings in a tight loop).
>
>> Isn't this very specific to kernel and glibc versions, depending on
>> things like CONFIG_HZ settings and what level of vDSO support has been
>> backported?
>
> No doubt, but I have yet to find a platform where clock_gettime() exists
> but performs worse than gettimeofday().  Do you know of one?

ppc64le with all the vDSO fixes for clock_gettime?

glibc has some test cases which fail because clock_gettime gives
inconsistent results.  This has been fixed in current kernels, but I
don't know if everyone uses them.



Re: [HACKERS] gettimeofday is at the end of its usefulness?

From
Tom Lane
Date:
Florian Weimer <fw@deneb.enyo.de> writes:
> glibc has some test cases which fail because clock_gettime gives
> inconsistent results.  This has been fixed in current kernels, but I
> don't know if everyone uses them.

hmm, for which clock IDs?
        regards, tom lane