Home > mailing lists

Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc? - Mailing list pgsql-hackers

From	Zsolt Parragi
Subject	Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?
Date	April 7 10:32:43
Msg-id	CAN4CZFPDWoXTQHSd8xhv_Q9UmWX2QunMX-cKD_UTenzbcY4PeQ@mail.gmail.com Whole thread Raw
In response to	Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc? (Lukas Fittl <lukas@fittl.com>)
Responses	Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?
List	pgsql-hackers

Tree view

> Its intentionally uint64, per this comment above it:
>
> * Note we utilize unsigned integers even though ticks are stored as a signed
> * value to encourage compilers to generate better assembly, since we can be
> * sure these values are not negative.
>
> In my earlier Compiler Explorer tests that did actually make a
> difference for the generated assembly.

Isn't that comment more about ticks_per_ns_scaled?

For max_ticks_no_overflow the only use is with a cast to int64, so I
didn't expect much assembly difference. Now I actually checked
locally/godbolt, and I don't see any actual differences. Making
max_ticks_no_overflow int64 and removing that cast generates exactly
the same code.

For ticks_per_ns_scaled, gcc 9-10 actually generates +1 mov
instruction with int64, but that's not present in more recent
versions.

Recent compiler versions only have an idiv/div and shr/sar difference.
Idiv is slower than div on intel, so that is a point for keeping
ticks_per_ns_scaled unsigned.

For arm I see the same lsr/asr and udiv/sdiv difference.

https://godbolt.org/z/4r5GTbrs3

(the main gcc vs clang difference seems to be clang's 32 bit division
optimization)

pgsql-hackers by date:

From: Masahiko Sawada
Date: 07 April, 10:28:56
Subject: Re: test_autovacuum/001_parallel_autovacuum is broken

From: jian he
Date: 07 April, 10:39:50
Subject: Re: using index to speedup add not null constraints to a table

Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc? - Mailing list pgsql-hackers

Previous

Next