Thread: Use CLOCK_MONOTONIC_COARSE for instr_time when available

Use CLOCK_MONOTONIC_COARSE for instr_time when available

From
杨江华
Date:

Dear PostgreSQL Hackers,

This patch modifies the instr_time.h header to prefer CLOCK_MONOTONIC_COARSE when available. By using CLOCK_MONOTONIC_COARSE, we can leverage a lower resolution(4ms) but faster alternative for timing operations, which reduces the overhead of frequent timestamp retrievals. This change is expected to provide performance improvements, especially in scenarios with frequent timing operations.


Key Changes:

CLOCK_MONOTONIC_COARSE is used when available, offering faster performance with slightly reduced precision.

For macOS, CLOCK_MONOTONIC_RAW remains the preferred choice due to its higher resolution.

CLOCK_MONOTONIC is used as a fallback when neither of the above options is available.


Performance Improvements:


In testing with a workload that performs a COUNT(*) operation on a table containing 100 million rows, we observed a noticeable performance improvement after applying this patch.


SQL to Reproduce:

-- Create table and insert 10 million rows
CREATE TABLE t1(a int);
INSERT INTO t1
SELECT * FROM generate_series(1, 10000000);

-- Close parallel
SET max_parallel_workers_per_gather = 0;
SET max_parallel_workers = 0;

-- Run the query and check execution time
EXPLAIN ANALYZE SELECT COUNT(*) FROM t1;
SELECT COUNT(*) FROM t1;

Before the Patch:

EXPLAIN ANALYZE Execution Time: 4914 ms

Perf Results:

33.97% of time spent in [vdso] __vdso_clock_gettime

5.28% in heapgettup_pagemode

4.44% in InstrStopNode


After the Patch:

EXPLAIN ANALYZE Execution Time: 3114 ms (down from 4914 ms)

Perf Results:

12.45% of time spent in ExecInterpExpr

9.18% in [vdso] __vdso_clock_gettime

6.92% in ExecScan

Reduced usage of clock_gettime, leading to more efficient execution.


The execution time of EXPLAIN ANALYZE SELECT COUNT(*) FROM t1; after the patch is much closer to the actual time of SELECT COUNT(*) FROM t1;, which means the overhead of timing operations has been significantly reduced.


This change provides around a 20-30% reduction in execution time for the tested query.


Patch Details:

From 91d61b8c9a60f0e19b73e03c1a0e46d2dc64573d Mon Sep 17 00:00:00 2001
From: Jianghua Yang <yjhjstz@gmail.com>
Date: Thu, 27 Mar 2025 01:58:58 +0800
Subject: [PATCH] Use CLOCK_MONOTONIC_COARSE for instr_time when available

This patch modifies `instr_time.h` to prefer `CLOCK_MONOTONIC_COARSE`
when available. `CLOCK_MONOTONIC_COARSE` provides a lower resolution
but faster alternative for timing operations, which can reduce the
overhead of frequent timestamp retrievals.

On macOS, `CLOCK_MONOTONIC_RAW` remains the preferred choice when
available, as it provides high-resolution timestamps. Otherwise,
`CLOCK_MONOTONIC` is used as a fallback.

Author: Jianghua Yang
--- src/include/portability/instr_time.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)

I believe this change will result in better performance for many PostgreSQL users, especially those with high-frequency timing operations. I look forward to your feedback on this patch.


Best regards,

Jianghua Yang


Attachment

Re: Use CLOCK_MONOTONIC_COARSE for instr_time when available

From
杨江华
Date:

This reflects the correct insertion of 100 million rows instead of 10 million.

-- Create table and insert 100 million rows

CREATE TABLE t1(a int);

INSERT INTO t1 SELECT * FROM generate_series(1, 100000000);

-- close parallel

SET max_parallel_workers_per_gather = 0;

SET max_parallel_workers = 0;

-- Run the query and check execution time

EXPLAIN ANALYZE SELECT COUNT(*) FROM t1;


杨江华 <yjhjstz@gmail.com> 于2025年3月26日周三 11:14写道:

Dear PostgreSQL Hackers,

This patch modifies the instr_time.h header to prefer CLOCK_MONOTONIC_COARSE when available. By using CLOCK_MONOTONIC_COARSE, we can leverage a lower resolution(4ms) but faster alternative for timing operations, which reduces the overhead of frequent timestamp retrievals. This change is expected to provide performance improvements, especially in scenarios with frequent timing operations.


Key Changes:

CLOCK_MONOTONIC_COARSE is used when available, offering faster performance with slightly reduced precision.

For macOS, CLOCK_MONOTONIC_RAW remains the preferred choice due to its higher resolution.

CLOCK_MONOTONIC is used as a fallback when neither of the above options is available.


Performance Improvements:


In testing with a workload that performs a COUNT(*) operation on a table containing 100 million rows, we observed a noticeable performance improvement after applying this patch.


SQL to Reproduce:

-- Create table and insert 10 million rows
CREATE TABLE t1(a int);
INSERT INTO t1
SELECT * FROM generate_series(1, 10000000);

-- Close parallel
SET max_parallel_workers_per_gather = 0;
SET max_parallel_workers = 0;

-- Run the query and check execution time
EXPLAIN ANALYZE SELECT COUNT(*) FROM t1;
SELECT COUNT(*) FROM t1;

Before the Patch:

EXPLAIN ANALYZE Execution Time: 4914 ms

Perf Results:

33.97% of time spent in [vdso] __vdso_clock_gettime

5.28% in heapgettup_pagemode

4.44% in InstrStopNode


After the Patch:

EXPLAIN ANALYZE Execution Time: 3114 ms (down from 4914 ms)

Perf Results:

12.45% of time spent in ExecInterpExpr

9.18% in [vdso] __vdso_clock_gettime

6.92% in ExecScan

Reduced usage of clock_gettime, leading to more efficient execution.


The execution time of EXPLAIN ANALYZE SELECT COUNT(*) FROM t1; after the patch is much closer to the actual time of SELECT COUNT(*) FROM t1;, which means the overhead of timing operations has been significantly reduced.


This change provides around a 20-30% reduction in execution time for the tested query.


Patch Details:

From 91d61b8c9a60f0e19b73e03c1a0e46d2dc64573d Mon Sep 17 00:00:00 2001
From: Jianghua Yang <yjhjstz@gmail.com>
Date: Thu, 27 Mar 2025 01:58:58 +0800
Subject: [PATCH] Use CLOCK_MONOTONIC_COARSE for instr_time when available

This patch modifies `instr_time.h` to prefer `CLOCK_MONOTONIC_COARSE`
when available. `CLOCK_MONOTONIC_COARSE` provides a lower resolution
but faster alternative for timing operations, which can reduce the
overhead of frequent timestamp retrievals.

On macOS, `CLOCK_MONOTONIC_RAW` remains the preferred choice when
available, as it provides high-resolution timestamps. Otherwise,
`CLOCK_MONOTONIC` is used as a fallback.

Author: Jianghua Yang
--- src/include/portability/instr_time.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)

I believe this change will result in better performance for many PostgreSQL users, especially those with high-frequency timing operations. I look forward to your feedback on this patch.


Best regards,

Jianghua Yang


Re: Use CLOCK_MONOTONIC_COARSE for instr_time when available

From
Michael Paquier
Date:
On Wed, Mar 26, 2025 at 11:14:47AM -0700, 杨江华 wrote:
> This patch modifies the instr_time.h header to prefer CLOCK_MONOTONIC_COARSE
> when available. By using CLOCK_MONOTONIC_COARSE, we can leverage a lower
> resolution(4ms) but faster alternative for timing operations, which reduces
> the overhead of frequent timestamp retrievals. This change is expected to
> provide performance improvements, especially in scenarios with frequent
> timing operations.
>
> *Key Changes:*
>
> • *CLOCK_MONOTONIC_COARSE* is used when available, offering faster
> performance with slightly reduced precision.
>
> • For macOS, *CLOCK_MONOTONIC_RAW* remains the preferred choice due to its
> higher resolution.
>
> • *CLOCK_MONOTONIC* is used as a fallback when neither of the above options
> is available.

-#if defined(__darwin__) && defined(CLOCK_MONOTONIC_RAW)
+#ifdef CLOCK_MONOTONIC_COARSE
+#define PG_INSTR_CLOCK CLOCK_MONOTONIC_COARSE
+#elif defined(__darwin__) && defined(CLOCK_MONOTONIC_RAW)

Why would we want to make this the default?  CLOCK_MONOTONIC_COARSE
could show benefits in some code paths.  Now, it can also have a
precision of a few milliseconds, and we have a bunch of code paths
that rely on clock_gettime() to be more precise than that so it could
lead to random decisions.  You could make that configurable with a
GUC, but it would mean plastering some decision-making in instr_time.h
based on such a GUC, which would likely be annoying performance-wise.

We are at the end of the v18 development cycle, so it is going to get
some time before you get any review.  Good to see that you are
tracking this patch in the commit fest:
https://commitfest.postgresql.org/patch/5669/
--
Michael

Attachment

Re: Use CLOCK_MONOTONIC_COARSE for instr_time when available

From
Jianghua Yang
Date:
 It Makes sense, but we can distinguish such code which needs `CLOCK_MONOTONIC`.

Now I add the configure option `--with-clock-monotonic-coarse`.

Michael Paquier <michael@paquier.xyz> 于2025年3月26日周三 15:34写道:
On Wed, Mar 26, 2025 at 11:14:47AM -0700, 杨江华 wrote:
> This patch modifies the instr_time.h header to prefer CLOCK_MONOTONIC_COARSE
> when available. By using CLOCK_MONOTONIC_COARSE, we can leverage a lower
> resolution(4ms) but faster alternative for timing operations, which reduces
> the overhead of frequent timestamp retrievals. This change is expected to
> provide performance improvements, especially in scenarios with frequent
> timing operations.
>
> *Key Changes:*
>
> • *CLOCK_MONOTONIC_COARSE* is used when available, offering faster
> performance with slightly reduced precision.
>
> • For macOS, *CLOCK_MONOTONIC_RAW* remains the preferred choice due to its
> higher resolution.
>
> • *CLOCK_MONOTONIC* is used as a fallback when neither of the above options
> is available.

-#if defined(__darwin__) && defined(CLOCK_MONOTONIC_RAW)
+#ifdef CLOCK_MONOTONIC_COARSE
+#define PG_INSTR_CLOCK CLOCK_MONOTONIC_COARSE
+#elif defined(__darwin__) && defined(CLOCK_MONOTONIC_RAW)

Why would we want to make this the default?  CLOCK_MONOTONIC_COARSE
could show benefits in some code paths.  Now, it can also have a
precision of a few milliseconds, and we have a bunch of code paths
that rely on clock_gettime() to be more precise than that so it could
lead to random decisions.  You could make that configurable with a
GUC, but it would mean plastering some decision-making in instr_time.h
based on such a GUC, which would likely be annoying performance-wise.

We are at the end of the v18 development cycle, so it is going to get
some time before you get any review.  Good to see that you are
tracking this patch in the commit fest:
https://commitfest.postgresql.org/patch/5669/
--
Michael
Attachment

Re: Use CLOCK_MONOTONIC_COARSE for instr_time when available

From
Tom Lane
Date:
=?UTF-8?B?5p2o5rGf5Y2O?= <yjhjstz@gmail.com> writes:
> This patch modifies the instr_time.h header to prefer CLOCK_MONOTONIC_COARSE
> when available.

As far as I know, our usage of instr_time really needs the highest
resolution available, because we are usually trying to measure pretty
short intervals.  You say that this patch reduces execution time,
and I imagine that's true ... but I wonder if it doesn't do so at
the cost of totally destroying the reliability of the output numbers.

            regards, tom lane



Re: Use CLOCK_MONOTONIC_COARSE for instr_time when available

From
wenhui qiu
Date:
HI 

> As far as I know, our usage of instr_time really needs the highest
> resolution available, because we are usually trying to measure pretty
> short intervals.  You say that this patch reduces execution time,
> and I imagine that's true ... but I wonder if it doesn't do so at
> the cost of totally destroying the reliability of the output numbers. 
i strongly agree ,It seems like focusing on the small stuff while missing the big pictur

Re: Use CLOCK_MONOTONIC_COARSE for instr_time when available

From
Andres Freund
Date:
Hi,

On 2025-03-26 23:09:42 -0400, Tom Lane wrote:
> =?UTF-8?B?5p2o5rGf5Y2O?= <yjhjstz@gmail.com> writes:
> > This patch modifies the instr_time.h header to prefer CLOCK_MONOTONIC_COARSE
> > when available.
> 
> As far as I know, our usage of instr_time really needs the highest
> resolution available, because we are usually trying to measure pretty
> short intervals.  You say that this patch reduces execution time,
> and I imagine that's true ... but I wonder if it doesn't do so at
> the cost of totally destroying the reliability of the output numbers.

The reason, on x86, the timestamp querying has a somewhat high overhead is
that the "accurate" "read the tsc" instruction serves as a barrier for
out-of-order execution. With modern highly out-of-order execution that means
we'll wait for all scheduled instructions to finish before determining the
current time, multiple times for each tuple.  That of course slows things down
substantially.

There's a patch to use the version of rdtsc that does *not* have barrier
semantics:
https://postgr.es/m/CAP53PkzO2KpscD-tgFW_V-4WS%2BvkniH4-B00eM-e0bsBF-xUxg%40mail.gmail.com

Greetings,

Andres Freund



Re: Use CLOCK_MONOTONIC_COARSE for instr_time when available

From
Jianghua Yang
Date:

I agree, so this patch only affects explain analyze.

1. This change to CLOCK_MONOTONIC_COARSE only affects EXPLAIN ANALYZE and does not impact other modules.

The patch introduces optional support for CLOCK_MONOTONIC_COARSE specifically within the INSTR_TIMEinstrumentation framework. The modifications are guarded by the compile-time macro USE_CLOCK_MONOTONIC_COARSE, and are only used when gathering timing data for performance instrumentation. Given that INSTR_TIME is mainly used in EXPLAIN ANALYZE, and there are no changes to runtime or planner logic, this patch ensures that only diagnostic outputs are affected—leaving core execution paths and other modules untouched.


2. With this modification, EXPLAIN ANALYZE produces timing results that are closer to real-world wall-clock time, making performance debugging more accurate.


By using CLOCK_MONOTONIC_COARSE, which has lower overhead compared to CLOCK_MONOTONIC, the patch improves the efficiency of timing collection in EXPLAIN ANALYZE. While it may slightly reduce precision, the resulting measurements more closely reflect actual elapsed time observed by users, especially in performance-sensitive environments. This makes EXPLAIN ANALYZE outputs more reliable and helpful for developers diagnosing query performance bottlenecks.

--- origin version

explain analyze select count(*) from t1;
                                        Thu 27 Mar 2025 01:31:20 AM CST (every 1s)

                                                        QUERY PLAN                                                        
---------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1852876.63..1852876.64 rows=1 width=8) (actual time=4914.037..4914.038 rows=1 loops=1)
   ->  Seq Scan on t1  (cost=0.00..1570796.90 rows=112831890 width=0) (actual time=0.039..3072.303 rows=100000000 loops=1)
 Planning Time: 0.132 ms
 Execution Time: 4914.072 ms
(4 rows)

Time: 4914.676 ms (00:04.915)


--- apply patch

postgres=# explain analyze select count(*) from t1;
                                                        QUERY PLAN                                                        
---------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1692478.40..1692478.41 rows=1 width=8) (actual time=3116.164..3116.164 rows=1 loops=1)
   ->  Seq Scan on t1  (cost=0.00..1442478.32 rows=100000032 width=0) (actual time=0.000..2416.127 rows=100000000 loops=1)
 Planning Time: 0.000 ms
 Execution Time: 3116.164 ms
(4 rows)

Time: 3114.059 ms (00:03.114)
postgres=# select count(*) from t1;
   count  
-----------
 100000000
(1 row)

Time: 2086.130 ms (00:02.086)


Andres Freund <andres@anarazel.de> 于2025年3月27日周四 07:19写道:
Hi,

On 2025-03-26 23:09:42 -0400, Tom Lane wrote:
> 杨江华 <yjhjstz@gmail.com> writes:
> > This patch modifies the instr_time.h header to prefer CLOCK_MONOTONIC_COARSE
> > when available.
>
> As far as I know, our usage of instr_time really needs the highest
> resolution available, because we are usually trying to measure pretty
> short intervals.  You say that this patch reduces execution time,
> and I imagine that's true ... but I wonder if it doesn't do so at
> the cost of totally destroying the reliability of the output numbers.

The reason, on x86, the timestamp querying has a somewhat high overhead is
that the "accurate" "read the tsc" instruction serves as a barrier for
out-of-order execution. With modern highly out-of-order execution that means
we'll wait for all scheduled instructions to finish before determining the
current time, multiple times for each tuple.  That of course slows things down
substantially.

There's a patch to use the version of rdtsc that does *not* have barrier
semantics:
https://postgr.es/m/CAP53PkzO2KpscD-tgFW_V-4WS%2BvkniH4-B00eM-e0bsBF-xUxg%40mail.gmail.com

Greetings,

Andres Freund
Attachment