Thread: Use CLOCK_MONOTONIC_COARSE for instr_time when available
Dear PostgreSQL Hackers,
This patch modifies the instr_time.h header to prefer CLOCK_MONOTONIC_COARSE when available. By using CLOCK_MONOTONIC_COARSE, we can leverage a lower resolution(4ms) but faster alternative for timing operations, which reduces the overhead of frequent timestamp retrievals. This change is expected to provide performance improvements, especially in scenarios with frequent timing operations.
Key Changes:
• CLOCK_MONOTONIC_COARSE is used when available, offering faster performance with slightly reduced precision.
• For macOS, CLOCK_MONOTONIC_RAW remains the preferred choice due to its higher resolution.
• CLOCK_MONOTONIC is used as a fallback when neither of the above options is available.
Performance Improvements:
In testing with a workload that performs a COUNT(*) operation on a table containing 100 million rows, we observed a noticeable performance improvement after applying this patch.
SQL to Reproduce:
-- Create table and insert 10 million rows
CREATE TABLE t1(a int);
INSERT INTO t1
SELECT * FROM generate_series(1, 10000000);
-- Close parallel
SET max_parallel_workers_per_gather = 0;
SET max_parallel_workers = 0;
-- Run the query and check execution time
EXPLAIN ANALYZE SELECT COUNT(*) FROM t1;
SELECT COUNT(*) FROM t1;
Before the Patch:
• EXPLAIN ANALYZE Execution Time: 4914 ms
• Perf Results:
• 33.97% of time spent in [vdso] __vdso_clock_gettime
• 5.28% in heapgettup_pagemode
• 4.44% in InstrStopNode
After the Patch:
• EXPLAIN ANALYZE Execution Time: 3114 ms (down from 4914 ms)
• Perf Results:
• 12.45% of time spent in ExecInterpExpr
• 9.18% in [vdso] __vdso_clock_gettime
• 6.92% in ExecScan
• Reduced usage of clock_gettime, leading to more efficient execution.
The execution time of EXPLAIN ANALYZE SELECT COUNT(*) FROM t1; after the patch is much closer to the actual time of SELECT COUNT(*) FROM t1;, which means the overhead of timing operations has been significantly reduced.
This change provides around a 20-30% reduction in execution time for the tested query.
Patch Details:
From 91d61b8c9a60f0e19b73e03c1a0e46d2dc64573d Mon Sep 17 00:00:00 2001
From: Jianghua Yang <yjhjstz@gmail.com>
Date: Thu, 27 Mar 2025 01:58:58 +0800
Subject: [PATCH] Use CLOCK_MONOTONIC_COARSE for instr_time when available
This patch modifies `instr_time.h` to prefer `CLOCK_MONOTONIC_COARSE`
when available. `CLOCK_MONOTONIC_COARSE` provides a lower resolution
but faster alternative for timing operations, which can reduce the
overhead of frequent timestamp retrievals.
On macOS, `CLOCK_MONOTONIC_RAW` remains the preferred choice when
available, as it provides high-resolution timestamps. Otherwise,
`CLOCK_MONOTONIC` is used as a fallback.
Author: Jianghua Yang
--- src/include/portability/instr_time.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
I believe this change will result in better performance for many PostgreSQL users, especially those with high-frequency timing operations. I look forward to your feedback on this patch.
Best regards,
Jianghua Yang
Attachment
This reflects the correct insertion of 100 million rows instead of 10 million.
-- Create table and insert 100 million rows
CREATE TABLE t1(a int);
INSERT INTO t1 SELECT * FROM generate_series(1, 100000000);
-- close parallel
SET max_parallel_workers_per_gather = 0;
SET max_parallel_workers = 0;
-- Run the query and check execution time
EXPLAIN ANALYZE SELECT COUNT(*) FROM t1;
Dear PostgreSQL Hackers,
This patch modifies the instr_time.h header to prefer CLOCK_MONOTONIC_COARSE when available. By using CLOCK_MONOTONIC_COARSE, we can leverage a lower resolution(4ms) but faster alternative for timing operations, which reduces the overhead of frequent timestamp retrievals. This change is expected to provide performance improvements, especially in scenarios with frequent timing operations.
Key Changes:
• CLOCK_MONOTONIC_COARSE is used when available, offering faster performance with slightly reduced precision.
• For macOS, CLOCK_MONOTONIC_RAW remains the preferred choice due to its higher resolution.
• CLOCK_MONOTONIC is used as a fallback when neither of the above options is available.
Performance Improvements:
In testing with a workload that performs a COUNT(*) operation on a table containing 100 million rows, we observed a noticeable performance improvement after applying this patch.
SQL to Reproduce:
-- Create table and insert 10 million rows CREATE TABLE t1(a int); INSERT INTO t1 SELECT * FROM generate_series(1, 10000000); -- Close parallel SET max_parallel_workers_per_gather = 0; SET max_parallel_workers = 0; -- Run the query and check execution time EXPLAIN ANALYZE SELECT COUNT(*) FROM t1;
SELECT COUNT(*) FROM t1;
Before the Patch:
• EXPLAIN ANALYZE Execution Time: 4914 ms
• Perf Results:
• 33.97% of time spent in [vdso] __vdso_clock_gettime
• 5.28% in heapgettup_pagemode
• 4.44% in InstrStopNode
After the Patch:
• EXPLAIN ANALYZE Execution Time: 3114 ms (down from 4914 ms)
• Perf Results:
• 12.45% of time spent in ExecInterpExpr
• 9.18% in [vdso] __vdso_clock_gettime
• 6.92% in ExecScan
• Reduced usage of clock_gettime, leading to more efficient execution.
The execution time of EXPLAIN ANALYZE SELECT COUNT(*) FROM t1; after the patch is much closer to the actual time of SELECT COUNT(*) FROM t1;, which means the overhead of timing operations has been significantly reduced.
This change provides around a 20-30% reduction in execution time for the tested query.
Patch Details:
From 91d61b8c9a60f0e19b73e03c1a0e46d2dc64573d Mon Sep 17 00:00:00 2001 From: Jianghua Yang <yjhjstz@gmail.com> Date: Thu, 27 Mar 2025 01:58:58 +0800 Subject: [PATCH] Use CLOCK_MONOTONIC_COARSE for instr_time when available This patch modifies `instr_time.h` to prefer `CLOCK_MONOTONIC_COARSE` when available. `CLOCK_MONOTONIC_COARSE` provides a lower resolution but faster alternative for timing operations, which can reduce the overhead of frequent timestamp retrievals. On macOS, `CLOCK_MONOTONIC_RAW` remains the preferred choice when available, as it provides high-resolution timestamps. Otherwise, `CLOCK_MONOTONIC` is used as a fallback. Author: Jianghua Yang --- src/include/portability/instr_time.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
I believe this change will result in better performance for many PostgreSQL users, especially those with high-frequency timing operations. I look forward to your feedback on this patch.
Best regards,
Jianghua Yang
On Wed, Mar 26, 2025 at 11:14:47AM -0700, 杨江华 wrote: > This patch modifies the instr_time.h header to prefer CLOCK_MONOTONIC_COARSE > when available. By using CLOCK_MONOTONIC_COARSE, we can leverage a lower > resolution(4ms) but faster alternative for timing operations, which reduces > the overhead of frequent timestamp retrievals. This change is expected to > provide performance improvements, especially in scenarios with frequent > timing operations. > > *Key Changes:* > > • *CLOCK_MONOTONIC_COARSE* is used when available, offering faster > performance with slightly reduced precision. > > • For macOS, *CLOCK_MONOTONIC_RAW* remains the preferred choice due to its > higher resolution. > > • *CLOCK_MONOTONIC* is used as a fallback when neither of the above options > is available. -#if defined(__darwin__) && defined(CLOCK_MONOTONIC_RAW) +#ifdef CLOCK_MONOTONIC_COARSE +#define PG_INSTR_CLOCK CLOCK_MONOTONIC_COARSE +#elif defined(__darwin__) && defined(CLOCK_MONOTONIC_RAW) Why would we want to make this the default? CLOCK_MONOTONIC_COARSE could show benefits in some code paths. Now, it can also have a precision of a few milliseconds, and we have a bunch of code paths that rely on clock_gettime() to be more precise than that so it could lead to random decisions. You could make that configurable with a GUC, but it would mean plastering some decision-making in instr_time.h based on such a GUC, which would likely be annoying performance-wise. We are at the end of the v18 development cycle, so it is going to get some time before you get any review. Good to see that you are tracking this patch in the commit fest: https://commitfest.postgresql.org/patch/5669/ -- Michael
Attachment
On Wed, Mar 26, 2025 at 11:14:47AM -0700, 杨江华 wrote:
> This patch modifies the instr_time.h header to prefer CLOCK_MONOTONIC_COARSE
> when available. By using CLOCK_MONOTONIC_COARSE, we can leverage a lower
> resolution(4ms) but faster alternative for timing operations, which reduces
> the overhead of frequent timestamp retrievals. This change is expected to
> provide performance improvements, especially in scenarios with frequent
> timing operations.
>
> *Key Changes:*
>
> • *CLOCK_MONOTONIC_COARSE* is used when available, offering faster
> performance with slightly reduced precision.
>
> • For macOS, *CLOCK_MONOTONIC_RAW* remains the preferred choice due to its
> higher resolution.
>
> • *CLOCK_MONOTONIC* is used as a fallback when neither of the above options
> is available.
-#if defined(__darwin__) && defined(CLOCK_MONOTONIC_RAW)
+#ifdef CLOCK_MONOTONIC_COARSE
+#define PG_INSTR_CLOCK CLOCK_MONOTONIC_COARSE
+#elif defined(__darwin__) && defined(CLOCK_MONOTONIC_RAW)
Why would we want to make this the default? CLOCK_MONOTONIC_COARSE
could show benefits in some code paths. Now, it can also have a
precision of a few milliseconds, and we have a bunch of code paths
that rely on clock_gettime() to be more precise than that so it could
lead to random decisions. You could make that configurable with a
GUC, but it would mean plastering some decision-making in instr_time.h
based on such a GUC, which would likely be annoying performance-wise.
We are at the end of the v18 development cycle, so it is going to get
some time before you get any review. Good to see that you are
tracking this patch in the commit fest:
https://commitfest.postgresql.org/patch/5669/
--
Michael
Attachment
=?UTF-8?B?5p2o5rGf5Y2O?= <yjhjstz@gmail.com> writes: > This patch modifies the instr_time.h header to prefer CLOCK_MONOTONIC_COARSE > when available. As far as I know, our usage of instr_time really needs the highest resolution available, because we are usually trying to measure pretty short intervals. You say that this patch reduces execution time, and I imagine that's true ... but I wonder if it doesn't do so at the cost of totally destroying the reliability of the output numbers. regards, tom lane
> As far as I know, our usage of instr_time really needs the highestHI
> resolution available, because we are usually trying to measure pretty
> short intervals. You say that this patch reduces execution time,
> and I imagine that's true ... but I wonder if it doesn't do so at
Hi, On 2025-03-26 23:09:42 -0400, Tom Lane wrote: > =?UTF-8?B?5p2o5rGf5Y2O?= <yjhjstz@gmail.com> writes: > > This patch modifies the instr_time.h header to prefer CLOCK_MONOTONIC_COARSE > > when available. > > As far as I know, our usage of instr_time really needs the highest > resolution available, because we are usually trying to measure pretty > short intervals. You say that this patch reduces execution time, > and I imagine that's true ... but I wonder if it doesn't do so at > the cost of totally destroying the reliability of the output numbers. The reason, on x86, the timestamp querying has a somewhat high overhead is that the "accurate" "read the tsc" instruction serves as a barrier for out-of-order execution. With modern highly out-of-order execution that means we'll wait for all scheduled instructions to finish before determining the current time, multiple times for each tuple. That of course slows things down substantially. There's a patch to use the version of rdtsc that does *not* have barrier semantics: https://postgr.es/m/CAP53PkzO2KpscD-tgFW_V-4WS%2BvkniH4-B00eM-e0bsBF-xUxg%40mail.gmail.com Greetings, Andres Freund
I agree, so this patch only affects explain analyze.
1. This change to CLOCK_MONOTONIC_COARSE only affects EXPLAIN ANALYZE and does not impact other modules.
The patch introduces optional support for CLOCK_MONOTONIC_COARSE specifically within the INSTR_TIMEinstrumentation framework. The modifications are guarded by the compile-time macro USE_CLOCK_MONOTONIC_COARSE, and are only used when gathering timing data for performance instrumentation. Given that INSTR_TIME is mainly used in EXPLAIN ANALYZE, and there are no changes to runtime or planner logic, this patch ensures that only diagnostic outputs are affected—leaving core execution paths and other modules untouched.
2. With this modification, EXPLAIN ANALYZE produces timing results that are closer to real-world wall-clock time, making performance debugging more accurate.
By using CLOCK_MONOTONIC_COARSE, which has lower overhead compared to CLOCK_MONOTONIC, the patch improves the efficiency of timing collection in EXPLAIN ANALYZE. While it may slightly reduce precision, the resulting measurements more closely reflect actual elapsed time observed by users, especially in performance-sensitive environments. This makes EXPLAIN ANALYZE outputs more reliable and helpful for developers diagnosing query performance bottlenecks.
--- origin version
explain analyze select count(*) from t1;
Thu 27 Mar 2025 01:31:20 AM CST (every 1s)
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1852876.63..1852876.64 rows=1 width=8) (actual time=4914.037..4914.038 rows=1 loops=1)
-> Seq Scan on t1 (cost=0.00..1570796.90 rows=112831890 width=0) (actual time=0.039..3072.303 rows=100000000 loops=1)
Planning Time: 0.132 ms
Execution Time: 4914.072 ms
(4 rows)
Time: 4914.676 ms (00:04.915)
--- apply patch
postgres=# explain analyze select count(*) from t1;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1692478.40..1692478.41 rows=1 width=8) (actual time=3116.164..3116.164 rows=1 loops=1)
-> Seq Scan on t1 (cost=0.00..1442478.32 rows=100000032 width=0) (actual time=0.000..2416.127 rows=100000000 loops=1)
Planning Time: 0.000 ms
Execution Time: 3116.164 ms
(4 rows)
Time: 3114.059 ms (00:03.114)
postgres=# select count(*) from t1;
count
-----------
100000000
(1 row)
Time: 2086.130 ms (00:02.086)
Hi,
On 2025-03-26 23:09:42 -0400, Tom Lane wrote:
> 杨江华 <yjhjstz@gmail.com> writes:
> > This patch modifies the instr_time.h header to prefer CLOCK_MONOTONIC_COARSE
> > when available.
>
> As far as I know, our usage of instr_time really needs the highest
> resolution available, because we are usually trying to measure pretty
> short intervals. You say that this patch reduces execution time,
> and I imagine that's true ... but I wonder if it doesn't do so at
> the cost of totally destroying the reliability of the output numbers.
The reason, on x86, the timestamp querying has a somewhat high overhead is
that the "accurate" "read the tsc" instruction serves as a barrier for
out-of-order execution. With modern highly out-of-order execution that means
we'll wait for all scheduled instructions to finish before determining the
current time, multiple times for each tuple. That of course slows things down
substantially.
There's a patch to use the version of rdtsc that does *not* have barrier
semantics:
https://postgr.es/m/CAP53PkzO2KpscD-tgFW_V-4WS%2BvkniH4-B00eM-e0bsBF-xUxg%40mail.gmail.com
Greetings,
Andres Freund