Re: Remove wal_[sync|write][_time] from pg_stat_wal and track_wal_io_timing - Mailing list pgsql-hackers

From Bertrand Drouvot
Subject Re: Remove wal_[sync|write][_time] from pg_stat_wal and track_wal_io_timing
Date
Msg-id Z7xvM8rxe9dvpc8Z@ip-10-97-1-34.eu-west-3.compute.internal
Whole thread Raw
In response to Re: Remove wal_[sync|write][_time] from pg_stat_wal and track_wal_io_timing  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
Hi,

On Mon, Feb 24, 2025 at 06:15:53AM -0500, Andres Freund wrote:
> Hi,
> 
> On 2025-02-24 10:54:54 +0000, Bertrand Drouvot wrote:
> > a051e71e28a added the "timing tracking" in the WAL code path with "only"
> > track_io_timing enabled (and track_wal_io_timing still false). Here, in this thread,
> > we remove unnecessary INSTR_TIME_SET_CURRENT()/INSTR_TIME_ACCUM_DIFF() calls when
> > both track_io_timing and track_wal_io_timing were enabled.
> >
> > So I think that your main concern comes from the fact that a051e71e28a added the
> > "timing tracking" in the WAL code path with "only" track_io_timing enabled.
> 
> Correct.
> 
> 
> > That's a concern we also had and discussed in [1]. The question was: should
> > we track the WAL timing stats in pg_stat_io only when track_wal_io_timing is
> > enabled or relying only on track_io_timing would be enough?
> >
> > We ran several benchmarks ([2], [3] and [4]) and based on the results we reached
> > the conclusion that to rely only on track_io_timing to display the WAL timing
> > stats in pg_stat_io was "good" enough.
> 
> Sorry to be blunt, but those benchmarks miss the mark. The benchmark
> 
> - emits WAL in a transactional fashion, with fairly small transactions, where
>   the disk is fairly slow. It's absolutely guaranteed that the bottleneck is
>   just the WAL flush disk time.  You're doing ~1k-2k TPS - the timing overhead
>   would have to be *ginormous* to show up.
> 
> - emits WAL at a low concurrency thus lock contention not really an issue.
> 
> - likely is executed so on a system with very cheap timing functions - but
>   often production workloads unfortunately don't. While it's not quite as
>   common as it used to be, virtualized systems often have considerably slower
>   clocks.
> 
> 
> On this system it'll likely be fine overhead-wise:
> 
> $ echo tsc | sudo tee /sys/devices/system/clocksource/clocksource0/current_clocksource
> tsc
> 
> $ src/bin/pg_test_timing/pg_test_timing
> Testing timing overhead for 3 seconds.
> Per loop time including overhead: 19.17 ns
> Histogram of timing durations:
>   < us   % of total      count
>      1     98.08521  153484414
>      2      1.91421    2995376
>      4      0.00040        619
>      8      0.00014        225
>     16      0.00002         35
>     32      0.00001          9
>     64      0.00001         10
>    128      0.00000          3
>    256      0.00000          2
>    512      0.00000          1
> 
> 
> On this system however, I'd not want to bet on it:
> 
> $ echo hpet | sudo tee /sys/devices/system/clocksource/clocksource0/current_clocksource
> $ src/bin/pg_test_timing/pg_test_timing
> Testing timing overhead for 3 seconds.
> Per loop time including overhead: 1228.91 ns
> Histogram of timing durations:
>   < us   % of total      count
>      1      0.15574       3802
>      2     78.51065    1916598
>      4     21.26778     519188
>      8      0.02069        505
>     16      0.03957        966
>     32      0.00356         87
>     64      0.00193         47
>    128      0.00008          2
> 
> 

I agree that we've to put everything in the picture (system with or without
cheap timing functions, lock contention and WAL flush disk time) and that we
can certainly find configuration/workload that would get benefits from a
dedicated track_wal_io_timing GUC.

PFA a patch to re-introduce the track_wal_io_timing GUC and to ensure that the
WAL write and fsync timing activities are tracked in pg_stat_io (and
pg_stat_get_backend_io()) only if both track_io_timing and track_wal_io_timing
are enabled.

Note that to change the a051e71e28a behavior, the attached patch adds an extra
"track_io_guc" parameter to pgstat_count_io_op_time() (like pgstat_prepare_io_time
already had in a051e71e28a).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

pgsql-hackers by date:

Previous
From: Rahila Syed
Date:
Subject: Re: Enhancing Memory Context Statistics Reporting
Next
From: wenhui qiu
Date:
Subject: Re: Anti join confusion