Re: Remove wal_[sync|write][_time] from pg_stat_wal and track_wal_io_timing - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Remove wal_[sync|write][_time] from pg_stat_wal and track_wal_io_timing
Date
Msg-id xiwwevyjw336dgpqpqwvcuzh7riurucjvdaywsysbmfq5qfmzv@w2jfhutsvs66
Whole thread Raw
In response to Re: Remove wal_[sync|write][_time] from pg_stat_wal and track_wal_io_timing  (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>)
Responses Re: Remove wal_[sync|write][_time] from pg_stat_wal and track_wal_io_timing
List pgsql-hackers
Hi,

On 2025-02-24 10:54:54 +0000, Bertrand Drouvot wrote:
> a051e71e28a added the "timing tracking" in the WAL code path with "only"
> track_io_timing enabled (and track_wal_io_timing still false). Here, in this thread,
> we remove unnecessary INSTR_TIME_SET_CURRENT()/INSTR_TIME_ACCUM_DIFF() calls when
> both track_io_timing and track_wal_io_timing were enabled.
>
> So I think that your main concern comes from the fact that a051e71e28a added the
> "timing tracking" in the WAL code path with "only" track_io_timing enabled.

Correct.


> That's a concern we also had and discussed in [1]. The question was: should
> we track the WAL timing stats in pg_stat_io only when track_wal_io_timing is
> enabled or relying only on track_io_timing would be enough?
>
> We ran several benchmarks ([2], [3] and [4]) and based on the results we reached
> the conclusion that to rely only on track_io_timing to display the WAL timing
> stats in pg_stat_io was "good" enough.

Sorry to be blunt, but those benchmarks miss the mark. The benchmark

- emits WAL in a transactional fashion, with fairly small transactions, where
  the disk is fairly slow. It's absolutely guaranteed that the bottleneck is
  just the WAL flush disk time.  You're doing ~1k-2k TPS - the timing overhead
  would have to be *ginormous* to show up.

- emits WAL at a low concurrency thus lock contention not really an issue.

- likely is executed so on a system with very cheap timing functions - but
  often production workloads unfortunately don't. While it's not quite as
  common as it used to be, virtualized systems often have considerably slower
  clocks.


On this system it'll likely be fine overhead-wise:

$ echo tsc | sudo tee /sys/devices/system/clocksource/clocksource0/current_clocksource
tsc

$ src/bin/pg_test_timing/pg_test_timing
Testing timing overhead for 3 seconds.
Per loop time including overhead: 19.17 ns
Histogram of timing durations:
  < us   % of total      count
     1     98.08521  153484414
     2      1.91421    2995376
     4      0.00040        619
     8      0.00014        225
    16      0.00002         35
    32      0.00001          9
    64      0.00001         10
   128      0.00000          3
   256      0.00000          2
   512      0.00000          1


On this system however, I'd not want to bet on it:

$ echo hpet | sudo tee /sys/devices/system/clocksource/clocksource0/current_clocksource
$ src/bin/pg_test_timing/pg_test_timing
Testing timing overhead for 3 seconds.
Per loop time including overhead: 1228.91 ns
Histogram of timing durations:
  < us   % of total      count
     1      0.15574       3802
     2     78.51065    1916598
     4     21.26778     519188
     8      0.02069        505
    16      0.03957        966
    32      0.00356         87
    64      0.00193         47
   128      0.00008          2


Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Shlok Kyal
Date:
Subject: Re: Restrict copying of invalidated replication slots
Next
From: Shlok Kyal
Date:
Subject: Re: long-standing data loss bug in initial sync of logical replication