Re: Remove wal_[sync|write][_time] from pg_stat_wal and track_wal_io_timing - Mailing list pgsql-hackers
From | Bertrand Drouvot |
---|---|
Subject | Re: Remove wal_[sync|write][_time] from pg_stat_wal and track_wal_io_timing |
Date | |
Msg-id | Z7xvM8rxe9dvpc8Z@ip-10-97-1-34.eu-west-3.compute.internal Whole thread Raw |
In response to | Re: Remove wal_[sync|write][_time] from pg_stat_wal and track_wal_io_timing (Andres Freund <andres@anarazel.de>) |
List | pgsql-hackers |
Hi, On Mon, Feb 24, 2025 at 06:15:53AM -0500, Andres Freund wrote: > Hi, > > On 2025-02-24 10:54:54 +0000, Bertrand Drouvot wrote: > > a051e71e28a added the "timing tracking" in the WAL code path with "only" > > track_io_timing enabled (and track_wal_io_timing still false). Here, in this thread, > > we remove unnecessary INSTR_TIME_SET_CURRENT()/INSTR_TIME_ACCUM_DIFF() calls when > > both track_io_timing and track_wal_io_timing were enabled. > > > > So I think that your main concern comes from the fact that a051e71e28a added the > > "timing tracking" in the WAL code path with "only" track_io_timing enabled. > > Correct. > > > > That's a concern we also had and discussed in [1]. The question was: should > > we track the WAL timing stats in pg_stat_io only when track_wal_io_timing is > > enabled or relying only on track_io_timing would be enough? > > > > We ran several benchmarks ([2], [3] and [4]) and based on the results we reached > > the conclusion that to rely only on track_io_timing to display the WAL timing > > stats in pg_stat_io was "good" enough. > > Sorry to be blunt, but those benchmarks miss the mark. The benchmark > > - emits WAL in a transactional fashion, with fairly small transactions, where > the disk is fairly slow. It's absolutely guaranteed that the bottleneck is > just the WAL flush disk time. You're doing ~1k-2k TPS - the timing overhead > would have to be *ginormous* to show up. > > - emits WAL at a low concurrency thus lock contention not really an issue. > > - likely is executed so on a system with very cheap timing functions - but > often production workloads unfortunately don't. While it's not quite as > common as it used to be, virtualized systems often have considerably slower > clocks. > > > On this system it'll likely be fine overhead-wise: > > $ echo tsc | sudo tee /sys/devices/system/clocksource/clocksource0/current_clocksource > tsc > > $ src/bin/pg_test_timing/pg_test_timing > Testing timing overhead for 3 seconds. > Per loop time including overhead: 19.17 ns > Histogram of timing durations: > < us % of total count > 1 98.08521 153484414 > 2 1.91421 2995376 > 4 0.00040 619 > 8 0.00014 225 > 16 0.00002 35 > 32 0.00001 9 > 64 0.00001 10 > 128 0.00000 3 > 256 0.00000 2 > 512 0.00000 1 > > > On this system however, I'd not want to bet on it: > > $ echo hpet | sudo tee /sys/devices/system/clocksource/clocksource0/current_clocksource > $ src/bin/pg_test_timing/pg_test_timing > Testing timing overhead for 3 seconds. > Per loop time including overhead: 1228.91 ns > Histogram of timing durations: > < us % of total count > 1 0.15574 3802 > 2 78.51065 1916598 > 4 21.26778 519188 > 8 0.02069 505 > 16 0.03957 966 > 32 0.00356 87 > 64 0.00193 47 > 128 0.00008 2 > > I agree that we've to put everything in the picture (system with or without cheap timing functions, lock contention and WAL flush disk time) and that we can certainly find configuration/workload that would get benefits from a dedicated track_wal_io_timing GUC. PFA a patch to re-introduce the track_wal_io_timing GUC and to ensure that the WAL write and fsync timing activities are tracked in pg_stat_io (and pg_stat_get_backend_io()) only if both track_io_timing and track_wal_io_timing are enabled. Note that to change the a051e71e28a behavior, the attached patch adds an extra "track_io_guc" parameter to pgstat_count_io_op_time() (like pgstat_prepare_io_time already had in a051e71e28a). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
pgsql-hackers by date: