Re: pg_stat_io not tracking smgrwriteback() is confusing - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: pg_stat_io not tracking smgrwriteback() is confusing |
Date | |
Msg-id | 20230424211432.u2z6v6mceemznd6t@awork3.anarazel.de Whole thread Raw |
In response to | Re: pg_stat_io not tracking smgrwriteback() is confusing (Melanie Plageman <melanieplageman@gmail.com>) |
Responses |
Re: pg_stat_io not tracking smgrwriteback() is confusing
|
List | pgsql-hackers |
Hi, On 2023-04-24 16:39:36 -0400, Melanie Plageman wrote: > On Wed, Apr 19, 2023 at 10:23:26AM -0700, Andres Freund wrote: > > Hi, > > > > I noticed that the numbers in pg_stat_io dont't quite add up to what I > > expected in write heavy workloads. Particularly for checkpointer, the numbers > > for "write" in log_checkpoints output are larger than what is visible in > > pg_stat_io. > > > > That partially is because log_checkpoints' "write" covers way too many things, > > but there's an issue with pg_stat_io as well: > > > > Checkpoints, and some other sources of writes, will often end up doing a lot > > of smgrwriteback() calls - which pg_stat_io doesn't track. Nor do any > > pre-existing forms of IO statistics. > > > > It seems pretty clear that we should track writeback as well. I wonder if it's > > worth doing so for 16? It'd give a more complete picture that way. The > > counter-argument I see is that we didn't track the time for it in existing > > stats either, and that nobody complained - but I suspect that's mostly because > > nobody knew to look. > > Not complaining about making pg_stat_io more accurate, but what exactly > would we be tracking for smgrwriteback()? I assume you are talking about > IO timing. AFAICT, on Linux, it does sync_file_range() with > SYNC_FILE_RANGE_WRITE, which is asynchronous. Wouldn't we just be > tracking the system call overhead time? It starts blocking once "enough" IO is in flight. For things like an immediate checkpoint, that can happen fairly quickly, unless you have a very fast IO subsystem. So often it'll not matter whether we track smgrwriteback(), but when it matter, it can matter a lot. As an example, I inited' a pgbench w/ scale 1000, on a decent but not great NVMe SSD. Created dirty data with: c=96;/srv/dev/build/m-opt/src/bin/pgbench/pgbench --random-seed=0 -n -M prepared -c$c -j$c -T30 -P1 and then measured the checkpoint: perf trace -s -p $pid_of_checkpointer psql -x -c "SELECT pg_stat_reset_shared('io')" -c "checkpoint" postgres (367444), 1891280 events, 100.0% syscall calls errors total min avg max stddev (msec) (msec) (msec) (msec) (%) --------------- -------- ------ -------- --------- --------- --------- ------ sync_file_range 359176 0 4560.670 0.002 0.013 238.955 10.36% pwrite64 582964 0 2874.634 0.003 0.005 0.156 0.06% fsync 242 0 251.631 0.001 1.040 42.166 18.81% openat 317 65 2.171 0.002 0.007 0.068 5.69% rename 69 0 1.973 0.012 0.029 0.084 5.81% fdatasync 1 0 1.543 1.543 1.543 1.543 0.00% unlink 150 137 1.278 0.002 0.009 0.062 10.69% close 250 0 0.694 0.001 0.003 0.007 3.14% newfstatat 140 68 0.667 0.001 0.005 0.022 7.26% write 5 0 0.067 0.005 0.013 0.025 24.55% lseek 14 0 0.050 0.001 0.004 0.018 33.87% getdents64 8 0 0.047 0.002 0.006 0.022 39.51% kill 3 0 0.029 0.008 0.010 0.011 10.18% epoll_wait 2 1 0.006 0.000 0.003 0.006 100.00% read 1 0 0.004 0.004 0.004 0.004 0.00% Log output: 2023-04-24 14:11:59.234 PDT [367444][checkpointer][:0][] LOG: checkpoint starting: immediate force wait 2023-04-24 14:12:09.236 PDT [367444][checkpointer][:0][] LOG: checkpoint complete: wrote 595974 buffers (28.4%); 0 WAL file(s)added, 0 removed, 68 recycled; write=9.740 s, sync=0.057 s, total=10.002 s; sync files=27, longest=0.043 s, average=0.003s; distance=4467386 kB, estimate=4467386 kB; lsn=6/E5D33F98, redo lsn=6/E5D33F28 # SELECT writes, write_time, fsyncs, fsync_time FROM pg_stat_io WHERE backend_type = 'checkpointer'; ┌────────┬────────────────────┬────────┬────────────┐ │ writes │ write_time │ fsyncs │ fsync_time │ ├────────┼────────────────────┼────────┼────────────┤ │ 595914 │ 4002.1730000000002 │ 24 │ 46.359 │ └────────┴────────────────────┴────────┴────────────┘ Greetings, Andres Freund
pgsql-hackers by date: