Thread: Add WAL read stats to pg_stat_wal
Hi, While working on [1], I was in need to know WAL read stats (number of times and amount of WAL data read from disk, time taken to read) to measure the benefit. I had to write a developer patch to capture WAL read stats as pg_stat_wal currently emits WAL write stats. With recent works on pg_stat_io which emit data read IO stats too, I think it's better to not miss WAL read stats. It might help others who keep an eye on IOPS of the production servers for various reasons. The WAL read stats I'm thinking useful are wal_read_bytes - total amount of WAL read, wal_read - total number of times WAL is read from disk, wal_read_time - total amount of time spent reading WAL (tracked only when an existing GUC track_wal_io_timing is on). I came up with a patch and attached it here. The WAL readers that add to WAL read stats are WAL senders, startup process and other backends using xlogreader for logical replication or pg_walinspect SQL functions. They all report stats to shared memory by calling pgstat_report_wal() in appropriate locations. In standby mode, calling pgstat_report_wa() for every record seems to be costly. Therefore, I chose to report stats every 1024 WAL records (a random number, suggestions for a better a way are welcome here). Note that the patch needs a bit more work, per [2]. With the patch, the WAL senders (processes exiting after checkpointer) will generate stats and we need to either let all or only one WAL sender to write stats to disk. Allowing one WAL sender to write might be tricky. Allowing all WAL senders to write might make too many writes to the stats file. And, we need a lock to let only one process write. I can't think of a best way here at the moment. Thoughts? [1] https://www.postgresql.org/message-id/CALj2ACXKKK=wbiG5_t6dGao5GoecMwRkhr7GjVBM_jg54+Na=Q@mail.gmail.com [2] /* * Write out stats after shutdown. This needs to be called by exactly one * process during a normal shutdown, and since checkpointer is shut down * very late... * * Walsenders are shut down after the checkpointer, but currently don't * report stats. If that changes, we need a more complicated solution. */ before_shmem_exit(pgstat_before_server_shutdown, 0); -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Hi, On 2023-02-16 23:39:00 +0530, Bharath Rupireddy wrote: > While working on [1], I was in need to know WAL read stats (number of > times and amount of WAL data read from disk, time taken to read) to > measure the benefit. I had to write a developer patch to capture WAL > read stats as pg_stat_wal currently emits WAL write stats. With recent > works on pg_stat_io which emit data read IO stats too, I think it's > better to not miss WAL read stats. It might help others who keep an > eye on IOPS of the production servers for various reasons. The WAL > read stats I'm thinking useful are wal_read_bytes - total amount of > WAL read, wal_read - total number of times WAL is read from disk, > wal_read_time - total amount of time spent reading WAL (tracked only > when an existing GUC track_wal_io_timing is on). I doesn't really seem useful to have this in pg_stat_wal, because you can't really figure out where those reads are coming from. Are they crash recovery? Walsender? ...? I think this'd better be handled by adding WAL support for pg_stat_io. Then the WAL reads would be attributed to the relevant backend type, making it easier to answer such questions. Going forward I want to add support for seeing pg_stat_io for individual connections, which'd then automatically support this feature for the WAL reads as well. Eventually I think pg_stat_wal should only track wal_records, wal_fpi, wal_buffers_full and fill the other columns from pg_stat_io. However, this doesn't "solve" the following issue: > Note that the patch needs a bit more work, per [2]. With the patch, > the WAL senders (processes exiting after checkpointer) will generate > stats and we need to either let all or only one WAL sender to write > stats to disk. Allowing one WAL sender to write might be tricky. > Allowing all WAL senders to write might make too many writes to the > stats file. And, we need a lock to let only one process write. I can't > think of a best way here at the moment. > > Thoughts? > > [1] https://www.postgresql.org/message-id/CALj2ACXKKK=wbiG5_t6dGao5GoecMwRkhr7GjVBM_jg54+Na=Q@mail.gmail.com > [2] > /* > * Write out stats after shutdown. This needs to be called by exactly one > * process during a normal shutdown, and since checkpointer is shut down > * very late... > * > * Walsenders are shut down after the checkpointer, but currently don't > * report stats. If that changes, we need a more complicated solution. > */ > before_shmem_exit(pgstat_before_server_shutdown, 0); I wonder if we should keep the checkpointer around for longer. If we have checkpointer signal postmaster after it wrote the shutdown checkpoint, postmaster could signal walsenders to shut down, and checkpointer could do some final work, like writing out the stats. I suspect this could be useful for other things as well. It's awkward that we don't have a place to put "just before shutting down" type tasks. And checkpointer seems well suited for that. Greetings, Andres Freund
At Thu, 16 Feb 2023 11:11:38 -0800, Andres Freund <andres@anarazel.de> wrote in > I wonder if we should keep the checkpointer around for longer. If we have > checkpointer signal postmaster after it wrote the shutdown checkpoint, > postmaster could signal walsenders to shut down, and checkpointer could do > some final work, like writing out the stats. > I suspect this could be useful for other things as well. It's awkward that we > don't have a place to put "just before shutting down" type tasks. And > checkpointer seems well suited for that. I totally agree that it will be useful, but I'm not quite sure how checkpointer would be able to let postmaster know about that state without requiring access to shared memory. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Fri, Feb 17, 2023 at 12:41 AM Andres Freund <andres@anarazel.de> wrote: > > On 2023-02-16 23:39:00 +0530, Bharath Rupireddy wrote: > > While working on [1], I was in need to know WAL read stats (number of > > times and amount of WAL data read from disk, time taken to read) to > > measure the benefit. I had to write a developer patch to capture WAL > > read stats as pg_stat_wal currently emits WAL write stats. With recent > > works on pg_stat_io which emit data read IO stats too, I think it's > > better to not miss WAL read stats. It might help others who keep an > > eye on IOPS of the production servers for various reasons. The WAL > > read stats I'm thinking useful are wal_read_bytes - total amount of > > WAL read, wal_read - total number of times WAL is read from disk, > > wal_read_time - total amount of time spent reading WAL (tracked only > > when an existing GUC track_wal_io_timing is on). > > I doesn't really seem useful to have this in pg_stat_wal, because you can't > really figure out where those reads are coming from. Are they crash recovery? > Walsender? ...? Yes, that's the limitation with what I've proposed. > I think this'd better be handled by adding WAL support for pg_stat_io. Then > the WAL reads would be attributed to the relevant backend type, making it > easier to answer such questions. Going forward I want to add support for > seeing pg_stat_io for individual connections, which'd then automatically > support this feature for the WAL reads as well. > > Eventually I think pg_stat_wal should only track wal_records, wal_fpi, > wal_buffers_full and fill the other columns from pg_stat_io. pg_stat_io being one place for all IO related information sounds apt and useful. And similarly, we might want to push write/read/flush info from pg_stat_slru to pg_stat_io. > However, this doesn't "solve" the following issue: > > > Note that the patch needs a bit more work, per [2]. With the patch, > > the WAL senders (processes exiting after checkpointer) will generate > > stats and we need to either let all or only one WAL sender to write > > stats to disk. Allowing one WAL sender to write might be tricky. > > Allowing all WAL senders to write might make too many writes to the > > stats file. And, we need a lock to let only one process write. I can't > > think of a best way here at the moment. > > > > Thoughts? > > > > [1] https://www.postgresql.org/message-id/CALj2ACXKKK=wbiG5_t6dGao5GoecMwRkhr7GjVBM_jg54+Na=Q@mail.gmail.com > > [2] > > /* > > * Write out stats after shutdown. This needs to be called by exactly one > > * process during a normal shutdown, and since checkpointer is shut down > > * very late... > > * > > * Walsenders are shut down after the checkpointer, but currently don't > > * report stats. If that changes, we need a more complicated solution. > > */ > > before_shmem_exit(pgstat_before_server_shutdown, 0); > > I wonder if we should keep the checkpointer around for longer. If we have > checkpointer signal postmaster after it wrote the shutdown checkpoint, > postmaster could signal walsenders to shut down, and checkpointer could do > some final work, like writing out the stats. > > I suspect this could be useful for other things as well. It's awkward that we > don't have a place to put "just before shutting down" type tasks. And > checkpointer seems well suited for that. Yes, there are some places that still assume checkpointer is the last process to exit, for instance see [1]. If we can truly make it happen, it'll be useful. I'll come up with more thoughts (and perhaps a patch) on this soon. [1] /* * Checkpointer is the last process to shut down, so we ask it to hold * the keys for a range of other tasks required most of which have * nothing to do with checkpointing at all. * * For various reasons, some config values can change dynamically so * the primary copy of them is held in shared memory to make sure all * backends see the same value. We make Checkpointer responsible for * updating the shared memory copy if the parameter setting changes * because of SIGHUP. */ -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Mon, Feb 20, 2023 at 10:51 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > At Thu, 16 Feb 2023 11:11:38 -0800, Andres Freund <andres@anarazel.de> wrote in > > I wonder if we should keep the checkpointer around for longer. If we have > > checkpointer signal postmaster after it wrote the shutdown checkpoint, > > postmaster could signal walsenders to shut down, and checkpointer could do > > some final work, like writing out the stats. > > I suspect this could be useful for other things as well. It's awkward that we > > don't have a place to put "just before shutting down" type tasks. And > > checkpointer seems well suited for that. > > I totally agree that it will be useful, but I'm not quite sure how > checkpointer would be able to let postmaster know about that state > without requiring access to shared memory. The checkpointer can either set a flag in shared memory (CheckpointerShmem or XLogCtl) or send a multiplexed SIGUSR1 (of course, this one too needs shared memory access for PMSignalState) or SIGUSR2 (pqsignal(SIGUSR2, dummy_handler); /* unused, reserve for children */) if we don't want shared memory access after it writes a shutdown checkpoint. Having said that, what's the problem if we use shared memory to report the shutdown checkpoint to the postmaster? In case of abnormal shutdown where shared memory gets corrupted, we don't even write a shutdown checkpoint, no? In such a case, the postmaster doesn't send SIGUSR2 to the checkpointer, instead it sends SIGQUIT. AFICS, using shared memory doesn't seem to have any problem. Do you have any other thoughts in mind? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On 2023-02-20 14:21:39 +0900, Kyotaro Horiguchi wrote: > I totally agree that it will be useful, but I'm not quite sure how > checkpointer would be able to let postmaster know about that state > without requiring access to shared memory. SendPostmasterSignal(PMSIGNAL_SHUTDOWN_CHECKPOINT_COMPLETE); or such. Greetings, Andres Freund
At Mon, 20 Feb 2023 08:29:06 -0800, Andres Freund <andres@anarazel.de> wrote in > Hi, > > On 2023-02-20 14:21:39 +0900, Kyotaro Horiguchi wrote: > > I totally agree that it will be useful, but I'm not quite sure how > > checkpointer would be able to let postmaster know about that state > > without requiring access to shared memory. > > SendPostmasterSignal(PMSIGNAL_SHUTDOWN_CHECKPOINT_COMPLETE); > or such. Ah, that's it. Thanks! regarsd. -- Kyotaro Horiguchi NTT Open Source Software Center
At Mon, 20 Feb 2023 20:15:00 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in > Having said that, what's the problem if we use shared memory to report > the shutdown checkpoint to the postmaster? In case of abnormal > shutdown where shared memory gets corrupted, we don't even write a > shutdown checkpoint, no? In such a case, the postmaster doesn't send > SIGUSR2 to the checkpointer, instead it sends SIGQUIT. AFICS, using > shared memory doesn't seem to have any problem. Do you have any other > thoughts in mind? I had a baseless belief that postmaster doesn't touch shared memory, but as Andres suggested, SendPostmasterSignal() already does that. regards. -- Kyotaro Horiguchi NTT Open Source Software Center