Thread: [BUG]: the walsender does not update its IO statistics until it exits
Hi hackers, while doing some tests for [1], I observed that $SUBJECT. To observe this behavior on master: 1. create a logical replication slot postgres=# SELECT * FROM pg_create_logical_replication_slot('logical_slot', 'test_decoding', false, true); slot_name | lsn --------------+------------ logical_slot | 0/40749508 (1 row) 2. create a table and add some data postgres=# create table bdt (a int); CREATE TABLE postgres=# insert into bdt select a from generate_series(1,10000) a ; INSERT 0 10000 3. starts pg_recvlogical that way pg_recvlogical -d postgres -S logical_slot -f - --no-loop --start 4. query pg_stat_io postgres=# select backend_type,object,context,reads,read_bytes from pg_stat_io where backend_type = 'walsender'; backend_type | object | context | reads | read_bytes --------------+---------------+-----------+-------+------------ walsender | relation | bulkread | 0 | 0 walsender | relation | bulkwrite | 0 | 0 walsender | relation | init | 0 | 0 walsender | relation | normal | 6 | 49152 walsender | relation | vacuum | 0 | 0 walsender | temp relation | normal | 0 | 0 walsender | wal | init | | walsender | wal | normal | 0 | 0 (8 rows) The non zeros stats that we see here are due to the pgstat_report_stat() call in PostgresMain() but not to the walsender decoding activity per say (proof is that you can see that the wal object values are empty while it certainly had to read some WAL). 5. Once ctrl-c is done for pg_recvlogical then we get: postgres=# select backend_type,object,context,reads,read_bytes from pg_stat_io where backend_type = 'walsender'; backend_type | object | context | reads | read_bytes --------------+---------------+-----------+-------+------------ walsender | relation | bulkread | 0 | 0 walsender | relation | bulkwrite | 0 | 0 walsender | relation | init | 0 | 0 walsender | relation | normal | 9 | 73728 walsender | relation | vacuum | 0 | 0 walsender | temp relation | normal | 0 | 0 walsender | wal | init | | walsender | wal | normal | 98 | 793856 (8 rows) Now we can see that the numbers increased for the relation object and that we get non zeros numbers for the wal object too (which makes fully sense). With the attached patch applied, we would get the same numbers already in step 4. (means the stats are flushed without the need to wait for the walsender to exit). Remarks: R1. The extra flush are done in WalSndLoop(): I believe this is the right place for them. R2. A test is added in 035_standby_logical_decoding.pl: while this TAP test is already racy ([2]) that looks like a good place as we don't want pg_recvlogical to stop/exit. R3. The test can also be back-patched till 16_STABLE as 035_standby_logical_decoding.pl has been introduced in 16 (and so do pg_stat_io). R4. The test fails if the extra flushs are not applied/done, which makes fully sense. [1]: https://www.postgresql.org/message-id/flat/Z3zqc4o09dM/Ezyz%40ip-10-97-1-34.eu-west-3.compute.internal [2]: https://www.postgresql.org/message-id/Z6oRgmD8m7zBo732%40ip-10-97-1-34.eu-west-3.compute.internal Looking forward to your feedback, Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Re: [BUG]: the walsender does not update its IO statistics until it exits
From
Michael Paquier
Date:
On Tue, Feb 25, 2025 at 01:42:08PM +0000, Bertrand Drouvot wrote: > Now we can see that the numbers increased for the relation object and that we > get non zeros numbers for the wal object too (which makes fully sense). > > With the attached patch applied, we would get the same numbers already in > step 4. (means the stats are flushed without the need to wait for the walsender > to exit). @@ -2793,6 +2794,12 @@ WalSndLoop(WalSndSendDataCallback send_data) if (pq_flush_if_writable() != 0) WalSndShutdown(); + /* + * Report IO statistics + */ + pgstat_flush_io(false); + (void) pgstat_flush_backend(false, PGSTAT_BACKEND_FLUSH_IO); + /* If nothing remains to be sent right now ... */ if (WalSndCaughtUp && !pq_is_send_pending()) { That's bad, worse for a logical WAL sender, because it means that we have no idea what kind of I/O happens in this process until it exits, and logical WAL senders could loop forever, since v16 where we've begun tracking I/O. A non-forced periodic flush like you are proposing here sounds OK to me, but the position of the flush could be positioned better in the loop. If there is a SIGUSR2 (aka got_SIGUSR2 is true), WAL senders would shut down, so it seems rather pointless to do a flush just before exiting the process in WalSndDone(), no? I'd suggest to move the flush attempt closer to where we wait for some activity, just after WalSndKeepaliveIfNecessary(). -- Michael
Attachment
Re: [BUG]: the walsender does not update its IO statistics until it exits
From
Bertrand Drouvot
Date:
Hi, On Wed, Feb 26, 2025 at 03:37:10PM +0900, Michael Paquier wrote: > On Tue, Feb 25, 2025 at 01:42:08PM +0000, Bertrand Drouvot wrote: > > Now we can see that the numbers increased for the relation object and that we > > get non zeros numbers for the wal object too (which makes fully sense). > > > > With the attached patch applied, we would get the same numbers already in > > step 4. (means the stats are flushed without the need to wait for the walsender > > to exit). > > @@ -2793,6 +2794,12 @@ WalSndLoop(WalSndSendDataCallback send_data) > if (pq_flush_if_writable() != 0) > WalSndShutdown(); > > + /* > + * Report IO statistics > + */ > + pgstat_flush_io(false); > + (void) pgstat_flush_backend(false, PGSTAT_BACKEND_FLUSH_IO); > + > /* If nothing remains to be sent right now ... */ > if (WalSndCaughtUp && !pq_is_send_pending()) > { > > That's bad, worse for a logical WAL sender, because it means that we > have no idea what kind of I/O happens in this process until it exits, > and logical WAL senders could loop forever, since v16 where we've > begun tracking I/O. Yeah... And while the example shared up-thread is related to logical walsender, the same issue exists for a physical walsender. OTOH, It's also great to see that the new stats that have been added (the WAL ones) helped to spot the issue. > A non-forced periodic flush like you are proposing here sounds OK to > me, Thanks for looking at it! > but the position of the flush could be positioned better in the > loop. If there is a SIGUSR2 (aka got_SIGUSR2 is true), WAL senders > would shut down, That's true for a physical walsender but I'm not sure it is for a logical walsender (due to the "sentPtr == replicatedPtr" check in WalSndDone()). > so it seems rather pointless to do a flush just > before exiting the process in WalSndDone(), no? I'd suggest to move > the flush attempt closer to where we wait for some activity, just > after WalSndKeepaliveIfNecessary(). Yeah I think that makes sense, done that way in the attached. Speaking about physical walsender, I moved the test to 001_stream_rep.pl instead (would also fail without the fix). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Hi, On 2025-02-26 15:37:10 +0900, Michael Paquier wrote: > That's bad, worse for a logical WAL sender, because it means that we > have no idea what kind of I/O happens in this process until it exits, > and logical WAL senders could loop forever, since v16 where we've > begun tracking I/O. FWIW, I think medium term we need to work on splitting stats flushing into two separate kinds of flushes: 1) non-transactional stats, which should be flushed at a regular interval, unless a process is completely idle 2) transaction stats, which can only be flushed at transaction boundaries, because before the transaction boundary we don't know if e.g. newly inserted rows should be counted as live or dead So far we have some timer logic for 2), but we have basically no support for 1). Which means we have weird ad-hoc logic in various kinds of non-plain-connection processes. And that will often have holes, as Bertrand noticed here. I think it's also bad that we don't have a solution for 1), even just for normal connections. If a backend causes a lot of IO we might want to know about that long before the longrunning transaction commits. I suspect the right design here would be to have a generalized form of the timeout mechanism we have for 2). For that we'd need to make sure that pgstat_report_stat() can be safely called inside a transaction. The second part would be to redesign the IdleStatsUpdateTimeoutPending mechanism so it is triggered independent of idleness, without introducing unacceptable overhead - I think that's doable. Greetings, Andres Freund