Thread: [BUG]: the walsender does not update its IO statistics until it exits

[BUG]: the walsender does not update its IO statistics until it exits

From
Bertrand Drouvot
Date:
Hi hackers,

while doing some tests for [1], I observed that $SUBJECT.

To observe this behavior on master:

1. create a logical replication slot

postgres=# SELECT * FROM pg_create_logical_replication_slot('logical_slot', 'test_decoding', false, true);
  slot_name   |    lsn
--------------+------------
 logical_slot | 0/40749508
(1 row)

2. create a table and add some data

postgres=# create table bdt (a int);
CREATE TABLE
postgres=# insert into bdt select a from generate_series(1,10000) a ;
INSERT 0 10000

3. starts pg_recvlogical that way

pg_recvlogical -d postgres -S logical_slot -f - --no-loop --start

4. query pg_stat_io

postgres=# select backend_type,object,context,reads,read_bytes from pg_stat_io where backend_type = 'walsender';
 backend_type |    object     |  context  | reads | read_bytes
--------------+---------------+-----------+-------+------------
 walsender    | relation      | bulkread  |     0 |          0
 walsender    | relation      | bulkwrite |     0 |          0
 walsender    | relation      | init      |     0 |          0
 walsender    | relation      | normal    |     6 |      49152
 walsender    | relation      | vacuum    |     0 |          0
 walsender    | temp relation | normal    |     0 |          0
 walsender    | wal           | init      |       |
 walsender    | wal           | normal    |     0 |          0
(8 rows)

The non zeros stats that we see here are due to the pgstat_report_stat() call in
PostgresMain() but not to the walsender decoding activity per say (proof is that
you can see that the wal object values are empty while it certainly had to read
some WAL).

5. Once ctrl-c is done for pg_recvlogical then we get:

postgres=# select backend_type,object,context,reads,read_bytes from pg_stat_io where backend_type = 'walsender';
 backend_type |    object     |  context  | reads | read_bytes
--------------+---------------+-----------+-------+------------
 walsender    | relation      | bulkread  |     0 |          0
 walsender    | relation      | bulkwrite |     0 |          0
 walsender    | relation      | init      |     0 |          0
 walsender    | relation      | normal    |     9 |      73728
 walsender    | relation      | vacuum    |     0 |          0
 walsender    | temp relation | normal    |     0 |          0
 walsender    | wal           | init      |       |
 walsender    | wal           | normal    |    98 |     793856
(8 rows)

Now we can see that the numbers increased for the relation object and that we
get non zeros numbers for the wal object too (which makes fully sense).

With the attached patch applied, we would get the same numbers already in
step 4. (means the stats are flushed without the need to wait for the walsender
to exit).

Remarks:

R1. The extra flush are done in WalSndLoop(): I believe this is the right place
for them.
R2. A test is added in 035_standby_logical_decoding.pl: while this TAP test
is already racy ([2]) that looks like a good place as we don't want pg_recvlogical
to stop/exit.
R3. The test can also be back-patched till 16_STABLE as 035_standby_logical_decoding.pl
has been introduced in 16 (and so do pg_stat_io).
R4. The test fails if the extra flushs are not applied/done, which makes fully
sense.

[1]: https://www.postgresql.org/message-id/flat/Z3zqc4o09dM/Ezyz%40ip-10-97-1-34.eu-west-3.compute.internal
[2]: https://www.postgresql.org/message-id/Z6oRgmD8m7zBo732%40ip-10-97-1-34.eu-west-3.compute.internal

Looking forward to your feedback,

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

Re: [BUG]: the walsender does not update its IO statistics until it exits

From
Michael Paquier
Date:
On Tue, Feb 25, 2025 at 01:42:08PM +0000, Bertrand Drouvot wrote:
> Now we can see that the numbers increased for the relation object and that we
> get non zeros numbers for the wal object too (which makes fully sense).
>
> With the attached patch applied, we would get the same numbers already in
> step 4. (means the stats are flushed without the need to wait for the walsender
> to exit).

@@ -2793,6 +2794,12 @@ WalSndLoop(WalSndSendDataCallback send_data)
         if (pq_flush_if_writable() != 0)
             WalSndShutdown();

+        /*
+         * Report IO statistics
+         */
+        pgstat_flush_io(false);
+        (void) pgstat_flush_backend(false, PGSTAT_BACKEND_FLUSH_IO);
+
         /* If nothing remains to be sent right now ... */
         if (WalSndCaughtUp && !pq_is_send_pending())
         {

That's bad, worse for a logical WAL sender, because it means that we
have no idea what kind of I/O happens in this process until it exits,
and logical WAL senders could loop forever, since v16 where we've
begun tracking I/O.

A non-forced periodic flush like you are proposing here sounds OK to
me, but the position of the flush could be positioned better in the
loop.  If there is a SIGUSR2 (aka got_SIGUSR2 is true), WAL senders
would shut down, so it seems rather pointless to do a flush just
before exiting the process in WalSndDone(), no?  I'd suggest to move
the flush attempt closer to where we wait for some activity, just
after WalSndKeepaliveIfNecessary().
--
Michael

Attachment

Re: [BUG]: the walsender does not update its IO statistics until it exits

From
Bertrand Drouvot
Date:
Hi,

On Wed, Feb 26, 2025 at 03:37:10PM +0900, Michael Paquier wrote:
> On Tue, Feb 25, 2025 at 01:42:08PM +0000, Bertrand Drouvot wrote:
> > Now we can see that the numbers increased for the relation object and that we
> > get non zeros numbers for the wal object too (which makes fully sense).
> > 
> > With the attached patch applied, we would get the same numbers already in
> > step 4. (means the stats are flushed without the need to wait for the walsender
> > to exit).
> 
> @@ -2793,6 +2794,12 @@ WalSndLoop(WalSndSendDataCallback send_data)
>          if (pq_flush_if_writable() != 0)
>              WalSndShutdown();
>  
> +        /*
> +         * Report IO statistics
> +         */
> +        pgstat_flush_io(false);
> +        (void) pgstat_flush_backend(false, PGSTAT_BACKEND_FLUSH_IO);
> +
>          /* If nothing remains to be sent right now ... */
>          if (WalSndCaughtUp && !pq_is_send_pending())
>          {
> 
> That's bad, worse for a logical WAL sender, because it means that we
> have no idea what kind of I/O happens in this process until it exits,
> and logical WAL senders could loop forever, since v16 where we've
> begun tracking I/O.

Yeah... And while the example shared up-thread is related to logical walsender,
the same issue exists for a physical walsender.

OTOH, It's also great to see that the new stats that have been added (the WAL
ones) helped to spot the issue.

> A non-forced periodic flush like you are proposing here sounds OK to
> me,

Thanks for looking at it!

> but the position of the flush could be positioned better in the
> loop.  If there is a SIGUSR2 (aka got_SIGUSR2 is true), WAL senders
> would shut down,

That's true for a physical walsender but I'm not sure it is for a logical
walsender (due to the "sentPtr == replicatedPtr" check in WalSndDone()).

> so it seems rather pointless to do a flush just
> before exiting the process in WalSndDone(), no?  I'd suggest to move
> the flush attempt closer to where we wait for some activity, just
> after WalSndKeepaliveIfNecessary().

Yeah I think that makes sense, done that way in the attached.

Speaking about physical walsender, I moved the test to 001_stream_rep.pl instead
(would also fail without the fix).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

Re: [BUG]: the walsender does not update its IO statistics until it exits

From
Andres Freund
Date:
Hi,

On 2025-02-26 15:37:10 +0900, Michael Paquier wrote:
> That's bad, worse for a logical WAL sender, because it means that we
> have no idea what kind of I/O happens in this process until it exits,
> and logical WAL senders could loop forever, since v16 where we've
> begun tracking I/O.

FWIW, I think medium term we need to work on splitting stats flushing into two
separate kinds of flushes:
1) non-transactional stats, which should be flushed at a regular interval,
   unless a process is completely idle
2) transaction stats, which can only be flushed at transaction boundaries,
   because before the transaction boundary we don't know if e.g. newly
   inserted rows should be counted as live or dead

So far we have some timer logic for 2), but we have basically no support for
1). Which means we have weird ad-hoc logic in various kinds of
non-plain-connection processes. And that will often have holes, as Bertrand
noticed here.

I think it's also bad that we don't have a solution for 1), even just for
normal connections. If a backend causes a lot of IO we might want to know
about that long before the longrunning transaction commits.

I suspect the right design here would be to have a generalized form of the
timeout mechanism we have for 2).

For that we'd need to make sure that pgstat_report_stat() can be safely called
inside a transaction.  The second part would be to redesign the
IdleStatsUpdateTimeoutPending mechanism so it is triggered independent of
idleness, without introducing unacceptable overhead - I think that's doable.



Greetings,

Andres Freund