Re: Fwd: [BUG]: the walsender does not update its IO statistics until it exits - Mailing list pgsql-hackers

From Bertrand Drouvot
Subject Re: Fwd: [BUG]: the walsender does not update its IO statistics until it exits
Date
Msg-id Z/N7EGnE3lCHGi2Z@ip-10-97-1-34.eu-west-3.compute.internal
Whole thread Raw
In response to Re: Fwd: [BUG]: the walsender does not update its IO statistics until it exits  (Michael Paquier <michael@paquier.xyz>)
Responses Re: Fwd: [BUG]: the walsender does not update its IO statistics until it exits
List pgsql-hackers
Hi,

On Mon, Apr 07, 2025 at 03:31:14PM +0900, Michael Paquier wrote:
> On Fri, Apr 04, 2025 at 09:33:46PM +0530, vignesh C wrote:
> > The new test added currently passes even without the patch. It would
> > be ideal to have a test that fails without the patch and passes once
> > the patch is applied.
> 
> Right.  The subscription test and logical WAL senders passes without
> the patch, because we are able to catch some WAL activity through
> pgoutput.  The recovery test for physical WAL sender fails without the
> patch on timeout.
> 
> We could need something more advanced here for the logical case, where
> we could use pg_recvlogical started in the background with a hardcoded
> endpos, or just kill the pg_recvlogical command once we have checked
> the state of the stats.  I am not sure if this is worth the cycles
> spent on,

Another option for the logical walsender is to reset the stats once the
subscription is created like in the attached. With the attached in place the
test now fails without the patch in place (and passes with the patch).

> One thing that I'm a bit unhappy about in the patch is the choice to
> do the stats updates in WalSndLoop() for the logical WAL sender case,
> because this forces an extra GetCurrentTimestamp() call for each loop,
> and that's never a cheap system call in what can be a hot code path.
> How about doing the calculations in WalSndWaitForWal() for the logical
> part, relying on the existing GetCurrentTimestamp() done there?
> That's also where the waits are handled for the logical part, so there
> may be a good point in keeping this code more symmetric for now,
> rather than split it.

Well, while I was working on a reproducer case for the logical walsender (that
lead to the attached), I realized that the test was looping for relatively long
time before passing. Then I was changing the patch to do *exactly* what you did
propose in v7 and the logical walsender test now passes quickly enough...

So it looks like that the idea of using a "shared" code path was not that good
and that spliting physical/logical as done in v7 is a good choice.

But...

> TBH, so I would be happy with just the physical case checked
> in TAP as it's simpler because streaming replication makes that easy
> to work with.

As we now have 2 code paths I think we "really" need 2 tests. The attached
(to apply on top of v7) seems to do the job.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

pgsql-hackers by date:

Previous
From: Bernd Helmle
Date:
Subject: Re: Modern SHA2- based password hashes for pgcrypto
Next
From: Fujii Masao
Date:
Subject: Re: Make prep_status() message translatable