Fwd: [BUG]: the walsender does not update its IO statistics until it exits - Mailing list pgsql-hackers
From | Xuneng Zhou |
---|---|
Subject | Fwd: [BUG]: the walsender does not update its IO statistics until it exits |
Date | |
Msg-id | CABPTF7VreDnD3YiWzx_=PpLRdgOQsH8Xp3fTGMc+r9rGpc3WLg@mail.gmail.com Whole thread Raw |
In response to | [BUG]: the walsender does not update its IO statistics until it exits (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>) |
Responses |
Re: Fwd: [BUG]: the walsender does not update its IO statistics until it exits
|
List | pgsql-hackers |
发件人: Xuneng Zhou <xunengzhou@gmail.com>
Date: 2025年3月13日周四 19:15
Subject: Re: [BUG]: the walsender does not update its IO statistics until it exits
To: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Hi,
Thanks for working on this! I'm glad to see that the patch (https://www.postgresql.org/message-id/flat/Z3zqc4o09dM/Ezyz@ip-10-97-1-34.eu-west-3.compute.internal) has been committed.
Regarding patch 0001, the optimization in pgstat_backend_have_pending_cb
looks good:
bool
pgstat_backend_have_pending_cb(void)
{
- return (!pg_memory_is_all_zeros(&PendingBackendStats,
- sizeof(struct PgStat_BackendPending)));
+ return backend_has_iostats;
}
Additionally, the function pgstat_flush_backend
includes the check:
+ if (!pgstat_backend_have_pending_cb()) return false;
However, I think we might need to revise the comment (and possibly the function name) for clarity:
/* * Check if there are any backend stats waiting to be flushed. */
Originally, this function was intended to check multiple types of backend statistics, which made sense when PendingBackendStats
was the centralized structure for various pending backend stats. However, since PgStat_PendingWalStats
was removed from PendingBackendStats
earlier, and now this patch introduces the backend_has_iostats
variable, the scope of this function appears even narrower. This narrowed functionality no longer aligns closely with the original function name and its associated comment.
Hi,
On Mon, Mar 03, 2025 at 10:51:19AM +0900, Michael Paquier wrote:
> On Fri, Feb 28, 2025 at 10:39:31AM +0000, Bertrand Drouvot wrote:
> > That sounds a good idea to measure the impact of those extra calls and see
> > if we'd need to mitigate the impacts. I'll collect some data.
So I did some tests using only one walsender (given the fact that the extra
lock you mentioned above is "only" for this particular backend).
=== Test with pg_receivewal
I was using one pg_receivewal process and did some tests that way:
pgbench -n -c8 -j8 -T60 -f <(echo "SELECT pg_logical_emit_message(true, 'test', repeat('0', 1));";)
I did not measure any noticeable extra lag (I did measure the time it took
for pg_size_pretty(sent_lsn - write_lsn) from pg_stat_replication to be back
to zero).
During the pgbench run a "perf record --call-graph fp -p <walsender_pid>" would
report (perf report -n):
1. pgstat_flush_backend() appears at about 3%
2. pg_memory_is_all_zeros() at about 2.8%
3. pgstat_flush_io() at about 0.4%
So it does not look like what we're adding here can be seen as a primary bottleneck.
That said it looks like that there is room for improvment in pgstat_flush_backend()
and that relying on a "have_iostats" like variable would be better than those
pg_memory_is_all_zeros() calls.
That's done in 0001 attached, by doing so, pgstat_flush_backend() now appears at
about 0.2%.
=== Test with pg_recvlogical
Now it does not look like pg_receivewal had a lot of IO stats to report (looking at
pg_stat_get_backend_io() output for the walsender).
Doing the same test with "pg_recvlogical -d postgres -S logical_slot -f /dev/null --start"
reports much more IO stats.
What I observe without the "have_iostats" optimization is:
1. I did not measure any noticeable extra lag
2. pgstat_flush_io() at about 5.5% (pgstat_io_flush_cb() at about 5.3%)
3 pgstat_flush_backend() at about 4.8%
and with the "have_iostats" optimization I now see pgstat_flush_backend() at
about 2.51%.
So it does not look like what we're adding here can be seen as a primary bottleneck
but that is probably worth implementing the "have_iostats" optimization attached.
Also, while I did not measure any noticeable extra lag, given the fact that
pgstat_flush_io() shows at about 5.5% and pgstat_flush_backend() at about 2.5%,
that could still make sense to reduce the frequency of the flush calls, thoughts?
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: