Re: GUC-ify walsender MAX_SEND_SIZE constant - Mailing list pgsql-hackers

From Andres Freund
Subject Re: GUC-ify walsender MAX_SEND_SIZE constant
Date
Msg-id 20240423220001.fevuhwirldhi3rkb@awork3.anarazel.de
Whole thread Raw
In response to Re: GUC-ify walsender MAX_SEND_SIZE constant  (Jakub Wartak <jakub.wartak@enterprisedb.com>)
Responses Re: GUC-ify walsender MAX_SEND_SIZE constant
List pgsql-hackers
Hi,

On 2024-04-23 14:47:31 +0200, Jakub Wartak wrote:
> On Tue, Apr 23, 2024 at 2:24 AM Michael Paquier <michael@paquier.xyz> wrote:
> >
> > > Any news, comments, etc. about this thread?
> >
> > FWIW, I'd still be in favor of doing a GUC-ification of this part, but
> > at this stage I'd need more time to do a proper study of a case where
> > this shows benefits to prove your point, or somebody else could come
> > in and show it.
> >
> > Andres has objected to this change, on the ground that this was not
> > worth it, though you are telling the contrary.  I would be curious to
> > hear from others, first, so as we gather more opinions to reach a
> > consensus.

I think it's a bad idea to make it configurable. It's just one more guc that
nobody has a chance of realistically tuning.  I'm not saying we shouldn't
improve the code - just that making MAX_SEND_SIZE configurable doesn't really
seem like a good answer.

FWIW, I have a hard time believing that MAX_SEND_SIZE is going to be the the
only or even primary issue with high latency, high bandwidth storage devices.



> First: it's very hard to get *reliable* replication setup for
> benchmark, where one could demonstrate correlation between e.g.
> increasing MAX_SEND_SIZE and observing benefits (in sync rep it is
> easier, as you are simply stalled in pgbench). Part of the problem are
> the following things:

Depending on the workload, it's possible to measure streaming-out performance
without actually regenerating WAL. E.g. by using pg_receivewal to stream the
data out multiple times.


Another way to get fairly reproducible WAL workloads is to drive
pg_logical_emit_message() from pgbench, that tends to havea lot less
variability than running tpcb-like or such.


> Second: once you perform above and ensure that there are no network or
> I/O stalls back then I *think* I couldn't see any impact of playing
> with MAX_SEND_SIZE from what I remember as probably something else is
> saturated first.

My understanding of Majid's use-case for tuning MAX_SEND_SIZE is that the
bottleneck is storage, not network. The reason MAX_SEND_SIZE affects that is
that it determines the max size passed to WALRead(), which in turn determines
how much we read from the OS at once.  If the storage has high latency but
also high throughput, and readahead is disabled or just not aggressive enough
after crossing segment boundaries, larger reads reduce the number of times
you're likely to be blocked waiting for read IO.

Which is also why I think that making MAX_SEND_SIZE configurable is a really
poor proxy for improving the situation.

We're imo much better off working on read_stream.[ch] support for reading WAL.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: Direct SSL connection with ALPN and HBA rules
Next
From: Tom Lane
Date:
Subject: Tarball builds in the new world order