Home > mailing lists

Re: BitmapHeapScan streaming read user and prelim refactoring - Mailing list pgsql-hackers

From	Jakub Wartak
Subject	Re: BitmapHeapScan streaming read user and prelim refactoring
Date	February 19 16:28:39
Msg-id	CAKZiRmwK-=_hgr1OU5uML7k-eREr+FAXTE2FuKDFxyFrudZRUg@mail.gmail.com Whole thread Raw
In response to	Re: BitmapHeapScan streaming read user and prelim refactoring (Andres Freund <andres@anarazel.de>)
Responses	Re: BitmapHeapScan streaming read user and prelim refactoring
List	pgsql-hackers

Tree view

On Fri, Feb 14, 2025 at 7:16 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

> On 2025-02-14 18:36:37 +0100, Tomas Vondra wrote:
> > All of this is true, ofc, but maybe it's better to have a tool providing
> > at least some advice
>
> I agree, a tool like that would be useful!
>
> One difficulty is that the relevant parameter space is really large, making it
> hard to keep the runtime in a reasonable range...

It doesn't need to be perfect for sure. I was to abandon this proposal
(argument for dynamic/burstable IO is hard to argue with), but saw
some data that made me write this. I have a strong feeling that the
whole effort of the community might go unnoticed if real-world
configuration e_io_c stays at what it is today. Distribution of e_io_c
values on real world installations is more like below:
    1    66%
    200    17%
    300    3%
    16    2%
    8    1%

200 seems to be EDB thingy. As per [1] even Flex has 1 by default.
I've asked R1 model and it literally told me to set this:
Example for SSDs: effective_io_concurrency = 200
Example for HDDs: effective_io_concurrency = 2

Funny, so the current default (1) is saying to me like: use half of
the platters in HDD in 2026+ (that's when people will start to
pg_upgrade) potentially on PCIe Gen 6.0 NVMEs by then :^)

> > I'd definitely not want initdb to do this automatically, though. Getting
> > good numbers is fairly expensive (in time and I/O), can be flaky, etc.
>
> Yea.

Why not? We are not talking about perfect results. If we would
constraint it to just few seconds and cap it (to still get something
conservative but still allow getting higher e_io_c where it might
matter), this would allow read streaming (and it's consumers such as
this $thread) and AIO to at least give some chance to shine , wouldn't
it ? I do understand the value should be conservative, but without at
least values of 4..8 hardly anyone will notice the benefits (?)

Wouldn't be MIN(best_estimated_eioc/VCPUs < 1 ? 1 :
best_estimated_eioc/VCPUs, 8) saner?
After all it could be anything in the OS, that could tell hint us too
(like /sys with nr_requests or queue_depth)

I cannot stop thinking how wasteful that e_io_c=1 seems to be with all
those IO stalls, context_switches, and You have mentioned even that
CPU power-saving idling impact too.

> > But maybe having a tool that gives you a bunch of numbers, as input for
> > manual tuning, would be good enough?
>
> I think it'd be useful. I'd perhaps make it an SQL callable tool though, so
> it can be run in cloud environments.

Right, you could even make it SQL callable and still run it when
initdb runs. It could take a max_runtime parameter too to limit its
max duration (longer the measurement the more accurate the result).

> > As you say, it's not just about the hardware (and how that changes over
> > time because of "burst" credits etc.), but also about the workload.
> > Would it be possible to track something, and adjust this dynamically
> > over time? And then adjust the prefetch distance in some adaptive way?
>
> Yes, I do think so!  It's not trivial, but I think we eventually do want it.
>
> Melanie has worked on this a fair bit, fwiw.
>
> My current thinking is that we'd want something very roughly like TCP
> BBR. Basically, it predicts the currently available bandwidth not just via
> lost packets - the traditional approach - but also by building a continually
> updated model of "bytes in flight" and latency and uses that to predict what
> the achievable bandwidth is.[..]

Sadly that doesn't sound like PG18, right? (or I missed some thread,
I've tried to watch Melanie's presentation though )

-J.

[1] -
https://learn.microsoft.com/en-us/azure/postgresql/flexible-server/server-parameters-table-resource-usage-asynchronous-behavior?pivots=postgresql-17

pgsql-hackers by date:

From: vignesh C
Date: 19 February, 15:48:15
Subject: Re: Added schema level support for publication.

From: Ilia Evdokimov
Date: 19 February, 17:01:06
Subject: Re: explain analyze rows=%.0f

Re: BitmapHeapScan streaming read user and prelim refactoring - Mailing list pgsql-hackers

Previous

Next