Re: BitmapHeapScan streaming read user and prelim refactoring - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: BitmapHeapScan streaming read user and prelim refactoring
Date
Msg-id b3b7b67a-db39-4061-9f75-723a12272fe6@vondra.me
Whole thread Raw
In response to Re: BitmapHeapScan streaming read user and prelim refactoring  (Andres Freund <andres@anarazel.de>)
Responses Re: BitmapHeapScan streaming read user and prelim refactoring
List pgsql-hackers
On 2/14/25 18:14, Andres Freund wrote:
> Hi,
> 
> On 2025-02-14 10:04:41 +0100, Jakub Wartak wrote:
>> Is there any reason we couldn't have new pg_test_iorates (similiar to
>> other pg_test_* proggies), that would literally do this and calibrate
>> best e_io_c during initdb and put the result into postgresql.auto.conf
>> (pg_test_iorates --adjust-auto-conf) , that way we would avoid user
>> questions on how to come with optimal value?
> 
> Unfortunately I think this is a lot easier said than done:
> 
> - The optimal depth depends a lot on your IO patterns, there's a lot of things
>   between fully sequential and fully random.
> 
>   You'd really need to test different IO sizes and different patterns. The
>   test matrix for that gets pretty big.
> 
> - The performance characteristics of storage heavily changes over time.
> 
>   This is particularly true in cloud environments, where disk/VM combinations
>   will be "burstable", allowing higher throughput for a while, but then not
>   anymore.  Measureing during either of those states will not be great for the
>   other state.
> 
> - e_io_c is per-query-node, but impacts the whole system. If you set
>   e_io_c=1000 on a disk with a metered IOPS of say 1k/s you might get a
>   slightly higher throughput for a bitmap heap scan, but also your commit
>   latency in concurrent will go through the roof, because your fdatasync()
>   will be behind a queue of 1k reads that all are throttled.
> 

All of this is true, ofc, but maybe it's better to have a tool providing
at least some advice? I'd imagine pg_test_fsync is affected by many of
those issues too (considering both are about I/O).

I'd definitely not want initdb to do this automatically, though. Getting
good numbers is fairly expensive (in time and I/O), can be flaky, etc.

But maybe having a tool that gives you a bunch of numbers, as input for
manual tuning, would be good enough?

As you say, it's not just about the hardware (and how that changes over
time because of "burst" credits etc.), but also about the workload.
Would it be possible to track something, and adjust this dynamically
over time? And then adjust the prefetch distance in some adaptive way?

Perhaps it's a bit naive, but say we know what % of requests is handled
from cache, how long it took, etc. We opportunistically increase the
prefetch distance, and check the cache hit ratio after a while. Did it
help (% increased, consumed less time), maybe try another step. If not,
maybe try prefetching less?


regards

-- 
Tomas Vondra




pgsql-hackers by date:

Previous
From: Sami Imseih
Date:
Subject: Re: pg_stat_statements and "IN" conditions
Next
From: Tomas Vondra
Date:
Subject: Re: BitmapHeapScan streaming read user and prelim refactoring