Re: BitmapHeapScan streaming read user and prelim refactoring - Mailing list pgsql-hackers

From Andres Freund
Subject Re: BitmapHeapScan streaming read user and prelim refactoring
Date
Msg-id xif2lgn7obsi5brj7llkzomcia2pn5nwqlyjnkjruknknclbws@vgw2kaldktxw
Whole thread Raw
In response to Re: BitmapHeapScan streaming read user and prelim refactoring  (Dilip Kumar <dilipbalaut@gmail.com>)
List pgsql-hackers
Hi,

On 2025-02-14 18:18:47 +0100, Tomas Vondra wrote:
> FWIW this does not change anything in the detection of sequential access
> patterns, discussed nearby, because the benchmarks started before Andres
> looked into that. If needed, I can easily rerun these tests, I just need
> a patch to apply.
> 
> But if there really is some sort of issue, it'd make sense why it's much
> worse on the older SATA SSDs, while NVMe devices perform somewhat
> better. Because AFAICS the NVMe devices are better at handling random
> I/O with shorter queues.

I think the results are complicated because there are two counteracting
factors influencing performance:

1) read stream doing larger reads -> considerably faster
2) read stream not doing prefetching -> more IO stalls


1) will be a a bigger boon on disks where you're not bottlenecked as much by
interface limits. Whereas SATA is limited to ~500MB/s, NVMe started out at
3GB/s. So this gain will matter more on NVMes.


At least on my machine 2) is what causes CPU idle states to kick in, which is
what causes a good bit of the slowdown.  How expensive the idle states are,
how quickly they kick in, etc seems to depend a lot on CPU model, bios
settings and "platform settings" (mainboard manufacturer settings).

The worse a disk is at random IO, the longer the stalls are (adding time), the
deeper idle state can be reached (further increasing latency). I.e. SATA will
be worse.


It might be interesting to run the benchmark with cpu idle stats disabled, at
least on the subset of cores you run the test on. E.g.
  cpupower -c 13 idle-set -D1
will disable idle states that have a transition time worse than 1us for core
13.

Sometimes disabling idle states for all cores will have deliterious effects,
due to reducing the thermal budget for turbo boost. E.g. on my older
workstation a core can boost to 3.4GHz if the whole system is at -E and only
3GHz at -D0.

Instead of disabling idle states, you could also just monitor them (cpupower
monitor <benchmark> or turbostat --quiet <benchmark>).


Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Melanie Plageman
Date:
Subject: Re: Confine vacuum skip logic to lazy_scan_skip
Next
From: Andres Freund
Date:
Subject: Re: BackgroundPsql swallowing errors on windows