Home > mailing lists

Re: BitmapHeapScan streaming read user and prelim refactoring - Mailing list pgsql-hackers

From	Melanie Plageman
Subject	Re: BitmapHeapScan streaming read user and prelim refactoring
Date	February 10 21:02:06
Msg-id	CAAKRu_Z57k4=Jka_roxAzE4B0bJxYg_eaWcQ6doH9uf580702Q@mail.gmail.com Whole thread Raw
In response to	Re: BitmapHeapScan streaming read user and prelim refactoring (Tomas Vondra <tomas@vondra.me>)
Responses	Re: BitmapHeapScan streaming read user and prelim refactoring Re: BitmapHeapScan streaming read user and prelim refactoring
List	pgsql-hackers

Tree view

On Sun, Feb 9, 2025 at 9:27 AM Tomas Vondra <tomas@vondra.me> wrote:
>
>
> 2) ryzen
> --------
>
> This "new" machine has multiple types of storage. The cached results (be
> it in shared buffers or in page cache) are not very interesting. 0003
> helps a bit (~15%), but other than that it's just random noise.
>
> The "uncached" results starting on page 23 are much more interesting. In
> general 0001, 0002 and 0004 have little impact, it seems just random
> noise. So in the rest I'll focus on 0003.
>
> For the single nvme device (device: data), it seems mostly fine. It's
> green, even though there are a couple "localized regressions" for eic=0.
> I haven't looked into those yet.
>
> For the nvme RAID (device: raid-nvme), it's looks almost exactly the
> same, except that with parallel query (page 27) there's a clear area of
> regression with eic=1 (look for "column" of red cells). That's a bit
> unfortunate, because eic=1 is the default value.

It'll be hard to look into all of these, so I think I'll focus on
trying to reproduce something with eic=1 that I can reproduce on my
machine. So far, I can reproduce a regression with the following and
the data file attached.

# initdb and get set up with shared_buffers 1GB
psql -c "create table bitmap_scan_test (a bigint, b bigint, c text)
with (fillfactor = 25)"
psql -c "copy bitmap_scan_test from '/tmp/bitmap_scan_test.data'"
psql -c "create index on bitmap_scan_test (a)"
psql -c "vacuum analyze"
psql -c "checkpoint"

pg_ctl stop
echo 3 | sudo tee /proc/sys/vm/drop_caches
pg_ctl start
psql -c "SET max_parallel_workers_per_gather = 4;" \
      -c "SET effective_io_concurrency = 1;" \
      -c "SET parallel_setup_cost = 0;" \
      -c "SET parallel_tuple_cost = 0;" \
      -c "SET enable_seqscan = off;" \
      -c "SET enable_indexscan = off;" \
      -c "SET work_mem = 65536;"

psql -c "EXPLAIN SELECT * FROM bitmap_scan_test WHERE (a BETWEEN -33
AND 10015) OFFSET 1000000;"
psql -c "SELECT * FROM bitmap_scan_test WHERE (a BETWEEN -33 AND
10015) OFFSET 1000000;"

It's not a huge regression and planner doesn't naturally pick parallel
bitmap heap scan for this, but I don't have a SATA drive right now, so
I focused on something I could reproduce.

One thing I noticed when I was playing around with the script is that
depending on the values chosen by random(), there were differences in
timing. From your script, it looks like the $from and $to won't be the
same for master and the patch each time (they are set in the inner
most nesting level, below where $build is set). Am I understanding
correctly?

> Anyway, the results look sensible. It might be good to investigate some
> of the regressions, and I'll try doing that if I find the time. But I
> don't think that's necessarily a blocker - every patch of this type will
> have a hardware where the heuristics doesn't quite do the right thing by
> default. Which is why we have GUCs to tune it if appropriate.

Yea, I definitely won't be able to look into all of the regressions.
So, I guess we have to ask if we are willing to make the tradeoff.

- Melanie

pgsql-hackers by date:

From: Dagfinn Ilmari Mannsåker
Date: 10 February, 20:44:02
Subject: Re: Remove useless casts to (char *)

From: Tomas Vondra
Date: 10 February, 21:11:04
Subject: Re: BitmapHeapScan streaming read user and prelim refactoring

Re: BitmapHeapScan streaming read user and prelim refactoring - Mailing list pgsql-hackers

Previous

Next