Re: BitmapHeapScan streaming read user and prelim refactoring - Mailing list pgsql-hackers

From Melanie Plageman
Subject Re: BitmapHeapScan streaming read user and prelim refactoring
Date
Msg-id CAAKRu_Ysrkav8oBbOm68AjE2h4wd=qTscifcuL3AsfFXvA2_CQ@mail.gmail.com
Whole thread Raw
In response to Re: BitmapHeapScan streaming read user and prelim refactoring  (Andres Freund <andres@anarazel.de>)
Responses Re: BitmapHeapScan streaming read user and prelim refactoring
List pgsql-hackers
Thanks for taking a look. I've pushed the patch to increase the
default effective_io_concurrency.

On Tue, Mar 11, 2025 at 8:07 PM Andres Freund <andres@anarazel.de> wrote:
>
> On 2025-03-10 19:45:38 -0400, Melanie Plageman wrote:
> > From 7b35b1144bddf202fb4d56a9b783751a0945ba0e Mon Sep 17 00:00:00 2001
> > From: Melanie Plageman <melanieplageman@gmail.com>
> > Date: Mon, 10 Mar 2025 17:17:38 -0400
> > Subject: [PATCH v35 1/5] Increase default effective_io_concurrency to 16
>
> > Moreover, when bitmap heap scan is converted to using the read stream
> > API, a prefetch distance of 1 will prevent read combining which is quite
> > detrimental to performance.
>
> Hm? This one surprises me. Doesn't the read stream code take some pains to
> still perform IO combining when effective_io_concurrency=1? It does work for
> seqscans, for example?

I went back and tried to figure out what I meant by this. When Thomas
and I were investigating various BHS regressions, we did see that
effective_io_concurrency 1 performed particularly badly on the read
stream branch (as compared to master). One thing we noticed was very
little read combining (especially for parallel bitmap heap scan, but
that's a separate bundle of issues). Re-reading the read stream code,
though, you are right that we should have no issue read combining when
eic is 1. In that case, with the default io_combine_limit, max pinned
buffers will be 32, which is definitely enough for read combining.

Anyway, I've removed this text from the commit message.

> > diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
> > index d2fa5f7d1a9..8c4409fc8bf 100644
> > --- a/doc/src/sgml/config.sgml
<-- snip-->
> > +         <literal>0</literal> to disable issuance of asynchronous I/O requests.
> > +         The default is <literal>16</literal> on supported systems, otherwise
> > +         <literal>0</literal>. Currently, this setting only affects bitmap heap
> > +         scans.
> >          </para>
>
> I'd probably use this as an occasion to remove "Currently, this setting only
> affects bitmap heap" sentence - afaict it's been wrong for a while and got
> more wrong since vacuum started to use read streams...

Well, technically vacuum uses maintenance_io_concurrency. Bitmap heap
scan is the only one using effective_io_concurrency for prefetching
(since sequential scan doesn't do prefetching). So, it is technically
true. Sequential scans will try to combine IOs only up to the
io_combine_limit so effective_io_concurrency doesn't really matter for
them. I removed this sentence anyway, though. Other read stream users
(including non-in-core ones) could start using
effective_io_concurrency and then it might be more confusing anyway.

- Melanie



pgsql-hackers by date:

Previous
From: Sami Imseih
Date:
Subject: Re: making EXPLAIN extensible
Next
From: Heikki Linnakangas
Date:
Subject: Re: Rename functions to alloc/free things in reorderbuffer.c