Re: BitmapHeapScan streaming read user and prelim refactoring - Mailing list pgsql-hackers
From | Melanie Plageman |
---|---|
Subject | Re: BitmapHeapScan streaming read user and prelim refactoring |
Date | |
Msg-id | CAAKRu_aqveUgWRhKjDMj-uVFjiKd_XFutXO-31nhEfproswT9g@mail.gmail.com Whole thread Raw |
In response to | Re: BitmapHeapScan streaming read user and prelim refactoring (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: BitmapHeapScan streaming read user and prelim refactoring
|
List | pgsql-hackers |
On Mon, Feb 10, 2025 at 4:24 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Mon, Feb 10, 2025 at 1:11 PM Tomas Vondra <tomas@vondra.me> wrote: > > Certainly for the "localized" regressions, and cases when bitmapheapscan > > would not be picked. The eic=1 case makes me a bit more nervous, because > > it's default and affects NVMe storage. Would be good to know why is > > that, or perhaps consider bumping up eic default. Not sure. > > I'm relatively upset by the fact that effective_io_concurrency is > measured in units that are, AFAIUI, completely stupid. The original > idea was that you would set it to the number of spindles you have. But > from what I have heard, that didn't actually work: you needed to set > it to a significantly higher number. But in 2025, you probably don't > even have spindles any more, because you're probably on SSD or some > other modern storage medium rather than a rotating hard drive. And if > you do have spindles, do you know how many you have? Is that even a > meaningful concept? I had taken to thinking of it as "queue depth". But I think that's not really accurate. Do you know why we set it to 1 as the default? I thought it was because the default should be just prefetch one block ahead. But if it was meant to be spindles, was it that most people had one spindle (I don't really know what a spindle is)? > I am not saying that it's this patch's job to replace > effective_io_concurrency with something better. I think adopting the > streaming read interface is pretty important, and if it works out that > we should also change the default effective_io_concurrency from 1 to 2 > or 17 or 42715 in the same release, fine. Yes, I think we should probably change it to be higher. However, this exercise today made me realize that it is going to be pretty difficult to completely emulate the IO behavior of parallel bitmap heap scans on master with this patch. The way the different processes and the two iterators happened to interact had all sorts of random side effects. (See, for example, this [1] bug we discovered and didn't end up fixing because it would be hard and we figured we would replace this all with the read stream API). > But I think that in the > slightly longer term, it would be a really good idea for someone to > propose a more sensible model. It's hard to think of a clearer case of > a parameter being rendered meaningless by the march of technology. The > closest analogue that comes to mind is our sorting implementation used > to have a hard coded number of tape drives, when the underlying > implementation was a bunch of files all on the same filesystem, but > that wasn't a user-settable parameter. I imagine if I said that we should start calling it prefetch_distance, you would say that no one would know what to configure it to and that would be just as bad. - Melanie [1] https://www.postgresql.org/message-id/20240315211449.en2jcmdqxv5o6tlz%40alap3.anarazel.de
pgsql-hackers by date: