Re: Pluggable Storage - Andres's take - Mailing list pgsql-hackers
From | Amit Khandekar |
---|---|
Subject | Re: Pluggable Storage - Andres's take |
Date | |
Msg-id | CAJ3gD9ccZF1WUBaZ6V4K64pX4-R3BBFE8ZJ_CjQVrR=LLApa+Q@mail.gmail.com Whole thread Raw |
In response to | Re: Pluggable Storage - Andres's take (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Pluggable Storage - Andres's take
|
List | pgsql-hackers |
On Thu, 21 Feb 2019 at 04:17, Robert Haas <robertmhaas@gmail.com> wrote: > > On Fri, Feb 8, 2019 at 5:18 AM Amit Khandekar <amitdkhan.pg@gmail.com> wrote: > > In the attached v1 patch, the prefetch_distance is calculated as > > effective_io_concurrency + 10. Also it has some cosmetic changes. > > I did a little brief review of this patch and noticed the following things. > > +} PrefetchState; > That name seems too generic. Ok, so something like XidHorizonPrefetchState ? On similar lines, does prefetch_buffer() function name sound too generic as well ? > > +/* > + * An arbitrary way to come up with a pre-fetch distance that grows with io > + * concurrency, but is at least 10 and not more than the max effective io > + * concurrency. > + */ > > This comment is kinda useless, because it only tells us what the code > does (which is obvious anyway) and not why it does that. Saying that > your formula is arbitrary may not be the best way to attract support > for it. Well, I had checked the way the number of drive spindles (effective_io_concurrency) is used to calculate the prefetch distance for bitmap heap scans (ComputeIoConcurrency). Basically I think the intention behind that method is to come up with a number that makes it highly likely that we pre-fetch a block of each of the drive spindles. But I didn't get how that exactly works, all the less for non-parallel bitmap scans. Same is the case for the pre-fetching that we do here for xid-horizon stuff, where we do the block reads sequentially. Me and Andres discussed this offline, and he was of the opinion that this formula won't help here, and instead we just keep a constant distance that is some number greater than effective_io_concurrency. I agree that instead of saying "arbitrary" we should explain why we have done that, and before that, come up with an agreed-upon formula. > > + for (i = prefetch_state->next_item; i < nitems && count < prefetch_count; i++) > > It looks strange to me that next_item is stored in prefetch_state and > nitems is passed around as an argument. Is there some reason why it's > like that? We could keep the max count in the structure itself as well. There isn't any specific reason for not keeping it there. It's just that this function prefetch_state () is not a general function for maintaining a prefetch state that spans across function calls; so we might as well just pass the max count to that function instead of having another field in that structure. I am not inclined specifically towards either of the approaches. > > + /* prefetch a fixed number of pages beforehand. */ > > Not accurate -- the number of pages we prefetch isn't fixed. It > depends on effective_io_concurrency. Yeah, will change that in the next patch version, according to what we conclude about the prefetch distance calculation. -- Thanks, -Amit Khandekar EnterpriseDB Corporation The Postgres Database Company
pgsql-hackers by date: