Gregory Stark wrote:
> I want to know if we're interested in the more invasive patch restructuring
> the buffer manager. My feeling is that we probably are eventually. But I
> wonder if people wouldn't feel more comfortable taking baby steps at first
> which will have less impact in cases where it's not being heavily used.
>
> It's a lot more work to do the invasive patch and there's not much point in
> doing it if we're not interested in taking it when it's ready.
I like the simplicity of this patch. My biggest worry is the impact on
performance when the posix_fadvise calls are not helping. I'd like to
see some testing of that. How expensive are the extra bufmgr hash table
lookups, compared to all the other stuff that's involved in a bitmap
heap scan?
Another thing I wonder is whether this approach can easily be extended
to other types of scans than bitmap heap scans. Normal index scans seem
most interesting; they can generate a lot of random I/O. I don't see any
big problems there, except again the worst-case performance: If you
issue AdviseBuffer calls for all the heap tuples in an index page, in
the naive way, you can issue hundreds of posix_fadvise calls. But what
if they're all actually on the same page? Surely you only want to go
through the buffer manager once (like we do in the scan, thanks to
ReleaseAndReadBuffer()) in that case, and only call posix_fadvise once.
But again, maybe you can convince me that it's a non-issue by some
benchmarking.
> Here's an updated patch. It's mainly just a CVS update but it also includes
> some window dressing: an autoconf test, some documentation, and some fancy
> arithmetic to auto-tune the amount of prefetching based on a more real-world
> parameter "effective_spindle_count".
I like the "effective_spindle_count". That's very intuitive.
The formula to turn that into preread_pages looks complex, though. I can
see the reasoning behind it, but I doubt thing behave that beautifully
in the real world. (That's not important right now, we have plenty of
time to tweak that after more testing.)
> It also includes printing out how much
> the prefetching target got ramped up to in the explain output -- though I'm
> not sure we really want that in the tree, but it's nice for performance
> testing.
I don't understand the ramp up logic. Can you explain what that's for
and how it works?
-- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com