Re: posix_fadvise v22 - Mailing list pgsql-hackers

From Gregory Stark
Subject Re: posix_fadvise v22
Date
Msg-id 87y6xhmwxq.fsf@oxford.xeocode.com
Whole thread Raw
In response to Re: posix_fadvise v22  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Tom Lane <tgl@sss.pgh.pa.us> writes:

> "Robert Haas" <robertmhaas@gmail.com> writes:
>> OK, here's an update of Greg's patch with the runtime configure test
>> ripped out, some minor documentation tweaks, and a few unnecessary
>> whitespace diff hunks quashed.  I think this is about ready for
>> committer review.
>
> I've started to look through this, and the only part I seriously don't
> like is the nbtsearch.c changes.  I've got three complaints about that:
>
> * Doing it inside the index AMs is wrong, or at the very least forces
> us to do it over for each index AM (which the patch fails to do).

ok

> * As coded, it generates prefetch bursts that are much too large and too
> widely spaced to be effective, not to mention that they entirely
> ignore the effective_io_concurrency control knob as well as the order
> in which the pages will actually be needed.  I wonder now whether
> Robert's inability to see any benefit came because he was testing
> indexscans and not bitmap scans.

Well experiments showed that it was very effective, even more so than for
bitmap scans. So much so that it nearly obliterated bitmap scans' advantage
over index scans.

I originally thought like you that all that logic was integral to the thing
but eventually came around to think the opposite. That logic is to overcome a
fundamental problem with bitmap scans -- that there's no logical group to
prefetch and a potentially unbounded stream of pages. Index scans just don't
have that problem so they don't need that extra logic.

> * It's only accidental that it's not kicking in during a bitmap
> indexscan and bollixing up the much-more-carefully-written
> nodeBitmapHeapscan prefetch logic.

ok.

> What I intend to do over the next day or so is commit the prefetch
> infrastructure and the bitmap scan prefetch logic, but I'm bouncing the
> indexscan part back for rework.  I think that it should be implemented
> in or near index_getnext() and pay attention to
> effective_io_concurrency.

So to do that I would have a few questions.

a) ISTM necessary to keep a dynahash of previously prefetched pointers around  to avoid repeatedly prefetching the same
pages.That could get quite large  though. Or do you think it would be fine to visit the buffer cache,  essentially
usingthat as the hash, for every index pointer?
 

b) How would index_getnext keep two read pointers like bitmap heap scans? I  think this would require adding a new
indexAM api similar to your  tuplestore api where the caller can maintain multiple read pointers in the  scan. And I'm
notsure how that would work with mark/restore.
 

c) I would be afraid that pushing the index scan to reach for the next index  leaf page prematurely (and not just a
asyncprefetch, but a blocking read)  would cause extra random i/o which would slow down the critical path of  reading
thecurrent index tuples. So I think we would still want to pause  when we hit the end of the current leaf page. That
wouldrequire some form  of feedback in the index am api as well.
 

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production
Tuning


pgsql-hackers by date:

Previous
From: Devrim GÜNDÜZ
Date:
Subject: Re: foreign_data test fails with non-C locale
Next
From: Tom Lane
Date:
Subject: Re: posix_fadvise v22