Re: Commitfest patches - Mailing list pgsql-hackers

From Gregory Stark
Subject Re: Commitfest patches
Date
Msg-id 8763uxdotq.fsf@oxford.xeocode.com
Whole thread Raw
In response to Re: Commitfest patches  (Martijn van Oosterhout <kleptog@svana.org>)
List pgsql-hackers
"Martijn van Oosterhout" <kleptog@svana.org> writes:

> - I think normal index scans could benefit from this (it was measurable
> when I was playing with AIO in postgres a few years back).

I don't want to torture any existing code paths to get prefetching to work.
Heikki suggested I take advantage of the page-at-a-time index scanning though
which does sound like it could be convenient.

> - I think the number of preread_count is far too small, given you get a
> benefit even if you only have one spindle.
> - I don't understand the ramp-up method either.

So the idea is that I was deathly afraid of being accused of doing unnecessary
additional I/O. So I was worried about the user doing something like SELECT
... LIMIT 1. Or worse, just starting a select and discarding it after only
selecting a handful of records. 

I also didn't want to start the bitmap scan by doing hundreds of syscalls
before we even return the first record.

So I figured it's safer to read only the first block at first. Only if the
user goes on to the next record do we try prefetching the next block. Each
record the user reads we bump up the prefetch amount by 1 until we hit the
goal value for the size raid array we're using.

That also nicely spreads out the syscalls so we get one prefetch between each
record returned.

We also don't know how densely packed the records are on the pages. If they're
densely packed then we'll have prefetched a whole bunch of records in time for
the second page. But if there's only one record on a page and we're already on
the second page I figured that indicated we would be reading many pages with
sparsely distributed records. So we ramp up the prefetch exponentially by
multiplying by two each time we move to the next page.

The net result is that if you do a bitmap scan which is processing lots of
pointers it'll quickly reach the point where it's prefetching the number of
blocks based on effective_spindle_count. If you're just processing the first
few tuples it'll only read a small number of extra pages. And if you're only
processing one record it won't prefetch at all.

> People spend a lot of time worrying about hundreds of posix_fadvise()
> calls but you don't need anywhere near that much to be effective. With
> AIO I limited the number of outstanding requests to a dozen and it was
> still useful. You lose nothing by capping the number of requests at any
> point.

Well you leave money on the table. But yes, I'm trying to be conservative
about how much to prefetch when we don't know that it's in our favour.

>> I want to know if we're interested in the more invasive patch restructuring
>> the buffer manager. My feeling is that we probably are eventually. But I
>> wonder if people wouldn't feel more comfortable taking baby steps at first
>> which will have less impact in cases where it's not being heavily used.
>
> I think the way it is now is neat and simple and enough for now.

Thanks.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about
EnterpriseDB'sPostgreSQL training!
 


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Patch queue -> wiki
Next
From: Gregory Stark
Date:
Subject: Re: Commitfest patches