Re: Prereading using posix_fadvise - Mailing list pgsql-hackers

From Gregory Stark
Subject Re: Prereading using posix_fadvise
Date
Msg-id 87ve36yd6o.fsf@oxford.xeocode.com
Whole thread Raw
In response to Re: Prereading using posix_fadvise (was Re: Commitfest patches)  (Bruce Momjian <bruce@momjian.us>)
List pgsql-hackers

Someone wrote:
>>>
>>> Should we consider only telling the kernel X pages ahead, meaning when
>>> we are on page 10 we tell it about page 16?

The patch I posted specifically handles bitmap heap scans. It does in fact
prefetch only a limited number of pages from the bitmap stream based on a guc,
but it tries to be a bit clever about ramping up gradually.

The real danger here, imho, is doing read-ahead for blocks the client never
ends up reading. By ramping up the read-ahead gradually as the client reads
records we protect against that. 

> Heikki Linnakangas wrote:
>> 
>> Yes. You don't want to fire off thousands of posix_fadvise calls 
>> upfront. That'll just flood the kernel, and it will most likely ignore 
>> any advise after the first few hundred or so. I'm not sure what the 
>> appropriate amount of read ahead would be, though. Probably depends a 
>> lot on the OS and hardware, and needs to be a adjustable.

"Bruce Momjian" <bruce@momjian.us> writes:
>
> And if you read-ahead too far the pages might get pushed out of the
> kernel cache before you ask to read them.

While these concerns aren't entirely baseless the actual experiments seem to
show the point of diminishing returns is pretty far out there. Look at the
graphs below, keeping in mind that the X axis is the number of blocks
prefetched.

http://archives.postgresql.org/pgsql-hackers/2007-12/msg00088.php

The pink case is analogous to a bitmap index scan where the blocks are read in
order. In that case the point of diminishing returns is reached around 64
pages. But performance doesn't actually dip until around 512 pages. And even
prefetching 8,192 blocks the negative impact on performance is still much less
severe than using a smaller-than-optimal prefetch size.

This is on a piddly little 3-way raid. On a larger raid you would want even
larger prefetch sizes.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services!


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: [PATCHES] Implemented current_query
Next
From: Gregory Stark
Date:
Subject: Re: Commitfest patches