Re: Prereading using posix_fadvise (was Re: Commitfest patches) - Mailing list pgsql-hackers

From Zeugswetter Andreas OSB SD
Subject Re: Prereading using posix_fadvise (was Re: Commitfest patches)
Date
Msg-id E1539E0ED7043848906A8FF995BDA57902EB1583@m0143.s-mxs.net
Whole thread Raw
In response to Prereading using posix_fadvise (was Re: Commitfest patches)  (Heikki Linnakangas <heikki@enterprisedb.com>)
Responses Re: Prereading using posix_fadvise (was Re: Commitfest patches)
List pgsql-hackers
Heikki wrote:
> It seems that the worst case for this patch is a scan on a table that
> doesn't fit in shared_buffers, but is fully cached in the OS cache. In

> that case, the posix_fadvise calls would be a certain waste of time.

I think this is a misunderstanding, the fadvise is not issued to read
the
whole table and is not issued for table scans at all (and if it were it
would
only advise for the next N pages).

So it has nothing to do with table size. The fadvise calls need to be
(and are)
limited by what can be used in the near future, and not for the whole
statement.

e.g. N next level index pages that are relevant, or N relevant heap
pages one
index leaf page points at. Maybe in the index case N does not need to be
limited,
since we have a natural limit on how many pointers fit on one page.

In general I think separate reader processes (or threads :-) that
directly read
into the bufferpool would be a more portable and efficient
implementation.
E.g. it could use ScatterGather IO. So I think we should look, that the
fadvise
solution is not obstruing that path, but I think it does not.

Gregory wrote:
>> A more invasive form of this patch would be to assign and pin a
buffer when
>> the preread is done. That would men subsequently we would have a
pinned buffer
>> ready to go and not need to go back to the buffer manager a second
time. We
>> would instead just "complete" the i/o by issuing a normal read call.

I guess you would rather need to mark the buffer for use for this page,
but let any backend that needs it first, pin it and issue the read.
I think the fadviser should not pin it in advance, since he cannot
guarantee to
actually read the page [soon]. Rather remember the buffer and later
check and pin
it for the read. Else you might be blocking the buffer.
But I think doing something like this might be good since it avoids
issuing duplicate
fadvises.

Andreas


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: segfault in locking code
Next
From: "Brendan Jurd"
Date:
Subject: Re: Status of GIT mirror (Was having problem in rsync'ing cvs)