Home > mailing lists

FW: [PATCH] Prefetch index pages for B-Tree index scans - Mailing list pgsql-hackers

From	John Lumby
Subject	FW: [PATCH] Prefetch index pages for B-Tree index scans
Date	November 1, 2012 19:41:29
Msg-id	COL116-W28048CDCCE2DC5D30C4340A3600@phx.gbl Whole thread
In response to	Re: [PATCH] Prefetch index pages for B-Tree index scans (John Lumby <johnlumby@hotmail.com>)
List	pgsql-hackers

Tree view

Claudio wrote :
>
> Check the latest patch, it contains heap page prefetching too.
>

Oh yes I see. I missed that - I was looking in the wrong place.
I do have one question about the way you did it : by placing the
prefetch heap-page calls in _bt_next, which effectively means inside
a call from the index am index_getnext_tid to btgettuple, are you sure
you are synchronizing your prefetches of heap pages with the index am's
ReadBuffer's of heap pages? I.e. are you complying with this comment
from nodeBitmapHeapscan.c for prefetching its bitmap heap pages in
the bitmap-index-scan case:

* We issue prefetch requests *after* fetching the current page to try
* to avoid having prefetching interfere with the main I/O.

I can't really tell whether your design conforms to this and nor do I
know whether it is important, but I decided to do it in the same manner,
and so implemented the heap-page fetching in index_fetch_heap

>
> async_io indeed may make that logic obsolete, but it's not redundant
> posix_fadvise what's the trouble there, but the fact that the kernel
> stops doing read-ahead when a call to posix_fadvise comes. I noticed
> the performance hit, and checked the kernel's code. It effectively
> changes the prediction mode from sequential to fadvise, negating the
> (assumed) kernel's prefetch logic.
>
I did not know that. Very interesting.


>
> I've mused about the possibility to batch async_io requests, and use
> the scatter/gather API insead of sending tons of requests to the
> kernel. I think doing so would enable a zero-copy path that could very
> possibly imply big speed improvements when memory bandwidth is the
> bottleneck.

I think you are totally correct on this point. If I recall, the
glic (librt) aio does have an lio_listio but it is either a noop
or just loops over the list, I forget which (don't have its source right now),
but in any case I am sure there is a potential for implementing such a facility.
But to be really effective, it should be implemented in the kernel itself,
which we don't have today.

John

pgsql-hackers by date:

From: Michael Paquier
Date: 01 November 2012, 19:13:13
Subject: Re: Synchronous commit not... synchronous?

From: Daniel Farina
Date: 01 November 2012, 19:43:08
Subject: Re: Synchronous commit not... synchronous?

FW: [PATCH] Prefetch index pages for B-Tree index scans - Mailing list pgsql-hackers

Previous

Next